8 procs -> 1850 s -> 14800 s-monoproc
16 procs -> 966 -> 15456 s-monoproc
32 procs -> 514 s -> 16448 s-monoproc
64 procs -> 312 s -> 19968 s-monoproc
128 procs -> 223 s -> 28544 s-monoproc
256 procs -> 221 s -> 56576 s-monoproc
512 procs -> 97 s -> 49664 s-monoproc
--------------------------------------------- 512 Processor run, global<512,512,512> for 100 time steps main timer : (real:0:1:47,user:0:1:1,sys:0:0:40)[clk:10758] loop : (real:0:1:37,user:0:0:59,sys:0:0:37)[clk:9721] (100 calls X 9.721000e+01) FFT timer : (real:0:1:38,user:0:0:52,sys:0:0:39)[clk:9841] planification time : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:0] FFT (out of place) and transposition time : (real:0:1:38,user:0:0:52,sys:0:0:39)[clk:9841] (509 calls X 1.933399e+01) FFT (out of place) only : (real:0:0:12,user:0:0:11,sys:0:0:0)[clk:1219] (2545 calls X 4.789784e-01) FFTW (out of place) Transposition only : (real:0:1:25,user:0:0:40,sys:0:0:38)[clk:8578] (2036 calls X 4.213163e+00) azur::array timer root : (real:0:0:4,user:0:0:4,sys:0:0:0)[clk:487] view = expr : (real:0:0:4,user:0:0:4,sys:0:0:0)[clk:487] (1620 calls X 3.006173e-01) fftw3<dbl> id= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:0] (2 calls X 0.000000e+00) fftw4<dbl> /= int : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:41] (204 calls X 2.009804e-01) fftw4<dbl> id= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:0] (1 calls X 0.000000e+00) fftw3<dbl> id= fftw3<dbl> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:80] (606 calls X 1.320132e-01) fftw4<dbl> *= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:0] (1 calls X 0.000000e+00) fftw4<cdbl> id= fftw4<cdbl> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:85] (202 calls X 4.207921e-01) fftw4<dbl> id= fftw4<dbl> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:45] (103 calls X 4.368932e-01) fftw4<cdbl> id= (s2v<basic3<dbl>> * ((fftw4<cdbl> * dbl) + fftw4<cdbl>)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:0] (1 calls X 0.000000e+00) fftw4<cdbl> id= (s2v<basic3<dbl>> * ((fftw4<cdbl> * dbl) + fftw4<cdbl>)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:1] (1 calls X 1.000000e+00) fftw4<cdbl> id= (s2v<basic3<dbl>> * (fftw4<cdbl> * dbl)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:0] (1 calls X 0.000000e+00) fftw4<cdbl> *= s2v<basic3<dbl>> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:0] (1 calls X 0.000000e+00) fftw4<cdbl> += fftw4<cdbl> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:1] (1 calls X 1.000000e+00) fftw4<cdbl> id= ((fftw4<cdbl> * s2v<basic3<dbl>>) + (s2v<basic3<dbl>> * (fftw4<cdbl> * dbl))) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:0] (1 calls X 0.0000 00e+00) fftw4<cdbl> id= ((fftw4<cdbl> * dbl) * s2v<basic3<dbl>>) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:49] (99 calls X 4.949495e-01) fftw4<cdbl> -= fftw4<cdbl> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:38] (99 calls X 3.838384e-01) fftw4<cdbl> -= ((fftw4<cdbl> * dbl) * s2v<basic3<dbl>>) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:52] (99 calls X 5.252525e-01) fftw4<cdbl> id= (s2v<basic3<dbl>> swp(*) (fftw4<cdbl> + (fftw4<cdbl> * dbl))) : (real:0:0:0,user:0:0:1,sys:0:0:0)[clk:95] (198 calls X 4.797980e-01) cubby::field timer root : (real:0:1:28,user:0:0:43,sys:0:0:38)[clk:8851] scalar::transpose_blocks_when_received : (real:0:1:25,user:0:0:40,sys:0:0:38)[clk:8574] (3054 calls X 2.807466e+00) scalar::copy_transposed : (real:0:0:1,user:0:0:1,sys:0:0:0)[clk:117] (781824 calls X 1.496500e-04) vector::in_place_curl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:49] (202 calls X 2.425743e-01) scalar::local_energy : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:4] (66 calls X 6.060606e-02) vector::vec_prod : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:84] (202 calls X 4.158416e-01) vector::project : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:46] (101 calls X 4.554456e-01) scalar::dealias : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:92] (606 calls X 1.518152e-01)
256 Processor run, global<512,512,512> for 100 time steps main timer : (real:0:3:39,user:0:1:57,sys:0:1:26)[clk:21974] loop : (real:0:3:31,user:0:1:55,sys:0:1:23)[clk:21134] (100 calls X 2.113400e+02) FFT timer : (real:0:3:19,user:0:1:38,sys:0:1:24)[clk:19919] planification time : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:0] FFT (out of place) and transposition time : (real:0:3:19,user:0:1:38,sys:0:1:24)[clk:19919] (509 calls X 3.913359e+01) FFT (out of place) only : (real:0:0:27,user:0:0:26,sys:0:0:0)[clk:2712] (2545 calls X 1.065619e+00) FFTW (out of place) Transposition only : (real:0:2:50,user:0:1:11,sys:0:1:24)[clk:17097] (2036 calls X 8.397347e+00) azur::array timer root : (real:0:0:9,user:0:0:9,sys:0:0:0)[clk:989] view = expr : (real:0:0:9,user:0:0:9,sys:0:0:0)[clk:988] (1620 calls X 6.098765e-01) fftw3<dbl> id= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:0] (2 calls X 0.000000e+00) fftw4<dbl> /= int : (real:0:0:1,user:0:0:1,sys:0:0:0)[clk:108] (204 calls X 5.294118e-01) fftw4<dbl> id= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:1] (1 calls X 1.000000e+00) fftw3<dbl> id= fftw3<dbl> : (real:0:0:1,user:0:0:1,sys:0:0:0)[clk:158] (606 calls X 2.607261e-01) fftw4<dbl> *= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:0] (1 calls X 0.000000e+00) fftw4<cdbl> id= fftw4<cdbl> : (real:0:0:1,user:0:0:1,sys:0:0:0)[clk:146] (202 calls X 7.227723e-01) fftw4<dbl> id= fftw4<dbl> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:91] (103 calls X 8.834952e-01) fftw4<cdbl> id= (s2v<basic3<dbl>> * ((fftw4<cdbl> * dbl) + fftw4<cdbl>)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:1] (1 calls X 1.000000e+00) fftw4<cdbl> id= (s2v<basic3<dbl>> * ((fftw4<cdbl> * dbl) + fftw4<cdbl>)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:1] (1 calls X 1.000000e+00) fftw4<cdbl> id= (s2v<basic3<dbl>> * (fftw4<cdbl> * dbl)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:0] (1 calls X 0.000000e+00) fftw4<cdbl> *= s2v<basic3<dbl>> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:0] (1 calls X 0.000000e+00) fftw4<cdbl> += fftw4<cdbl> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:1] (1 calls X 1.000000e+00) fftw4<cdbl> id= ((fftw4<cdbl> * s2v<basic3<dbl>>) + (s2v<basic3<dbl>> * (fftw4<cdbl> * dbl))) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:0] (1 calls X 0.0000 00e+00) fftw4<cdbl> id= ((fftw4<cdbl> * dbl) * s2v<basic3<dbl>>) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:87] (99 calls X 8.787879e-01) fftw4<cdbl> -= fftw4<cdbl> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:78] (99 calls X 7.878788e-01) fftw4<cdbl> -= ((fftw4<cdbl> * dbl) * s2v<basic3<dbl>>) : (real:0:0:1,user:0:0:0,sys:0:0:0)[clk:104] (99 calls X 1.050505e+00) fftw4<cdbl> id= (s2v<basic3<dbl>> swp(*) (fftw4<cdbl> + (fftw4<cdbl> * dbl))) : (real:0:0:2,user:0:0:2,sys:0:0:0)[clk:212] (198 calls X 1.070707e+00) cubby::field timer root : (real:0:2:57,user:0:1:17,sys:0:1:24)[clk:17722] scalar::transpose_blocks_when_received : (real:0:2:50,user:0:1:11,sys:0:1:24)[clk:17095] (3054 calls X 5.597577e+00) scalar::copy_transposed : (real:0:0:4,user:0:0:3,sys:0:0:0)[clk:415] (390912 calls X 1.061620e-03) vector::in_place_curl : (real:0:0:1,user:0:0:1,sys:0:0:0)[clk:121] (202 calls X 5.990099e-01) scalar::local_energy : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:10] (66 calls X 1.515152e-01) vector::vec_prod : (real:0:0:1,user:0:0:1,sys:0:0:0)[clk:191] (202 calls X 9.455445e-01) vector::project : (real:0:0:1,user:0:0:1,sys:0:0:0)[clk:112] (101 calls X 1.108911e+00) scalar::dealias : (real:0:0:1,user:0:0:1,sys:0:0:0)[clk:192] (606 calls X 3.168317e-01)
--------------------------------------------- 128 Processor run, global<512,512,512> for 100 time steps main timer : (real:0:3:50,user:0:2:32,sys:0:1:11)[clk:23065] loop : (real:0:3:43,user:0:2:29,sys:0:1:9)[clk:22364] (100 calls X 2.236400e+02) FFT timer : (real:0:3:9,user:0:1:54,sys:0:1:8)[clk:18957] planification time : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:1] FFT (out of place) and transposition time : (real:0:3:9,user:0:1:54,sys:0:1:8)[clk:18956] (509 calls X 3.724165e+01) FFT (out of place) only : (real:0:0:54,user:0:0:53,sys:0:0:0)[clk:5480] (2545 calls X 2.153242e+00) FFTW (out of place) Transposition only : (real:0:2:12,user:0:0:57,sys:0:1:8)[clk:13256] (2036 calls X 6.510806e+00) azur::array timer root : (real:0:0:21,user:0:0:20,sys:0:0:1)[clk:2143] view = expr : (real:0:0:21,user:0:0:20,sys:0:0:1)[clk:2143] (1620 calls X 1.322839e+00) fftw3<dbl> id= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:0] (2 calls X 0.000000e+00) fftw4<dbl> /= int : (real:0:0:2,user:0:0:2,sys:0:0:0)[clk:218] (204 calls X 1.068627e+00) fftw4<dbl> id= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:1] (1 calls X 1.000000e+00) fftw3<dbl> id= fftw3<dbl> : (real:0:0:3,user:0:0:3,sys:0:0:0)[clk:360] (606 calls X 5.940594e-01) fftw4<dbl> *= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:1] (1 calls X 1.000000e+00) fftw4<cdbl> id= fftw4<cdbl> : (real:0:0:3,user:0:0:3,sys:0:0:0)[clk:343] (202 calls X 1.698020e+00) fftw4<dbl> id= fftw4<dbl> : (real:0:0:1,user:0:0:0,sys:0:0:1)[clk:179] (103 calls X 1.737864e+00) fftw4<cdbl> id= (s2v<basic3<dbl>> * ((fftw4<cdbl> * dbl) + fftw4<cdbl>)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:2] (1 calls X 2.000000e+00) fftw4<cdbl> id= (s2v<basic3<dbl>> * ((fftw4<cdbl> * dbl) + fftw4<cdbl>)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:2] (1 calls X 2.000000e+00) fftw4<cdbl> id= (s2v<basic3<dbl>> * (fftw4<cdbl> * dbl)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:2] (1 calls X 2.000000e+00) fftw4<cdbl> *= s2v<basic3<dbl>> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:1] (1 calls X 1.000000e+00) fftw4<cdbl> += fftw4<cdbl> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:2] (1 calls X 2.000000e+00) fftw4<cdbl> id= ((fftw4<cdbl> * s2v<basic3<dbl>>) + (s2v<basic3<dbl>> * (fftw4<cdbl> * dbl))) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:2] (1 calls X 2.0000 00e+00) fftw4<cdbl> id= ((fftw4<cdbl> * dbl) * s2v<basic3<dbl>>) : (real:0:0:1,user:0:0:1,sys:0:0:0)[clk:188] (99 calls X 1.898990e+00) fftw4<cdbl> -= fftw4<cdbl> : (real:0:0:1,user:0:0:1,sys:0:0:0)[clk:166] (99 calls X 1.676768e+00) fftw4<cdbl> -= ((fftw4<cdbl> * dbl) * s2v<basic3<dbl>>) : (real:0:0:2,user:0:0:2,sys:0:0:0)[clk:219] (99 calls X 2.212121e+00) fftw4<cdbl> id= (s2v<basic3<dbl>> swp(*) (fftw4<cdbl> + (fftw4<cdbl> * dbl))) : (real:0:0:4,user:0:0:4,sys:0:0:0)[clk:456] (198 calls X 2.303030e+00) cubby::field timer root : (real:0:2:25,user:0:1:10,sys:0:1:8)[clk:14501] scalar::transpose_blocks_when_received : (real:0:2:12,user:0:0:57,sys:0:1:8)[clk:13256] (3054 calls X 4.340537e+00) scalar::copy_transposed : (real:0:0:8,user:0:0:8,sys:0:0:0)[clk:840] (195456 calls X 4.297643e-03) vector::in_place_curl : (real:0:0:2,user:0:0:2,sys:0:0:0)[clk:239] (202 calls X 1.183168e+00) scalar::local_energy : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:20] (66 calls X 3.030303e-01) vector::vec_prod : (real:0:0:3,user:0:0:3,sys:0:0:0)[clk:371] (202 calls X 1.836634e+00) vector::project : (real:0:0:2,user:0:0:2,sys:0:0:0)[clk:235] (101 calls X 2.326733e+00) scalar::dealias : (real:0:0:3,user:0:0:3,sys:0:0:0)[clk:380] (606 calls X 6.270627e-01)
64 Processor run, global<512,512,512> for 100 time steps main timer : (real:0:5:20,user:0:4:17,sys:0:1:0)[clk:32058] loop : (real:0:5:12,user:0:4:11,sys:0:0:58)[clk:31217] (100 calls X 3.121700e+02) FFT timer : (real:0:3:59,user:0:3:1,sys:0:0:55)[clk:23956] planification time : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:0] FFT (out of place) and transposition time : (real:0:3:59,user:0:3:1,sys:0:0:55)[clk:23956] (509 calls X 4.706483e+01) FFT (out of place) only : (real:0:1:53,user:0:1:53,sys:0:0:0)[clk:11368] (2545 calls X 4.466798e+00) FFTW (out of place) Transposition only : (real:0:2:1,user:0:1:4,sys:0:0:54)[clk:12192] (2036 calls X 5.988212e+00) azur::array timer root : (real:0:0:42,user:0:0:40,sys:0:0:2)[clk:4241] view = expr : (real:0:0:42,user:0:0:40,sys:0:0:2)[clk:4241] (1620 calls X 2.617901e+00) fftw3<dbl> id= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:2] (2 calls X 1.000000e+00) fftw4<dbl> /= int : (real:0:0:3,user:0:0:4,sys:0:0:0)[clk:392] (204 calls X 1.921569e+00) fftw4<dbl> id= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:2] (1 calls X 2.000000e+00) fftw3<dbl> id= fftw3<dbl> : (real:0:0:7,user:0:0:7,sys:0:0:0)[clk:707] (606 calls X 1.166667e+00) fftw4<dbl> *= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:3] (1 calls X 3.000000e+00) fftw4<cdbl> id= fftw4<cdbl> : (real:0:0:6,user:0:0:6,sys:0:0:0)[clk:685] (202 calls X 3.391089e+00) fftw4<dbl> id= fftw4<dbl> : (real:0:0:3,user:0:0:1,sys:0:0:2)[clk:372] (103 calls X 3.611650e+00) fftw4<cdbl> id= (s2v<basic3<dbl>> * ((fftw4<cdbl> * dbl) + fftw4<cdbl>)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:4] (1 calls X 4.000000e+00) fftw4<cdbl> id= (s2v<basic3<dbl>> * ((fftw4<cdbl> * dbl) + fftw4<cdbl>)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:4] (1 calls X 4.000000e+00) fftw4<cdbl> id= (s2v<basic3<dbl>> * (fftw4<cdbl> * dbl)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:4] (1 calls X 4.000000e+00) fftw4<cdbl> *= s2v<basic3<dbl>> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:2] (1 calls X 2.000000e+00) fftw4<cdbl> += fftw4<cdbl> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:4] (1 calls X 4.000000e+00) fftw4<cdbl> id= ((fftw4<cdbl> * s2v<basic3<dbl>>) + (s2v<basic3<dbl>> * (fftw4<cdbl> * dbl))) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:5] (1 calls X 5.0000 00e+00) fftw4<cdbl> id= ((fftw4<cdbl> * dbl) * s2v<basic3<dbl>>) : (real:0:0:3,user:0:0:3,sys:0:0:0)[clk:379] (99 calls X 3.828283e+00) fftw4<cdbl> -= fftw4<cdbl> : (real:0:0:3,user:0:0:3,sys:0:0:0)[clk:335] (99 calls X 3.383838e+00) fftw4<cdbl> -= ((fftw4<cdbl> * dbl) * s2v<basic3<dbl>>) : (real:0:0:4,user:0:0:4,sys:0:0:0)[clk:435] (99 calls X 4.393939e+00) fftw4<cdbl> id= (s2v<basic3<dbl>> swp(*) (fftw4<cdbl> + (fftw4<cdbl> * dbl))) : (real:0:0:9,user:0:0:8,sys:0:0:0)[clk:904] (198 calls X 4.565657e+00) cubby::field timer root : (real:0:2:26,user:0:1:29,sys:0:0:54)[clk:14664] scalar::transpose_blocks_when_received : (real:0:2:1,user:0:1:4,sys:0:0:54)[clk:12191] (3054 calls X 3.991814e+00) scalar::copy_transposed : (real:0:0:14,user:0:0:14,sys:0:0:0)[clk:1497] (97728 calls X 1.531803e-02) vector::in_place_curl : (real:0:0:4,user:0:0:4,sys:0:0:0)[clk:462] (202 calls X 2.287129e+00) scalar::local_energy : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:36] (66 calls X 5.454546e-01) vector::vec_prod : (real:0:0:7,user:0:0:7,sys:0:0:0)[clk:723] (202 calls X 3.579208e+00) vector::project : (real:0:0:4,user:0:0:5,sys:0:0:0)[clk:498] (101 calls X 4.930693e+00) scalar::dealias : (real:0:0:7,user:0:0:7,sys:0:0:0)[clk:754] (606 calls X 1.244224e+00)
32 Processor run, global<512,512,512> for 100 time steps main timer : (real:0:9:28,user:0:8:6,sys:0:1:21)[clk:56842] loop : (real:0:9:14,user:0:7:54,sys:0:1:19)[clk:55442] (100 calls X 5.544200e+02) FFT timer : (real:0:6:49,user:0:5:36,sys:0:1:11)[clk:40931] planification time : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:0] FFT (out of place) and transposition time : (real:0:6:49,user:0:5:36,sys:0:1:11)[clk:40931] (509 calls X 8.041454e+01) FFT (out of place) only : (real:0:3:51,user:0:3:51,sys:0:0:0)[clk:23192] (2545 calls X 9.112770e+00) FFTW (out of place) Transposition only : (real:0:2:49,user:0:1:37,sys:0:1:11)[clk:16984] (2036 calls X 8.341846e+00) azur::array timer root : (real:0:1:23,user:0:1:18,sys:0:0:4)[clk:8300] view = expr : (real:0:1:23,user:0:1:18,sys:0:0:4)[clk:8300] (1620 calls X 5.123457e+00) fftw3<dbl> id= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:3] (2 calls X 1.500000e+00) fftw4<dbl> /= int : (real:0:0:7,user:0:0:7,sys:0:0:0)[clk:752] (204 calls X 3.686275e+00) fftw4<dbl> id= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:5] (1 calls X 5.000000e+00) fftw3<dbl> id= fftw3<dbl> : (real:0:0:14,user:0:0:14,sys:0:0:0)[clk:1450] (606 calls X 2.392739e+00) fftw4<dbl> *= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:4] (1 calls X 4.000000e+00) fftw4<cdbl> id= fftw4<cdbl> : (real:0:0:13,user:0:0:13,sys:0:0:0)[clk:1303] (202 calls X 6.450495e+00) fftw4<dbl> id= fftw4<dbl> : (real:0:0:7,user:0:0:2,sys:0:0:4)[clk:735] (103 calls X 7.135922e+00) fftw4<cdbl> id= (s2v<basic3<dbl>> * ((fftw4<cdbl> * dbl) + fftw4<cdbl>)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:10] (1 calls X 1.000000e+01) fftw4<cdbl> id= (s2v<basic3<dbl>> * ((fftw4<cdbl> * dbl) + fftw4<cdbl>)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:9] (1 calls X 9.000000e+00) fftw4<cdbl> id= (s2v<basic3<dbl>> * (fftw4<cdbl> * dbl)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:9] (1 calls X 9.000000e+00) fftw4<cdbl> *= s2v<basic3<dbl>> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:6] (1 calls X 6.000000e+00) fftw4<cdbl> += fftw4<cdbl> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:7] (1 calls X 7.000000e+00) fftw4<cdbl> id= ((fftw4<cdbl> * s2v<basic3<dbl>>) + (s2v<basic3<dbl>> * (fftw4<cdbl> * dbl))) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:11] (1 calls X 1.1000 00e+01) fftw4<cdbl> id= ((fftw4<cdbl> * dbl) * s2v<basic3<dbl>>) : (real:0:0:7,user:0:0:7,sys:0:0:0)[clk:749] (99 calls X 7.565657e+00) fftw4<cdbl> -= fftw4<cdbl> : (real:0:0:6,user:0:0:6,sys:0:0:0)[clk:663] (99 calls X 6.696970e+00) fftw4<cdbl> -= ((fftw4<cdbl> * dbl) * s2v<basic3<dbl>>) : (real:0:0:8,user:0:0:8,sys:0:0:0)[clk:841] (99 calls X 8.494949e+00) fftw4<cdbl> id= (s2v<basic3<dbl>> swp(*) (fftw4<cdbl> + (fftw4<cdbl> * dbl))) : (real:0:0:17,user:0:0:17,sys:0:0:0)[clk:1743] (198 calls X 8.803030e+00) cubby::field timer root : (real:0:3:38,user:0:2:26,sys:0:1:11)[clk:21854] scalar::transpose_blocks_when_received : (real:0:2:49,user:0:1:37,sys:0:1:11)[clk:16983] (3054 calls X 5.560904e+00) scalar::copy_transposed : (real:0:0:29,user:0:0:29,sys:0:0:0)[clk:2909] (48864 calls X 5.953258e-02) vector::in_place_curl : (real:0:0:9,user:0:0:9,sys:0:0:0)[clk:904] (202 calls X 4.475247e+00) scalar::local_energy : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:71] (66 calls X 1.075758e+00) vector::vec_prod : (real:0:0:13,user:0:0:13,sys:0:0:0)[clk:1387] (202 calls X 6.866337e+00) vector::project : (real:0:0:10,user:0:0:10,sys:0:0:0)[clk:1015] (101 calls X 1.004951e+01) scalar::dealias : (real:0:0:14,user:0:0:14,sys:0:0:0)[clk:1494] (606 calls X 2.465347e+00)
16 Processor run, global<512,512,512> for 100 time steps main timer : (real:0:16:32,user:0:15:10,sys:0:1:21)[clk:99275] loop : (real:0:16:6,user:0:14:48,sys:0:1:18)[clk:96658] (100 calls X 9.665800e+02) FFT timer : (real:0:11:21,user:0:10:18,sys:0:1:2)[clk:68169] planification time : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:0] FFT (out of place) and transposition time : (real:0:11:21,user:0:10:18,sys:0:1:2)[clk:68169] (509 calls X 1.339273e+02) FFT (out of place) only : (real:0:7:32,user:0:7:31,sys:0:0:0)[clk:45267] (2545 calls X 1.778664e+01) FFTW (out of place) Transposition only : (real:0:3:33,user:0:2:30,sys:0:1:2)[clk:21351] (2036 calls X 1.048674e+01) azur::array timer root : (real:0:2:42,user:0:2:33,sys:0:0:8)[clk:16223] view = expr : (real:0:2:42,user:0:2:33,sys:0:0:8)[clk:16223] (1620 calls X 1.001420e+01) fftw3<dbl> id= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:6] (2 calls X 3.000000e+00) fftw4<dbl> /= int : (real:0:0:15,user:0:0:15,sys:0:0:0)[clk:1550] (204 calls X 7.598039e+00) fftw4<dbl> id= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:10] (1 calls X 1.000000e+01) fftw3<dbl> id= fftw3<dbl> : (real:0:0:27,user:0:0:27,sys:0:0:0)[clk:2770] (606 calls X 4.570957e+00) fftw4<dbl> *= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:8] (1 calls X 8.000000e+00) fftw4<cdbl> id= fftw4<cdbl> : (real:0:0:25,user:0:0:25,sys:0:0:0)[clk:2535] (202 calls X 1.254951e+01) fftw4<dbl> id= fftw4<dbl> : (real:0:0:14,user:0:0:5,sys:0:0:8)[clk:1419] (103 calls X 1.377670e+01) fftw4<cdbl> id= (s2v<basic3<dbl>> * ((fftw4<cdbl> * dbl) + fftw4<cdbl>)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:16] (1 calls X 1.600000e+01) fftw4<cdbl> id= (s2v<basic3<dbl>> * ((fftw4<cdbl> * dbl) + fftw4<cdbl>)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:15] (1 calls X 1.500000e+01) fftw4<cdbl> id= (s2v<basic3<dbl>> * (fftw4<cdbl> * dbl)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:14] (1 calls X 1.400000e+01) fftw4<cdbl> *= s2v<basic3<dbl>> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:11] (1 calls X 1.100000e+01) fftw4<cdbl> += fftw4<cdbl> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:15] (1 calls X 1.500000e+01) fftw4<cdbl> id= ((fftw4<cdbl> * s2v<basic3<dbl>>) + (s2v<basic3<dbl>> * (fftw4<cdbl> * dbl))) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:20] (1 calls X 2.0000 00e+01) fftw4<cdbl> id= ((fftw4<cdbl> * dbl) * s2v<basic3<dbl>>) : (real:0:0:14,user:0:0:14,sys:0:0:0)[clk:1491] (99 calls X 1.506061e+01) fftw4<cdbl> -= fftw4<cdbl> : (real:0:0:13,user:0:0:13,sys:0:0:0)[clk:1322] (99 calls X 1.335354e+01) fftw4<cdbl> -= ((fftw4<cdbl> * dbl) * s2v<basic3<dbl>>) : (real:0:0:16,user:0:0:16,sys:0:0:0)[clk:1631] (99 calls X 1.647475e+01) fftw4<cdbl> id= (s2v<basic3<dbl>> swp(*) (fftw4<cdbl> + (fftw4<cdbl> * dbl))) : (real:0:0:33,user:0:0:33,sys:0:0:0)[clk:3389] (198 calls X 1.711616e+01) cubby::field timer root : (real:0:5:9,user:0:4:6,sys:0:1:2)[clk:30909] scalar::transpose_blocks_when_received : (real:0:3:33,user:0:2:30,sys:0:1:2)[clk:21351] (3054 calls X 6.991159e+00) scalar::copy_transposed : (real:0:0:56,user:0:0:56,sys:0:0:0)[clk:5611] (24432 calls X 2.296578e-01) vector::in_place_curl : (real:0:0:17,user:0:0:17,sys:0:0:0)[clk:1776] (202 calls X 8.792079e+00) scalar::local_energy : (real:0:0:1,user:0:0:1,sys:0:0:0)[clk:137] (66 calls X 2.075758e+00) vector::vec_prod : (real:0:0:26,user:0:0:26,sys:0:0:0)[clk:2618] (202 calls X 1.296040e+01) vector::project : (real:0:0:20,user:0:0:20,sys:0:0:0)[clk:2052] (101 calls X 2.031683e+01) scalar::dealias : (real:0:0:29,user:0:0:29,sys:0:0:0)[clk:2975] (606 calls X 4.909241e+00)
8 Processor run, global<512,512,512> for 100 time steps main timer : (real:0:31:36,user:0:30:13,sys:0:1:22)[clk:189657] loop : (real:0:30:50,user:0:29:30,sys:0:1:19)[clk:185022] (100 calls X 1.850220e+03) FFT timer : (real:0:21:49,user:0:21:5,sys:0:0:43)[clk:130935] planification time : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:0] FFT (out of place) and transposition time : (real:0:21:49,user:0:21:5,sys:0:0:43)[clk:130935] (509 calls X 2.572397e+02) FFT (out of place) only : (real:0:15:27,user:0:15:26,sys:0:0:0)[clk:92727] (2545 calls X 3.643497e+01) FFTW (out of place) Transposition only : (real:0:5:55,user:0:5:12,sys:0:0:43)[clk:35564] (2036 calls X 1.746758e+01) azur::array timer root : (real:0:4:54,user:0:4:36,sys:0:0:17)[clk:29461] view = expr : (real:0:4:54,user:0:4:36,sys:0:0:17)[clk:29461] (1620 calls X 1.818580e+01) fftw3<dbl> id= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:12] (2 calls X 6.000000e+00) fftw4<dbl> /= int : (real:0:0:26,user:0:0:26,sys:0:0:0)[clk:2642] (204 calls X 1.295098e+01) fftw4<dbl> id= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:19] (1 calls X 1.900000e+01) fftw3<dbl> id= fftw3<dbl> : (real:0:0:47,user:0:0:47,sys:0:0:0)[clk:4706] (606 calls X 7.765676e+00) fftw4<dbl> *= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:13] (1 calls X 1.300000e+01) fftw4<cdbl> id= fftw4<cdbl> : (real:0:0:47,user:0:0:47,sys:0:0:0)[clk:4738] (202 calls X 2.345545e+01) fftw4<dbl> id= fftw4<dbl> : (real:0:0:27,user:0:0:10,sys:0:0:17)[clk:2779] (103 calls X 2.698058e+01) fftw4<cdbl> id= (s2v<basic3<dbl>> * ((fftw4<cdbl> * dbl) + fftw4<cdbl>)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:30] (1 calls X 3.000000e+01) fftw4<cdbl> id= (s2v<basic3<dbl>> * ((fftw4<cdbl> * dbl) + fftw4<cdbl>)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:30] (1 calls X 3.000000e+01) fftw4<cdbl> id= (s2v<basic3<dbl>> * (fftw4<cdbl> * dbl)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:28] (1 calls X 2.800000e+01) fftw4<cdbl> *= s2v<basic3<dbl>> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:20] (1 calls X 2.000000e+01) fftw4<cdbl> += fftw4<cdbl> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:26] (1 calls X 2.600000e+01) fftw4<cdbl> id= ((fftw4<cdbl> * s2v<basic3<dbl>>) + (s2v<basic3<dbl>> * (fftw4<cdbl> * dbl))) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:35] (1 calls X 3.5000 00e+01) fftw4<cdbl> id= ((fftw4<cdbl> * dbl) * s2v<basic3<dbl>>) : (real:0:0:28,user:0:0:28,sys:0:0:0)[clk:2859] (99 calls X 2.887879e+01) fftw4<cdbl> -= fftw4<cdbl> : (real:0:0:24,user:0:0:24,sys:0:0:0)[clk:2479] (99 calls X 2.504040e+01) fftw4<cdbl> -= ((fftw4<cdbl> * dbl) * s2v<basic3<dbl>>) : (real:0:0:29,user:0:0:29,sys:0:0:0)[clk:2942] (99 calls X 2.971717e+01) fftw4<cdbl> id= (s2v<basic3<dbl>> swp(*) (fftw4<cdbl> + (fftw4<cdbl> * dbl))) : (real:0:1:1,user:0:1:0,sys:0:0:0)[clk:6102] (198 calls X 3.081818e+01) cubby::field timer root : (real:0:9:3,user:0:8:20,sys:0:0:43)[clk:54354] scalar::transpose_blocks_when_received : (real:0:5:55,user:0:5:12,sys:0:0:43)[clk:35562] (3054 calls X 1.164440e+01) scalar::copy_transposed : (real:0:1:49,user:0:1:48,sys:0:0:0)[clk:10920] (12216 calls X 8.939096e-01) vector::in_place_curl : (real:0:0:34,user:0:0:34,sys:0:0:0)[clk:3436] (202 calls X 1.700990e+01) scalar::local_energy : (real:0:0:2,user:0:0:2,sys:0:0:0)[clk:274] (66 calls X 4.151515e+00) vector::vec_prod : (real:0:0:50,user:0:0:50,sys:0:0:0)[clk:5074] (202 calls X 2.511881e+01) vector::project : (real:0:0:40,user:0:0:40,sys:0:0:0)[clk:4096] (101 calls X 4.055445e+01) scalar::dealias : (real:0:0:59,user:0:0:59,sys:0:0:0)[clk:5910] (606 calls X 9.752475e+00)
Last modified 9 years ago
Last modified on Sep 29, 2011 2:47:39 PM