2563
8 -> 423 s -> 3384 s_mono
16 -> 241 s -> 3856 s_mono
32 -> 138 s -> 4416 s_mono
64 -> 81 s -> 5184 s_mono
5123
32 -> 1087 s -> 34784 s_mono
64 -> 549 s -> 35136 s_mono
128 -> 383 s -> 49024 s_mono
256 -> 269 s -> 68864 s_mono
512 -> 225 s -> 115200 s_mono
10243
128 -> 2946 s -> 377088 s_mono
256 -> 1596 s -> 408576 s_mono
512 -> 1329 s -> 680448 s_mono
1024 -> 899 s -> 920576 s_mono
8 Processor run, global<256,256,256> for 100 time steps main timer : (real:0:12:24,user:0:12:23,sys:0:0:0)[clk:74491] loop : (real:0:7:3,user:0:7:2,sys:0:0:0)[clk:42318] (100 calls X 4.231800e+02) FFT timer : (real:0:9:5,user:0:9:4,sys:0:0:0)[clk:54553] planification time : (real:0:5:8,user:0:5:8,sys:0:0:0)[clk:30856] FFT and transposition time : (real:0:3:56,user:0:3:56,sys:0:0:0)[clk:23696] (509 calls X 4.655403e+01) FFT only : (real:0:2:15,user:0:2:15,sys:0:0:0)[clk:13561] (2545 calls X 5.328487e+00) Transposition only : (real:0:1:21,user:0:1:21,sys:0:0:0)[clk:8160] (1832 calls X 4.454148e+00) azur::array timer root : (real:0:2:16,user:0:2:16,sys:0:0:0)[clk:13671] view = expr : (real:0:2:16,user:0:2:16,sys:0:0:0)[clk:13668] (2538 calls X 5.385343e+00) fftw3<dbl> id= dbl : (real:0:0:8,user:0:0:8,sys:0:0:0)[clk:891] (510 calls X 1.747059e+00) fftw3<dbl> /= dbl : (real:0:0:12,user:0:0:12,sys:0:0:0)[clk:1251] (612 calls X 2.044118e+00) fftw4<dbl> id= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:5] (1 calls X 5.000000e+00) fftw3<dbl> id= fftw3<dbl> : (real:0:0:20,user:0:0:20,sys:0:0:0)[clk:2062] (606 calls X 3.402640e+00) fftw3<dbl> *= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:6] (3 calls X 2.000000e+00) fftw4<cdbl> id= fftw4<cdbl> : (real:0:0:21,user:0:0:21,sys:0:0:0)[clk:2120] (202 calls X 1.049505e+01) fftw4<dbl> id= fftw4<dbl> : (real:0:0:10,user:0:0:9,sys:0:0:0)[clk:1000] (103 calls X 9.708738e+00) fftw4<cdbl> id= (s2v<basic3<dbl>> * ((fftw4<cdbl> * dbl) + fftw4<cdbl>)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:11] (1 calls X 1.100000e+01) fftw4<cdbl> id= (s2v<basic3<dbl>> * ((fftw4<cdbl> * dbl) + fftw4<cdbl>)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:13] (1 calls X 1.300000e+01) fftw4<cdbl> id= (s2v<basic3<dbl>> * (fftw4<cdbl> * dbl)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:11] (1 calls X 1.100000e+01) fftw4<cdbl> *= s2v<basic3<dbl>> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:9] (1 calls X 9.000000e+00) fftw4<cdbl> += fftw4<cdbl> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:11] (1 calls X 1.100000e+01) fftw4<cdbl> id= ((fftw4<cdbl> * s2v<basic3<dbl>>) + (s2v<basic3<dbl>> * (fftw4<cdbl> * dbl))) : (real:0:0:0,user:0:0:0,sys:0:0:0)[ clk:16] (1 calls X 1.600000e+01) fftw4<cdbl> id= ((fftw4<cdbl> * dbl) * s2v<basic3<dbl>>) : (real:0:0:12,user:0:0:12,sys:0:0:0)[clk:1270] (99 calls X 1.2828 28e+01) fftw4<cdbl> -= fftw4<cdbl> : (real:0:0:10,user:0:0:10,sys:0:0:0)[clk:1073] (99 calls X 1.083838e+01) fftw4<cdbl> -= ((fftw4<cdbl> * dbl) * s2v<basic3<dbl>>) : (real:0:0:13,user:0:0:13,sys:0:0:0)[clk:1304] (99 calls X 1.317172e+01) fftw4<cdbl> id= (s2v<basic3<dbl>> swp(*) (fftw4<cdbl> + (fftw4<cdbl> * dbl))) : (real:0:0:26,user:0:0:26,sys:0:0:0)[clk:2611] (1 98 calls X 1.318687e+01) cubby::field timer root : (real:0:2:19,user:0:2:19,sys:0:0:0)[clk:13975] scalar::transpose_blocks_when_received : (real:0:1:28,user:0:1:28,sys:0:0:0)[clk:8882] (3054 calls X 2.908317e+00) scalar::copy_transposed : (real:0:0:48,user:0:0:48,sys:0:0:0)[clk:4841] (12216 calls X 3.962836e-01) vector::in_place_curl : (real:0:0:11,user:0:0:11,sys:0:0:0)[clk:1140] (202 calls X 5.643564e+00) scalar::local_energy : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:80] (66 calls X 1.212121e+00) vector::vec_prod : (real:0:0:18,user:0:0:18,sys:0:0:0)[clk:1803] (202 calls X 8.925742e+00) vector::project : (real:0:0:7,user:0:0:7,sys:0:0:0)[clk:790] (101 calls X 7.821782e+00) scalar::dealias : (real:0:0:12,user:0:0:12,sys:0:0:0)[clk:1280] (606 calls X 2.112211e+00)
--------------------------------------------- 16 Processor run, global<256,256,256> for 100 time steps main timer : (real:0:6:24,user:0:6:22,sys:0:0:1)[clk:38421] loop : (real:0:4:1,user:0:3:59,sys:0:0:0)[clk:24122] (100 calls X 2.412200e+02) FFT timer : (real:0:4:48,user:0:4:46,sys:0:0:0)[clk:28805] planification time : (real:0:2:7,user:0:2:7,sys:0:0:0)[clk:12790] FFT and transposition time : (real:0:2:40,user:0:2:38,sys:0:0:0)[clk:16015] (509 calls X 3.146365e+01) FFT only : (real:0:0:57,user:0:0:56,sys:0:0:0)[clk:5778] (2545 calls X 2.270334e+00) Transposition only : (real:0:1:27,user:0:1:27,sys:0:0:0)[clk:8718] (1832 calls X 4.758734e+00) azur::array timer root : (real:0:1:4,user:0:1:4,sys:0:0:0)[clk:6458] view = expr : (real:0:1:4,user:0:1:4,sys:0:0:0)[clk:6458] (2538 calls X 2.544523e+00) fftw3<dbl> id= dbl : (real:0:0:4,user:0:0:4,sys:0:0:0)[clk:466] (510 calls X 9.137255e-01) fftw3<dbl> /= dbl : (real:0:0:5,user:0:0:5,sys:0:0:0)[clk:565] (612 calls X 9.232026e-01) fftw4<dbl> id= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:3] (1 calls X 3.000000e+00) fftw3<dbl> id= fftw3<dbl> : (real:0:0:10,user:0:0:9,sys:0:0:0)[clk:1002] (606 calls X 1.653465e+00) fftw3<dbl> *= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:3] (3 calls X 1.000000e+00) fftw4<cdbl> id= fftw4<cdbl> : (real:0:0:10,user:0:0:9,sys:0:0:0)[clk:1000] (202 calls X 4.950495e+00) fftw4<dbl> id= fftw4<dbl> : (real:0:0:4,user:0:0:4,sys:0:0:0)[clk:498] (103 calls X 4.834951e+00) fftw4<cdbl> id= (s2v<basic3<dbl>> * ((fftw4<cdbl> * dbl) + fftw4<cdbl>)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:3] (1 calls X 3.000000e+00) fftw4<cdbl> id= (s2v<basic3<dbl>> * ((fftw4<cdbl> * dbl) + fftw4<cdbl>)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:7] (1 calls X 7.000000e+00) fftw4<cdbl> id= (s2v<basic3<dbl>> * (fftw4<cdbl> * dbl)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:6] (1 calls X 6.000000e+00) fftw4<cdbl> *= s2v<basic3<dbl>> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:4] (1 calls X 4.000000e+00) fftw4<cdbl> += fftw4<cdbl> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:6] (1 calls X 6.000000e+00) fftw4<cdbl> id= ((fftw4<cdbl> * s2v<basic3<dbl>>) + (s2v<basic3<dbl>> * (fftw4<cdbl> * dbl))) : (real:0:0:0,user:0:0:0,sys:0:0:0)[ clk:8] (1 calls X 8.000000e+00) fftw4<cdbl> id= ((fftw4<cdbl> * dbl) * s2v<basic3<dbl>>) : (real:0:0:5,user:0:0:5,sys:0:0:0)[clk:593] (99 calls X 5.989899e+00) fftw4<cdbl> -= fftw4<cdbl> : (real:0:0:4,user:0:0:4,sys:0:0:0)[clk:495] (99 calls X 5.000000e+00) fftw4<cdbl> -= ((fftw4<cdbl> * dbl) * s2v<basic3<dbl>>) : (real:0:0:5,user:0:0:5,sys:0:0:0)[clk:588] (99 calls X 5.939394e+00) fftw4<cdbl> id= (s2v<basic3<dbl>> swp(*) (fftw4<cdbl> + (fftw4<cdbl> * dbl))) : (real:0:0:12,user:0:0:12,sys:0:0:0)[clk:1203] (1 98 calls X 6.075758e+00) cubby::field timer root : (real:0:2:2,user:0:2:2,sys:0:0:0)[clk:12286] scalar::transpose_blocks_when_received : (real:0:1:36,user:0:1:36,sys:0:0:0)[clk:9672] (3054 calls X 3.166994e+00) scalar::copy_transposed : (real:0:0:22,user:0:0:22,sys:0:0:0)[clk:2223] (24432 calls X 9.098723e-02) vector::in_place_curl : (real:0:0:6,user:0:0:6,sys:0:0:0)[clk:607] (202 calls X 3.004951e+00) scalar::local_energy : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:47] (66 calls X 7.121212e-01) vector::vec_prod : (real:0:0:10,user:0:0:10,sys:0:0:0)[clk:1029] (202 calls X 5.094059e+00) vector::project : (real:0:0:3,user:0:0:3,sys:0:0:0)[clk:353] (101 calls X 3.495049e+00) scalar::dealias : (real:0:0:5,user:0:0:5,sys:0:0:0)[clk:578] (606 calls X 9.537954e-01)
--------------------------------------------- 32 Processor run, global<256,256,256> for 100 time steps main timer : (real:0:3:17,user:0:3:12,sys:0:0:1)[clk:19741] loop : (real:0:2:18,user:0:2:13,sys:0:0:1)[clk:13804] (100 calls X 1.380400e+02) FFT timer : (real:0:2:28,user:0:2:23,sys:0:0:1)[clk:14821] planification time : (real:0:0:50,user:0:0:50,sys:0:0:0)[clk:5009] FFT and transposition time : (real:0:1:38,user:0:1:33,sys:0:0:1)[clk:9812] (509 calls X 1.927701e+01) FFT only : (real:0:0:29,user:0:0:27,sys:0:0:0)[clk:2905] (2545 calls X 1.141454e+00) Transposition only : (real:0:0:58,user:0:0:56,sys:0:0:1)[clk:5891] (1832 calls X 3.215611e+00) azur::array timer root : (real:0:0:33,user:0:0:33,sys:0:0:0)[clk:3363] view = expr : (real:0:0:33,user:0:0:33,sys:0:0:0)[clk:3363] (2538 calls X 1.325059e+00) fftw3<dbl> id= dbl : (real:0:0:2,user:0:0:2,sys:0:0:0)[clk:240] (510 calls X 4.705882e-01) fftw3<dbl> /= dbl : (real:0:0:3,user:0:0:2,sys:0:0:0)[clk:302] (612 calls X 4.934641e-01) fftw4<dbl> id= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:2] (1 calls X 2.000000e+00) fftw3<dbl> id= fftw3<dbl> : (real:0:0:5,user:0:0:5,sys:0:0:0)[clk:517] (606 calls X 8.531353e-01) fftw3<dbl> *= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:2] (3 calls X 6.666667e-01) fftw4<cdbl> id= fftw4<cdbl> : (real:0:0:5,user:0:0:5,sys:0:0:0)[clk:528] (202 calls X 2.613861e+00) fftw4<dbl> id= fftw4<dbl> : (real:0:0:2,user:0:0:2,sys:0:0:0)[clk:255] (103 calls X 2.475728e+00) fftw4<cdbl> id= (s2v<basic3<dbl>> * ((fftw4<cdbl> * dbl) + fftw4<cdbl>)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:4] (1 calls X 4.000000e+00) fftw4<cdbl> id= (s2v<basic3<dbl>> * ((fftw4<cdbl> * dbl) + fftw4<cdbl>)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:3] (1 calls X 3.000000e+00) fftw4<cdbl> id= (s2v<basic3<dbl>> * (fftw4<cdbl> * dbl)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:3] (1 calls X 3.000000e+00) fftw4<cdbl> *= s2v<basic3<dbl>> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:2] (1 calls X 2.000000e+00) fftw4<cdbl> += fftw4<cdbl> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:3] (1 calls X 3.000000e+00) fftw4<cdbl> id= ((fftw4<cdbl> * s2v<basic3<dbl>>) + (s2v<basic3<dbl>> * (fftw4<cdbl> * dbl))) : (real:0:0:0,user:0:0:0,sys:0:0:0)[ clk:4] (1 calls X 4.000000e+00) fftw4<cdbl> id= ((fftw4<cdbl> * dbl) * s2v<basic3<dbl>>) : (real:0:0:3,user:0:0:3,sys:0:0:0)[clk:305] (99 calls X 3.080808e+00) fftw4<cdbl> -= fftw4<cdbl> : (real:0:0:2,user:0:0:2,sys:0:0:0)[clk:265] (99 calls X 2.676768e+00) fftw4<cdbl> -= ((fftw4<cdbl> * dbl) * s2v<basic3<dbl>>) : (real:0:0:3,user:0:0:3,sys:0:0:0)[clk:303] (99 calls X 3.060606e+00) fftw4<cdbl> id= (s2v<basic3<dbl>> swp(*) (fftw4<cdbl> + (fftw4<cdbl> * dbl))) : (real:0:0:6,user:0:0:6,sys:0:0:0)[clk:622] (1 98 calls X 3.141414e+00) cubby::field timer root : (real:0:1:18,user:0:1:16,sys:0:0:1)[clk:7854] scalar::transpose_blocks_when_received : (real:0:1:6,user:0:1:3,sys:0:0:1)[clk:6603] (3054 calls X 2.162082e+00) scalar::copy_transposed : (real:0:0:8,user:0:0:6,sys:0:0:0)[clk:844] (48864 calls X 1.727243e-02) vector::in_place_curl : (real:0:0:3,user:0:0:3,sys:0:0:0)[clk:300] (202 calls X 1.485149e+00) scalar::local_energy : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:22] (66 calls X 3.333333e-01) vector::vec_prod : (real:0:0:4,user:0:0:4,sys:0:0:0)[clk:480] (202 calls X 2.376238e+00) vector::project : (real:0:0:1,user:0:0:1,sys:0:0:0)[clk:177] (101 calls X 1.752475e+00) scalar::dealias : (real:0:0:2,user:0:0:2,sys:0:0:0)[clk:272] (606 calls X 4.488449e-01)
--------------------------------------------- 64 Processor run, global<256,256,256> for 100 time steps main timer : (real:0:1:49,user:0:1:43,sys:0:0:1)[clk:10996] loop : (real:0:1:21,user:0:1:15,sys:0:0:1)[clk:8169] (100 calls X 8.169000e+01) FFT timer : (real:0:1:25,user:0:1:18,sys:0:0:1)[clk:8504] planification time : (real:0:0:26,user:0:0:26,sys:0:0:0)[clk:2613] FFT and transposition time : (real:0:0:58,user:0:0:52,sys:0:0:1)[clk:5891] (509 calls X 1.157367e+01) FFT only : (real:0:0:11,user:0:0:8,sys:0:0:0)[clk:1100] (2545 calls X 4.322200e-01) Transposition only : (real:0:0:41,user:0:0:37,sys:0:0:1)[clk:4144] (1832 calls X 2.262009e+00) azur::array timer root : (real:0:0:16,user:0:0:16,sys:0:0:0)[clk:1663] view = expr : (real:0:0:16,user:0:0:16,sys:0:0:0)[clk:1662] (2538 calls X 6.548463e-01) fftw3<dbl> id= dbl : (real:0:0:1,user:0:0:1,sys:0:0:0)[clk:100] (510 calls X 1.960784e-01) fftw3<dbl> /= dbl : (real:0:0:1,user:0:0:1,sys:0:0:0)[clk:119] (612 calls X 1.944444e-01) fftw4<dbl> id= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:1] (1 calls X 1.000000e+00) fftw3<dbl> id= fftw3<dbl> : (real:0:0:2,user:0:0:2,sys:0:0:0)[clk:264] (606 calls X 4.356436e-01) fftw3<dbl> *= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:1] (3 calls X 3.333333e-01) fftw4<cdbl> id= fftw4<cdbl> : (real:0:0:2,user:0:0:2,sys:0:0:0)[clk:258] (202 calls X 1.277228e+00) fftw4<dbl> id= fftw4<dbl> : (real:0:0:1,user:0:0:1,sys:0:0:0)[clk:138] (103 calls X 1.339806e+00) fftw4<cdbl> id= (s2v<basic3<dbl>> * ((fftw4<cdbl> * dbl) + fftw4<cdbl>)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:1] (1 calls X 1.000000e+00) fftw4<cdbl> id= (s2v<basic3<dbl>> * ((fftw4<cdbl> * dbl) + fftw4<cdbl>)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:2] (1 calls X 2.000000e+00) fftw4<cdbl> id= (s2v<basic3<dbl>> * (fftw4<cdbl> * dbl)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:1] (1 calls X 1.000000e+00) fftw4<cdbl> *= s2v<basic3<dbl>> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:1] (1 calls X 1.000000e+00) fftw4<cdbl> += fftw4<cdbl> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:2] (1 calls X 2.000000e+00) fftw4<cdbl> id= ((fftw4<cdbl> * s2v<basic3<dbl>>) + (s2v<basic3<dbl>> * (fftw4<cdbl> * dbl))) : (real:0:0:0,user:0:0:0,sys:0:0:0)[ clk:2] (1 calls X 2.000000e+00) fftw4<cdbl> id= ((fftw4<cdbl> * dbl) * s2v<basic3<dbl>>) : (real:0:0:1,user:0:0:1,sys:0:0:0)[clk:162] (99 calls X 1.636364e+00) fftw4<cdbl> -= fftw4<cdbl> : (real:0:0:1,user:0:0:1,sys:0:0:0)[clk:133] (99 calls X 1.343434e+00) fftw4<cdbl> -= ((fftw4<cdbl> * dbl) * s2v<basic3<dbl>>) : (real:0:0:1,user:0:0:1,sys:0:0:0)[clk:163] (99 calls X 1.646465e+00) fftw4<cdbl> id= (s2v<basic3<dbl>> swp(*) (fftw4<cdbl> + (fftw4<cdbl> * dbl))) : (real:0:0:3,user:0:0:3,sys:0:0:0)[clk:313] (1 98 calls X 1.580808e+00) cubby::field timer root : (real:0:0:53,user:0:0:49,sys:0:0:1)[clk:5309] scalar::transpose_blocks_when_received : (real:0:0:46,user:0:0:42,sys:0:0:1)[clk:4669] (3054 calls X 1.528815e+00) scalar::copy_transposed : (real:0:0:3,user:0:0:2,sys:0:0:0)[clk:301] (97728 calls X 3.079977e-03) vector::in_place_curl : (real:0:0:1,user:0:0:1,sys:0:0:0)[clk:145] (202 calls X 7.178218e-01) scalar::local_energy : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:9] (66 calls X 1.363636e-01) vector::vec_prod : (real:0:0:2,user:0:0:2,sys:0:0:0)[clk:253] (202 calls X 1.252475e+00) vector::project : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:90] (101 calls X 8.910891e-01) scalar::dealias : (real:0:0:1,user:0:0:1,sys:0:0:0)[clk:143] (606 calls X 2.359736e-01)
--------------------------------------------- 32 Processor run, global<512,512,512> for 100 time steps main timer : (real:0:29:55,user:0:29:45,sys:0:0:5)[clk:179571] loop : (real:0:18:7,user:0:18:0,sys:0:0:4)[clk:108726] (100 calls X 1.087260e+03) FFT timer : (real:0:23:26,user:0:23:17,sys:0:0:4)[clk:140619] planification time : (real:0:11:17,user:0:11:16,sys:0:0:0)[clk:67776] FFT and transposition time : (real:0:12:8,user:0:12:1,sys:0:0:4)[clk:72843] (509 calls X 1.431100e+02) FFT only : (real:0:3:49,user:0:3:48,sys:0:0:0)[clk:22951] (2545 calls X 9.018075e+00) Transposition only : (real:0:7:3,user:0:6:57,sys:0:0:3)[clk:42374] (1832 calls X 2.312991e+01) azur::array timer root : (real:0:4:27,user:0:4:27,sys:0:0:0)[clk:26762] view = expr : (real:0:4:27,user:0:4:27,sys:0:0:0)[clk:26762] (2538 calls X 1.054452e+01) fftw3<dbl> id= dbl : (real:0:0:19,user:0:0:19,sys:0:0:0)[clk:1952] (510 calls X 3.827451e+00) fftw3<dbl> /= dbl : (real:0:0:23,user:0:0:23,sys:0:0:0)[clk:2399] (612 calls X 3.919935e+00) fftw4<dbl> id= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:12] (1 calls X 1.200000e+01) fftw3<dbl> id= fftw3<dbl> : (real:0:0:41,user:0:0:41,sys:0:0:0)[clk:4119] (606 calls X 6.797029e+00) fftw3<dbl> *= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:13] (3 calls X 4.333333e+00) fftw4<cdbl> id= fftw4<cdbl> : (real:0:0:41,user:0:0:41,sys:0:0:0)[clk:4133] (202 calls X 2.046040e+01) fftw4<dbl> id= fftw4<dbl> : (real:0:0:20,user:0:0:20,sys:0:0:0)[clk:2052] (103 calls X 1.992233e+01) fftw4<cdbl> id= (s2v<basic3<dbl>> * ((fftw4<cdbl> * dbl) + fftw4<cdbl>)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:26] (1 calls X 2.600000e+01) fftw4<cdbl> id= (s2v<basic3<dbl>> * ((fftw4<cdbl> * dbl) + fftw4<cdbl>)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:27] (1 calls X 2.700000e+01) fftw4<cdbl> id= (s2v<basic3<dbl>> * (fftw4<cdbl> * dbl)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:26] (1 calls X 2.600000e+01) fftw4<cdbl> *= s2v<basic3<dbl>> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:17] (1 calls X 1.700000e+01) fftw4<cdbl> += fftw4<cdbl> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:21] (1 calls X 2.100000e+01) fftw4<cdbl> id= ((fftw4<cdbl> * s2v<basic3<dbl>>) + (s2v<basic3<dbl>> * (fftw4<cdbl> * dbl))) : (real:0:0:0,user:0:0:0,sys:0:0:0)[ clk:32] (1 calls X 3.200000e+01) fftw4<cdbl> id= ((fftw4<cdbl> * dbl) * s2v<basic3<dbl>>) : (real:0:0:24,user:0:0:24,sys:0:0:0)[clk:2457] (99 calls X 2.4818 18e+01) fftw4<cdbl> -= fftw4<cdbl> : (real:0:0:20,user:0:0:20,sys:0:0:0)[clk:2038] (99 calls X 2.058586e+01) fftw4<cdbl> -= ((fftw4<cdbl> * dbl) * s2v<basic3<dbl>>) : (real:0:0:24,user:0:0:24,sys:0:0:0)[clk:2482] (99 calls X 2.507071e+01) fftw4<cdbl> id= (s2v<basic3<dbl>> swp(*) (fftw4<cdbl> + (fftw4<cdbl> * dbl))) : (real:0:0:49,user:0:0:49,sys:0:0:0)[clk:4950] (1 98 calls X 2.500000e+01) cubby::field timer root : (real:0:9:37,user:0:9:31,sys:0:0:4)[clk:57759] scalar::transpose_blocks_when_received : (real:0:7:54,user:0:7:48,sys:0:0:4)[clk:47487] (3054 calls X 1.554912e+01) scalar::copy_transposed : (real:0:1:23,user:0:1:20,sys:0:0:0)[clk:8322] (48864 calls X 1.703094e-01) vector::in_place_curl : (real:0:0:24,user:0:0:24,sys:0:0:0)[clk:2471] (202 calls X 1.223267e+01) scalar::local_energy : (real:0:0:2,user:0:0:2,sys:0:0:0)[clk:213] (66 calls X 3.227273e+00) vector::vec_prod : (real:0:0:42,user:0:0:42,sys:0:0:0)[clk:4268] (202 calls X 2.112871e+01) vector::project : (real:0:0:14,user:0:0:14,sys:0:0:0)[clk:1421] (101 calls X 1.406931e+01) scalar::dealias : (real:0:0:18,user:0:0:19,sys:0:0:0)[clk:1899] (606 calls X 3.133663e+00)
64 Processor run, global<512,512,512> for 100 time steps main timer : (real:0:15:23,user:0:15:15,sys:0:0:3)[clk:92320] loop : (real:0:9:9,user:0:9:3,sys:0:0:2)[clk:54978] (100 calls X 5.497800e+02) FFT timer : (real:0:12:1,user:0:11:55,sys:0:0:2)[clk:72151] planification time : (real:0:5:16,user:0:5:16,sys:0:0:0)[clk:31694] FFT and transposition time : (real:0:6:44,user:0:6:38,sys:0:0:2)[clk:40457] (509 calls X 7.948330e+01) FFT only : (real:0:1:58,user:0:1:57,sys:0:0:0)[clk:11895] (2545 calls X 4.673871e+00) Transposition only : (real:0:4:7,user:0:4:2,sys:0:0:2)[clk:24737] (1832 calls X 1.350273e+01) azur::array timer root : (real:0:2:18,user:0:2:18,sys:0:0:0)[clk:13841] view = expr : (real:0:2:18,user:0:2:18,sys:0:0:0)[clk:13840] (2538 calls X 5.453113e+00) fftw3<dbl> id= dbl : (real:0:0:10,user:0:0:10,sys:0:0:0)[clk:1023] (510 calls X 2.005882e+00) fftw3<dbl> /= dbl : (real:0:0:11,user:0:0:12,sys:0:0:0)[clk:1199] (612 calls X 1.959150e+00) fftw4<dbl> id= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:6] (1 calls X 6.000000e+00) fftw3<dbl> id= fftw3<dbl> : (real:0:0:21,user:0:0:21,sys:0:0:0)[clk:2145] (606 calls X 3.539604e+00) fftw3<dbl> *= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:6] (3 calls X 2.000000e+00) fftw4<cdbl> id= fftw4<cdbl> : (real:0:0:21,user:0:0:21,sys:0:0:0)[clk:2172] (202 calls X 1.075247e+01) fftw4<dbl> id= fftw4<dbl> : (real:0:0:10,user:0:0:10,sys:0:0:0)[clk:1050] (103 calls X 1.019417e+01) fftw4<cdbl> id= (s2v<basic3<dbl>> * ((fftw4<cdbl> * dbl) + fftw4<cdbl>)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:13] (1 calls X 1.300000e+01) fftw4<cdbl> id= (s2v<basic3<dbl>> * ((fftw4<cdbl> * dbl) + fftw4<cdbl>)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:13] (1 calls X 1.300000e+01) fftw4<cdbl> id= (s2v<basic3<dbl>> * (fftw4<cdbl> * dbl)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:12] (1 calls X 1.200000e+01) fftw4<cdbl> *= s2v<basic3<dbl>> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:8] (1 calls X 8.000000e+00) fftw4<cdbl> += fftw4<cdbl> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:11] (1 calls X 1.100000e+01) fftw4<cdbl> id= ((fftw4<cdbl> * s2v<basic3<dbl>>) + (s2v<basic3<dbl>> * (fftw4<cdbl> * dbl))) : (real:0:0:0,user:0:0:0,sys:0:0:0)[ clk:16] (1 calls X 1.600000e+01) fftw4<cdbl> id= ((fftw4<cdbl> * dbl) * s2v<basic3<dbl>>) : (real:0:0:12,user:0:0:12,sys:0:0:0)[clk:1265] (99 calls X 1.2777 78e+01) fftw4<cdbl> -= fftw4<cdbl> : (real:0:0:10,user:0:0:10,sys:0:0:0)[clk:1067] (99 calls X 1.077778e+01) fftw4<cdbl> -= ((fftw4<cdbl> * dbl) * s2v<basic3<dbl>>) : (real:0:0:13,user:0:0:12,sys:0:0:0)[clk:1301] (99 calls X 1.314141e+01) fftw4<cdbl> id= (s2v<basic3<dbl>> swp(*) (fftw4<cdbl> + (fftw4<cdbl> * dbl))) : (real:0:0:25,user:0:0:25,sys:0:0:0)[clk:2531] (1 98 calls X 1.278283e+01) cubby::field timer root : (real:0:5:25,user:0:5:20,sys:0:0:2)[clk:32572] scalar::transpose_blocks_when_received : (real:0:4:33,user:0:4:28,sys:0:0:2)[clk:27357] (3054 calls X 8.957760e+00) scalar::copy_transposed : (real:0:0:36,user:0:0:33,sys:0:0:0)[clk:3637] (97728 calls X 3.721554e-02) vector::in_place_curl : (real:0:0:12,user:0:0:12,sys:0:0:0)[clk:1268] (202 calls X 6.277228e+00) scalar::local_energy : (real:0:0:1,user:0:0:1,sys:0:0:0)[clk:106] (66 calls X 1.606061e+00) vector::vec_prod : (real:0:0:21,user:0:0:20,sys:0:0:0)[clk:2100] (202 calls X 1.039604e+01) vector::project : (real:0:0:7,user:0:0:7,sys:0:0:0)[clk:715] (101 calls X 7.079208e+00) scalar::dealias : (real:0:0:10,user:0:0:10,sys:0:0:0)[clk:1023] (606 calls X 1.688119e+00)
128 Processor run, global<512,512,512> for 100 time steps main timer : (real:0:9:23,user:0:9:17,sys:0:0:1)[clk:56329] loop : (real:0:6:23,user:0:6:18,sys:0:0:1)[clk:38323] (100 calls X 3.832300e+02) FFT timer : (real:0:7:39,user:0:7:34,sys:0:0:1)[clk:45948] planification time : (real:0:2:33,user:0:2:33,sys:0:0:0)[clk:15329] FFT and transposition time : (real:0:5:6,user:0:5:1,sys:0:0:1)[clk:30619] (509 calls X 6.015520e+01) FFT only : (real:0:1:14,user:0:1:12,sys:0:0:0)[clk:7416] (2545 calls X 2.913949e+00) Transposition only : (real:0:3:18,user:0:3:15,sys:0:0:1)[clk:19882] (1832 calls X 1.085262e+01) azur::array timer root : (real:0:1:11,user:0:1:11,sys:0:0:0)[clk:7168] view = expr : (real:0:1:11,user:0:1:11,sys:0:0:0)[clk:7167] (2538 calls X 2.823877e+00) fftw3<dbl> id= dbl : (real:0:0:5,user:0:0:5,sys:0:0:0)[clk:576] (510 calls X 1.129412e+00) fftw3<dbl> /= dbl : (real:0:0:6,user:0:0:6,sys:0:0:0)[clk:689] (612 calls X 1.125817e+00) fftw4<dbl> id= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:4] (1 calls X 4.000000e+00) fftw3<dbl> id= fftw3<dbl> : (real:0:0:10,user:0:0:10,sys:0:0:0)[clk:1081] (606 calls X 1.783828e+00) fftw3<dbl> *= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:3] (3 calls X 1.000000e+00) fftw4<cdbl> id= fftw4<cdbl> : (real:0:0:11,user:0:0:11,sys:0:0:0)[clk:1104] (202 calls X 5.465346e+00) fftw4<dbl> id= fftw4<dbl> : (real:0:0:5,user:0:0:5,sys:0:0:0)[clk:545] (103 calls X 5.291262e+00) fftw4<cdbl> id= (s2v<basic3<dbl>> * ((fftw4<cdbl> * dbl) + fftw4<cdbl>)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:7] (1 calls X 7.000000e+00) fftw4<cdbl> id= (s2v<basic3<dbl>> * ((fftw4<cdbl> * dbl) + fftw4<cdbl>)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:6] (1 calls X 6.000000e+00) fftw4<cdbl> id= (s2v<basic3<dbl>> * (fftw4<cdbl> * dbl)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:6] (1 calls X 6.000000e+00) fftw4<cdbl> *= s2v<basic3<dbl>> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:4] (1 calls X 4.000000e+00) fftw4<cdbl> += fftw4<cdbl> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:6] (1 calls X 6.000000e+00) fftw4<cdbl> id= ((fftw4<cdbl> * s2v<basic3<dbl>>) + (s2v<basic3<dbl>> * (fftw4<cdbl> * dbl))) : (real:0:0:0,user:0:0:0,sys:0:0:0)[ clk:8] (1 calls X 8.000000e+00) fftw4<cdbl> id= ((fftw4<cdbl> * dbl) * s2v<basic3<dbl>>) : (real:0:0:6,user:0:0:6,sys:0:0:0)[clk:644] (99 calls X 6.505051e+00) fftw4<cdbl> -= fftw4<cdbl> : (real:0:0:5,user:0:0:5,sys:0:0:0)[clk:539] (99 calls X 5.444445e+00) fftw4<cdbl> -= ((fftw4<cdbl> * dbl) * s2v<basic3<dbl>>) : (real:0:0:6,user:0:0:6,sys:0:0:0)[clk:643] (99 calls X 6.494949e+00) fftw4<cdbl> id= (s2v<basic3<dbl>> swp(*) (fftw4<cdbl> + (fftw4<cdbl> * dbl))) : (real:0:0:13,user:0:0:13,sys:0:0:0)[clk:1301] (1 98 calls X 6.570707e+00) cubby::field timer root : (real:0:4:11,user:0:4:8,sys:0:0:1)[clk:25197] scalar::transpose_blocks_when_received : (real:0:3:45,user:0:3:41,sys:0:0:1)[clk:22511] (3054 calls X 7.370989e+00) scalar::copy_transposed : (real:0:0:20,user:0:0:19,sys:0:0:0)[clk:2066] (195456 calls X 1.057015e-02) vector::in_place_curl : (real:0:0:7,user:0:0:7,sys:0:0:0)[clk:730] (202 calls X 3.613861e+00) scalar::local_energy : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:41] (66 calls X 6.212121e-01) vector::vec_prod : (real:0:0:10,user:0:0:10,sys:0:0:0)[clk:1034] (202 calls X 5.118812e+00) vector::project : (real:0:0:3,user:0:0:3,sys:0:0:0)[clk:389] (101 calls X 3.851485e+00) scalar::dealias : (real:0:0:4,user:0:0:4,sys:0:0:0)[clk:492] (606 calls X 8.118812e-01)
256 Processor run, global<512,512,512> for 100 time steps main timer : (real:0:5:51,user:0:5:50,sys:0:0:0)[clk:35130] loop : (real:0:4:29,user:0:4:29,sys:0:0:0)[clk:26974] (100 calls X 2.697400e+02) FFT timer : (real:0:5:1,user:0:5:1,sys:0:0:0)[clk:30168] planification time : (real:0:1:1,user:0:1:1,sys:0:0:0)[clk:6168] FFT and transposition time : (real:0:4:0,user:0:3:59,sys:0:0:0)[clk:24000] (509 calls X 4.715128e+01) FFT only : (real:0:0:30,user:0:0:30,sys:0:0:0)[clk:3042] (2545 calls X 1.195285e+00) Transposition only : (real:0:3:5,user:0:3:5,sys:0:0:0)[clk:18579] (1832 calls X 1.014138e+01) azur::array timer root : (real:0:0:32,user:0:0:32,sys:0:0:0)[clk:3210] view = expr : (real:0:0:32,user:0:0:32,sys:0:0:0)[clk:3209] (2538 calls X 1.264381e+00) fftw3<dbl> id= dbl : (real:0:0:2,user:0:0:2,sys:0:0:0)[clk:214] (510 calls X 4.196078e-01) fftw3<dbl> /= dbl : (real:0:0:1,user:0:0:1,sys:0:0:0)[clk:176] (612 calls X 2.875817e-01) fftw4<dbl> id= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:2] (1 calls X 2.000000e+00) fftw3<dbl> id= fftw3<dbl> : (real:0:0:5,user:0:0:5,sys:0:0:0)[clk:510] (606 calls X 8.415841e-01) fftw3<dbl> *= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:2] (3 calls X 6.666667e-01) fftw4<cdbl> id= fftw4<cdbl> : (real:0:0:5,user:0:0:5,sys:0:0:0)[clk:566] (202 calls X 2.801980e+00) fftw4<dbl> id= fftw4<dbl> : (real:0:0:2,user:0:0:2,sys:0:0:0)[clk:273] (103 calls X 2.650486e+00) fftw4<cdbl> id= (s2v<basic3<dbl>> * ((fftw4<cdbl> * dbl) + fftw4<cdbl>)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:3] (1 calls X 3.000000e+00) fftw4<cdbl> id= (s2v<basic3<dbl>> * ((fftw4<cdbl> * dbl) + fftw4<cdbl>)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:3] (1 calls X 3.000000e+00) fftw4<cdbl> id= (s2v<basic3<dbl>> * (fftw4<cdbl> * dbl)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:3] (1 calls X 3.000000e+00) fftw4<cdbl> *= s2v<basic3<dbl>> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:3] (1 calls X 3.000000e+00) fftw4<cdbl> += fftw4<cdbl> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:2] (1 calls X 2.000000e+00) fftw4<cdbl> id= ((fftw4<cdbl> * s2v<basic3<dbl>>) + (s2v<basic3<dbl>> * (fftw4<cdbl> * dbl))) : (real:0:0:0,user:0:0:0,sys:0:0:0)[ clk:4] (1 calls X 4.000000e+00) fftw4<cdbl> id= ((fftw4<cdbl> * dbl) * s2v<basic3<dbl>>) : (real:0:0:2,user:0:0:3,sys:0:0:0)[clk:298] (99 calls X 3.010101e+00) fftw4<cdbl> -= fftw4<cdbl> : (real:0:0:2,user:0:0:2,sys:0:0:0)[clk:252] (99 calls X 2.545455e+00) fftw4<cdbl> -= ((fftw4<cdbl> * dbl) * s2v<basic3<dbl>>) : (real:0:0:3,user:0:0:3,sys:0:0:0)[clk:313] (99 calls X 3.161616e+00) fftw4<cdbl> id= (s2v<basic3<dbl>> swp(*) (fftw4<cdbl> + (fftw4<cdbl> * dbl))) : (real:0:0:5,user:0:0:5,sys:0:0:0)[clk:584] (1 98 calls X 2.949495e+00) cubby::field timer root : (real:0:3:40,user:0:3:39,sys:0:0:0)[clk:22017] scalar::transpose_blocks_when_received : (real:0:3:27,user:0:3:27,sys:0:0:0)[clk:20777] (3054 calls X 6.803209e+00) scalar::copy_transposed : (real:0:0:6,user:0:0:5,sys:0:0:0)[clk:636] (390912 calls X 1.626965e-03) vector::in_place_curl : (real:0:0:3,user:0:0:2,sys:0:0:0)[clk:315] (202 calls X 1.559406e+00) scalar::local_energy : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:23] (66 calls X 3.484848e-01) vector::vec_prod : (real:0:0:4,user:0:0:4,sys:0:0:0)[clk:422] (202 calls X 2.089109e+00) vector::project : (real:0:0:1,user:0:0:1,sys:0:0:0)[clk:151] (101 calls X 1.495049e+00) scalar::dealias : (real:0:0:3,user:0:0:3,sys:0:0:0)[clk:327] (606 calls X 5.396039e-01)
512 Processor run, global<512,512,512> for 100 time steps main timer : (real:0:4:29,user:0:4:28,sys:0:0:0)[clk:26936] loop : (real:0:3:45,user:0:3:45,sys:0:0:0)[clk:22564] (100 calls X 2.256400e+02) FFT timer : (real:0:4:3,user:0:4:2,sys:0:0:0)[clk:24319] planification time : (real:0:0:37,user:0:0:37,sys:0:0:0)[clk:3767] FFT and transposition time : (real:0:3:25,user:0:3:25,sys:0:0:0)[clk:20552] (509 calls X 4.037721e+01) FFT only : (real:0:0:11,user:0:0:11,sys:0:0:0)[clk:1150] (2545 calls X 4.518664e-01) Transposition only : (real:0:2:52,user:0:2:52,sys:0:0:0)[clk:17294] (1832 calls X 9.439957e+00) azur::array timer root : (real:0:0:17,user:0:0:17,sys:0:0:0)[clk:1732] view = expr : (real:0:0:17,user:0:0:17,sys:0:0:0)[clk:1732] (2538 calls X 6.824271e-01) fftw3<dbl> id= dbl : (real:0:0:1,user:0:0:1,sys:0:0:0)[clk:106] (510 calls X 2.078431e-01) fftw3<dbl> /= dbl : (real:0:0:1,user:0:0:1,sys:0:0:0)[clk:145] (612 calls X 2.369281e-01) fftw4<dbl> id= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:1] (1 calls X 1.000000e+00) fftw3<dbl> id= fftw3<dbl> : (real:0:0:2,user:0:0:2,sys:0:0:0)[clk:266] (606 calls X 4.389439e-01) fftw3<dbl> *= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:1] (3 calls X 3.333333e-01) fftw4<cdbl> id= fftw4<cdbl> : (real:0:0:2,user:0:0:2,sys:0:0:0)[clk:281] (202 calls X 1.391089e+00) fftw4<dbl> id= fftw4<dbl> : (real:0:0:1,user:0:0:1,sys:0:0:0)[clk:143] (103 calls X 1.388350e+00) fftw4<cdbl> id= (s2v<basic3<dbl>> * ((fftw4<cdbl> * dbl) + fftw4<cdbl>)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:2] (1 calls X 2.000000e+00) fftw4<cdbl> id= (s2v<basic3<dbl>> * ((fftw4<cdbl> * dbl) + fftw4<cdbl>)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:1] (1 calls X 1.000000e+00) fftw4<cdbl> id= (s2v<basic3<dbl>> * (fftw4<cdbl> * dbl)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:2] (1 calls X 2.000000e+00) fftw4<cdbl> *= s2v<basic3<dbl>> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:0] (1 calls X 0.000000e+00) fftw4<cdbl> += fftw4<cdbl> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:2] (1 calls X 2.000000e+00) fftw4<cdbl> id= ((fftw4<cdbl> * s2v<basic3<dbl>>) + (s2v<basic3<dbl>> * (fftw4<cdbl> * dbl))) : (real:0:0:0,user:0:0:0,sys:0:0:0)[ clk:2] (1 calls X 2.000000e+00) fftw4<cdbl> id= ((fftw4<cdbl> * dbl) * s2v<basic3<dbl>>) : (real:0:0:1,user:0:0:1,sys:0:0:0)[clk:163] (99 calls X 1.646465e+00) fftw4<cdbl> -= fftw4<cdbl> : (real:0:0:1,user:0:0:1,sys:0:0:0)[clk:126] (99 calls X 1.272727e+00) fftw4<cdbl> -= ((fftw4<cdbl> * dbl) * s2v<basic3<dbl>>) : (real:0:0:1,user:0:0:1,sys:0:0:0)[clk:166] (99 calls X 1.676768e+00) fftw4<cdbl> id= (s2v<basic3<dbl>> swp(*) (fftw4<cdbl> + (fftw4<cdbl> * dbl))) : (real:0:0:3,user:0:0:3,sys:0:0:0)[clk:323] (1 98 calls X 1.631313e+00) cubby::field timer root : (real:0:3:19,user:0:3:18,sys:0:0:0)[clk:19907] scalar::transpose_blocks_when_received : (real:0:3:12,user:0:3:12,sys:0:0:0)[clk:19254] (3054 calls X 6.304519e+00) scalar::copy_transposed : (real:0:0:2,user:0:0:2,sys:0:0:0)[clk:253] (781824 calls X 3.236022e-04) vector::in_place_curl : (real:0:0:1,user:0:0:1,sys:0:0:0)[clk:155] (202 calls X 7.673267e-01) scalar::local_energy : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:16] (66 calls X 2.424242e-01) vector::vec_prod : (real:0:0:2,user:0:0:2,sys:0:0:0)[clk:267] (202 calls X 1.321782e+00) vector::project : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:89] (101 calls X 8.811881e-01) scalar::dealias : (real:0:0:1,user:0:0:1,sys:0:0:0)[clk:126] (606 calls X 2.079208e-01)
128 Processor run, global<1024,1024,1024> for 100 time steps main timer : (real:1:15:4,user:1:14:38,sys:0:0:11)[clk:450471] loop : (real:0:49:6,user:0:48:43,sys:0:0:8)[clk:294619] (100 calls X 2.946190e+03) FFT timer : (real:1:1:49,user:1:1:27,sys:0:0:8)[clk:370918] planification time : (real:0:24:20,user:0:24:20,sys:0:0:0)[clk:146079] FFT and transposition time : (real:0:37:28,user:0:37:6,sys:0:0:8)[clk:224839] (509 calls X 4.417269e+02) FFT only : (real:0:8:40,user:0:8:39,sys:0:0:0)[clk:52091] (2545 calls X 2.046798e+01) Transposition only : (real:0:24:28,user:0:24:10,sys:0:0:7)[clk:146848] (1832 calls X 8.015720e+01) azur::array timer root : (real:0:9:11,user:0:9:10,sys:0:0:0)[clk:55182] view = expr : (real:0:9:11,user:0:9:10,sys:0:0:0)[clk:55182] (2538 calls X 2.174232e+01) fftw3<dbl> id= dbl : (real:0:0:39,user:0:0:39,sys:0:0:0)[clk:3972] (510 calls X 7.788235e+00) fftw3<dbl> /= dbl : (real:0:0:50,user:0:0:50,sys:0:0:0)[clk:5077] (612 calls X 8.295752e+00) fftw4<dbl> id= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:25] (1 calls X 2.500000e+01) fftw3<dbl> id= fftw3<dbl> : (real:0:1:23,user:0:1:23,sys:0:0:0)[clk:8399] (606 calls X 1.38597 4e+01) fftw3<dbl> *= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:26] (3 calls X 8.666667e+00) fftw4<cdbl> id= fftw4<cdbl> : (real:0:1:24,user:0:1:24,sys:0:0:0)[clk:8452] (202 calls X 4.18415 8e+01) fftw4<dbl> id= fftw4<dbl> : (real:0:0:40,user:0:0:40,sys:0:0:0)[clk:4095] (103 calls X 3.97572 8e+01) fftw4<cdbl> id= (s2v<basic3<dbl>> * ((fftw4<cdbl> * dbl) + fftw4<cdbl>)) : (real:0:0:0,user:0:0:0,sy s:0:0:0)[clk:52] (1 calls X 5.200000e+01) fftw4<cdbl> id= (s2v<basic3<dbl>> * ((fftw4<cdbl> * dbl) + fftw4<cdbl>)) : (real:0:0:0,user:0:0:0,sy s:0:0:0)[clk:48] (1 calls X 4.800000e+01) fftw4<cdbl> id= (s2v<basic3<dbl>> * (fftw4<cdbl> * dbl)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:52] (1 calls X 5.200000e+01) fftw4<cdbl> *= s2v<basic3<dbl>> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:34] (1 calls X 3.400000e +01) fftw4<cdbl> += fftw4<cdbl> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:42] (1 calls X 4.200000e+01) fftw4<cdbl> id= ((fftw4<cdbl> * s2v<basic3<dbl>>) + (s2v<basic3<dbl>> * (fftw4<cdbl> * dbl))) : (rea l:0:0:0,user:0:0:0,sys:0:0:0)[clk:63] (1 calls X 6.300000e+01) fftw4<cdbl> id= ((fftw4<cdbl> * dbl) * s2v<basic3<dbl>>) : (real:0:0:50,user:0:0:50,sys:0:0:0)[clk:5 076] (99 calls X 5.127273e+01) fftw4<cdbl> -= fftw4<cdbl> : (real:0:0:42,user:0:0:42,sys:0:0:0)[clk:4269] (99 calls X 4.312121 e+01) fftw4<cdbl> -= ((fftw4<cdbl> * dbl) * s2v<basic3<dbl>>) : (real:0:0:51,user:0:0:51,sys:0:0:0)[clk:51 70] (99 calls X 5.222222e+01) fftw4<cdbl> id= (s2v<basic3<dbl>> swp(*) (fftw4<cdbl> + (fftw4<cdbl> * dbl))) : (real:0:1:43,user:0: 1:43,sys:0:0:0)[clk:10328] (198 calls X 5.216162e+01) cubby::field timer root : (real:0:31:18,user:0:30:58,sys:0:0:8)[clk:187885] scalar::transpose_blocks_when_received : (real:0:27:56,user:0:27:36,sys:0:0:8)[clk:167668] (3054 calls X 5.490111e+01) scalar::copy_transposed : (real:0:2:10,user:0:2:7,sys:0:0:0)[clk:13054] (195456 calls X 6.678741e-02 ) vector::in_place_curl : (real:0:0:47,user:0:0:47,sys:0:0:0)[clk:4788] (202 calls X 2.370297e+01) scalar::local_energy : (real:0:0:4,user:0:0:4,sys:0:0:0)[clk:403] (66 calls X 6.106061e+00) vector::vec_prod : (real:0:1:22,user:0:1:22,sys:0:0:0)[clk:8240] (202 calls X 4.079208e+01) vector::project : (real:0:0:28,user:0:0:28,sys:0:0:0)[clk:2884] (101 calls X 2.855445e+01) scalar::dealias : (real:0:0:39,user:0:0:38,sys:0:0:0)[clk:3901] (606 calls X 6.437294e+00)
--------------------------------------------- 256 Processor run, global<1024,1024,1024> for 100 time steps main timer : (real:0:40:47,user:0:40:13,sys:0:0:10)[clk:244736] loop : (real:0:26:36,user:0:26:5,sys:0:0:8)[clk:159666] (100 calls X 1.596660e+03) FFT timer : (real:0:34:14,user:0:33:42,sys:0:0:8)[clk:205439] planification time : (real:0:12:53,user:0:12:52,sys:0:0:0)[clk:77311] FFT and transposition time : (real:0:21:21,user:0:20:50,sys:0:0:8)[clk:128128] (509 calls X 2.517249e+02) FFT only : (real:0:3:52,user:0:3:51,sys:0:0:0)[clk:23201] (2545 calls X 9.116306e+00) Transposition only : (real:0:15:0,user:0:14:34,sys:0:0:7)[clk:90004] (1832 calls X 4.912882e+01) azur::array timer root : (real:0:4:27,user:0:4:26,sys:0:0:0)[clk:26732] view = expr : (real:0:4:27,user:0:4:26,sys:0:0:0)[clk:26731] (2538 calls X 1.053231e+01) fftw3<dbl> id= dbl : (real:0:0:20,user:0:0:20,sys:0:0:0)[clk:2055] (510 calls X 4.029412e+00) fftw3<dbl> /= dbl : (real:0:0:21,user:0:0:21,sys:0:0:0)[clk:2107] (612 calls X 3.442811e+00) fftw4<dbl> id= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:13] (1 calls X 1.300000e+01) fftw3<dbl> id= fftw3<dbl> : (real:0:0:42,user:0:0:42,sys:0:0:0)[clk:4216] (606 calls X 6.957096e+00) fftw3<dbl> *= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:11] (3 calls X 3.666667e+00) fftw4<cdbl> id= fftw4<cdbl> : (real:0:0:42,user:0:0:42,sys:0:0:0)[clk:4279] (202 calls X 2.118317e+01) fftw4<dbl> id= fftw4<dbl> : (real:0:0:20,user:0:0:19,sys:0:0:0)[clk:2001] (103 calls X 1.942719e+01) fftw4<cdbl> id= (s2v<basic3<dbl>> * ((fftw4<cdbl> * dbl) + fftw4<cdbl>)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:33] (1 calls X 3 .300000e+01) fftw4<cdbl> id= (s2v<basic3<dbl>> * ((fftw4<cdbl> * dbl) + fftw4<cdbl>)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:25] (1 calls X 2 .500000e+01) fftw4<cdbl> id= (s2v<basic3<dbl>> * (fftw4<cdbl> * dbl)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:23] (1 calls X 2.300000e+01) fftw4<cdbl> *= s2v<basic3<dbl>> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:19] (1 calls X 1.900000e+01) fftw4<cdbl> += fftw4<cdbl> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:20] (1 calls X 2.000000e+01) fftw4<cdbl> id= ((fftw4<cdbl> * s2v<basic3<dbl>>) + (s2v<basic3<dbl>> * (fftw4<cdbl> * dbl))) : (real:0:0:0,user:0:0:0,sys:0:0:0)[cl k:30] (1 calls X 3.000000e+01) fftw4<cdbl> id= ((fftw4<cdbl> * dbl) * s2v<basic3<dbl>>) : (real:0:0:23,user:0:0:23,sys:0:0:0)[clk:2379] (99 calls X 2.403030 e+01) fftw4<cdbl> -= fftw4<cdbl> : (real:0:0:21,user:0:0:21,sys:0:0:0)[clk:2134] (99 calls X 2.155556e+01) fftw4<cdbl> -= ((fftw4<cdbl> * dbl) * s2v<basic3<dbl>>) : (real:0:0:25,user:0:0:25,sys:0:0:0)[clk:2562] (99 calls X 2.587879e+01) fftw4<cdbl> id= (s2v<basic3<dbl>> swp(*) (fftw4<cdbl> + (fftw4<cdbl> * dbl))) : (real:0:0:48,user:0:0:48,sys:0:0:0)[clk:4822] (198 calls X 2.435353e+01) cubby::field timer root : (real:0:18:46,user:0:18:16,sys:0:0:8)[clk:112688] scalar::transpose_blocks_when_received : (real:0:17:8,user:0:16:38,sys:0:0:8)[clk:102817] (3054 calls X 3.366634e+01) scalar::copy_transposed : (real:0:0:49,user:0:0:45,sys:0:0:0)[clk:4958] (390912 calls X 1.268316e-02) vector::in_place_curl : (real:0:0:23,user:0:0:23,sys:0:0:0)[clk:2357] (202 calls X 1.166832e+01) scalar::local_energy : (real:0:0:2,user:0:0:2,sys:0:0:0)[clk:203] (66 calls X 3.075758e+00) vector::vec_prod : (real:0:0:38,user:0:0:38,sys:0:0:0)[clk:3871] (202 calls X 1.916337e+01) vector::project : (real:0:0:13,user:0:0:13,sys:0:0:0)[clk:1331] (101 calls X 1.317822e+01) scalar::dealias : (real:0:0:21,user:0:0:21,sys:0:0:0)[clk:2109] (606 calls X 3.480198e+00)
512 Processor run, global<1024,1024,1024> for 100 time steps main timer : (real:0:30:29,user:0:29:45,sys:0:0:10)[clk:182999] loop : (real:0:22:9,user:0:21:27,sys:0:0:9)[clk:132993] (100 calls X 1.329930e+03) FFT timer : (real:0:27:46,user:0:27:2,sys:0:0:10)[clk:166607] planification time : (real:0:6:55,user:0:6:55,sys:0:0:0)[clk:41545] FFT and transposition time : (real:0:20:50,user:0:20:7,sys:0:0:9)[clk:125062] (509 calls X 2.457014e+02) FFT only : (real:0:1:44,user:0:1:44,sys:0:0:0)[clk:10446] (2545 calls X 4.104519e+00) Transposition only : (real:0:16:42,user:0:16:4,sys:0:0:8)[clk:100259] (1832 calls X 5.472653e+01) azur::array timer root : (real:0:1:49,user:0:1:49,sys:0:0:0)[clk:10928] view = expr : (real:0:1:49,user:0:1:49,sys:0:0:0)[clk:10928] (2538 calls X 4.305753e+00) fftw3<dbl> id= dbl : (real:0:0:6,user:0:0:6,sys:0:0:0)[clk:668] (510 calls X 1.309804e+00) fftw3<dbl> /= dbl : (real:0:0:7,user:0:0:7,sys:0:0:0)[clk:785] (612 calls X 1.282680e+00) fftw4<dbl> id= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:7] (1 calls X 7.000000e+00) fftw3<dbl> id= fftw3<dbl> : (real:0:0:17,user:0:0:17,sys:0:0:0)[clk:1768] (606 calls X 2.917492e+00 ) fftw3<dbl> *= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:3] (3 calls X 1.000000e+00) fftw4<cdbl> id= fftw4<cdbl> : (real:0:0:16,user:0:0:16,sys:0:0:0)[clk:1674] (202 calls X 8.287128e+00 ) fftw4<dbl> id= fftw4<dbl> : (real:0:0:9,user:0:0:9,sys:0:0:0)[clk:900] (103 calls X 8.737864e+00) fftw4<cdbl> id= (s2v<basic3<dbl>> * ((fftw4<cdbl> * dbl) + fftw4<cdbl>)) : (real:0:0:0,user:0:0:0,sys:0:0 :0)[clk:12] (1 calls X 1.200000e+01) fftw4<cdbl> id= (s2v<basic3<dbl>> * ((fftw4<cdbl> * dbl) + fftw4<cdbl>)) : (real:0:0:0,user:0:0:0,sys:0:0 :0)[clk:13] (1 calls X 1.300000e+01) fftw4<cdbl> id= (s2v<basic3<dbl>> * (fftw4<cdbl> * dbl)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:7] ( 1 calls X 7.000000e+00) fftw4<cdbl> *= s2v<basic3<dbl>> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:6] (1 calls X 6.000000e+00) fftw4<cdbl> += fftw4<cdbl> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:6] (1 calls X 6.000000e+00) fftw4<cdbl> id= ((fftw4<cdbl> * s2v<basic3<dbl>>) + (s2v<basic3<dbl>> * (fftw4<cdbl> * dbl))) : (real:0:0 :0,user:0:0:0,sys:0:0:0)[clk:9] (1 calls X 9.000000e+00) fftw4<cdbl> id= ((fftw4<cdbl> * dbl) * s2v<basic3<dbl>>) : (real:0:0:10,user:0:0:10,sys:0:0:0)[clk:1054](99 calls X 1.064646e+01) fftw4<cdbl> -= fftw4<cdbl> : (real:0:0:9,user:0:0:8,sys:0:0:0)[clk:908] (99 calls X 9.171718e+00) fftw4<cdbl> -= ((fftw4<cdbl> * dbl) * s2v<basic3<dbl>>) : (real:0:0:10,user:0:0:10,sys:0:0:0)[clk:1037] ( 99 calls X 1.047475e+01) fftw4<cdbl> id= (s2v<basic3<dbl>> swp(*) (fftw4<cdbl> + (fftw4<cdbl> * dbl))) : (real:0:0:20,user:0:0:20, sys:0:0:0)[clk:2069] (198 calls X 1.044950e+01) cubby::field timer root : (real:0:19:37,user:0:18:54,sys:0:0:9)[clk:117776] scalar::transpose_blocks_when_received : (real:0:18:58,user:0:18:15,sys:0:0:9)[clk:113828] (3054 calls X 3.7 27177e+01) scalar::copy_transposed : (real:0:0:23,user:0:0:16,sys:0:0:0)[clk:2315] (781824 calls X 2.961024e-03) vector::in_place_curl : (real:0:0:8,user:0:0:8,sys:0:0:0)[clk:899] (202 calls X 4.450495e+00) scalar::local_energy : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:87] (66 calls X 1.318182e+00) vector::vec_prod : (real:0:0:14,user:0:0:14,sys:0:0:0)[clk:1410] (202 calls X 6.980198e+00) vector::project : (real:0:0:5,user:0:0:5,sys:0:0:0)[clk:548] (101 calls X 5.425743e+00) scalar::dealias : (real:0:0:10,user:0:0:10,sys:0:0:0)[clk:1004] (606 calls X 1.656766e+00)
1024 Processor run, global<1024,1024,1024> for 100 time steps main timer : (real:0:19:0,user:0:18:58,sys:0:0:1)[clk:114091] loop : (real:0:14:59,user:0:14:58,sys:0:0:0)[clk:89964] (100 calls X 8.996400e+02) FFT timer : (real:0:17:20,user:0:17:18,sys:0:0:0)[clk:104015] planification time : (real:0:3:19,user:0:3:19,sys:0:0:0)[clk:19918] FFT and transposition time : (real:0:14:0,user:0:13:59,sys:0:0:0)[clk:84097] (509 calls X 1.652200e+02) FFT only : (real:0:0:52,user:0:0:53,sys:0:0:0)[clk:5261] (2545 calls X 2.067191e+00) Transposition only : (real:0:11:25,user:0:11:24,sys:0:0:0)[clk:68590] (1832 calls X 3.743996e+01) azur::array timer root : (real:0:1:8,user:0:1:7,sys:0:0:0)[clk:6813] view = expr : (real:0:1:8,user:0:1:7,sys:0:0:0)[clk:6811] (2538 calls X 2.683609e+00) fftw3<dbl> id= dbl : (real:0:0:4,user:0:0:4,sys:0:0:0)[clk:475] (510 calls X 9.313725e-01) fftw3<dbl> /= dbl : (real:0:0:5,user:0:0:5,sys:0:0:0)[clk:588] (612 calls X 9.607843e-01) fftw4<dbl> id= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:3] (1 calls X 3.000000e+00) fftw3<dbl> id= fftw3<dbl> : (real:0:0:10,user:0:0:10,sys:0:0:0)[clk:1058] (606 calls X 1.745875e+00) fftw3<dbl> *= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:3] (3 calls X 1.000000e+00) fftw4<cdbl> id= fftw4<cdbl> : (real:0:0:10,user:0:0:10,sys:0:0:0)[clk:1067] (202 calls X 5.282178e+00 ) fftw4<dbl> id= fftw4<dbl> : (real:0:0:5,user:0:0:5,sys:0:0:0)[clk:527] (103 calls X 5.116505e+00) fftw4<cdbl> id= (s2v<basic3<dbl>> * ((fftw4<cdbl> * dbl) + fftw4<cdbl>)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:7] (1 calls X 7.000000e+00) fftw4<cdbl> id= (s2v<basic3<dbl>> * ((fftw4<cdbl> * dbl) + fftw4<cdbl>)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:6] (1 calls X 6.000000e+00) fftw4<cdbl> id= (s2v<basic3<dbl>> * (fftw4<cdbl> * dbl)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:7] (1 calls X 7.000000e+00) fftw4<cdbl> *= s2v<basic3<dbl>> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:4] (1 calls X 4.000000e+00) fftw4<cdbl> += fftw4<cdbl> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:5] (1 calls X 5.000000e+00) fftw4<cdbl> id= ((fftw4<cdbl> * s2v<basic3<dbl>>) + (s2v<basic3<dbl>> * (fftw4<cdbl> * dbl))) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:8] (1 calls X 8.000000e+00) fftw4<cdbl> id= ((fftw4<cdbl> * dbl) * s2v<basic3<dbl>>) : (real:0:0:6,user:0:0:6,sys:0:0:0)[clk:624] (99 calls X 6.303030e+00) fftw4<cdbl> -= fftw4<cdbl> : (real:0:0:5,user:0:0:5,sys:0:0:0)[clk:529] (99 calls X 5.343434e+00) fftw4<cdbl> -= ((fftw4<cdbl> * dbl) * s2v<basic3<dbl>>) : (real:0:0:6,user:0:0:6,sys:0:0:0)[clk:647] (99 calls X 6.535354e+00) fftw4<cdbl> id= (s2v<basic3<dbl>> swp(*) (fftw4<cdbl> + (fftw4<cdbl> * dbl))) : (real:0:0:12,user:0:0:12,sys:0:0:0)[clk:1252] (198 calls X 6.323232e+00) cubby::field timer root : (real:0:13:27,user:0:13:26,sys:0:0:0)[clk:80756] scalar::transpose_blocks_when_received : (real:0:13:2,user:0:13:1,sys:0:0:0)[clk:78244] (3054 calls X 2.562017e+01) scalar::copy_transposed : (real:0:0:12,user:0:0:13,sys:0:0:0)[clk:1298] (1563648 calls X 8.301101e-04) vector::in_place_curl : (real:0:0:6,user:0:0:6,sys:0:0:0)[clk:606] (202 calls X 3.000000e+00) scalar::local_energy : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:49] (66 calls X 7.424242e-01) vector::vec_prod : (real:0:0:10,user:0:0:10,sys:0:0:0)[clk:1024] (202 calls X 5.069307e+00) vector::project : (real:0:0:3,user:0:0:3,sys:0:0:0)[clk:331] (101 calls X 3.277228e+00) scalar::dealias : (real:0:0:5,user:0:0:4,sys:0:0:0)[clk:500] (606 calls X 8.250825e-01)
Last modified 10 years ago
Last modified on Mar 24, 2011 3:14:01 PM
Attachments (4)
-
time_monoprocs_jade.png (40.6 KB) - added by 10 years ago.
Time Step monoprocs by grip points for jade
-
time_step_jade.png (39.1 KB) - added by 10 years ago.
Time Step by grip points for jade
-
time_monoprocs_jade_log.png (32.7 KB) - added by 10 years ago.
time step monoprocs
-
time_step_jade_log.png (31.9 KB) - added by 10 years ago.
time step in log
Download all attachments as: .zip