wiki:Fripp8G

Release 1480, 4 cpus for a 1283 space

A very small diff that indicate the pathscale has issues with simple high level optimizations. (gcc 1.4.4 does fine on that one).

****************************************************
max number of allocated scalars: 39
 ---------------------------------------------
   4 Processor run, global<128,128,128>
   for 100 time steps
main timer : (real:0:3:48,user:0:3:43,sys:0:0:4)[clk:22817]
	loop : (real:0:2:25,user:0:2:20,sys:0:0:4)[clk:14562]	(100 calls X 1.456200e+02)
	FFT timer : (real:0:2:31,user:0:2:31,sys:0:0:0)[clk:15192]
		FFT and transposition time : (real:0:2:31,user:0:2:31,sys:0:0:0)[clk:15192]	(512 calls X 2.967188e+01)
			FFT only : (real:0:0:38,user:0:0:38,sys:0:0:0)[clk:3849]	(2560 calls X 1.503516e+00)
		planification time : (real:0:1:18,user:0:1:18,sys:0:0:0)[clk:7889]
	azur::array timer root : (real:0:0:50,user:0:0:49,sys:0:0:1)[clk:5073]
		view = expr : (real:0:0:50,user:0:0:49,sys:0:0:1)[clk:5073]	(2380 calls X 2.131513e+00)
			basic3<flt> id= (basic3<int> + flt) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:1]	(2 calls X 5.000000e-01)
			fftw4<cdbl> id= cdbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:86]	(106 calls X 8.113208e-01)
			fftw4<dbl> id= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:3]	(4 calls X 7.500000e-01)
			fftw3<dbl> id= fftw3<dbl> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:4]	(9 calls X 4.444444e-01)
			fftw3<dbl> *= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:5]	(6 calls X 8.333333e-01)
			fftw3<dbl> id= dbl : (real:0:0:2,user:0:0:0,sys:0:0:1)[clk:209]	(516 calls X 4.050388e-01)
			fftw3<dbl> /= dbl : (real:0:0:3,user:0:0:3,sys:0:0:0)[clk:366]	(618 calls X 5.922330e-01)
			fftw4<cdbl> id= fftw4<cdbl> : (real:0:0:13,user:0:0:13,sys:0:0:0)[clk:1348]	(406 calls X 3.320197e+00)
			fftw4<dbl> id= fftw4<dbl> : (real:0:0:3,user:0:0:3,sys:0:0:0)[clk:339]	(205 calls X 1.653659e+00)
			fftw4<cdbl> id= (s2v<basic3<dbl>> * ((fftw4<cdbl> * dbl) + fftw4<cdbl>)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:13]	(2 calls X 6.500000e+00)
			fftw4<cdbl> id= (s2v<basic3<dbl>> * (fftw4<cdbl> * dbl)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:10]	(2 calls X 5.000000e+00)
			fftw4<cdbl> *= s2v<basic3<dbl>> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:7]	(2 calls X 3.500000e+00)
			fftw4<cdbl> += fftw4<cdbl> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:9]	(2 calls X 4.500000e+00)
			fftw4<cdbl> id= ((fftw4<cdbl> * dbl) * s2v<basic3<dbl>>) : (real:0:0:4,user:0:0:4,sys:0:0:0)[clk:489]	(100 calls X 4.890000e+00)
			fftw4<cdbl> -= fftw4<cdbl> : (real:0:0:3,user:0:0:3,sys:0:0:0)[clk:386]	(100 calls X 3.860000e+00)
			fftw4<cdbl> -= ((fftw4<cdbl> * dbl) * s2v<basic3<dbl>>) : (real:0:0:4,user:0:0:4,sys:0:0:0)[clk:472]	(100 calls X 4.720000e+00)
			fftw4<cdbl> id= (s2v<basic3<dbl>> swp(*) (fftw4<cdbl> + (fftw4<cdbl> * dbl))) : (real:0:0:13,user:0:0:13,sys:0:0:0)[clk:1325]	(200 calls X 6.625000e+00)
	cubby::field timer root : (real:0:0:24,user:0:0:24,sys:0:0:0)[clk:2433]
		vector::in_place_curl : (real:0:0:7,user:0:0:7,sys:0:0:0)[clk:759]	(204 calls X 3.720588e+00)
		vector::vec_prod : (real:0:0:6,user:0:0:7,sys:0:0:0)[clk:696]	(204 calls X 3.411765e+00)
		vector::project : (real:0:0:4,user:0:0:4,sys:0:0:0)[clk:431]	(102 calls X 4.225490e+00)
		scalar::dealias : (real:0:0:4,user:0:0:4,sys:0:0:0)[clk:424]	(612 calls X 6.928105e-01)
		scalar::local_energy : (real:0:0:1,user:0:0:1,sys:0:0:0)[clk:123]	(66 calls X 1.863636e+00)
[alainm@fripp77 bench]$ 

Release 1479, 4 cpus for a 1283 space

max number of allocated scalars: 39
 ---------------------------------------------
   4 Processor run, global<128,128,128>
   for 100 time steps
main timer : (real:0:3:52,user:0:3:47,sys:0:0:4)[clk:23211]
	loop : (real:0:2:30,user:0:2:25,sys:0:0:4)[clk:15007]	(100 calls X 1.500700e+02)
	FFT timer : (real:0:2:29,user:0:2:29,sys:0:0:0)[clk:14977]
		FFT and transposition time : (real:0:2:29,user:0:2:29,sys:0:0:0)[clk:14977]	(512 calls X 2.925195e+01)
			FFT only : (real:0:0:36,user:0:0:34,sys:0:0:0)[clk:3615]	(2560 calls X 1.412109e+00)
		planification time : (real:0:1:18,user:0:1:18,sys:0:0:0)[clk:7823]
	azur::array timer root : (real:0:0:49,user:0:0:48,sys:0:0:1)[clk:4944]
		view = expr : (real:0:0:49,user:0:0:48,sys:0:0:1)[clk:4944]	(2380 calls X 2.077311e+00)
			basic3<flt> id= (basic3<int> + flt) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:0]	(2 calls X 0.000000e+00)
			fftw4<cdbl> id= cdbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:83]	(106 calls X 7.830189e-01)
			fftw4<dbl> id= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:4]	(4 calls X 1.000000e+00)
			fftw3<dbl> id= fftw3<dbl> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:4]	(9 calls X 4.444444e-01)
			fftw3<dbl> *= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:4]	(6 calls X 6.666667e-01)
			fftw3<dbl> id= dbl : (real:0:0:1,user:0:0:0,sys:0:0:1)[clk:187]	(516 calls X 3.624031e-01)
			fftw3<dbl> /= dbl : (real:0:0:3,user:0:0:3,sys:0:0:0)[clk:357]	(618 calls X 5.776699e-01)
			fftw4<cdbl> id= fftw4<cdbl> : (real:0:0:12,user:0:0:13,sys:0:0:0)[clk:1291]	(406 calls X 3.179803e+00)
			fftw4<dbl> id= fftw4<dbl> : (real:0:0:3,user:0:0:3,sys:0:0:0)[clk:330]	(205 calls X 1.609756e+00)
			fftw4<cdbl> id= (s2v<basic3<dbl>> * ((fftw4<cdbl> * dbl) + fftw4<cdbl>)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:13]	(2 calls X 6.500000e+00)
			fftw4<cdbl> id= (s2v<basic3<dbl>> * (fftw4<cdbl> * dbl)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:9]	(2 calls X 4.500000e+00)
			fftw4<cdbl> *= s2v<basic3<dbl>> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:7]	(2 calls X 3.500000e+00)
			fftw4<cdbl> += fftw4<cdbl> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:9]	(2 calls X 4.500000e+00)
			fftw4<cdbl> id= ((fftw4<cdbl> * dbl) * s2v<basic3<dbl>>) : (real:0:0:4,user:0:0:4,sys:0:0:0)[clk:489]	(100 calls X 4.890000e+00)
			fftw4<cdbl> -= fftw4<cdbl> : (real:0:0:3,user:0:0:3,sys:0:0:0)[clk:364]	(100 calls X 3.640000e+00)
			fftw4<cdbl> -= ((fftw4<cdbl> * dbl) * s2v<basic3<dbl>>) : (real:0:0:4,user:0:0:4,sys:0:0:0)[clk:473]	(100 calls X 4.730000e+00)
			fftw4<cdbl> id= (s2v<basic3<dbl>> swp(*) (fftw4<cdbl> + (fftw4<cdbl> * dbl))) : (real:0:0:13,user:0:0:13,sys:0:0:0)[clk:1319]	(200 calls X 6.595000e+00)
	cubby::field timer root : (real:0:0:31,user:0:0:31,sys:0:0:0)[clk:3142]
		vector::in_place_curl : (real:0:0:7,user:0:0:7,sys:0:0:0)[clk:747]	(204 calls X 3.661765e+00)
		vector::vec_prod : (real:0:0:14,user:0:0:14,sys:0:0:0)[clk:1418]	(204 calls X 6.950980e+00)
		vector::project : (real:0:0:4,user:0:0:4,sys:0:0:0)[clk:431]	(102 calls X 4.225490e+00)
		scalar::dealias : (real:0:0:4,user:0:0:4,sys:0:0:0)[clk:430]	(612 calls X 7.026144e-01)
		scalar::local_energy : (real:0:0:1,user:0:0:1,sys:0:0:0)[clk:116]	(66 calls X 1.757576e+00)

Release 1473, 4 cpus for a 1283 space

max number of allocated scalars: 39
 ---------------------------------------------
   4 Processor run, global<128,128,128>
   for 100 time steps
main timer : (real:0:3:45,user:0:3:39,sys:0:0:5)[clk:22524]
	loop : (real:0:2:23,user:0:2:17,sys:0:0:5)[clk:14323]	(100 calls X 1.432300e+02)
	FFT timer : (real:0:2:32,user:0:2:32,sys:0:0:0)[clk:15229]
		FFT and transposition time : (real:0:2:32,user:0:2:32,sys:0:0:0)[clk:15229]	(512 calls X 2.974414e+01)
			FFT only : (real:0:0:42,user:0:0:43,sys:0:0:0)[clk:4273]	(2560 calls X 1.669141e+00)
		planification time : (real:0:1:18,user:0:1:17,sys:0:0:0)[clk:7810]
	azur::array timer root : (real:0:0:39,user:0:0:38,sys:0:0:1)[clk:3998]
		view = expr : (real:0:0:39,user:0:0:38,sys:0:0:1)[clk:3997]	(2380 calls X 1.679412e+00)
			V3 x= expr_n(V3) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:1]	(2 calls X 5.000000e-01)
			V4 = R : (real:0:0:1,user:0:0:0,sys:0:0:0)[clk:101]	(106 calls X 9.528302e-01)
			V4 = R : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:2]	(4 calls X 5.000000e-01)
			V3 = V3 : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:3]	(9 calls X 3.333333e-01)
			V3 x= R : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:2]	(6 calls X 3.333333e-01)
			V3 = R : (real:0:0:2,user:0:0:0,sys:0:0:1)[clk:225]	(516 calls X 4.360465e-01)
			V3 x= R : (real:0:0:3,user:0:0:3,sys:0:0:0)[clk:357]	(618 calls X 5.776699e-01)
			V4 = V4 : (real:0:0:5,user:0:0:5,sys:0:0:0)[clk:515]	(406 calls X 1.268473e+00)
			V4 = V4 : (real:0:0:2,user:0:0:2,sys:0:0:0)[clk:289]	(205 calls X 1.409756e+00)
			V4 x= V4 x ( V4 x V4 ) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:13]	(2 calls X 6.500000e+00)
			V4 x= V4 x V4 : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:8]	(2 calls X 4.000000e+00)
			V4 x= V4 : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:7]	(2 calls X 3.500000e+00)
			V4 x= V4 : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:5]	(2 calls X 2.500000e+00)
			V4 x= V4 x V4 : (real:0:0:4,user:0:0:4,sys:0:0:0)[clk:468]	(100 calls X 4.680000e+00)
			V4 x= V4 : (real:0:0:2,user:0:0:1,sys:0:0:0)[clk:201]	(100 calls X 2.010000e+00)
			V4 x= V4 x V4 : (real:0:0:4,user:0:0:4,sys:0:0:0)[clk:478]	(100 calls X 4.780000e+00)
			V4 x= V4 x ( V4 x V4 ) : (real:0:0:13,user:0:0:13,sys:0:0:0)[clk:1321]	(200 calls X 6.605000e+00)
	cubby::field timer root : (real:0:0:31,user:0:0:31,sys:0:0:0)[clk:3153]
		vector::in_place_curl : (real:0:0:7,user:0:0:7,sys:0:0:0)[clk:772]	(204 calls X 3.784314e+00)
		vector::vec_prod : (real:0:0:13,user:0:0:13,sys:0:0:0)[clk:1396]	(204 calls X 6.843137e+00)
		vector::project : (real:0:0:4,user:0:0:4,sys:0:0:0)[clk:441]	(102 calls X 4.323529e+00)
		scalar::dealias : (real:0:0:4,user:0:0:4,sys:0:0:0)[clk:429]	(612 calls X 7.009804e-01)
		scalar::local_energy : (real:0:0:1,user:0:0:1,sys:0:0:0)[clk:114]	(66 calls X 1.727273e+00)
[alainm@fripp143 bench]$   

Release [1481], 32 cpus for a 2563 space

****************************************************
max number of allocated scalars: 39
 ---------------------------------------------
   32 Processor run, global<256,256,256>
   for 100 time steps
main timer : (real:0:4:48,user:0:4:41,sys:0:0:6)[clk:28864]
	loop : (real:0:3:2,user:0:2:55,sys:0:0:5)[clk:18207]	(100 calls X 1.820700e+02)
	FFT timer : (real:0:3:40,user:0:3:39,sys:0:0:0)[clk:22016]
		FFT and transposition time : (real:0:3:40,user:0:3:39,sys:0:0:0)[clk:22016]	(512 calls X 4.300000e+01)
			FFT only : (real:0:0:37,user:0:0:37,sys:0:0:0)[clk:3798]	(4302 calls X 8.828452e-01)
		planification time : (real:0:1:40,user:0:1:40,sys:0:0:0)[clk:10054]
	azur::array timer root : (real:0:0:44,user:0:0:42,sys:0:0:1)[clk:4404]
		view = expr : (real:0:0:44,user:0:0:42,sys:0:0:1)[clk:4404]	(2380 calls X 1.850420e+00)
			basic3<flt> id= (basic3<int> + flt) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:1]	(2 calls X 5.000000e-01)
			fftw4<cdbl> id= cdbl : (real:0:0:1,user:0:0:1,sys:0:0:0)[clk:134]	(106 calls X 1.264151e+00)
			fftw4<dbl> id= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:4]	(4 calls X 1.000000e+00)
			fftw3<dbl> id= fftw3<dbl> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:6]	(9 calls X 6.666667e-01)
			fftw3<dbl> *= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:3]	(6 calls X 5.000000e-01)
			fftw3<dbl> id= dbl : (real:0:0:2,user:0:0:0,sys:0:0:1)[clk:241]	(516 calls X 4.670543e-01)
			fftw3<dbl> /= dbl : (real:0:0:3,user:0:0:3,sys:0:0:0)[clk:333]	(618 calls X 5.388349e-01)
			fftw4<cdbl> id= fftw4<cdbl> : (real:0:0:9,user:0:0:9,sys:0:0:0)[clk:967]	(406 calls X 2.381773e+00)
			fftw4<dbl> id= fftw4<dbl> : (real:0:0:4,user:0:0:4,sys:0:0:0)[clk:436]	(205 calls X 2.126829e+00)
			fftw4<cdbl> id= (s2v<basic3<dbl>> * ((fftw4<cdbl> * dbl) + fftw4<cdbl>)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:10]	(2 calls X 5.000000e+00)
			fftw4<cdbl> id= (s2v<basic3<dbl>> * (fftw4<cdbl> * dbl)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:8]	(2 calls X 4.000000e+00)
			fftw4<cdbl> *= s2v<basic3<dbl>> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:5]	(2 calls X 2.500000e+00)
			fftw4<cdbl> += fftw4<cdbl> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:6]	(2 calls X 3.000000e+00)
			fftw4<cdbl> id= ((fftw4<cdbl> * dbl) * s2v<basic3<dbl>>) : (real:0:0:3,user:0:0:3,sys:0:0:0)[clk:378]	(100 calls X 3.780000e+00)
			fftw4<cdbl> -= fftw4<cdbl> : (real:0:0:3,user:0:0:3,sys:0:0:0)[clk:319]	(100 calls X 3.190000e+00)
			fftw4<cdbl> -= ((fftw4<cdbl> * dbl) * s2v<basic3<dbl>>) : (real:0:0:5,user:0:0:5,sys:0:0:0)[clk:539]	(100 calls X 5.390000e+00)
			fftw4<cdbl> id= (s2v<basic3<dbl>> swp(*) (fftw4<cdbl> + (fftw4<cdbl> * dbl))) : (real:0:0:10,user:0:0:10,sys:0:0:0)[clk:1011]	(200 calls X 5.055000e+00)
	cubby::field timer root : (real:0:1:36,user:0:1:36,sys:0:0:0)[clk:9697]
		scalar::transpose_blocks_when_received : (real:0:1:15,user:0:1:15,sys:0:0:0)[clk:7565]	(3072 calls X 2.462565e+00)
			scalar::copy_transposed : (real:0:0:8,user:0:0:8,sys:0:0:0)[clk:895]	(49152 calls X 1.820882e-02)
		vector::in_place_curl : (real:0:0:6,user:0:0:6,sys:0:0:0)[clk:644]	(204 calls X 3.156863e+00)
		vector::vec_prod : (real:0:0:7,user:0:0:7,sys:0:0:0)[clk:759]	(204 calls X 3.720588e+00)
		vector::project : (real:0:0:3,user:0:0:3,sys:0:0:0)[clk:379]	(102 calls X 3.715686e+00)
		scalar::dealias : (real:0:0:3,user:0:0:3,sys:0:0:0)[clk:309]	(612 calls X 5.049019e-01)
		scalar::local_energy : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:41]	(66 calls X 6.212121e-01)
[ponty@projekct TEST1]$              
Last modified 10 years ago Last modified on Aug 19, 2010 1:12:40 PM