wiki:Licallo2048P2P

Description

vb_basic test in release mode.

This is for a 20483 on 64 nodes*2 cpu*4 cores.

Right now, transposition is dominant, need to see if multi-threading, by reducing the number of messages (while increasing their size) would help. mu;ti threading can esaly be applied to the host computation spots (FFT mostly).

Submission script

#!/bin/bash
#OAR -n vb2048-p2p
#OAR -l /nodes=64/cpu=2,walltime=20:00:00
#OAR -p gpu='NO'

ulimit -c unlimited
. /softs/openmpi-1.4.3-intel-11/env.sh
. /softs/boost_1_47_0-intel-11/env.sh
. /softs/fftw-3.3-intel-11/env.sh

echo $LD_LIBRARY_PATH

/softs/openmpi-1.4.3-intel-11/bin/mpiexec --mca mpi_yield_when_idle 1 --mca mpi_leave_pinned 0 --mca mpi_preconnect_mpi 1 -x LD_LIBRARY_PATH -npernode 8 -mac
hinefile $OAR_FILE_NODES ../../../cubby --fft-verbose --cube-dim=2048 --fftw-planner=measure --fft-eff=speed --transposition=p2p

cubby.data

magnetic
Nts                 100
out_energy          10
v_check             1
velocity_samples_nb 10

The performance trace

   512 Processor run, global<2048,2048,2048>
   for 100 time steps
main timer : (real:1:1:46,user:0:43:55,sys:0:17:50)[clk:370607]
	loop : (real:0:57:22,user:0:40:51,sys:0:16:29)[clk:344217]	(100 calls X 3.442170e+03)
	FFT timer : (real:0:50:11,user:0:33:14,sys:0:16:56)[clk:301102]
		planification time : (real:0:0:9,user:0:0:8,sys:0:0:1)[clk:955]
		FFT and transposition time : (real:0:50:1,user:0:33:5,sys:0:16:55)[clk:300147]	(557 calls X 5.388635e+02)
			FFT only : (real:0:15:45,user:0:15:45,sys:0:0:0)[clk:94588]	(4773 calls X 1.981731e+01)
			FFTW Transposition only : (real:0:33:38,user:0:16:42,sys:0:16:55)[clk:201819]	(2108 calls X 9.573956e+01)
	azur::array timer root : (real:0:6:35,user:0:6:13,sys:0:0:21)[clk:39523]
		view = expr : (real:0:6:35,user:0:6:13,sys:0:0:21)[clk:39523]	(1680 calls X 2.352559e+01)
			fftw3<dbl> id= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:11]	(2 calls X 5.500000e+00)
			fftw4<c<dbl>> /= long int : (real:0:0:35,user:0:0:35,sys:0:0:0)[clk:3527]	(208 calls X 1.695673e+01)
			fftw4<dbl> id= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:17]	(1 calls X 1.700000e+01)
			fftw3<dbl> id= fftw3<dbl> : (real:0:1:19,user:0:1:16,sys:0:0:2)[clk:7925]	(654 calls X 1.211774e+01)
			fftw4<dbl> *= dbl : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:15]	(1 calls X 1.500000e+01)
			fftw4<c<dbl>> id= fftw4<c<dbl>> : (real:0:0:53,user:0:0:53,sys:0:0:0)[clk:5351]	(202 calls X 2.649010e+01)
			fftw4<dbl> id= fftw4<dbl> : (real:0:0:32,user:0:0:13,sys:0:0:18)[clk:3200]	(107 calls X 2.990654e+01)
			fftw4<c<dbl>> id= (s2v<basic3<dbl>> * ((fftw4<c<dbl>> * dbl) + fftw4<c<dbl>>)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:52]	(1 calls X 5.200000e+01)
			fftw4<c<dbl>> id= (s2v<basic3<dbl>> * ((fftw4<c<dbl>> * dbl) + fftw4<c<dbl>>)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:49]	(1 calls X 4.900000e+01)
			fftw4<c<dbl>> id= (s2v<basic3<dbl>> * (fftw4<c<dbl>> * dbl)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:34]	(1 calls X 3.400000e+01)
			fftw4<c<dbl>> *= s2v<basic3<dbl>> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:45]	(1 calls X 4.500000e+01)
			fftw4<c<dbl>> += fftw4<c<dbl>> : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:33]	(1 calls X 3.300000e+01)
			fftw4<c<dbl>> id= ((fftw4<c<dbl>> * s2v<basic3<dbl>>) + (s2v<basic3<dbl>> * (fftw4<c<dbl>> * dbl))) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:65]	(1 calls X 6.500000e+01)
			fftw3<c<dbl>> id= (fftw3<c<dbl>> swp(+) (fftw3<c<dbl>> + fftw3<c<dbl>>)) : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:58](4 calls X 1.450000e+01)
			fftw4<c<dbl>> id= ((fftw4<c<dbl>> * dbl) * s2v<basic3<dbl>>) : (real:0:0:34,user:0:0:35,sys:0:0:0)[clk:3491]	(99 calls X 3.526263e+01)
			fftw4<c<dbl>> -= fftw4<c<dbl>> : (real:0:0:29,user:0:0:29,sys:0:0:0)[clk:2916]	(99 calls X 2.945455e+01)
			fftw4<c<dbl>> -= ((fftw4<c<dbl>> * dbl) * s2v<basic3<dbl>>) : (real:0:0:42,user:0:0:42,sys:0:0:0)[clk:4286]	(99 calls X 4.329293e+01)
			fftw4<c<dbl>> id= (s2v<basic3<dbl>> swp(*) (fftw4<c<dbl>> + (fftw4<c<dbl>> * dbl))) : (real:0:1:24,user:0:1:24,sys:0:0:0)[clk:8446]	(198 calls X 4.265657e+01)
	cubby::field timer root : (real:0:36:46,user:0:19:50,sys:0:16:56)[clk:220685]
		scalar::transpose_blocks_when_received : (real:0:33:35,user:0:16:42,sys:0:16:53)[clk:201567]	(3222 calls X 6.255959e+01)
			scalar::copy_transposed : (real:0:1:45,user:0:1:45,sys:0:0:0)[clk:10549]	(814592 calls X 1.295004e-02)
		vector::in_place_curl : (real:0:0:31,user:0:0:31,sys:0:0:0)[clk:3135]	(202 calls X 1.551980e+01)
		scalar::local_energy : (real:0:0:2,user:0:0:2,sys:0:0:0)[clk:271]	(66 calls X 4.106061e+00)
		vector::vec_prod : (real:0:0:51,user:0:0:51,sys:0:0:0)[clk:5156]	(202 calls X 2.552475e+01)
		vector::project : (real:0:0:38,user:0:0:38,sys:0:0:0)[clk:3826]	(101 calls X 3.788119e+01)
		scalar::dealias : (real:0:0:58,user:0:0:58,sys:0:0:0)[clk:5892]	(606 calls X 9.722773e+00)
		vector::div : (real:0:0:2,user:0:0:1,sys:0:0:0)[clk:252]	(4 calls X 6.300000e+01)
		scalar::dx : (real:0:0:2,user:0:0:1,sys:0:0:0)[clk:214]	(16 calls X 1.337500e+01)
		scalar::dy : (real:0:0:2,user:0:0:1,sys:0:0:0)[clk:220]	(16 calls X 1.375000e+01)
		scalar::dz : (real:0:0:2,user:0:0:1,sys:0:0:0)[clk:221]	(16 calls X 1.381250e+01)
		vector::max_abs_pos_help : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:94]	(4 calls X 2.350000e+01)
		vector::local_max_abs_pos_help : (real:0:0:0,user:0:0:0,sys:0:0:0)[clk:50]	(4 calls X 1.250000e+01)

[alainm@login02 vb_basic]$ 

Results

energy_b

0.000000e+00 1.500000000000794e+00
1.000000e-02 1.470332161400734e+00
2.000000e-02 1.441308650819588e+00
3.000000e-02 1.412902100169330e+00
4.000000e-02 1.385088266721843e+00
5.000000e-02 1.357845578006378e+00
6.000000e-02 1.331154740654514e+00
7.000000e-02 1.304998405197896e+00
8.000000e-02 1.279360879032680e+00
9.000000e-02 1.254227880876765e+00
1.000000e-01 1.229586330981230e+00

energy_v

0.000000e+00 1.250000000000056e-01
1.000000e-02 1.177205428581816e-01
2.000000e-02 1.108648892214804e-01
3.000000e-02 1.044082833349543e-01
4.000000e-02 9.832744665994622e-02
5.000000e-02 9.260048207102142e-02
6.000000e-02 8.720678576620740e-02
7.000000e-02 8.212696607670689e-02
8.000000e-02 7.734276841312238e-02
9.000000e-02 7.283700569788115e-02
1.000000e-01 6.859349372779107e-02
Last modified 9 years ago Last modified on Oct 28, 2011 5:35:46 PM