The performance is traced by timing of cubby and bench_fft runs.

the of the bench :

Nts              100
out_energy       10

The trunk 2009 is now the yannick_trunk branch
the trunk 2010 is the actual trunk branch to be compared with the yannick_trunk version or something else.

the runs has been bench with the resolution 2563 and 5123 on different platform

For the trunk branch 2010 : write the resolution in the file cubby.cfg

cube-dim=256 (or 512) 

For the yannick_trunk branch : change the resolution in cubby.hpp file with NY = NX/(processor number) and compile.

//  PARAMETERS               

#define NPROC 32      //  Number of processors
#define NX 256
#define NY  8       //  if NPROC > 1, NY*NPROC is the total y size
#define NZ 256 
#define NMAX 256         //  must be max of NX, NY*NPROC, NZ

the time is in second(CPU time).

Sporadic traces

Some results regarding the clk for future reference.

Benchmark 2011 from version 2109

Analyse of the trace : FFT balance with transpostion

See the results in the files attached.

