FFTW in the Cubby code
The Cubby code needs to calculate a lot of Fourier transforms at each time step. So, the way to compute Fourier transforms plays a very important role in the Cubby code's performances. This short page explains why we have chosen FFTW(Fastest Fourier Transformation in the West) and how is this library used in the Cubby code.
About FFTW
We have chosen FFTW for several reasons, here are the main features of FFTW 3.2.1 :
- Very fast transforms of purely real input or output data
- Arbitrary-size transforms
- Both one-dimensional and multi-dimensional transforms
- Both C and Fortan interfaces
- Free software (released under the GNU License)
For more informations about FFTW, see their web page.
How FFTW is used in the Cubby code ?
With FFTW, computing a Fourier transform requires 4 steps:
- Allocation of the input and output arrays (fftw-malloc can be used)
real_type* data_as_real = get_storage_impl().data(); fftw_complex* data_as_cplx = reinterpret_cast<fftw_complex*>(get_storage_impl().data());
- Building of the plan (which depends on the size, the sense of the transform, the planner flag, the input, the output and other parameters)
fftw_plan p1 = fftw_plan_guru_dft_r2c( 1, &dims_z,1,&howmany_dims_z, data_as_real, data_as_cplx, FFTW_ESTIMATE);
- Execution of the plan
fftw_execute(p1);
- Free up memory
fftw_destroy_plan(p1);
In our code, the class where the Fourier transforms are computed is fftw_engine.cpp, located in the following directory : source:/cubby/branches/work/libcubby/cubby/field/fft/fftw_engine.cpp.
For more informations about the FFTW utilization, see their manual.
Influence of the planner flag
The planner flag control the rigor of the planning process, and can also impose restrictions on the type of transform algorithm that is employed. The different flags are, by increasing order of rigor : FFTW_ESTIMATE, FFTW_MEASURE, FFTW_PATIENT and FFTW_EXHAUSTIVE.
In our case, as the number of Fourier transform is gigantic, we use the flag FFTW_EXHAUSTIVE. Here are some results which shows the influence of the planner flag in the computing time (computing CPU time in seconds, with 10 000 time steps and a cube dimension of 256) :
- With 4 processors in mpifull4 :
Planner flag Vector test Scalar test FFTW_ESTIMATE 28 043 30 933 FFTW_MEASURE 23 656 26 406 FFTW_PATIENT 21 725 24 481 FFTW_EXHAUSTIVE 21 530 24 135
- With 16 processors in mpifull4 :
Planner flag Vector test Scalar test FFTW_ESTIMATE 8058 8873 FFTW_MEASURE 7 534 8 420 FFTW_PATIENT 7 145 7 808 FFTW_EXHAUSTIVE 6 523 7 139
Note that these tests are located in the following directory: source:/cubby/branches/work/main_bench_fft.cpp. With such a large number of Fourier transforms, we can save a lot of time by using the Exhaustive flag. In this case, the benchmark with the Exhaustive flag can run more than 23% faster than with the Estimate flag.
For more informations, see the documentation about the planner flags.
Speed up the usage with FFTW Wisdom
Thanks to FFTW, the strategy used for the first transform is reused for all the next transforms. Here is the time in seconds spent by FFTW to find the best strategy (planning time) for a cube dimension of 256 with 4 processors in mpifull4 :
Planner flag Planning time 1 FFT computing time (vectorial meth.) 1 FFT computing time (scalar meth.) FFTW_ESTIMATE 2.8 1.55 1.61 FFTW_MEASURE 6.1 1.29 1.34 FFTW_PATIENT 62.4 1.19 1.26 FFTW_EXHAUSTIVE 311.8 1.09 1.15
To minimize the planning time, FFTW Wisdom can be used. Indeed, this mechanism saves the plans to disk and restores them. In the Cubby code, we don't use this at the moment because all the nodes of the cluster are not necessarily in the same configuration and the plan which is used can be suboptimal.
How to set the planner flag ?
By default, the planner flag used for each Fourier transform is Exhaustive. But it is possible to set this flag manually in adding --fftw-planner [flag name] to the executable files launch command.Note that some abbreviations are allowed (EST, MEA, PAT and EXH) and you can use capital or small letters to set the planner flag.
$ ./cubby --cube-dim 256 --fftw-planner Mea
For example, this command will execute the Cubby code with a cube dimension of 256 and will use the planner flag FFTW_MEASURE.