wiki:DevHdf5

HDF5 in the Cubby code

In the Cubby code, a lot of complex, large and heterogeneous data are managed. So a file format which is known by the scientific community and which provide very fast and parallel access is needed. This short page explains why we have chosen the HDF5 file format (Hierarchical Data Format), why is it very convenient to use with typical softwares (Matlab, IDL, ...) and how is this format used in the Cubby code.

About HDF5

We have chosen the HDF5 file format for several reasons, here are the main features of HDF5 1.8.3 :

  • Hierarchical format which can manage large and complex data collections
  • Allow to read and write data at very high speeds
  • Both C, C++, Java and Fortan interfaces
  • Designed to support parallel I/O
  • Support for data compression
  • Meta data can be embedded to store informations
  • Free software
  • Good maturity (20 years of development history)
  • Used by a wide range of engineering and scientific fields

For more informations about HDF5, see the HDF group website.

https://www.llnl.gov/str/April03/gifs/Cook2.jpg

A sample HDF5 file with groups to provide structure, datasets, raster images, and a palette.

Useful utilities about HDF5

  • h5ls : like Unix ls, it can be used to view the directory of the HDF5 file (the -r option show all the file hierarchy).
$ h5ls -r energ_b.h5
/                        Group
/Energy_outputs          Group
/Energy_outputs/Bench01  Dataset {11, 2}
/Energy_outputs/Bench02  Dataset {11, 2}
  • h5dump : it can be used to examine the content of a hdf5 file (the -m option is used to set the displayed format).
$ h5dump -m "%.9e" energ_b.h5
HDF5 "energ_b.h5" {
GROUP "/" {
   GROUP "Energy_outputs" {
      DATASET "Bench01" {
         DATATYPE  H5T_IEEE_F64LE
         DATASPACE  SIMPLE { ( 11, 2 ) / ( 11, 2 ) }
         DATA {
         (0,0): 0.000000000e+00, 1.500000000e+00,
         ... 
         (10,0): 1.000000000e+00, 2.029760570e-01
         }
      }
   }
}
}
  • h5cc : like Unix gcc, it can be used to compile modules which use HDF5.

This command can also be used to obtain some informations about the installed version of the HDF5 library :

$ h5cc -showconfig
  • hdfview : visual tool for browsing (file hierarchy in a tree structure) and editing (create new files, add or delete groups and datasets, ...) HDF4 and HDF5 files.

Note that h5ls, h5dump and h5cc are included in the HDF5 library (1.8.3 version of HDF5 available on this page) and hdfview can be found on this page. But a lot of software using HDF5 by application type can be found on this page.

How to convert existing Cubby data to HDF5 ?

In the Cubby code, an utility called convert_to_hdf5 can be used to generate hdf5 files from existing Cubby files (*.dat or *.datf). The HDF5 converter is automatically built when a make command in the work directory is used. But this utility can be also built with a make command in the HDF directory : source:/cubby/branches/work/Util_hdf.

For example, to convert the vx.dat file into Bench_42.h5 file in putting the converted dataset into the /Velocity group :

$ ./convert_to_hdf5 vx.dat -o Bench_42 -g Velocity
Successfully conversion done
$ h5ls -r Bench_42.h5
/                        Group
/Velocity                Group
/Velocity/vx             Dataset {64, 64, 64}

The add option (-a <file>) can be interesting to obtain only one hdf5 file from several Cubby data. For example :

$ ./convert_to_hdf5 vy.dat -a Bench_42.h5 -g Velocity
Successfully conversion done
$ h5ls -r Bench_42.h5
/                        Group
/Velocity                Group
/Velocity/vx             Dataset {64, 64, 64}
/Velocity/vy             Dataset {64, 64, 64}

To obtain one structured HDF5 file from several Cubby data files of the same run, a little Python script called run_data_to_hdf5 can be used :

$ ./run_data_to_hdf5.py -v
Converting point.data ...
Converting fbx.dat ...
Converting 00001_vx.dat ...
...

Screenshot of a HDF5 file created by the Python script

The HDF5 datasets are order by output type but this structure can easily be changed by the script.

For more informations about the HDF5 converter utilization, see the manual.

In the future, the Cubby code will be able to provide directly HDF5 outputting as an optional feature for storing data.

How to deal with HDF5 data in the Cubby code ?

The I/O file format can be chosen with the several options :

  • --io-format : set the file format for read and write operations (legacy by default)
  • --in-format : set the file format for read operations (legacy by default)
  • --out-format : set the file format for write operations (legacy by default)

To use HDF5, the parallel version of the library must be installed :

$ ./cubby --cube-dim 32 --io-format hdf5

Matlab & HDF5

  • To load a HDF5 dataset when the path is known :
>> data_vx = hdf5read('Bench_42.h5', '/Velocity/vx')
  • To obtain informations about a HDF5 file :
>> info = hdf5info('Bench_42.h5')
  • To display a graphical browser of a HDF5 file and import items :
>> tool = hdftool('Bench_42.h5')

Some tips about how to manage HDF5 data with Matlab are available on this page.

IDL & HDF5

  • To load a HDF5 file and a dataset when the path is known :
IDL> file_id = H5F_OPEN("Bench_42.h5")
IDL> data_id = H5D_OPEN(file_id, "/Velocity/vx")
IDL> data_vx = H5D_READ(data_id)
IDL> help, data_vx
DATA_VX         DOUBLE    = Array[64, 64, 64]
  • To parse a HDF5 file in creating an IDL structure containing object information and data :
IDL> bench_data = H5_PARSE('Bench_42.h5', /READ_DATA)
IDL> help, bench_data.velocity.vx, /STRUCTURE
** Structure <61c798>, 14 tags, length=296, data length=292, refs=2:
   _NAME           STRING    'vx'
   _TYPE           STRING    'DATASET'
   _FILE           STRING    'Bench_42.h5'
   _PATH           STRING    '/Velocity'
   _DATA           DOUBLE    Array[64, 64, 64]
   _NELEMENTS      ULONG64   262144
   _DATATYPE       STRING    'H5T_FLOAT'
   _PRECISION      LONG      64
   _VX_PARAMETERS  STRUCT    -> <Anonymous> Array[1]
   ...
  • To display a graphical browser of a HDF5 file and import items :
IDL> result = H5_BROWSER('Bench_42.h5')

Screenshot of the IDL hdf5 browser

Thanks to the HDF5 browser, it is possible to display file contents and to import a group or a dataset into an IDL session.

Note that if the DIALOG_READ keyword is specified (bench_browser = H5_BROWSER('Bench_42.h5', /DIALOG_READ) then the result is a structure containing the selected group or dataset (as described in the H5_PARSE function), or a zero if the Cancel button was pressed. If the DIALOG_READ keyword is not specified then the result is the widget ID of the HDF5 browser.

Some tips about how to manage HDF5 data with IDL are available on this page.

Last modified 11 years ago Last modified on Oct 8, 2009 1:40:12 AM

Attachments (2)

Download all attachments as: .zip