[Carpet] Too many HDF5 files?

Christian David Ott cott at as.arizona.edu
Wed Apr 25 20:50:38 CEST 2007




On Wed, 25 Apr 2007, Erik Schnetter wrote:

> On Apr 25, 2007, at 11:34:31, Thomas Radke wrote:
>
>> Erik Schnetter wrote:
>
> However, I think >>> Should we implement another  method
>>> to reduce the number of files?  We could e.g. write one output  file per
>>> group, per thorn, or even per iteration.  Do you have  another suggestion?
>> 
>> One file per group shouldn't be difficult to implement, one just needs
>> to copy the logic which is already there in IOScalar or IOASCII, right ?
>> One file per thorn sounds a little strange to me. One per iteration
>> basically amount to writing a checkpoint.
>> 
>>> If we combine different variables into the same file, is there  anything
>>> special that needs to be added?  I think it should be  possible to just
>>> write several variables into the same file, adding  the meta-information
>>> (grid structure etc.) only once.  Is that correct?
>> 
>> Yes, I think so. I'll take a closer look on the one_file_per_group
>> implementation of IOScalar.
>
> If it is not too difficult, then I can also try that myself.

Note that a number of our analysis tools (for example: Amira) may 
potentially (or will probably) choke on multiple variables in a Carpet 
hdf5 output file. So grouping multiple variables into a single output file 
might not be the best way to do things.


>> Would that be enough in order to reduce the total number of HDF5 output
>> files ?
>
> the basic problem is that there is one file per processor -- 
> on 1000 processors the output directory will just overflow.  Maybe the 
> correct approach would be to have different output subdirectories on 
> different processors?  This should also improve performance a bit, since 
> changes to the subdirectories are then interesting only for a single 
> processor, whereas with a global output directory each processor has to be 
> informed about newly created files, resulting in lock contention.  (If I 
> understood Maciek correctly.)

I agree with this. The number of cpus (which is increasing giving better 
scaling of future/experimental Carpet versions) is going to be the major 
issue here and having the option of using a separate output directory for
each cpu sounds good.

  - Christian



More information about the developers mailing list