[Carpet] Too many HDF5 files?
Christian David Ott
cott at as.arizona.edu
Wed Apr 25 20:50:38 CEST 2007
On Wed, 25 Apr 2007, Erik Schnetter wrote:
> On Apr 25, 2007, at 11:34:31, Thomas Radke wrote:
>
>> Erik Schnetter wrote:
>
> However, I think >>> Should we implement another method
>>> to reduce the number of files? We could e.g. write one output file per
>>> group, per thorn, or even per iteration. Do you have another suggestion?
>>
>> One file per group shouldn't be difficult to implement, one just needs
>> to copy the logic which is already there in IOScalar or IOASCII, right ?
>> One file per thorn sounds a little strange to me. One per iteration
>> basically amount to writing a checkpoint.
>>
>>> If we combine different variables into the same file, is there anything
>>> special that needs to be added? I think it should be possible to just
>>> write several variables into the same file, adding the meta-information
>>> (grid structure etc.) only once. Is that correct?
>>
>> Yes, I think so. I'll take a closer look on the one_file_per_group
>> implementation of IOScalar.
>
> If it is not too difficult, then I can also try that myself.
Note that a number of our analysis tools (for example: Amira) may
potentially (or will probably) choke on multiple variables in a Carpet
hdf5 output file. So grouping multiple variables into a single output file
might not be the best way to do things.
>> Would that be enough in order to reduce the total number of HDF5 output
>> files ?
>
> the basic problem is that there is one file per processor --
> on 1000 processors the output directory will just overflow. Maybe the
> correct approach would be to have different output subdirectories on
> different processors? This should also improve performance a bit, since
> changes to the subdirectories are then interesting only for a single
> processor, whereas with a global output directory each processor has to be
> informed about newly created files, resulting in lock contention. (If I
> understood Maciek correctly.)
I agree with this. The number of cpus (which is increasing giving better
scaling of future/experimental Carpet versions) is going to be the major
issue here and having the option of using a separate output directory for
each cpu sounds good.
- Christian
More information about the developers
mailing list