[Carpet] inefficient recovery from a checkpoint

Peter Diener diener at cct.lsu.edu
Fri Oct 19 18:29:29 CEST 2007


Hi,

I am actually using hdf5-1.8.0-beta3 on abe (from Erik's home directory).

I just tried to restart with Thomas' patch to only have one file open at a 
time. And it worked. My run is going along nicely again. It seems that it
was only on refinement level 6 that it needed to read other files. On 
level 7, 8 and 9 it only read checkpoint file 0 again.

Cheers,

   Peter


On Fri, 19 Oct 2007, Christian D. Ott wrote:

>
> Hi,
>
> On Fri, Oct 19, 2007 at 05:12:18AM -0500, Peter Diener wrote:
>
>> With the attached parfile it seems that when recovery is done for levels
>> 0-5 all information on processor 0 is read from checkpoint file 0. Then for
>> level 6, it somehow thinks it needs to read information from additional
>> checkpoint files. At some point while reading those files the run finally
>> dies with the following error:
>>
>> terminate called after throwing an instance of 'std::bad_alloc'
>>   what():  St9bad_alloc
>>
>> The question then is: why does it need information from multiple checkpoint
>> files, when it is restarted on exactly the same number of processors?
>
>
> interesting Carpet behavior. Which hdf5 version is your code linked with?
> If you are not using 1.8, then I suggest you try it out -- it does reduce
> the memory overhead that 1.6.x generate when iterating through files.
> Of course this is not fixing the problem, just relieving its symptoms...
>
> - Christian
>
>
> _______________________________________________
> developers mailing list
> developers at lists.carpetcode.org
> http://lists.carpetcode.org/listinfo/developers
>
>


More information about the developers mailing list