[Carpet] inefficient recovery from a checkpoint

Christian D. Ott cott at as.arizona.edu
Fri Oct 19 17:58:42 CEST 2007


Hi,

On Fri, Oct 19, 2007 at 05:12:18AM -0500, Peter Diener wrote:

> With the attached parfile it seems that when recovery is done for levels 
> 0-5 all information on processor 0 is read from checkpoint file 0. Then for 
> level 6, it somehow thinks it needs to read information from additional 
> checkpoint files. At some point while reading those files the run finally 
> dies with the following error:
>
> terminate called after throwing an instance of 'std::bad_alloc'
>   what():  St9bad_alloc
>
> The question then is: why does it need information from multiple checkpoint 
> files, when it is restarted on exactly the same number of processors?


interesting Carpet behavior. Which hdf5 version is your code linked with?
If you are not using 1.8, then I suggest you try it out -- it does reduce
the memory overhead that 1.6.x generate when iterating through files. 
Of course this is not fixing the problem, just relieving its symptoms...

 - Christian 




More information about the developers mailing list