[Carpet] inefficient recovery from a checkpoint
Peter Diener
diener at cct.lsu.edu
Fri Oct 19 12:12:18 CEST 2007
Hi,
Some more detailed info (using IO::verbose = "full") from a job (on abe)
that fails to restart on the same number of processors as the checkpoint
files where produced with. The run producing the checkpoint files used
around 800-900 Mb per process and abe has 1Gb per core.
With the attached parfile it seems that when recovery is done for levels
0-5 all information on processor 0 is read from checkpoint file 0. Then
for level 6, it somehow thinks it needs to read information from
additional checkpoint files. At some point while reading those files the
run finally dies with the following error:
terminate called after throwing an instance of 'std::bad_alloc'
what(): St9bad_alloc
The question then is: why does it need information from multiple
checkpoint files, when it is restarted on exactly the same number of
processors?
Cheers,
Peter
On Thu, 18 Oct 2007, Erik Schnetter wrote:
> On Oct 18, 2007, at 08:46:53, Thomas Radke wrote:
>
>> Hi Erik,
>>
>> while debugging a BBH run recovery problem (the run always aborted due
>> to excessive memory allocation) I discovered that during recovery each
>> processor would iterate through _all_ the checkpoint files to read data
>> from, even though the run was restarted on the same number of
>> processors. Did something change in the way Carpet (the experimental
>> version) sets up the grid structure ?
>
> I am not aware of any changes that should cause this. However, I did
> recently make changes to the HDF5 I/O routines, and I could have introduced
> an error there. My changes added timers measuring I/O times and bytes
> transferred.
>
>> The parameter file I used has
>>
>> Carpet::regrid_during_recovery = no
>> IOHDF5::use_grid_structure_from_checkpoint = yes
>>
>> in it. What's the first parameter setting good for ?
>
> These parameters are for the development version of Carpet, or for old
> versions of the experimental version. The first parameter ensures that
> Carpet does not use a grid structure as chosen by a regridding thorn to
> regrid. A regridding thorn right before recovery would probably choose a
> slightly different grid structure. The old regridding mechanism in Carpet
> would read in the base level, regrid, read the next level, regrid, etc., all
> ignoring the grid structure in the checkpoint file. The new mechanism reads
> the grid structure from the checkpoint file, regrids once, and then never
> regrids again until the evolution has re-started.
>
> -erik
>
> --
> Erik Schnetter <schnetter at cct.lsu.edu>
>
> My email is as private as my paper mail. I therefore support encrypting
> and signing email messages. Get my PGP key from www.keyserver.net.
>
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ll_8.out
Type: application/octet-stream
Size: 673775 bytes
Desc:
Url : /archives/developers/attachments/20071019/e45d256a/attachment-0001.obj
-------------- next part --------------
#change me
#==============================================================================
# mass ratio 1/8, spin down a/M = 0.8
#==============================================================================
Cactus::cctk_run_title = "BBH inspiral"
Cactus::cctk_full_warnings = yes
Cactus::terminate = time
Cactus::cctk_final_time = 800.0
ActiveThorns = "Fortran"
ActiveThorns = "LocalInterp AEILocalInterp LocalReduce"
ActiveThorns = "Slab"
ActiveThorns = "IOUtil"
ActiveThorns = "Carpet CarpetLib CarpetInterp CarpetReduce CarpetSlab"
ActiveThorns = "NaNChecker"
ActiveThorns = "Boundary CartGrid3D CoordBase SymBase "
ActiveThorns = "CarpetRegrid2 SphericalSurface"
ActiveThorns = "ADMBase ADMCoupling ADMMacros CoordGauge SpaceMask StaticConformal PunctureTracker"
ActiveThorns = "Time"
ActiveThorns = "MoL"
ActiveThorns = "BSSN_MoL"
ActiveThorns = "ADMAnalysis"
ActiveThorns = "AHFinderDirect SphericalSurface"
ActiveThorns = "PsiKadelia WaveExtract"
ActiveThorns = "SphericalHarmonics SummationByParts"
ActiveThorns = "ADMConstraints"
ActiveThorns = "CarpetIOScalar CarpetIOASCII CarpetIOBasic CarpetIOHDF5"
ActiveThorns = "TwoPunctures"
ActiveThorns = "Dissipation"
ActiveThorns = "TimerReport MPIClock Formaline"
ActiveThorns = "ReflectionSymmetry " #RotatingSymmetry180"
ActiveThorns = "TmunuBase IsolatedHorizon"
ActiveThorns = "Kick"
### Initial data
ADMBase::metric_type = physical
ADMBase::initial_data = twopunctures
ADMBase::initial_lapse = twopunctures-averaged
ADMBase::initial_shift = zero
TwoPunctures::par_b = 3.99528
TwoPunctures::par_m_plus = 0.0641779
TwoPunctures::par_m_minus = 0.549007
TwoPunctures::par_P_plus[1] = 0.0482051
TwoPunctures::par_P_minus[1] = -0.0482051
TwoPunctures::par_S_plus[2] = -0.00987654
TwoPunctures::par_S_minus[2] = -0.632099
TwoPunctures::TP_epsilon = 1e-6
TwoPunctures::npoints_A = 68
TwoPunctures::npoints_B = 68
TwoPunctures::npoints_phi = 34
#TwoPunctures::grid_setup_method = evaluation
TwoPunctures::verbose = yes
TwoPunctures::do_residuum_debug_output = yes
TwoPunctures::do_initial_debug_output = yes
TwoPunctures::center_offset[0] = 3.10743
#Carpet::init_3_timelevels = no
MoL::initial_data_is_crap = yes
### Grid setup
Time::dtfac = 0.25
Carpet::time_refinement_factors = "[1,1,2,4,8,16,32,64,128,256]"
Carpet::verbose = no
Carpet::domain_from_coordbase = yes
CartGrid3D::type = coordbase
ReflectionSymmetry::reflection_x = no
ReflectionSymmetry::reflection_y = no
ReflectionSymmetry::reflection_z = yes
ReflectionSymmetry::avoid_origin_x = no
ReflectionSymmetry::avoid_origin_y = no
ReflectionSymmetry::avoid_origin_z = no
CoordBase::boundary_size_z_lower = 3
CoordBase::boundary_shiftout_z_lower = 1
CoordBase::domainsize = minmax
CoordBase::xmax = 258.048
CoordBase::ymax = 258.048
CoordBase::zmax = 258.048
CoordBase::xmin =-258.048
CoordBase::ymin =-258.048
CoordBase::zmin = 0.0
CoordBase::dx = 3.072
CoordBase::dy = 3.072
CoordBase::dz = 3.072
Carpet::max_refinement_levels = 10
CarpetRegrid2::regrid_every = 16
CarpetRegrid2::num_centres = 2
CarpetRegrid2::num_levels_1 = 10
CarpetRegrid2::position_x_1 = 7.10271
CarpetRegrid2::radius_1 [1] = 80.0 # 1.536
CarpetRegrid2::radius_1 [2] = 20.0 # 0.768
CarpetRegrid2::radius_1 [3] = 12.0 # 0.384
CarpetRegrid2::radius_1 [4] = 6.4 # 0.192
CarpetRegrid2::radius_1 [5] = 3.2 # 0.096
CarpetRegrid2::radius_1 [6] = 1.6 # 0.048
CarpetRegrid2::radius_1 [7] = 0.8 # 0.024
CarpetRegrid2::radius_1 [8] = 0.4 # 0.012
CarpetRegrid2::radius_1 [9] = 0.2 # 0.006
CarpetRegrid2::movement_threshold_1 = 0.04
CarpetRegrid2::position_x_2 = -0.88785
CarpetRegrid2::num_levels_2 = 8
CarpetRegrid2::radius_2 [1] = 80.0
CarpetRegrid2::radius_2 [2] = 20.0
CarpetRegrid2::radius_2 [3] = 12.0
CarpetRegrid2::radius_2 [4] = 6.4
CarpetRegrid2::radius_2 [5] = 3.2
CarpetRegrid2::radius_2 [6] = 1.6
CarpetRegrid2::radius_2 [7] = 0.8
CarpetRegrid2::movement_threshold_2 = 0.16
Driver::ghost_size = 3
Carpet::use_buffer_zones = yes
Carpet::prolongation_order_space = 5
Carpet::prolongation_order_time = 2
Carpet::convergence_level = 0
Carpet::enable_all_storage = no
Carpet::regrid_in_level_mode = yes
Carpet::regrid_during_initialisation = no
Carpet::init_each_timelevel = no
### Refinement tracking
PunctureTracker::track [0] = yes
PunctureTracker::initial_x[0] = +7.10271
PunctureTracker::track [1] = yes
PunctureTracker::initial_x[1] = -0.88785
PunctureTracker::modify_puncture[0] = 0
PunctureTracker::modify_puncture[1] = 1
PunctureTracker::modify_distance = 0.1
PunctureTracker::new_reflevel_number[0] = 7
PunctureTracker::new_reflevel_number[1] = 7
### Evolution
MoL::ODE_Method = rk4
MoL::MoL_Intermediate_Steps = 4
Carpet::num_integrator_substeps = 4
MoL::MoL_Num_Scratch_Levels = 1
ADMBase::evolution_method = adm_bssn
ADMMacros::spatial_order = 4
ADM_BSSN::stencil_size = 3
ADM_BSSN::timelevels = 3
ADM_BSSN::advection = upwind4
ADM_BSSN::bound = newrad
Boundary::radpower = 3
### Gauges
ADMBase::lapse_evolution_method = 1+log
ADM_BSSN::lapsesource = straight
ADM_BSSN::harmonic_f = 2.0
ADM_BSSN::force_lapse_positive = yes
ADM_BSSN::lapse_advection_coeff = 1.0
ADMBase::shift_evolution_method = gamma0
ADM_BSSN::ShiftGammaCoeff = 0.75
ADM_BSSN::BetaDriver = 3.0
ADM_BSSN::gamma_driver_advection_coeff = 1.0
ADM_BSSN::ApplyShiftBoundary = yes
### Dissipation
Dissipation::order = 5
Dissipation::epsdis = 0.1
Dissipation::vars = "
ADM_BSSN::ADM_BSSN_phi
ADM_BSSN::ADM_BSSN_metric
ADM_BSSN::ADM_BSSN_curv
ADM_BSSN::ADM_BSSN_K
ADM_BSSN::ADM_BSSN_gamma
ADMBase::lapse
ADMBase::shift
"
### Constraints
ADMConstraints::constraints_persist = yes
ADMConstraints::constraints_timelevels = 3
ADMConstraints::constraints_prolongation_type = none
### Horizon and extraction surfaces
# We choose the angular resolution such that it is approx. of same
# magnitude as the underlying Cartesian grid.
# Note that the angular resolution scales with the radius of the sphere.
# -> spheres with smaller R need less angular gridpoints.
# (Here I have chosen the same res. as the sphere with largest R)
# Also note that IsolatedHorizon is computationally intensive and gets
# low res surfaces
SphericalSurface::nsurfaces = 10
SphericalSurface::maxntheta = 120
SphericalSurface::maxnphi = 228
SphericalSurface::ntheta [0] = 32
SphericalSurface::nphi [0] = 120
SphericalSurface::symmetric_z [0] = yes
SphericalSurface::nghoststheta [0] = 2
SphericalSurface::nghostsphi [0] = 2
SphericalSurface::ntheta [1] = 32
SphericalSurface::nphi [1] = 120
SphericalSurface::symmetric_z [1] = yes
SphericalSurface::nghoststheta [1] = 2
SphericalSurface::nghostsphi [1] = 2
SphericalSurface::ntheta [2] = 32
SphericalSurface::nphi [2] = 120
SphericalSurface::symmetric_z [2] = yes
SphericalSurface::nghoststheta [2] = 2
SphericalSurface::nghostsphi [2] = 2
# Kick extraction surfaces
SphericalSurface::set_spherical [3] = yes
SphericalSurface::radius [3] = 30.0
SphericalSurface::ntheta [3] = 60
SphericalSurface::nphi [3] = 228
SphericalSurface::symmetric_z [3] = yes
SphericalSurface::nghoststheta [3] = 2
SphericalSurface::nghostsphi [3] = 2
SphericalSurface::set_spherical [4] = yes
SphericalSurface::radius [4] = 40.0
SphericalSurface::ntheta [4] = 60
SphericalSurface::nphi [4] = 228
SphericalSurface::symmetric_z [4] = yes
SphericalSurface::nghoststheta [4] = 2
SphericalSurface::nghostsphi [4] = 2
SphericalSurface::set_spherical [5] = yes
SphericalSurface::radius [5] = 50.0
SphericalSurface::ntheta [5] = 60
SphericalSurface::nphi [5] = 228
SphericalSurface::symmetric_z [5] = yes
SphericalSurface::nghoststheta [5] = 2
SphericalSurface::nghostsphi [5] = 2
SphericalSurface::set_spherical [6] = yes
SphericalSurface::radius [6] = 60.0
SphericalSurface::ntheta [6] = 60
SphericalSurface::nphi [6] = 228
SphericalSurface::symmetric_z [6] = yes
SphericalSurface::nghoststheta [6] = 2
SphericalSurface::nghostsphi [6] = 2
# Isolated Horizon Surfaces with lower res.
SphericalSurface::set_spherical [7] = yes
SphericalSurface::radius [7] = 30.0
SphericalSurface::ntheta [7] = 32
SphericalSurface::nphi [7] = 120
SphericalSurface::symmetric_z [7] = yes
SphericalSurface::nghoststheta [7] = 2
SphericalSurface::nghostsphi [7] = 2
SphericalSurface::set_spherical [8] = yes
SphericalSurface::radius [8] = 40.0
SphericalSurface::ntheta [8] = 32
SphericalSurface::nphi [8] = 120
SphericalSurface::symmetric_z [8] = yes
SphericalSurface::nghoststheta [8] = 2
SphericalSurface::nghostsphi [8] = 2
SphericalSurface::set_spherical [9] = yes
SphericalSurface::radius [9] = 50.0
SphericalSurface::ntheta [9] = 32
SphericalSurface::nphi [9] = 120
SphericalSurface::symmetric_z [9] = yes
SphericalSurface::nghoststheta [9] = 2
SphericalSurface::nghostsphi [9] = 2
### Spherical harmonics for psikadelia wave extraction
SphericalHarmonics::number_of_radii = 4
SphericalHarmonics::ex_radii [0] = 30
SphericalHarmonics::ex_radii [1] = 40
SphericalHarmonics::ex_radii [2] = 50
SphericalHarmonics::ex_radii [3] = 60
SphericalHarmonics::lmax = 8
SphericalHarmonics::interp_integration_order = 4
SphericalHarmonics::grid_type = cart3d
SphericalHarmonics::InterpPointsTheta = 113
SphericalHarmonics::InterpPointsPhi = 224
SphericalHarmonics::number_of_vars = 2
SphericalHarmonics::vars [0] = "PsiKadelia::psi4re"
SphericalHarmonics::SH_spin_weight [0] = -2
SphericalHarmonics::vars [1] = "PsiKadelia::psi4im"
SphericalHarmonics::SH_spin_weight [1] = -2
### Wave extraction via Zerilli
WaveExtract::out_every = 64
WaveExtract::maximum_detector_number = 4
WaveExtract::switch_output_format = 100
WaveExtract::rsch2_computation = "average Schwarzschild metric"
WaveExtract::l_mode = 7
WaveExtract::m_mode = 7
WaveExtract::detector_radius [0] = 30
WaveExtract::detector_radius [1] = 40
WaveExtract::detector_radius [2] = 50
WaveExtract::detector_radius [3] = 60
WaveExtract::maxntheta = 113
WaveExtract::maxnphi = 224
WaveExtract::ntheta [0] = 113
WaveExtract::nphi [0] = 224
WaveExtract::ntheta [1] = 113 #156
WaveExtract::nphi [1] = 224 #156
WaveExtract::ntheta [2] = 113 #156
WaveExtract::nphi [2] = 224 #156
WaveExtract::ntheta [3] = 113 #156
WaveExtract::nphi [3] = 224 #156
### norm of Einstein tensor
#
#ActiveThorns = "EinsteinNorm HarmonicFD"
#
#AHFinderDirect::set_mask_for_all_horizons = yes
#AHFinderDirect::old_style_mask_gridfn_name = "EinsteinNorm::AH_mask"
#AHFinderDirect::mask_radius_multiplier = 1
#AHFinderDirect::mask_radius_offset = 0
#AHFinderDirect::mask_buffer_thickness = 0
#AHFinderDirect::mask_is_noshrink = false
#
#EinsteinNorm::mask_out_constraints = yes
### Horizons
AHFinderDirect::N_horizons = 3
#AHFinderDirect::find_every = 8
AHFinderDirect::output_h_every = 0
AHFinderDirect::max_Newton_iterations__initial = 50
AHFinderDirect::max_Newton_iterations__subsequent = 50
AHFinderDirect::max_allowable_Theta_growth_iterations = 10
AHFinderDirect::max_allowable_Theta_nonshrink_iterations = 10
#AHFinderDirect::verbose_level = "algorithm details"
AHFinderDirect::geometry_interpolator_name = "Lagrange polynomial interpolation"
AHFinderDirect::geometry_interpolator_pars = "order=4"
AHFinderDirect::surface_interpolator_name = "Lagrange polynomial interpolation"
AHFinderDirect::surface_interpolator_pars = "order=4"
AHFinderDirect::move_origins = yes
AHFinderDirect::reshape_while_moving = yes
AHFinderDirect::predict_origin_movement = yes
AHFinderDirect::find_every_individual [1] = 8
AHFinderDirect::origin_x [1] = 7.10271
AHFinderDirect::initial_guess__coord_sphere__x_center [1] = 7.10271
AHFinderDirect::initial_guess__coord_sphere__radius [1] = 0.06
AHFinderDirect::which_surface_to_store_info [1] = 0
AHFinderDirect::set_mask_for_individual_horizon [1] = no
AHFinderDirect::reset_horizon_after_not_finding [1] = no
#AHFinderDirect::dont_find_after_individual [1] = 32768
AHFinderDirect::find_every_individual [2] = 64
AHFinderDirect::origin_x [2] = -0.88784
AHFinderDirect::initial_guess__coord_sphere__x_center [2] = -0.88784
AHFinderDirect::initial_guess__coord_sphere__radius [2] = 0.25
AHFinderDirect::which_surface_to_store_info [2] = 1
AHFinderDirect::set_mask_for_individual_horizon [2] = no
AHFinderDirect::reset_horizon_after_not_finding [2] = no
#AHFinderDirect::dont_find_after_individual [2] = 32768
AHFinderDirect::find_every_individual [3] = 64
AHFinderDirect::origin_x [3] = 0
AHFinderDirect::find_after_individual [3] = 32768
AHFinderDirect::initial_guess__coord_sphere__x_center [3] = 0
AHFinderDirect::initial_guess__coord_sphere__radius [3] = 0.8
AHFinderDirect::shiftout_factor [3] = 1.5
AHFinderDirect::which_surface_to_store_info [3] = 2
AHFinderDirect::set_mask_for_individual_horizon [3] = no
### Isolated horizon measurements
IsolatedHorizon::verbose = yes
IsolatedHorizon::veryverbose = no
IsolatedHorizon::interpolator = "Lagrange polynomial interpolation"
IsolatedHorizon::interpolator_options = "order=4"
IsolatedHorizon::spatial_order = 4
IsolatedHorizon::num_horizons = 6
IsolatedHorizon::surface_index [0] = 0
IsolatedHorizon::surface_index [1] = 1
IsolatedHorizon::surface_index [2] = 2
IsolatedHorizon::surface_index [3] = 3
IsolatedHorizon::surface_index [4] = 4
IsolatedHorizon::surface_index [5] = 5
### Wave extraction via PsiKadelia
PsiKadelia::psikadelia_persists = yes
PsiKadelia::weyl_timelevels = 3
PsiKadelia::PsiKadelia_min_radius = 30.0
#PsiKadelia::verbose_level = 1
#PsiKadelia::PsiKadelia_num_ref_levels = 4
PsiKadelia::ricci_prolongation_type = none
PsiKadelia::psif_vec = standard-radial
SummationByParts::order = 4
### Kick determination
Kick::nkick_surfaces = 4 #8
Kick::which_surface_to_take [0] = 3
Kick::which_surface_to_take [1] = 4
Kick::which_surface_to_take [2] = 5
Kick::which_surface_to_take [3] = 6
#Kick::which_surface_to_take [4] = 3
#Kick::which_surface_to_take [5] = 4
#Kick::which_surface_to_take [6] = 5
#Kick::which_surface_to_take [7] = 6
Kick::which_proc_to_take [0] = 0
Kick::which_proc_to_take [1] = 1
Kick::which_proc_to_take [2] = 2
Kick::which_proc_to_take [3] = 3
#Kick::which_proc_to_take [4] = 4
#Kick::which_proc_to_take [5] = 5
#Kick::which_proc_to_take [6] = 6
#Kick::which_proc_to_take [7] = 7
Kick::Schwarzschild_approx [0] = yes
Kick::Schwarzschild_approx [1] = yes
Kick::Schwarzschild_approx [2] = yes
Kick::Schwarzschild_approx [3] = yes
Kick::post_process_mode = no
Kick::Kick_ref_level = 1
### NaNCheck
NaNChecker::check_every = 64
NanChecker::check_after = 0
NaNChecker::action_if_found = terminate
NaNChecker::check_vars = "ADM_BSSN::ADM_BSSN_metric"
NaNChecker::out_NaNmask = yes
### Checkpointing
CarpetIOHDF5::checkpoint = yes
IO::checkpoint_dir = $parfile
IO::checkpoint_ID = no
IO::recover = "autoprobe"
IO::checkpoint_every = 2048
IO::out_mode = np
IO::out_proc_every = 8
IO::checkpoint_keep = 1
IO::recover_dir = $parfile
IO::verbose = "full"
Carpet::regrid_during_recovery = no
CarpetIOHDF5::use_grid_structure_from_checkpoint = yes
### Performance timers
Cactus::cctk_timer_output = full
Carpet::print_timestats_every = 32
TimerReport::out_every = 2048
Carpet::output_timers_every = 2048
CarpetLib::print_timestats_every = 2048
CarpetLib::print_memstats_every = 2048
### Output
IOASCII::one_file_per_group = yes
IOBasic::outInfo_every = 1
IOBasic::outInfo_reductions = "norm2"
IOBasic::outInfo_vars = "
ADMConstraints::hamiltonian
ADM_BSSN::ADM_BSSN_K
"
IOScalar::outScalar_every = 128
IOScalar::one_file_per_group = yes
IOScalar::outScalar_vars = "
ADM_BSSN::ADM_BSSN_K
ADMConstraints::hamiltonian
"
IOASCII::out0D_every = 64
IOASCII::out0D_vars = "
ADMBase::lapse
ADMBase::shift
PsiKadelia::IJinvariants
IsolatedHorizon::ih_scalars
IsolatedHorizon::ih_multipole_moments
IsolatedHorizon::ih_state
IsolatedHorizon::ih_grid_real
SphericalSurface::sf_valid
SphericalSurface::sf_info
SphericalSurface::sf_origin
Carpet::timing
"
IOASCII::out1D_every = 256
IOASCII::out1D_vars = "
ADM_BSSN::ADM_BSSN_phi
ADM_BSSN::ADM_BSSN_metric
ADM_BSSN::ADM_BSSN_K
ADM_BSSN::ADM_BSSN_curv
ADM_BSSN::ADM_BSSN_gamma
ADMBase::lapse
ADMBase::shift
ADMConstraints::hamiltonian
ADMConstraints::momentum
"
IOASCII::out2D_every = 64
IOASCII::out2D_vars = "
SphericalHarmonics::decomposed_vars
"
# SphericalSurface::sf_radius
# IsolatedHorizon::ih_shapes
# IsolatedHorizon::ih_newman_penrose
# IsolatedHorizon::ih_weyl_scalars
# IsolatedHorizon::ih_killing_vector
# IsolatedHorizon::ih_invariant_coordinates
# IsolatedHorizon::ih_3determinant
# IsolatedHorizon::ih_fluxes
# IsolatedHorizon::ih_petrov
#IOHDF5::out_every = 64
#IOHDF5::compression_level = 1
#IOHDF5::out_vars = "
#"
More information about the developers
mailing list