[Carpet] Problems with checkpoint recovery on AMD machine

Yosef Zlochower yosef at phys.utb.edu
Mon Apr 23 20:50:43 CEST 2007


AHFinderDirect::find_every   = 32
#AHFinderDirect::move_origins = yes

AHFinderDirect::geometry_interpolator_name = "Lagrange polynomial 
interpolation"
AHFinderDirect::geometry_interpolator_pars = "order=3"
AHFinderDirect::surface_interpolator_name  = "Lagrange polynomial 
interpolation"
AHFinderDirect::surface_interpolator_pars  = "order=3"

AHFinderDirect::N_horizons = 3

AHFinderDirect::origin_x                                 [1] =  3.5
AHFinderDirect::origin_y                                 [1] =  0.0
AHFinderDirect::origin_z                                 [1] =  0.0
AHFinderDirect::initial_guess__coord_sphere__x_center    [1] =  3.5
AHFinderDirect::initial_guess__coord_sphere__y_center    [1] =  0.0
AHFinderDirect::initial_guess__coord_sphere__z_center    [1] =  0.0
AHFinderDirect::initial_guess__coord_sphere__radius      [1] =  0.3
AHFinderDirect::which_surface_to_store_info              [1] = 0
AHFinderDirect::reset_horizon_after_not_finding          [1] = no
AHFinderDirect::verbose_level="algorithm debug"

AHFinderDirect::origin_x                                 [2] =  -3.5
AHFinderDirect::origin_y                                 [2] =  0.0
AHFinderDirect::origin_z                                 [2] =  0.0
AHFinderDirect::initial_guess__coord_sphere__x_center    [2] =  -3.5
AHFinderDirect::initial_guess__coord_sphere__y_center    [2] =  0.0
AHFinderDirect::initial_guess__coord_sphere__z_center    [2] =  0.0
AHFinderDirect::initial_guess__coord_sphere__radius      [2] =  0.3
AHFinderDirect::which_surface_to_store_info              [2] = 1
AHFinderDirect::reset_horizon_after_not_finding          [2] = no

AHFinderDirect::origin_x                                 [3] =  0.0
AHFinderDirect::initial_guess__coord_sphere__x_center    [3] =  0.0
AHFinderDirect::initial_guess__coord_sphere__radius      [3] =  2.5
AHFinderDirect::which_surface_to_store_info              [3] = 2
                                                              
269,0-1       84%

Here is the output
INFO (AHFinderDirect): proc 0: searching for horizon 1/3
INFO (AHFinderDirect): Newton_solve(): processor 0 working on horizon 1
INFO (AHFinderDirect):                 horizon_is_genuine=1
INFO (AHFinderDirect):                 there_is_another_genuine_horizon=0
INFO (AHFinderDirect): beginning iteration 1 (horizon_is_genuine=1)
INFO (AHFinderDirect):    expansion
INFO (AHFinderDirect):       checking that h is finite
INFO (AHFinderDirect):       xyz positions and derivative coefficients
INFO (AHFinderDirect):       interpolating {g_ij, K_ij} from Cactus grid
INFO (AHFinderDirect):          setting up interpolator derivative info
INFO (AHFinderDirect):          calling geometry interpolator (2166 points)
INFO (AHFinderDirect):       checking that geometry is finite
INFO (AHFinderDirect):       computing Theta(h)
cactus_ctest: 
/home/yosef/Cactus_New/configs/ctest/build/AHFinderDirect/patch/coords.cc:315: 
void AHFinderDirect::local_coords::partial_xyz_wrt_r_mu_nu(fp, fp, fp, 
fp&, fp&, fp&, fp&, fp&, fp&, fp&, fp&, fp&): Assertion 
`jtutil::fuzzy<fp>::NE(r, 0.0)' failed.

cactus_ctest:1178 terminated with signal 6 at PC=3227e2e21d 
SP=7fbfffea08.  Backtrace:
/lib64/tls/libc.so.6(gsignal+0x3d)[0x3227e2e21d]
/lib64/tls/libc.so.6(abort+0xfe)[0x3227e2fa1e]
/lib64/tls/libc.so.6(__assert_fail+0xf1)[0x3227e27ae1]
/home/yosef/Cactus_New/ctest/./cactus_ctest[0xd42f07]
/home/yosef/Cactus_New/ctest/./cactus_ctest[0xd13adc]
/home/yosef/Cactus_New/ctest/./cactus_ctest[0xd16653]
/home/yosef/Cactus_New/ctest/./cactus_ctest[0xd1ee72]
/home/yosef/Cactus_New/ctest/./cactus_ctest[0xd03e2f]
/home/yosef/Cactus_New/ctest/./cactus_ctest[0xd01043]
/home/yosef/Cactus_New/ctest/./cactus_ctest[0x40bea3]
/home/yosef/Cactus_New/ctest/./cactus_ctest[0xc3d556]
/home/yosef/Cactus_New/ctest/./cactus_ctest[0x40b759]
/home/yosef/Cactus_New/ctest/./cactus_ctest[0x4106d9]
/home/yosef/Cactus_New/ctest/./cactus_ctest[0x41083e]
/home/yosef/Cactus_New/ctest/./cactus_ctest[0x40c7c2]
/home/yosef/Cactus_New/ctest/./cactus_ctest[0xc07b2e]
/home/yosef/Cactus_New/ctest/./cactus_ctest[0xc0ac26]
/home/yosef/Cactus_New/ctest/./cactus_ctest(__gxx_personality_v0+0x3b1)[0x407fe9]
/lib64/tls/libc.so.6(__libc_start_main+0xdb)[0x3227e1c3fb]
/home/yosef/Cactus_New/ctest/./cactus_ctest(__gxx_personality_v0+0x102)[0x407d3a]
INFO (AHFinderDirect):    Newton_solve(): Theta_is_ok=1
INFO (AHFinderDirect):    Theta rms-norm 5.8e-08, infinity-norm 2.7e-06
INFO (AHFinderDirect):    flags: found_this_horizon=0
INFO (AHFinderDirect):           this_horizon_needs_more_iterations=1
INFO (AHFinderDirect):           I_need_more_iterations=1
MPIRUN: 31 ranks have not yet exited 60 seconds after rank 2 (node 
n016.cluster) exited without reaching MPI_Finalize().
MPIRUN: Waiting another 60 seconds before terminating remaining 31 node 
processes
Mon Apr 23 14:47:53 EDT 2007


Erik Schnetter wrote:
> Hi Yosef,
>
> can you produce more debugging output?  Have a look at 
> AHFinderDirect::verbose_level, and please send more context -- at 
> least all the parameters that you set.
>
> -erik
>
> On Apr 23, 2007, at 13:34:52, Yosef Zlochower wrote:
>
>> Hi,
>>
>>   The problem seems to be with the Erik tagged version. I was actually
>> using a custom
>> version. With the "Erik" version I get the error:
>> cactus_ctest:
>> /home/yosef/Cactus_New/configs/ctest/build/AHFinderDirect/patch/coords.cc:313: 
>>
>> void AHFinderDirect::local_coords::partial_xyz_wrt_r_mu_nu(fp, fp, fp,
>> fp&, fp&, fp&, fp&, fp&, fp&, fp&, fp&, fp&): Assertion
>> `jtutil::fuzzy<fp>::NE(r, 0.0)' failed.
>>
>> The recovery seems to work correctly with the standard version of
>> AHFinderDirect.
>>
>> Yosef
>>
>> Erik Schnetter wrote:
>>> On Apr 20, 2007, at 13:38:49, Yosef Zlochower wrote:
>>>
>>>> Hi,
>>>>
>>>>   I have been having problems restarting a carpet run
>>>> on a new AMD X86_64 cluster with mpich-infinipath version
>>>> of mpi.
>>>> AHFinderDirect gives the following error message on recovery:
>>>> WARNING level -1 in thorn AHFinderDirect processor 2 host n016.cluster
>>>>   (line 78 of
>>>> /home/yosef/Cactus_New/configs/carpetdt/build/AHFinderDirect/jtutil/error_exit.cc): 
>>>>
>>>>
>>>>   -> ***** row_sparse_Jacobian__UMFPACK::solve_linear_system():
>>>>         error return status=1 from umfpack_numeric() routine
>>>>
>>>> I tried compiling AHFinderDirect with Lapack and got a similar
>>>> message.
>>>>
>>>>  I used both the stable and development version of
>>>> carpet as well as the pathscale and gnu (gcc4, gfortran4, g++4)
>>>> compilers. The operating system is CentOs 4.4.
>>>>
>>>> Any ideas?
>>>
>>> Hi Yosef.
>>>
>>> What version of the horizon finder are you using?  There are two
>>> branches in CVS; "HEAD" and "Erik".  You can also increase the horizon
>>> finder verbosity to find out more about this problem.
>>>
>>> -erik
>>>
>>> --Erik Schnetter <schnetter at cct.lsu.edu>
>>>
>>> My email is as private as my paper mail.  I therefore support 
>>> encrypting
>>> and signing email messages.  Get my PGP key from www.keyserver.net.
>>>
>>>
>>>
>>> ------------------------------------------------------------------------ 
>>>
>>>
>>> _______________________________________________
>>> developers mailing list
>>> developers at lists.carpetcode.org
>>> http://lists.carpetcode.org/listinfo/developers
>>>
>>
>> _______________________________________________
>> developers mailing list
>> developers at lists.carpetcode.org
>> http://lists.carpetcode.org/listinfo/developers
>>
>
>
> --Erik Schnetter <schnetter at cct.lsu.edu>
>
> My email is as private as my paper mail.  I therefore support encrypting
> and signing email messages.  Get my PGP key from www.keyserver.net.
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> developers mailing list
> developers at lists.carpetcode.org
> http://lists.carpetcode.org/listinfo/developers
>   



More information about the developers mailing list