[Carpet] MPI_WAITALL : Error code is in status

Jonathan Thornburg jthorn at aei.mpg.de
Mon May 29 15:15:04 CEST 2006


Hi, Erik,

On Fri, 26 May 2006, Erik Schnetter wrote:
> > The error message is clearly telling me that MPI_Waitall() died,
> > and put some information about what went wrong in its status structure.
> > Alas, a brief grep through Carpet/CarpetLib/src/* shows that each and
> > every call on MPI_Waitall() passes MPI_STATUSES_IGNORE as the 3rd
> > argument, so there's no status structure to look at.
> 
> These status arguments are usually not useful for error checking.  As your
> error message shows, the MPI routine aborted before it returned to Cactus, so
> there would be nothing to look at anyway.  This is the default behaviour of
> MPI.

However, it would have been very useful for me to know *which* of
the numerous MPI_Waitall() calls in {Carpet,CarpetInterp,CarpetReduce,Slab}
was producing the error.  Even better would have been the ability to
force a core-dump at that point, so I could go in after the fact and
get a stack traceback to see which thorn was active at the time...

I eventually tracked down the problem (I think -- I'm in the middle
of what I hope will be a fix now) with the aid of my new flesh option
to turn off stdout/stderr buffering.  I was calling CCTK_SyncGroup()
non-synchronously on different processors.  Bad, bad.....

ciao,

-- 
-- Jonathan Thornburg <jthorn at aei.mpg.de>      
   Max-Planck-Institut fuer Gravitationsphysik (Albert-Einstein-Institut),
   Golm, Germany, "Old Europe"     http://www.aei.mpg.de/~jthorn/home.html      
   "Washing one's hands of the conflict between the powerful and the
    powerless means to side with the powerful, not to be neutral."
                                      -- quote by Freire / poster by Oxfam




More information about the developers mailing list