[Carpet] component geometry for multi-core processor chips
Steve White
steve.white at aei.mpg.de
Tue Aug 29 10:05:43 CEST 2006
Erik,
That was the easy part.
The harder question is how to efficiently pack the computational domain
with the clumps of processes.
"Clump" is new. Do we ever shy away from new nomenclature?
Let me clarify my terminology here:
core: hardware computational processing unit (previously CPU)
node: an individual computer in a cluster, each of which may
contain several cores
computational domain: the rectangular range of xyz values
component: a rectangular subrange of the computational domain,
whose interior calculations are executed on a single
processor core
clump: a set of contiguous components, meant to be assigned to
the processor cores of a single node.
* a reasonable assumption is that the number of cores on a node is a
multiple of 2 (or even a power of 2).
* it will be best for the clumps all to be the same. That is, the total
number of processes should be restricted to
k × n
where K is the clump size, and that the clumps should be arranged
all the same. (As opposed to: on a 4 processor machine, having some
clumps arranged as 2 × 2 slabs, and others as 1 x 4 chains. Otherwise,
several other issues arise, which in the end may ruin the performance gain.
I would suggest to warn the user if they have chosen a geometry that
doesn't permit a good decomposition into clumps, and fall back to the
old algorithm in that case.
* for clump size k = 1 or 8, one can treat the (cubical) clumps as components
were treated before as regards decomposing the computational domain.
* for clump size k = 2 (Peyote, Mike) or 4 (as on Belladonna), the clumps
can't be cubical, so one has to determine a good orientation for them to
best pack the computational domain.
On 28.08.06, Erik Schnetter wrote:
> On Aug 28, 2006, at 01:52:15, Steve White wrote:
>
> >Second, it would be better not to ask the specific batch system
> >(PBS) about
> >which processes are running on which nodes. I would prefer a means
> >that
> >is independent of batch system.
> >
> >How about:
> >
> > if this is not the MPI root process
> > * send to MPI root process the result of
> > system( "uname -n" )
>
> You mean Util_GetHostName.
>
> > otherwise
> > * make a hash of
> > node_name, mpi_rank
> > * wait for message from each other process
> > * for each message,
> > add to the hash the the message body with mpi_rank
> > * add to the hash the present node name with mpi_rank = 0
> >
> >Now you have an easy association of nodes to MPI rank numbers.
>
> Clever. I didn't think of that. I was thinking along the lines of
> running a short benchmark, determining latencies and bandwidths
> between the individual processors. That would unfortunately be quite
> expensive, since you would need to test each processor pair, and the
> individual communications would influence each other.
>
--
Steve White : Programmer
Max-Planck-Institut für Gravitationsphysik Albert-Einstein-Institut
Am Mühlenberg 1, D-14476 Golm, Germany +49-331-567-7625
More information about the developers
mailing list