[Carpet] component geometry for multi-core processor chips
Erik Schnetter
schnetter at cct.lsu.edu
Tue Aug 29 18:38:07 CEST 2006
On Aug 29, 2006, at 03:05:43, Steve White wrote:
> Erik,
>
> That was the easy part.
>
> The harder question is how to efficiently pack the computational
> domain
> with the clumps of processes.
>
> "Clump" is new. Do we ever shy away from new nomenclature?
No, we usually invent several nomenclatures independently. You say
"clump"? I should say "cluster" then. (I won't.)
> Let me clarify my terminology here:
> core: hardware computational processing unit (previously CPU)
> node: an individual computer in a cluster, each of which may
> contain several cores
> computational domain: the rectangular range of xyz values
> component: a rectangular subrange of the computational domain,
> whose interior calculations are executed on a single
> processor core
> clump: a set of contiguous components, meant to be assigned to
> the processor cores of a single node.
"node" is a bad name; it doesn't start with a "c". Seriously, why do
most words in the Cactus/Carpet world start with a "c"?
MPI uses the notion of "process". I prefer to speak of "processors"
therefore, where a "processors" is the entity associated with an MPI
"process". That could be a single core (which is what we usually do
today), or we could have two processes per core (if we enable
hyperthreading), or we could have two cores per process (e.g. if we
want to use more memory per process). I would like be be able to
speak of processors independent of how they are mapped to the actual
hardware.
> * a reasonable assumption is that the number of cores on a node is a
> multiple of 2 (or even a power of 2).
>
> * it will be best for the clumps all to be the same. That is, the
> total
> number of processes should be restricted to
> k × n
> where K is the clump size, and that the clumps should be arranged
> all the same. (As opposed to: on a 4 processor machine, having some
> clumps arranged as 2 × 2 slabs, and others as 1 x 4 chains.
> Otherwise,
> several other issues arise, which in the end may ruin the
> performance gain.
That immediately suggests an algorithm: First decompose the domain
into n regions (one per node), then split each regions into k regions
(which is easy if k is a power of 2).
> I would suggest to warn the user if they have chosen a geometry that
> doesn't permit a good decomposition into clumps, and fall back to
> the
> old algorithm in that case.
>
> * for clump size k = 1 or 8, one can treat the (cubical) clumps as
> components
> were treated before as regards decomposing the computational domain.
Right.
> * for clump size k = 2 (Peyote, Mike) or 4 (as on Belladonna), the
> clumps
> can't be cubical, so one has to determine a good orientation for
> them to
> best pack the computational domain.
If the intra-clump communications is much faster than the inter-clump
communication, then the clumps should themselves be cubic, even if
the processors' domains are not. Whether this is true or not is to
be determined by experiment.
A natural extension of this (often talked about, but never made to
work efficiently) is to use not MPI but OpenMP within a node. That
would require some additional code near the main evolution loops.
-erik
--
Erik Schnetter <schnetter at cct.lsu.edu>
My email is as private as my paper mail. I therefore support encrypting
and signing email messages. Get my PGP key from www.keyserver.net.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 186 bytes
Desc: This is a digitally signed message part
Url : /archives/developers/attachments/20060829/36c22fe1/attachment.pgp
More information about the developers
mailing list