The examples directory in the Cell
Messaging Layer distribution is the best way to learn how to use the
Cell Messaging Layer. The files in the minimal subdirectory demonstrate the minimal
amount of code needed on the PPE and the SPE for a
do-nothing
program. The files in the showcase subdirectory show how to use all of
the MPI functions implemented by the Cell Messaging Layer. At the time
of this writing, those functions include the following:
There is documentation on the Web for each of these functions (e.g., at http://www.mpich.org/static/docs/v3.1/). See also the spe/include/mpi.h file, installed as part of the Cell Messaging Layer, for the complete set of function prototypes.
The Cell Messaging Layer can run in either hybrid or non-hybrid mode. Hybrid mode is used in systems like Roadrunner in which the Cell processors are not directly connected to the network but only indirectly through a host processor. Non-hybrid mode is used when there is no network or when the Cell processors can access the network directly.
The SPE program must #include <mpi.h>
, which
provides a set of datatypes and functions in addition to the MPI calls
listed above:
typedef uint64_t ppe_funcptr;
Refer to an RPC function provided by the PPE.
ppe_funcptr cellmsg_accept_rpc (void);
Accept a pointer to a function provided by the PPE. Every SPE must
call cellmsg_accept_rpc()
in the same order and in the
same order that the PPE called cellmsg_provide_rpc()
.
void cellmsg_rpc (ppe_funcptr ppefunc, void *toppe, uint32_t toppebytes, uint32_t toppeswap, void *fromppe, uint32_t fromppebytes, uint32_t fromppeswap);
Invoke a function on the PPE. The function is passed
toppebytes
bytes of data from toppe
with
byte swapping in sets of toppeswap
bytes and returns
fromppebytes
bytes of data from fromppe
with
byte swapping in sets of fromppeswap
bytes.
toppeswap
and fromppeswap
should normally
be set to the size of the corresponding data type
(e.g., sizeof(int)
to transmit one or more
int
values). However, as a convenience, the
value CML_BYTE_SWAP_NOT_NEEDED
can be specified instead
to assert that no byte-swapping is needed because the SPE and PPE have
the same endianness. (Both are big-endian in the current generation
of the Cell processor.)
The PPE program must #include <cellmsg.h>
, which
provides a set of datatypes and functions:
typedef struct { void *buffer; /* Pointer to a properly aligned buffer */ uint32_t numbytes; /* Number of valid bytes in the above */ int localrank; /* Local rank of the initiator */ } cellmsg_rpc_data;
cellmsg_rpc_data
represents the input and output data
for a SPE-initiated remote procedure call.
typedef void (*ppe_funcptr)(cellmsg_rpc_data *in_out_data);
A ppe_funcptr
is a pointer to a PPE function that can
be invoked from the SPE.
typedef void (*host_funcptr)(cellmsg_rpc_data *in_out_data);
A host_funcptr
is a pointer to a host function that
can be invoked from the PPE (hybrid mode only).
void cellmsg_init (int *argc, char ***argv);
Initialize the Cell Messaging Layer given pointers to the
main()
function's argument count and argument list. This
function must be called before any other Cell Messaging Layer
function.
void cellmsg_run (void *spemain, int spe_argc, char *spe_argv[]);
Load a SPE program (either a filename string or a pointer to a
spe_program_handle_t
), passing it an argument count and a
list of arguments.
void cellmsg_finalize (void);
Shut down the Cell Messaging Layer. No other Cell Messaging Layer
function should be called after cellmsg_finalize()
.
int cellmsg_spes_per_ppe (void);
Return the number of SPEs managed by each PPE.
int cellmsg_ppes (void);
Return the total number of PPEs.
int cellmsg_this_ppe (void);
Return the caller's PPE number.
void cellmsg_provide_rpc (ppe_funcptr ppefunc);
Make a function available to the caller's SPEs. Each SPE must call
cellmsg_accept_rpc()
to accept the function.
host_funcptr cellmsg_accept_rpc (void);
Accept a function made availble by the caller's host (hybrid mode only).
void cellmsg_rpc (host_funcptr hostfunc, void *tohost, uint32_t tohostbytes, uint32_t tohostswap, void *fromhost, uint32_t fromhostbytes, uint32_t fromhostswap);
Invoke a function on the host (hybrid mode only). The function is
passed tohostbytes
bytes of data from tohost
with byte swapping in sets of tohostswap
bytes and
returns fromhostbytes
bytes of data from
fromhost
with byte swapping in sets of
fromhostswap
bytes. tohostswap
and
fromhostswap
should normally be set to the size of the
corresponding data type (e.g., sizeof(int)
to
transmit one or more int
values). If the host and the
PPE have the same endianness (not the case in any known system at the
time of this writing), the convenience
value CML_BYTE_SWAP_NOT_NEEDED
can be specified instead
to assert that no byte-swapping is needed.
The host program must #include <cellmsg.h>
, which
provides a set of datatypes and functions:
typedef struct { void *buffer; /* Pointer to a properly aligned buffer */ uint32_t numbytes; /* Number of valid bytes in the above */ } cellmsg_rpc_data;
cellmsg_rpc_data
represents the input and output data
for a PPE-initiated remote procedure call.
typedef void (*host_funcptr)(cellmsg_rpc_data *in_out_data);
A host_funcptr
is a pointer to a host function that
can be invoked from the PPE.
void cellmsg_init (int *argc, char ***argv);
Initialize the Cell Messaging Layer given pointers to the
main()
function's argument count and argument list. This
function must be called before any other Cell Messaging Layer
function.
void cellmsg_run (char *progname, int argc, char **argv);
Load a PPE program (by filename), passing it an argument count and a list of arguments.
void cellmsg_finalize (void);
Shut down the Cell Messaging Layer. No other Cell Messaging Layer
function should be called after cellmsg_finalize()
.
int cellmsg_ppes_per_host (void);
Return the number of PPEs on each host.
int cellmsg_hosts (void);
Return the total number of hosts.
int cellmsg_this_host (void);
Return the caller's host number.
void cellmsg_provide_rpc (host_funcptr hostfunc);
Make a function available to the caller's PPE. The PPE must call
cellmsg_accept_rpc()
to accept the function.
The functions invoked by cellmsg_rpc()
are passed a
cellmsg_rpc_data
data structure that encapsulates the
data provided by the initiator of the RPC, the number of bytes
provided by the initiator, and the initator's local rank. The invoked
function returns data to the initator via the same
cellmsg_rpc_data
structure. The following are some
details regarding the use of cellmsg_rpc_data
:
cellmsg_rpc_data
. If buffer
is changed, it
should not point to stack-allocated memory because stack-allocated
memory becomes invalid when the function returns. Use global,
static
, or malloc()
'd memory instead.localrank
has no effect.cellmsg_rpc_data
's buffer
field and update
the numbytes
field accordingly.buffer
is guaranteed to be only as large as the
numbytes
specified by the initiator.localrank
field represents the local rank of the
initiator (typically, 0–7 or 0–15 for a SPE-to-PPE
RPC invocation). It
may be useful for indexing into an array of values.MPI ranks are assigned such that they utilize all of the SPEs on one Cell before using any of the SPEs on the next Cell. That is, ranks 0 to 7 are on the first Cell, ranks 8 to 15 are on the second Cell, and so forth (assuming current hardware, with 8 SPEs per Cell).
The predefined MPI_COMM_MEM_DOMAIN communicator refers to all SPEs that share a main-memory virtual address space. It can be useful, for example, for sharing a pointer to main memory with only those SPEs that can access it.
The CMLMAXLOCALSPES environment variable limits the number of SPEs used by each PPE.
The PMPI profiling interface is supported for all MPI functions defined by the Cell Messaging Layer.
The MPI_Comm_get_attr() function accepts a MPI_CML_LOCAL_NEIGHBORS key, which returns the the number of SPEs managed by a single PPE (typically 8 for a single Cell or 16 for a pair of Cells connected via a BIF connection).
The Cell Messaging Layer supports a convenient remote procedure call (RPC) mechanism that enables a SPE to invoke functions on the PPE and receive the results. (In hybrid mode, it also enables a PPE to invoke functions on the host and receive the results.) See the files in the examples/showcase or examples/showcase-hybrid directories for usage examples.
In addition to supporting only a small subset of MPI, the current version of the Cell Messaging Layer has the following additional limitations:
More documentation for the Cell Messaging Layer is available from the Cell Messaging Layer project page on SourceForge.net.
Some initial performance data appears on a Cell Messaging Layer poster that was displayed in Los Alamos National Laboratory's booth at the SC08 conference (November 2008).
The following conference paper details the Cell Messaging Layer's implementation and presents a wealth of performance data:
Scott Pakin, pakin@lanl.govScott Pakin. Receiver-initiated Message Passing over RDMA Networks. In Proceedings of the 22nd IEEE International Parallel and Distributed Processing Symposium (IPDPS 2008), Miami, Florida, April 2008.