The Cell Messaging Layer
Documentation

Supported MPI functions

The examples directory in the Cell Messaging Layer distribution is the best way to learn how to use the Cell Messaging Layer. The files in the minimal subdirectory demonstrate the minimal amount of code needed on the PPE and the SPE for a do-nothing program. The files in the showcase subdirectory show how to use all of the MPI functions implemented by the Cell Messaging Layer. At the time of this writing, those functions include the following:

There is documentation on the Web for each of these functions (e.g., at http://www.mpich.org/static/docs/v3.1/). See also the spe/include/mpi.h file, installed as part of the Cell Messaging Layer, for the complete set of function prototypes.

Supported non-MPI functions

The Cell Messaging Layer can run in either hybrid or non-hybrid mode. Hybrid mode is used in systems like Roadrunner in which the Cell processors are not directly connected to the network but only indirectly through a host processor. Non-hybrid mode is used when there is no network or when the Cell processors can access the network directly.

SPE-callable functions

The SPE program must #include <mpi.h>, which provides a set of datatypes and functions in addition to the MPI calls listed above:

typedef uint64_t ppe_funcptr;

Refer to an RPC function provided by the PPE.

ppe_funcptr cellmsg_accept_rpc (void);

Accept a pointer to a function provided by the PPE. Every SPE must call cellmsg_accept_rpc() in the same order and in the same order that the PPE called cellmsg_provide_rpc().

void cellmsg_rpc (ppe_funcptr ppefunc,
                  void *toppe, uint32_t toppebytes, uint32_t toppeswap,
                  void *fromppe, uint32_t fromppebytes, uint32_t fromppeswap);

Invoke a function on the PPE. The function is passed toppebytes bytes of data from toppe with byte swapping in sets of toppeswap bytes and returns fromppebytes bytes of data from fromppe with byte swapping in sets of fromppeswap bytes. toppeswap and fromppeswap should normally be set to the size of the corresponding data type (e.g., sizeof(int) to transmit one or more int values). However, as a convenience, the value CML_BYTE_SWAP_NOT_NEEDED can be specified instead to assert that no byte-swapping is needed because the SPE and PPE have the same endianness. (Both are big-endian in the current generation of the Cell processor.)

PPE-callable functions

The PPE program must #include <cellmsg.h>, which provides a set of datatypes and functions:

typedef struct {
  void *buffer;            /* Pointer to a properly aligned buffer */
  uint32_t numbytes;       /* Number of valid bytes in the above */
  int localrank;           /* Local rank of the initiator */
} cellmsg_rpc_data;

cellmsg_rpc_data represents the input and output data for a SPE-initiated remote procedure call.

typedef void (*ppe_funcptr)(cellmsg_rpc_data *in_out_data);

A ppe_funcptr is a pointer to a PPE function that can be invoked from the SPE.

typedef void (*host_funcptr)(cellmsg_rpc_data *in_out_data);

A host_funcptr is a pointer to a host function that can be invoked from the PPE (hybrid mode only).

void cellmsg_init (int *argc, char ***argv);

Initialize the Cell Messaging Layer given pointers to the main() function's argument count and argument list. This function must be called before any other Cell Messaging Layer function.

void cellmsg_run (void *spemain, int spe_argc, char *spe_argv[]);

Load a SPE program (either a filename string or a pointer to a spe_program_handle_t), passing it an argument count and a list of arguments.

void cellmsg_finalize (void);

Shut down the Cell Messaging Layer. No other Cell Messaging Layer function should be called after cellmsg_finalize().

int cellmsg_spes_per_ppe (void);

Return the number of SPEs managed by each PPE.

int cellmsg_ppes (void);

Return the total number of PPEs.

int cellmsg_this_ppe (void);

Return the caller's PPE number.

void cellmsg_provide_rpc (ppe_funcptr ppefunc);

Make a function available to the caller's SPEs. Each SPE must call cellmsg_accept_rpc() to accept the function.

host_funcptr cellmsg_accept_rpc (void);

Accept a function made availble by the caller's host (hybrid mode only).

void cellmsg_rpc (host_funcptr hostfunc,
                  void *tohost, uint32_t tohostbytes, uint32_t tohostswap,
                  void *fromhost, uint32_t fromhostbytes, uint32_t fromhostswap);

Invoke a function on the host (hybrid mode only). The function is passed tohostbytes bytes of data from tohost with byte swapping in sets of tohostswap bytes and returns fromhostbytes bytes of data from fromhost with byte swapping in sets of fromhostswap bytes. tohostswap and fromhostswap should normally be set to the size of the corresponding data type (e.g., sizeof(int) to transmit one or more int values). If the host and the PPE have the same endianness (not the case in any known system at the time of this writing), the convenience value CML_BYTE_SWAP_NOT_NEEDED can be specified instead to assert that no byte-swapping is needed.

Host-callable functions (hybrid mode only)

The host program must #include <cellmsg.h>, which provides a set of datatypes and functions:

typedef struct {
  void *buffer;            /* Pointer to a properly aligned buffer */
  uint32_t numbytes;       /* Number of valid bytes in the above */
} cellmsg_rpc_data;

cellmsg_rpc_data represents the input and output data for a PPE-initiated remote procedure call.

typedef void (*host_funcptr)(cellmsg_rpc_data *in_out_data);

A host_funcptr is a pointer to a host function that can be invoked from the PPE.

void cellmsg_init (int *argc, char ***argv);

Initialize the Cell Messaging Layer given pointers to the main() function's argument count and argument list. This function must be called before any other Cell Messaging Layer function.

void cellmsg_run (char *progname, int argc, char **argv);

Load a PPE program (by filename), passing it an argument count and a list of arguments.

void cellmsg_finalize (void);

Shut down the Cell Messaging Layer. No other Cell Messaging Layer function should be called after cellmsg_finalize().

int cellmsg_ppes_per_host (void);

Return the number of PPEs on each host.

int cellmsg_hosts (void);

Return the total number of hosts.

int cellmsg_this_host (void);

Return the caller's host number.

void cellmsg_provide_rpc (host_funcptr hostfunc);

Make a function available to the caller's PPE. The PPE must call cellmsg_accept_rpc() to accept the function.

Remote procedure call semantics

The functions invoked by cellmsg_rpc() are passed a cellmsg_rpc_data data structure that encapsulates the data provided by the initiator of the RPC, the number of bytes provided by the initiator, and the initator's local rank. The invoked function returns data to the initator via the same cellmsg_rpc_data structure. The following are some details regarding the use of cellmsg_rpc_data:

Additional features and characteristics

MPI ranks are assigned such that they utilize all of the SPEs on one Cell before using any of the SPEs on the next Cell. That is, ranks 0 to 7 are on the first Cell, ranks 8 to 15 are on the second Cell, and so forth (assuming current hardware, with 8 SPEs per Cell).

The predefined MPI_COMM_MEM_DOMAIN communicator refers to all SPEs that share a main-memory virtual address space. It can be useful, for example, for sharing a pointer to main memory with only those SPEs that can access it.

The CMLMAXLOCALSPES environment variable limits the number of SPEs used by each PPE.

The PMPI profiling interface is supported for all MPI functions defined by the Cell Messaging Layer.

The MPI_Comm_get_attr() function accepts a MPI_CML_LOCAL_NEIGHBORS key, which returns the the number of SPEs managed by a single PPE (typically 8 for a single Cell or 16 for a pair of Cells connected via a BIF connection).

The Cell Messaging Layer supports a convenient remote procedure call (RPC) mechanism that enables a SPE to invoke functions on the PPE and receive the results. (In hybrid mode, it also enables a PPE to invoke functions on the host and receive the results.) See the files in the examples/showcase or examples/showcase-hybrid directories for usage examples.

Limitations

In addition to supporting only a small subset of MPI, the current version of the Cell Messaging Layer has the following additional limitations:

Further information

More documentation for the Cell Messaging Layer is available from the Cell Messaging Layer project page on SourceForge.net.

Some initial performance data appears on a Cell Messaging Layer poster that was displayed in Los Alamos National Laboratory's booth at the SC08 conference (November 2008).

The following conference paper details the Cell Messaging Layer's implementation and presents a wealth of performance data:

Scott Pakin. Receiver-initiated Message Passing over RDMA Networks. In Proceedings of the 22nd IEEE International Parallel and Distributed Processing Symposium (IPDPS 2008), Miami, Florida, April 2008.

Scott Pakin, pakin@lanl.gov