Cell), which many people recognize as the microprocessor in the Playstation 3. Each Cell contains eight high-speed vector processors called synergistic processing elements (SPEs). The Cell Messaging Layer makes it easy for the SPEs within a Cell and across any number of networked Cells to transfer data to each other and coordinate their operations. Specifically, the Cell Messaging Layer implements a small subset of the de facto standard Message Passing Interface (MPI) library, which makes it easy for an MPI programmer to adapt to Cell programming.
modifications must be clearly markedclause.
First, remember that the Cell Messaging Layer implements only a small subset of MPI. Even so, the Cell Messaging Layer tries to keep its footprint small by dividing MPI functions across a number of object files. The linker is smart enough to include only those object files that are actually referenced in the final executable. For example, if a program never calls MPI_Reduce() (either explicitly or implcitly by calling MPI_Allreduce()), the coll.o object file will not be used and will therefore take up no space. Table 1 lists the number of bytes used by each section of each Cell Messaging Layer object file at the time of this writing.
Filename | text | data | bss | common | Total |
---|---|---|---|---|---|
barrier.o | 1356 | 0 | 332 | 0 | 1688 |
bcast.o | 1096 | 4 | 152 | 0 | 1252 |
coll.o | 552 | 0 | 0 | 0 | 552 |
globals.o | 0 | 0 | 0 | 256 | 256 |
info.o | 256 | 0 | 0 | 0 | 256 |
init.o | 1484 | 0 | 64 | 0 | 1548 |
pt2pt.o | 2512 | 0 | 1360 | 0 | 3872 |
reduce.o | 2812 | 8 | 648 | 0 | 3468 |
rpc.o | 296 | 0 | 0 | 0 | 296 |
time.o | 128 | 0 | 8 | 0 | 136 |
Max | 2812 | 8 | 1360 | 256 | 4436 |
Total | 10492 | 12 | 2564 | 256 | 13324 |
As indicated by Table 1, the maximum amount of memory that the Cell Messaging Layer will ever use—if at least one function in every object files is invoked—is 13,324 bytes. In practice, this number can be made much less by using overlays. For example, if all Cell Messaging Layer code (text segment) is made to overlay application code, the Cell Messaging Layer will require only 2,832 bytes of resident data (data, bss, and common segments) in the worst case. As an alternative, if data overlays are also used, all of the Cell Messaging Layer's code and private data can fit in a shared 4,180-byte segment (worst case) with the 256 bytes of global data kept resident.
The point is that there are a number of space-vs.-performance tradeoffs that a programmer can make when using the Cell Messaging Layer. Plus, a program that uses few MPI functions will need less space than a program that uses many MPI functions.
The SPE program looks like any other MPI program. The PPE
and—if running in hybrid mode—host programs contain a
small amount of boilerplate code plus any
RPC functions they
want to make available to the SPEs. An online Hello,
world
code example shows a simple SPE program and the
corresponding PPE code.
On the PPE, the
cellmsg_spes_per_ppe()
function returns the number of
SPEs managed by each PPE.
On the SPE, there are two alternatives. Either use the Cell Messaging Layer-specific MPI_CML_LOCAL_NEIGHBORS key with MPI_Comm_get_attr() or use the Cell Messaging Layer-specific MPI_COMM_MEM_DOMAIN communicator with MPI_Comm_size().
Some early performance data appear on a Cell Messaging Layer poster that was displayed in Los Alamos National Laboratory's booth at the SC08 conference (November 2008).
The following conference paper details the Cell Messaging Layer's implementation and presents a wealth of performance data:
Scott Pakin. Receiver-initiated Message Passing over RDMA Networks. In Proceedings of the 22nd IEEE International Parallel and Distributed Processing Symposium (IPDPS 2008), Miami, Florida, April 2008.