Wavefront Explanation
A line-by-line walkthrough of a sample coNCePTuaL program

On this page, we describe the coNCePTuaL implementation of a wavefront program.

Terminology

A task is a unit of computation and is typically implemented as a process or thread. (coNCePTuaL shuns the terms process and thread so as not to mislead the programmer into thinking that processes or threads are necessarily being used; processor and processing element are avoided because they refer to hardware; rank doesn't imply an ability to compute; and, node is too overloaded of a term.) All tasks execute concurrently (i.e., with no implied ordering among them).

A message is a chunk of data that is sent by one task and received by another. Sending and receiving are often described as comprising two steps: posting, in which all of the information needed to send or a receive a message is provided to the communication library, and completing, in which the task waits for the message to be sent/received.

Processing the command line

Each of the first four lines of code defines a command-line argument in terms of a coNCePTuaL variable, a help string (output when the program is run with --help), a long option name, a short option name, and a default value. The 1E5 as the default value for the reps variable is coNCePTuaL notation for 1×105 or 10,000.

xdim is "Mesh width (tasks)" and comes from "--width" or "-w" with default 1.
ydim is "Mesh height (tasks)" and comes from "--height" or "-h" with default 1.
reps is "Number of wavefronts to time" and comes from "--reps" or "-r" with default 1E5.
msgsize is "Message size (bytes)" and comes from "--msgsize" or "-m" with default 0.

The outer loop

The coNCePTuaL language supports various types of bounded loops. The for loop used in the wavefront program is the simplest type. It repeats a given number of times (in this case, reps times) without specifying a loop variable.

For reps repetitions {
}

Measuring time

A coNCePTuaL program automatically maintains a variety of counters: elapsed time, number of messages sent/received, number of bytes sent/received, and bit errors encountered during communication. When a task resets its counters it assigns zero to all of them, effectively marking the beginning of some code to measure.

The logs statement specifies what gets written to the program's log files. A coNCePTuaL program creates one log file per task although for convenience these can be merged into a single file using the supplied ncptl-logmerge utility. Various data-aggregating functions (arithmetic/geometric/harmonic mean, standard deviation, median, maximum, minimum, final, histogram, only, or all) can be applied to the measured data. The wavefront program's logs statement logs both the mean and the standard deviation of elapsed_usecs (a predefined variable set to the elapsed time in microseconds since the last resets) divided by the maximum message distance (i.e., xdim+ydim-1). The "Per-hop latency (usecs)" string is a column header that precedes the data in the log file.

  task 0 resets its counters then
  task 0 logs the mean of elapsed_usecs/(xdim+ydim-1) as "Per-hop latency (usecs)"
         and the standard deviation of elapsed_usecs/(xdim+ydim-1) as "Per-hop latency (usecs)"

The then keyword is a statement separator much like a semicolon in C. (Actually, it's more like Pascal's semicolon; C's semicolon is in fact a statement terminator.)

Expressing wavefront communication

The entire wavefront communication pattern proper can be expressed in only two lines of coNCePTuaL code:

  all tasks src send a msgsize-byte message to tasks dst such that dst = src+xdim \/ (dst=src+1 /\ dst mod xdim <> 0) then
  task num_tasks-1 sends a msgsize-byte message to task 0 then

The send statement represents a blocking send operation. That is, it both posts a send (or multiple sends in this case) and awaits its completion. The receivers implicitly post a matching, blocking receive.

The preceding code utilizes the following logical and relational operators:

/\
logical and (∧)
\/
logical or (∨)
<>
is not equal to (≠)
=
is equal to (=)

Let's first examine the second line because it's a bit simpler. num_tasks is a predefined variable that holds the total number of tasks in the program. Because tasks are numbered starting from zero, num_tasks-1 represents the last task in the program. The code specifies that the last task in the program send a message of length msgsize bytes to the first task in the program (task 0). (msgsize is defined by a command-line argument.) A subtle but important point is that the sends statement implicitly posts a matching receive for every message sent. Hence, the second line shown above is actually specifying two operations: that task num_tasks-1 send a message to task 0 and that task  receive a message from task num_tasks-1.

The first line shown above specifies all of the communication for the wavefront pattern apart from the lone message from the final task to task 0. The first phrase, all tasks src, says that the statement applies to all tasks in the program and, furthermore, that each task should refer to its task number through a variable called src for the remainder of the statement. The next phrase, sends a msgsize-byte message, is the same as in the second line and indicates that all tasks are sending messages. The final phrase, to tasks dst such that dst = src+xdim \/ (dst=src+1 /\ dst mod xdim <> 0), designates as message recipients all tasks whose task number matches a sender's task number—and recall that all tasks are senders—plus the number of tasks in a row (i.e., the task directly beneath a sending task on a 2-D array) or whose task number is one greater than a sender's task number (i.e., the task directly to the right of a sending task on a 2-D array) as long as the destination is not in the first column of the 2-D array (to avoid sending from the rightmost column to the leftmost column in the next row).

The magic that makes this work is that coNCePTuaL semantics dictate that all receives are posted before any send is posted and that messages sent to or received from nonexistent tasks are silently ignored. Consequently, the program begins with all processes blocked in a receive except for task 0, who has nothing to receive and can therefore send messages to the right and downwards. Both tasks who receive a message from task 0 can then send messages to the right and downwards. The procedure continues until the final task receives both of its messages. It has no task to its right and no task beneath it so it has no further communication tasks to perform.

Summary

  1. A coNCePTuaL program comprises one or more concurrently running tasks.
  2. Tasks communicate by sending and receiving messages.
  3. Parsing the command line is easy: only one coNCePTuaL statement per command-line option is required.
  4. Programs can exploit predefined variables such as num_tasks (the total number of tasks in the program) and elapsed_usecs (elapsed time in microseconds).
  5. Performance data is recorded by specifying an expression to log and, optionally, any of a number of data-aggregating functions (e.g., mean, median, etc.) to apply to the list of data values.
  6. Communication is specified in terms of a set of processes sending messages to another set of processes instead of requiring each communication operation to be specified individually.
  7. By default, every send implicitly posts a matching receive.
Scott Pakin, pakin@lanl.gov