On this page, we describe the coNCePTuaL implementation of a wavefront program.
A task is a unit of computation and is typically
implemented as a process or thread. (coNCePTuaL shuns the terms
process
and thread
so as not to mislead the programmer
into thinking that processes or threads are necessarily being used;
processor
and processing element
are avoided because
they refer to hardware; rank
doesn't imply an ability to
compute; and, node
is too overloaded of a term.) All tasks
execute concurrently (i.e., with no implied ordering among
them).
A message is a chunk of data that is sent by one task and received by another. Sending and receiving are often described as comprising two steps: posting, in which all of the information needed to send or a receive a message is provided to the communication library, and completing, in which the task waits for the message to be sent/received.
Each of the first four lines of code defines a command-line
argument in terms of a coNCePTuaL variable,
a help string (output when the program is run with --help), a long option name, a short
option name, and a default value. The 1E5
as the
default value for the reps
variable is coNCePTuaL notation for 1×105 or
10,000.
The coNCePTuaL language supports various
types of bounded loops. The for loop
used in the wavefront program is the simplest type. It repeats a
given number of times (in this case, reps
times) without
specifying a loop variable.
A coNCePTuaL program automatically maintains a variety of counters: elapsed time, number of messages sent/received, number of bytes sent/received, and bit errors encountered during communication. When a task resets its counters it assigns zero to all of them, effectively marking the beginning of some code to measure.
The logs statement specifies what
gets written to the program's log files. A coNCePTuaL program creates one log file per task
although for convenience these can be merged into a single file using
the supplied ncptl-logmerge utility.
Various data-aggregating functions (arithmetic/geometric/harmonic
mean, standard deviation, median, maximum, minimum, final, histogram,
only, or all) can be applied to the measured data. The wavefront
program's logs statement logs both the
mean and the standard deviation of elapsed_usecs
(a
predefined variable set to the elapsed time in microseconds since the
last resets) divided by the maximum
message distance (i.e., xdim+ydim-1
). The
"Per-hop latency (usecs)"
string is a column header that
precedes the data in the log file.
The then keyword is a statement separator much like a semicolon in C. (Actually, it's more like Pascal's semicolon; C's semicolon is in fact a statement terminator.)
The entire wavefront communication pattern proper can be expressed in only two lines of coNCePTuaL code:
The send statement represents a blocking send operation. That is, it both posts a send (or multiple sends in this case) and awaits its completion. The receivers implicitly post a matching, blocking receive.
The preceding code utilizes the following logical and relational operators:
/\
\/
<>
=
Let's first examine the second line because it's a bit simpler.
num_tasks
is a predefined variable that holds the total
number of tasks in the program. Because tasks are numbered starting
from zero, num_tasks-1
represents the last task in the
program. The code specifies that the last task in the program send a
message of length msgsize
bytes to the first task in the
program (task 0). (msgsize
is defined by a command-line argument.) A subtle but
important point is that the sends
statement implicitly posts a matching receive for every message sent.
Hence, the second line shown above is actually specifying two
operations: that task num_tasks-1
send a message to
task 0 and that task receive a message from
task num_tasks-1
.
The first line shown above specifies all of the communication for
the wavefront pattern apart from the lone message from the final task
to task 0. The first phrase, all tasks
src
, says that the statement
applies to all tasks in the program and, furthermore, that each task
should refer to its task number through a variable called
src
for the remainder of the statement. The next phrase,
sends a msgsize-byte message
, is the same as
in the second line and indicates that all tasks are sending messages.
The final phrase,
to tasks dst
such that dst = src+xdim \/ (dst=src+1 /\
dst mod xdim <>
0)
, designates as message recipients all tasks whose
task number matches a sender's task number—and recall that all
tasks are senders—plus the number of tasks in a row
(i.e., the task directly beneath a sending task on a 2-D array)
or whose task number is one greater than a sender's task number
(i.e., the task directly to the right of a sending task on a 2-D
array) as long as the destination is not in the first column of the
2-D array (to avoid sending from the rightmost column to the leftmost
column in the next row).
The magic that makes this work is that coNCePTuaL semantics dictate that all receives are posted before any send is posted and that messages sent to or received from nonexistent tasks are silently ignored. Consequently, the program begins with all processes blocked in a receive except for task 0, who has nothing to receive and can therefore send messages to the right and downwards. Both tasks who receive a message from task 0 can then send messages to the right and downwards. The procedure continues until the final task receives both of its messages. It has no task to its right and no task beneath it so it has no further communication tasks to perform.
num_tasks
(the total number of tasks in the program) and
elapsed_usecs
(elapsed time in microseconds).