Next: The stats backend, Previous: The c_profile backend, Up: Supplied backends [Contents][Index]
interpret
backendLike the
c_udgram
backend (see The
c_udgram backend), the
interpret
backend is designed to help programmers
ensure the correctness of coNCePTuaL code. The
interpret
backend does not output code. As its name
implies,
interpret
is an interpreter of coNCePTuaL
programs rather than a compiler.
interpret
exhibits the following salient features:
interpret
merely simulates communication. It also
skips over statements such as COMPUTES
/ SLEEPS
(see Delaying
execution) and
TOUCHES
(see Touching memory).interpret
can simulate massively parallel computer
systems from a single process.interpret
runs it checks for common communication
errors such as deadlocks, asynchronous sends and receives that are
never completed, and blocking operations left over at the end of
the program (which would likely cause hung tasks under a real
messaging layer).The drawbacks are that
interpret
is slow when interpreting control-intensive
programs and that timing measurements are not indicative of any
real network. (The
interpret
backend utilizes logical time rather than
physical time.)
interpret
is intended primarily as a development tool
for helping ensure the correctness of coNCePTuaL programs.
The
interpret
backend accepts all of the command-line
options described in Running
coNCePTuaL programs, plus the following four options:
-H, --hierarchy=<string> Latency hierarchy as a comma-separated list of task_factor:latency_delta pairs [default: "tasks:1"] -K, --kill-reps=<number> If nonzero, perform FOR...REPETITIONS loop bodies exactly once [default: 0] -M, --mcastsync=<number> Perform an implicit synchronization after a multicast (0=no; 1=yes) [default: 0] -T, --tasks=<number> Number of tasks to use [default: 1] |
Normally, the
interpret
backend assigns unit latency to every
communication operation. The --hierarchy option can make
communication with distant tasks observe more latency than
communication with nearby tasks. An explanation of the argument to
--hierarchy is presented in
Task
latency hierarchies.
To save execution time, the --kill-reps option alters the
behavior of all
FOR
… REPETITIONS
statements in the
program, treating them as if they read, ‘FOR 1
REPETITION’. That is, it ignores warmup repetitions,
synchronizations, and the specified number of repetitions, always
using ‘1’ instead.
A multicast operation (see Multicasting) is normally
treated as multiple point-to-point operations with the same send
time. The --mcastsync option instructs
the
interpret
backend to perform an implicit barrier
synchronization at the end of the multicast.
The --tasks option specifies the number of tasks to simulate. Because this number can be quite large the NCPTL_LOG_ONLY environment variable (see Environment Variables) may be used to limit the set of processors that are allowed to create log files. That way, if task 0 is the only task out of thousands that logs any data, NCPTL_LOG_ONLY can specify that only one log file will be produced, not thousands. By default, all processors create a log file.
All other command-line arguments are passed to the program being interpreted.
The --output option described in
Compiling
coNCePTuaL programs, has special meaning to the
interpret
backend. When
--output is used,
interpret
dumps a list of events to the specified file
after a successful run. For example, the coNCePTuaL program
‘ALL TASKS t ASYNCHRONOUSLY SEND A 384 BYTE MESSAGE TO TASK t
XOR 2 THEN ALL TASKS AWAIT COMPLETION’ results in the
following event dump:
Task 0 posted a NEWSTMT at time 0 and completed it at time 0 Task 0 posted a RECEIVE at time 0 and completed it at time 0 Task 0 posted a SEND at time 1 and completed it at time 1 Task 0 posted a WAIT_ALL at time 2 and completed it at time 2 Task 1 posted a NEWSTMT at time 0 and completed it at time 0 Task 1 posted a RECEIVE at time 0 and completed it at time 0 Task 1 posted a SEND at time 1 and completed it at time 1 Task 1 posted a WAIT_ALL at time 2 and completed it at time 2 Task 2 posted a NEWSTMT at time 0 and completed it at time 0 Task 2 posted a RECEIVE at time 0 and completed it at time 0 Task 2 posted a SEND at time 1 and completed it at time 1 Task 2 posted a WAIT_ALL at time 2 and completed it at time 2 Task 3 posted a NEWSTMT at time 0 and completed it at time 0 Task 3 posted a RECEIVE at time 0 and completed it at time 0 Task 3 posted a SEND at time 1 and completed it at time 1 Task 3 posted a WAIT_ALL at time 2 and completed it at time 2
As an example of the
interpret
backend’s usage, here’s how to simulate
100,000 processors communicating in a simple ring pattern:
% ncptl --backend=interpret --lenient --program='All tasks t send nummsgs 1024 gigabyte messages to task t+1 then task num_tasks-1 sends nummsgs 1024 gigabyte messages to task 0.' --tasks=100000 --nummsgs=5
The preceding command ran to completion in under 5 minutes on a 1.5GHz Xeon uniprocessor workstation—not too bad considering that 488 petabytes of data are transmitted on the program’s critical path.
The
interpret
backend is especially useful for finding
communication-related program errors:
% ncptl --backend=interpret --quiet --program='All tasks t send a 10 doubleword message to task (t+1) mod num_tasks.' --tasks=3 <command line>: The following tasks have deadlocked: 0 --> 2 --> 1 --> 0
Deadlocked tasks are shown with ‘-->’ signifying “is blocked waiting for”. In the preceding example, all receives are posted before all sends. Hence, task 0 is blocked waiting for task 2 to send it a message. Task 2, in turn, is blocked waiting for task 1 to sent it a message. Finally, task 1 is blocked waiting for task 0 to send it a message, which creates a cycle of dependencies.
The
interpret
backend can find other errors, as well:
% ncptl --backend=interpret --quiet --program='All tasks t asynchronously send a 10 doubleword message to task (t+1) mod num_tasks.' --tasks=4 <command line>: The program ended with the following leftover-event errors: * Task 0 posted an asynchronous RECEIVE that was never waited for * Task 0 posted an asynchronous SEND that was never waited for * Task 0 sent a message to task 1 that was never received * Task 1 posted an asynchronous RECEIVE that was never waited for * Task 1 posted an asynchronous SEND that was never waited for * Task 1 sent a message to task 2 that was never received * Task 2 posted an asynchronous RECEIVE that was never waited for * Task 2 posted an asynchronous SEND that was never waited for * Task 2 sent a message to task 3 that was never received * Task 3 posted an asynchronous RECEIVE that was never waited for * Task 3 posted an asynchronous SEND that was never waited for * Task 3 sent a message to task 0 that was never received
(A message received ASYNCHRONOUSLY
is not
considered received until after the corresponding
AWAITS COMPLETION
; hence, all of the ‘was never
received’ messages listed above.)
% ncptl --backend=interpret --quiet --program='Task 0 sends a 40 kilobyte message to unsuspecting task 1 then task 0 receives a 40 kilobyte message from task 1.' --tasks=2 <command line>: The program ended with the following leftover-event errors: * Task 0 sent a message to task 1 that was never received * Task 1 terminated before satisfying task 0's RECEIVE operation
In short, it is well worth testing the correctness of new
coNCePTuaL programs with
interpret
before performing timing runs with one of
the message-passing backends.
• Task latency hierarchies: | Specifying latency as a function of distance |
Next: The stats backend, Previous: The c_profile backend, Up: Supplied backends [Contents][Index]