Next: , Previous: , Up: Supplied backends   [Contents][Index]


3.3.6 The interpret backend

Like the c_udgram backend (see The c_udgram backend), the interpret backend is designed to help programmers ensure the correctness of coNCePTuaL code. The interpret backend does not output code. As its name implies, interpret is an interpreter of coNCePTuaL programs rather than a compiler. interpret exhibits the following salient features:

  1. Some programs run faster than with a compiler because the interpreter does not actually send messages. interpret merely simulates communication. It also skips over statements such as COMPUTES/ SLEEPS (see Delaying execution) and TOUCHES (see Touching memory).
  2. interpret can simulate massively parallel computer systems from a single process.
  3. As interpret runs it checks for common communication errors such as deadlocks, asynchronous sends and receives that are never completed, and blocking operations left over at the end of the program (which would likely cause hung tasks under a real messaging layer).

The drawbacks are that interpret is slow when interpreting control-intensive programs and that timing measurements are not indicative of any real network. (The interpret backend utilizes logical time rather than physical time.) interpret is intended primarily as a development tool for helping ensure the correctness of coNCePTuaL programs.

The interpret backend accepts all of the command-line options described in Running coNCePTuaL programs, plus the following four options:

-H, --hierarchy=<string>     Latency hierarchy as a comma-separated list
                               of task_factor:latency_delta pairs [default:
                               "tasks:1"]
  -K, --kill-reps=<number>     If nonzero, perform FOR...REPETITIONS loop
                               bodies exactly once [default: 0]
  -M, --mcastsync=<number>     Perform an implicit synchronization after a
                               multicast (0=no; 1=yes) [default: 0]
  -T, --tasks=<number>         Number of tasks to use [default: 1]

Normally, the interpret backend assigns unit latency to every communication operation. The --hierarchy option can make communication with distant tasks observe more latency than communication with nearby tasks. An explanation of the argument to --hierarchy is presented in Task latency hierarchies.

To save execution time, the --kill-reps option alters the behavior of all FOR REPETITIONS statements in the program, treating them as if they read, ‘FOR 1 REPETITION’. That is, it ignores warmup repetitions, synchronizations, and the specified number of repetitions, always using ‘1’ instead.

A multicast operation (see Multicasting) is normally treated as multiple point-to-point operations with the same send time. The --mcastsync option instructs the interpret backend to perform an implicit barrier synchronization at the end of the multicast.

The --tasks option specifies the number of tasks to simulate. Because this number can be quite large the NCPTL_LOG_ONLY environment variable (see Environment Variables) may be used to limit the set of processors that are allowed to create log files. That way, if task 0 is the only task out of thousands that logs any data, NCPTL_LOG_ONLY can specify that only one log file will be produced, not thousands. By default, all processors create a log file.

All other command-line arguments are passed to the program being interpreted.

The --output option described in Compiling coNCePTuaL programs, has special meaning to the interpret backend. When --output is used, interpret dumps a list of events to the specified file after a successful run. For example, the coNCePTuaL program ‘ALL TASKS t ASYNCHRONOUSLY SEND A 384 BYTE MESSAGE TO TASK t XOR 2 THEN ALL TASKS AWAIT COMPLETION’ results in the following event dump:

Task 0 posted a NEWSTMT at time 0 and completed it at time 0
Task 0 posted a RECEIVE at time 0 and completed it at time 0
Task 0 posted a SEND at time 1 and completed it at time 1
Task 0 posted a WAIT_ALL at time 2 and completed it at time 2
Task 1 posted a NEWSTMT at time 0 and completed it at time 0
Task 1 posted a RECEIVE at time 0 and completed it at time 0
Task 1 posted a SEND at time 1 and completed it at time 1
Task 1 posted a WAIT_ALL at time 2 and completed it at time 2
Task 2 posted a NEWSTMT at time 0 and completed it at time 0
Task 2 posted a RECEIVE at time 0 and completed it at time 0
Task 2 posted a SEND at time 1 and completed it at time 1
Task 2 posted a WAIT_ALL at time 2 and completed it at time 2
Task 3 posted a NEWSTMT at time 0 and completed it at time 0
Task 3 posted a RECEIVE at time 0 and completed it at time 0
Task 3 posted a SEND at time 1 and completed it at time 1
Task 3 posted a WAIT_ALL at time 2 and completed it at time 2

As an example of the interpret backend’s usage, here’s how to simulate 100,000 processors communicating in a simple ring pattern:

% ncptl --backend=interpret --lenient --program='All tasks t send
    nummsgs 1024 gigabyte messages to task t+1 then task num_tasks-1
    sends nummsgs 1024 gigabyte messages to task 0.' --tasks=100000
    --nummsgs=5

The preceding command ran to completion in under 5 minutes on a 1.5GHz Xeon uniprocessor workstation—not too bad considering that 488 petabytes of data are transmitted on the program’s critical path.

The interpret backend is especially useful for finding communication-related program errors:

% ncptl --backend=interpret --quiet --program='All tasks t send
    a 10 doubleword message to task (t+1) mod num_tasks.' --tasks=3
<command line>: The following tasks have deadlocked: 0 --> 2 --> 1
    --> 0

Deadlocked tasks are shown with ‘-->’ signifying “is blocked waiting for”. In the preceding example, all receives are posted before all sends. Hence, task 0 is blocked waiting for task 2 to send it a message. Task 2, in turn, is blocked waiting for task 1 to sent it a message. Finally, task 1 is blocked waiting for task 0 to send it a message, which creates a cycle of dependencies.

The interpret backend can find other errors, as well:

% ncptl --backend=interpret --quiet --program='All tasks t
    asynchronously send a 10 doubleword message to task (t+1) mod
    num_tasks.' --tasks=4
<command line>: The program ended with the following leftover-event
    errors:
   * Task 0 posted an asynchronous RECEIVE that was never waited for
   * Task 0 posted an asynchronous SEND that was never waited for
   * Task 0 sent a message to task 1 that was never received
   * Task 1 posted an asynchronous RECEIVE that was never waited for
   * Task 1 posted an asynchronous SEND that was never waited for
   * Task 1 sent a message to task 2 that was never received
   * Task 2 posted an asynchronous RECEIVE that was never waited for
   * Task 2 posted an asynchronous SEND that was never waited for
   * Task 2 sent a message to task 3 that was never received
   * Task 3 posted an asynchronous RECEIVE that was never waited for
   * Task 3 posted an asynchronous SEND that was never waited for
   * Task 3 sent a message to task 0 that was never received

(A message received ASYNCHRONOUSLY is not considered received until after the corresponding AWAITS COMPLETION; hence, all of the ‘was never received’ messages listed above.)

% ncptl --backend=interpret --quiet --program='Task 0 sends a 40
    kilobyte message to unsuspecting task 1 then task 0 receives a 40
    kilobyte message from task 1.' --tasks=2
<command line>: The program ended with the following leftover-event
    errors:
   * Task 0 sent a message to task 1 that was never received
   * Task 1 terminated before satisfying task 0's RECEIVE operation

In short, it is well worth testing the correctness of new coNCePTuaL programs with interpret before performing timing runs with one of the message-passing backends.


Next: , Previous: , Up: Supplied backends   [Contents][Index]

Scott Pakin, pakin@lanl.gov