Designing ParallelZone’s Parallel Runtime

Designing ParallelZone described the overall design points which went into the architecture of ParallelZone. Here we focus on the architecture of the parallel runtime component.

Why Do We Need a Parallel Runtime?

The primary goal of ParallelZone is to be a runtime system for parallel software. To accomplish this ParallelZone must have abstractions the software can use to interact with the runtime environment in a parallel manner. Ideally these interactions should make the task of writing parallel code as simple as possible.

Design Considerations

The parallel runtime will be responsible for addressing many of the considerations raised in the Designing ParallelZone section. Including:

  1. Object-oriented (both the parallel runtime itself and the data it operates on)

  2. Support for process, CPU-threading, and GPU-threading

  3. Extendable if/when new hardware parallelism arises (e.g. tensor cores)

  4. Support for MPI

  5. Scheduling of tasks

Additionally, we recognize that the parallel runtime will:

  1. manage the lifetime of parallel resources (think MPI_initialize)

  2. track resource usage

  3. logical partitioning

  4. needs to understand locality/affinity of hardware

  5. compatible with not being the top (i.e., should work even when it’s not wrapping MPI_COMM_WORLD).

Parallel Runtime Design

../../_images/parallel_runtime_design.png

Fig. 6 High-level architecture of the parallel runtime component of ParallelZone.

Fig. 6 provides a high-level overview of the parallel runtime in ParallelZone. The RuntimeView class is an abstraction which models the runtime environment. The ResourceSet class is used to group resources into affinity sets. The various hardware classes (e.g., CPU, RAM, etc.) and the Scheduler instances are still considered opaque entities for the purposes of this page.

RuntimeView

Full discussion Designing RuntimeView Class.

As the name suggests RuntimeView is a view of the actual runtime environment. This means it does not own the resources it accesses. Similarly, copying RuntimeView instances do not create more resources, instead it just makes another view to the same set of resources. Conceptually, RuntimeView is a container filled with ResourceSet objects.

The RuntimeView class directly addresses (numbers refer to above):

  1. RuntimeView is an object, methods in the interface assume they interact with objects.

  2. Processes map to ResourceSet objects, so managing ResourceSet objects is a large part of supporting process parallelism.

  1. Ultimately powered by MPI

  2. Responsible for managing tasks which use more than one process.

  3. Reference counting of RuntimeView used to manage MPI lifetime

  4. Tracks resource usage above the process level.

  5. Supports logical partitioning at or above the process level.

  6. Responsible for the overall mapping of hardware to ResourceSet

  7. State is tied to underlying MPI communicator. Splitting the communicator is allowed.

ResourceSet

Full discussion is at Designing ParallelZone’s ResourceSet Class.

Each process is associated with a ResourceSet instance. The ResourceSet affiliated with the current process contains the resources which are local and directly accessible to that process.

  1. It’s an object, uses objects to model resources, and assumes it will interact with objects.

  2. Managing parallelism of CPU/GPU threads handled by sub-objects.

  3. Additional members can be added for new hardware.

  4. Data movement from one ResourceSet to another uses MPI.

  5. Responsible for managing tasks below the process level.

  6. Tracks resource usage below process level

  7. Allows logical partitioning of local resources

  8. Understands that its resources are local and resources in other ResourceSet objects are in general local.

Hardware Classes

See discussions at Designing RAM Class.

These classes have not been designed in detail yet. For now we note they should allow the user to query whatever information they want in order to make informed runtime decisions. They also are responsible for encapsulating the mechanisms for interacting with that device (e.g., sending result from the memory of one ResourceSet to that of another).

Scheduler Classes

The scheduler has not been designed yet.

Additional Options

Ultimately the runtime environment is a singleton. It may make more sense to actually create a class Runtime which holds the state of the runtime environment and have RuntimeView really hold a reference to the Runtime object. The primary benefit of this is that it better maps to MPI (which really is a singleton).