The ParallelZone Worldview

Note

TL;DR. In ParallelZone the RuntimeView instance is your handle to the runtime environment. RuntimeView is a container of ResourceSet objects. ResourceSet objects describe the resources (i.e., RAM, CPUs, GPUs, etc.) that processes have direct/local access to.

To use ParallelZone it is helpful to understand the abstraction model at a high level.

Focusing on Resources

In ParallelZone we go beyond MPI by mapping tasks/data to hardware, bypassing the rank concept to the extent possible. More specifically, in ParallelZone we start by assuming that when a multi-process program starts running there is some total set of resources (here a “resource” is a somewhat catchall term that includes CPU, GPU, memory, etc.) available to that program. This total set of resources may or may not be all of the resources on the computer and it also may or may not be the case that ParallelZone can access all of the program’s resources. Regardless, the set of resources ParallelZone can access forms the RuntimeView.

../_images/nesting_runtime_views.png

Fig. 3 Here the blue oval depicts the total resources the program is allowed to use and the green oval shows the resources the program lets ParallelZone use. In the top scenario ParallelZone manages all of the program’s resources, whereas in the bottom scenario it manages only some of the program’s resources.

To better illustrate the relationship between the program’s total resources and the RuntimeView, Fig. 3 shows two possible ways a RuntimeView may initialized. The top scenario shown in Fig. 3 occurs when the entire program uses ParallelZone as its parallel runtime. In this scenario a job is spun up with some set of resources and ParallelZone is allowed to manage all of them (N.B., while both scenarios in Fig. 3 show ParallelZone only managing one node ParallelZone can manage multiple nodes). Another common scenario occurs when a program relying on ParallelZone is being driven by another program. In this scenario, the program has access to more resources than it shares with ParallelZone.

Programs built on ParallelZone treat RuntimeView as the full set of resources regardless of whether it is or isn’t.

Resource Affinity

Simply knowing the total amounts of resources available isn’t quite enough. We also need to know which resources the current process has an affinity for. In ParallelZone we keep this simple by partitioning each process’s resources into two sets: those it has an affinity for and those it doesn’t. The set of resources a process has an affinity for is termed that process’s ResourceSet. The ResourceSet of a process is populated with the resources in the RuntimeView which are located physically on the node where the process is running.

../_images/resourceset_mapping.png

Fig. 4 ParallelZone’s view of the runtime environment for the common scenario of one process per node.

Fig. 4 illustrates how ParallelZone sees the hardware in the runtime environment when the program has one process running on each node. In this scenario RuntimeView is managing the entire runtime environment, meaning the RuntimeView object can see both nodes. The RuntimeView is then split in to two ResourceSet instances, each instance seeing the node the current process is running on. In this scenario the ResourceSet objects are disjoint, i.e., they do not share resources.

ParallelZone does not restrict users to running one process per node. If a user runs more processes per node, then ParallelZone will give each of those processes its own ResourceSet; however, the ResourceSet objects will no longer be disjoint since each process on a node can see the same resources.

The Vision

At the moment ParallelZone’s feature set is pretty bare bones. The medium term goal is for the ResourceSet objects to have task schedulers. Users will estimate the resources needed for a task and tell a particular ResourceSet to run the task. The schedulers will automatically figure out how to best run the task based on runtime conditions. Longer term we want to take this a step further and add task schedulers to RuntimeView. The task schedulers on RuntimeView would accept task graphs and take care of assigning the tasks in the graph to the schedulers in the individual ResourceSet objects.