.. Copyright 2022 NWChemEx-Project
..
.. Licensed under the Apache License, Version 2.0 (the "License");
.. you may not use this file except in compliance with the License.
.. You may obtain a copy of the License at
..
.. http://www.apache.org/licenses/LICENSE-2.0
..
.. Unless required by applicable law or agreed to in writing, software
.. distributed under the License is distributed on an "AS IS" BASIS,
.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
.. See the License for the specific language governing permissions and
.. limitations under the License.

.. _runtime_view_design:

###########################
Designing RuntimeView Class
###########################

The need for the ``RuntimeView`` class grew out of
:ref:`parallel_runtime_design`. Here we flesh the design out more.

*************************************
Why Do We Need the RuntimeView Class?
*************************************

As discussed in :ref:`parallel_runtime_design`, the ``RuntimeView`` class is
ParallelZone's abstraction for modeling the runtime environment. ``RuntimeView``
will also serve as the top-level :ref:`api` for accessing ParallelZone
functionality.

*****************************************
What Should the RuntimeView Class Manage?
*****************************************

ParallelZone assumes it is managing the runtime environment for a potentially
multi-process program. In general that program can be running on a laptop, the
number one supercomputer in the world, or anything in between.

1. ``RuntimeView`` is a view of the runtime environment at the highest level.
   System-wide state will be accessible from ``RuntimeView``. State includes:

   - System-wide scheduling
   - System-wide printing
   - Processes
   - Hardware

#. Multi-process operations need to go through ``RuntimeView``.
#. MPI compatibility.
#. Flexibility of backend.
#. Setup/teardown of parallel resources

   - See :ref:`understanding_runtime_initialization_finalization` for more
     details, but basically we need callbacks.

************************
RuntimeView Architecture
************************

.. _fig_runtime_view:

.. figure:: assets/runtime_view.png
   :align: center

   Schematic illustration of the RuntimeView class and its major pieces.

The architecture of ``RuntimeView`` is shown in :numref:`fig_runtime_view`. This
addresses the above consideration by (numbering is from above):

1. Introducing components for each piece of state. Process to component affinity
   is maintained by ``ResourceSet`` objects.
#. Exposing MPI All-to-All operations at the ``RuntimeView`` level.

   - MPI All-to-One and One-to-All Operations are delegated to ``RAM`` and
     ``GPU`` objects in a particular ``ResourceSet``.
   - This facilitates selecting start/end points.

#. MPI support happens via the ``CommPP`` class.

#. The use of the PIMPL design allows us to hide many of the backend types. It
   also facilitates writing an implementation for a different backend down the
   line (although the API would need to change too).

#. Storing of callbacks allows us to tie the lifetime of the ``RuntimeView`` to
   the teardown of parallel resources, i.e., ``RuntimeView`` will automatically
   finalize any parallel resources which depend on ``RuntimeView`` before
   finalizing itself.

   - Note, finalization callbacks are stored in a stack to ensure a controlled
     teardown order as is usually needed for libraries with initialize/finalize
     functions.

Some finer points:

- The scheduler is envisioned as taking task graphs and scheduling them in a
  distributed and threaded manner. The latter relies on collaboration with
  ``ResourceSet`` instances.
- The Logger is an instance that every process can access. It represents the
  authoritative printing of the program. It's exact behavior should be
  customizable, but it is assumed this logger is always called in a SIMD manner
  by all processes. Default behavior is to redirect all output to ``/dev/null``
  except that of the root process.

*************
Proposed APIs
*************

Examples of all-to-all communications:

.. code-block:: c++

   auto rt = get_runtime();

   auto data = generate_data();

   // This is an all gather
   auto output = rt.gather(data);

   // This is an all reduce
   auto output2 = rt.reduce(data, op);


Example of tying another library's parallel runtime teardown to the lifetime of
a ``RuntimeView`` (note this is only relevant when ParallelZone starts MPI):

.. code-block:: c++

   // Create a RuntimeView object
   RuntimeView rt;

   // Initialize the other library
   other_library_initialize();

   // Register the corresponding finalization routine with the RuntimeView
   rt.stack_callback(other_library_finalize);

.. note::

   As written the APIs assume the data is going to/from RAM. If we eventually
   want to support other memory spaces we could create overloads which take the
   target space. In particular we note that we can NOT do things like:

   .. code-block:: c++

      auto output = rt.my_resource_set().ram().gather(data);

   because that would result in deadlock (it calls a series of all-to-one calls
   where each rank thinks it's the root).