.. Copyright 2023 NWChemEx-Project .. .. Licensed under the Apache License, Version 2.0 (the "License"); .. you may not use this file except in compliance with the License. .. You may obtain a copy of the License at .. .. http://www.apache.org/licenses/LICENSE-2.0 .. .. Unless required by applicable law or agreed to in writing, software .. distributed under the License is distributed on an "AS IS" BASIS, .. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. .. See the License for the specific language governing permissions and .. limitations under the License. .. _module_design: ############################## Designing the Module Component ############################## The :ref:`call_graph_design` section motivated the need for a module component (although the fact that we've written "module" like a thousand times by now ought to have suggested it too...). ***************************** What is the Module Component? ***************************** See :ref:`module` for PluginPlay's definition of a module. The module component is primarily responsible for interfacing PluginPlay to the algorithm the module developer wrote. ******************************* Module Component Considerations ******************************* .. _mc_user_interface: User interface. Stemming from :ref:`call_graph_design`, one of the motivating factors for the module component is to provide a mechanism for a user to interface their algorithm with PluginPlay. .. _mc_memoization: Memoization. Also stemming from :ref:`call_graph_design`, caching/memoization of a module is the responsibility of the module component. - Ideally largely automated - Developers may need mark modules as incapable of being memoized when they are not sufficiently "pure" (*i.e.*, they have side effects, are non-deterministic, and/or depend on global input). .. _mc_construction_phase: Have a construction phase. Modules will ultimately be classes. The construction phase will happen in the constructor and may include: - Used by developer to register the module's metadata. - Also can initialize constant state needed by the module. .. _mc_run_hook: Expose a ``run`` hook. As agreed upon in :ref:`call_graph_design`, executing a module happens by calling a ``run`` member. Members of the module component must expose such a member. - The ``run`` member should contain minimal branching. Traditional if/else logic needed for determining what function to run should be handled by selecting submodules ahead of time. .. _mc_store_call_back_points: Store callback points. The graph only works if nodes can call other nodes. Modules must be able to call sub-modules, which requires somehow being able to hold the submodules. .. _mc_driver_modules: Driver modules. Motivation for driver modules is given below. In short, we need modules whose sole purpose is to do some setup and then call another/other module(s). Need for Driver Modules ======================= .. _fig_switching_modules: .. figure:: assets/switching_modules.png :align: center Left the original graph. Right the graph resulting from using module "E" instead of "D". The question is how can both graphs be loaded into the ``ModuleManager`` simultaneously? Consider the two graphs shown in :numref:`fig_switching_modules`. Let's call the left graph "L" and the right graph "R". If we choose to have graph "L" be the default graph that is loaded into the ``ModuleManager`` the user can go from graph "L" to graph "R" by telling the ``ModuleManager`` to switch the submodule node "C" uses from node "D" to node "E". While viable, this is not necessarily user-friendly as running "R" vs "L" means the user needs to know to switch "D" to "E". If we wanted to make it easy to run both "R" and "L" one option is to make copies of the "A" and "C" modules. Let "RA" and "RC" respectively be those copies. Then it becomes possible to have both the "L" and "R" graphs loaded into the ``ModuleManager`` by default. More specifically, "L" is loaded in in the same manner, "R" is loaded in by having "RC" call "E" and "RA" call "B" and "RC". While this solution works, it can be tedious depending on how nested the graph is. It also can be wasteful because the two graphs may have a substantial amount of overlap. *********************** Module Component Design *********************** .. _fig_module_design: .. figure:: assets/module_design.png :align: center Module component design showing the contact points. Users of the module go through the ``Module`` class :ref:`api`. PluginPlay interacts with the user of the module via the ``ModulePIMPL`` class and the module developer via the ``ModuleBase`` class. Module developers interact with the component by deriving from ``ModuleBase``. The design of the module component is shown in Fig :numref:`fig_module_design`. The subsections below go over the major pieces in more detail. Module Development ================== Following traditional object-oriented practices module developers implement modules by deriving from the ``ModuleBase`` class. In the constructor of their module, module developers set the property type(s) their module satisfies, any additional inputs/results (beyond those specified by the property type), callback hooks used throughout, and the metadata (version, author, papers to cite, *etc.*). The actual state provided in the constructor is stored in the ``ModuleBase`` part of the object and preserved in the state provided. When users change inputs, or callbacks the user's requests are actually stored in the ``ModulePIMPL``. The other half of implementing a module is done when the module developer overrides the ``run_()`` member. This member is assumed to be a pure function (a pure function always returns the same results for the same inputs, and has no side-effects). PluginPlay helps enforce this assumption by making the ``run_()`` member ``const``. The need for a pure function is brought on by the desired black-box nature and for memoization purposes. To be treated as a black box the module must receive no "hidden" inputs including from global variables, files, or state not registered with PluginPlay. In practice, particularly when considering modules meant to be called iteratively, a module may need access to modifiable state. This is where the "Temporary Cache" comes in. The derived class is able to put/get data in/out of the temporary cache using a key-value system. Driver Module Development ========================= To address :ref:`mc_driver_modules` we introduce the idea of a driver module. Design 1.0 ---------- .. note:: This is here for historic context, it's NOT current. To ensure that driver modules interoperate with other modules, driver modules also inherit from ``ModuleBase``. Keeping with :ref:`mc_run_hook`, we want the ``run`` member of the driver module to have minimal branching, thus logic for swapping modules should happen before ``run`` is called. Our solution is to introduce ``ModuleBase::pre_run``. This method allows the derived module to manipulate the input values and submodules ``run`` will call before they are passed to ``run``. By default ``ModuleBase::pre_run`` will just return the inputs and submodules provided to it. To define a driver module, the module developer overrides the default implementation. For symmetry we also introduce ``ModuleBase::post_run`` which allows the derived class to manipulate the results before they are given back to the caller of ``Module::run``. The official C++ API for declaring a module is to use the ``DECLARE_MODULE`` macro. If the user is going to override ``pre_run`` or ``post_run`` this changes the declaration needed (i.e., the signature for ``pre_run`` and/or ``post_run`` must be part of the declaration). To avoid an API break we introduce a new macro ``DECLARE_MODULE_DRIVER``, for symmetry we require users to override both ``pre_run`` and ``post_run`` if they choose to write a driver (even if they only need one or the other). Design 2.0 ---------- In prototyping design 1.0, it was realized that ``Module::run`` looks like: .. code-block:: c++ std::tie(inputs, submods) = module.pre_run(inputs, submods); auto rv = module.run(inputs, submods); rv = module.post_run(inputs, submods, rv); With nothing between ``pre_run`` and ``run`` (or ``run`` and ``post_run``) there is no reason (aside from partitioning preference) why the module developer can't just put their pre-run and post-run logic inside their module's run overload. More specifically the same inputs and submods that would go to ``pre_run`` can just be fed to ``run``, then the same logic which would have happened in ``pre_run`` can just happen in ``run``. Similarly all information which would have been fed into ``post_run`` is also available in ``run``. Ultimately, it was thus realized that pre- and post- conditions can be handled as is. ******* Summary ******* The above design specifically addresses the stated considerations by: :ref:`mc_user_interface` - Module developers inherit from ``ModuleBase`` and fill in the virtual ``run_`` member. - Metadata for the module can be registered with ``ModuleBase`` (and thus PluginPlay) in the derived class's ctor. :ref:`mc_memoization` - ``ModulePIMPL`` performs memoization. :ref:`mc_construction_phase` - Derived classes use their constructor to set meta-data. :ref:`mc_run_hook` - ``Module`` exposes the ``run`` (and more useful ``run_as``) which executes the module. :ref:`mc_store_call_back_points` - ``ModuleBase`` records the hooks (property types and associated tag) for each call back location. - ``ModulePIMPl`` holds the bound callbacks for each hook. :ref:`mc_driver_modules` - Driver modules can be