Molecule Design
This section contains notes on the design of Chemist’s molecule component.
What is a Molecule?
In chemistry molecules are sets of atoms which are covalently bonded. In computational chemistry, particularly quantum chemistry, the covalently-bonded requirement is usually dropped, i.e., a molecule is simply a set of atoms. Our present discussion assumes the computational chemistry definition.
Why do we need a Molecule (component)?
Computational chemistry aims to predict chemical phenomena. Part of the input to a computational chemistry package will always be some sort of chemical system. The molecule is used to represent the atomic part of that chemical system. Thus we need a molecule component so that users can specify the atoms in their system.
Molecule Considerations
The concept of a molecule class, is widely ingrained in most, if not all, object-oriented computational chemistry packages. As such, many developers have an implicit idea of what the molecule should do. In this section we explicitly list the considerations which went into our molecule component.
- Data structure.
Chemist is designed so that classes are primarily data structures. Manipulations of data structures are primarily done via functions (modules particularly).
- Nuclei reuse.
Typical quantum chemistry uses see the nuclei being treated as point charges. Ideally the infrastructure underlying the nuclei reuses the infrastructure for point charges.
- Quantum electrons.
Typical quantum chemistry use treats electrons as quantum mechanical particles. The main repercussion of this is that while the number of electrons of an isolated atom is well-defined, when atoms come together it becomes impossible to unambiguously assign electrons to atoms, and we must talk about “the molecule’s electrons” not “an atom’s electrons”. Sub-considerations:
Charge and spin properties of
Molecule
Molecule
really shouldn’t be a set of atoms, since that would preserve the electron to atom assignments. Should instead contain a set of nuclei.
- Performance.
Thinking of molecules as a set of atoms suggests that an array-of-structures is a natural representation. However, it is also widely known that because of modern hardware’s reliance on vectorization, a structure-of-arrays will perform better.
Not In Scope
Given the prevalence of the molecule class in other software packages, we have also explicitly listed other design points which were considered, but we ultimately decided to be out of scope for the molecule component.
- Hierarchical structure.
There are a number of electronic structure techniques (e.g., many-body expansion, symmetry-adapted perturbation theory, or basis-set superposition error) which require one to decompose the system of interest into subsystems. While valid use cases for the an electronic structure package, we have opted to treat these cases at a higher level. Reasoning:
Any hierarchical approach will necessarily need a molecular representation of each subsystem.
We require a non-hierarchical structure which can be used by methods which will assume a non-hierarchical input, i.e., we want a class that guarantees that the input should be treated in a non-hierarchical manner.
- Basis set.
Most electronic structure packages consider the atomic basis set to be an input to the package. For methods relying on atom-centered basis functions, the conversion from atomic basis set to molecular basis set is thus intrinsically linked to the molecule by virtue of the fact that the basis functions reside at the same point in space as the nuclei. We have opted to represent the atomic and molecular basis sets with separate classes, and not have them be part of the molecule component. Reasoning:
In practice the molecule is used to construct the molecular basis set. Once the molecular basis set is created, the vast majority of infrastructure needs the basis set, not the molecule (the molecule is minimally still needed for Hamiltonian terms involving the nuclei). Thus separating them helps separate concerns.
Atom-centered basis sets are not the only basis sets in common usage. In particular plane-waves are commonly used. Tying atom-centered basis sets to the molecule would create an awkward situation where plane-wave based codes would have to ignore/error if the user sets that state.
- Connectivity.
It is not uncommon to want some notion of which atoms are bonded to each other. That said, there are a number of ways to actually determine such connectivity, suggesting it would be better to create modules for each possible algorithm for mapping a molecule to a connectivity table. Given the somewhat non-trivial nature of establishing connectivity, requiring the user to provide a connectivity table is heavy-handed. Similarly, tying the molecule to a particular algorithm is unlikely to satisfy all users; thus, we have opted to represent the connectivity of a molecule with an entirely different object (the
ConnectivityTable
).- Topology.
Here we define the topology of a molecule as the distances, angles, and torsion angles between and among the atoms in the molecule. Following from arguments nearly identical to those presented for the connectivity, we have opted to store the topology in a different class as well (the
Topology
class).`
Overview of Molecule Design
Subject to the considerations raised in Molecule Considerations, the overall design of the molecule component is shown in Fig. 6. The component is comprised of two related class hierarchies. The point hierarchy describes the individual descretized units of the system, whereas the point-set hierarchy represents the overall set of descretized units.
Point Hierarchy
In addressing the Nuclei reuse. consideration, we have opted to design
the Atom
class as a derived class in a hierarchy. This allows each of the
base classes to be reused for other concepts. The design of the Atom
and
its base classes are summarized in the following subsections.
Point Class
In scientific simulations we often have a number of point-like objects (point
multipole moments, point masses, grid points, etc.). While we rarely only
care about the location of these points, i.e., each point also usually has
additional state, these point-like objects do all share the common property of
being located somewhere in Cartesian space. The Point
class has been
introduced to represent such objects.
The Point
class primarily serves as:
code-factorization for state common to the various point-like objects,
a means of writing generic algorithms which only require knowledge of an object’s coordinates
Point Charge Class
When we associate a charge with a point in space we get a point charge, and the
PointCharge
class is designed to represent such a charge. Relative to the
Point
class, the PointCharge
class adds the charge of the point charge.
The PointCharge
class serves as:
code-factorization for state common to point-charge-like objects including both point-charges, and more relevantly, nuclei,
a means of writing generic algorithms that work with any point charge, be it nuclei or actual point charges.
Nucleus Class
Main discussion: Designing the Nucleus Component.
Quantum chemistry overwhelming views nuclei as massive point-charges. The
Nucleus
class adds mass and atomic number to the PointCharge
base
class. In practice, given consideration Quantum electrons., we
expect Nucleus
objects to be the descretization of the Molecule
class,
and thus much of the state of Molecule
will be tied up in Nucleus
objects.
Atom Class
In practice, when people build molecules they do so by specifying atoms. In
electronic structure theory, an atom is a set of electrons and a nucleus. We
introduce the Atom
class to represent isolated atoms. We primarily see
the Atom
class serving as:
a means of building
Molecule
objects bypush_back
-ingAtom
objects, anda potential building block of a “classical” molecule object by, for example, setting the number of electrons off of a charge partitioning scheme such as Mulliken or Lowdin charges.
Of some note in our design, Atom
contains a Nucleus
and does not
inherit from it. This is because atoms are in general not just nuclei (the
notable exceptions being \(\text{H}^+\) and \(\text{He}^{2+}\)), but
are nuclei AND electrons.
Point Set Hierarchy
The point set hierarchy mirrors the point hierarchy, except that each class
contains one or more of its namesake descretized units. As explained in more
detail below, the introduction of new classes (as opposed to simply just using
std::vector<Point>
, for example) is in anticipation of the
Performance. consideration. More specifically, by having actual
classes we are able to define an API, which is decoupled from the
implementation, whereas if we declare Molecule
as a typedef of
std::vector<Atom>
.
Note
In C++, having polymorphic containers, with polymorphic elements, introduces
a lot of edge cases we do not want to deal with. So in practice Charges
does not actually derive from PointSet
and Nuclei
does not actually
derive from PointSet
. Instead we allow implicit conversion to read-only
references of the bases
Point Set Class
When one has more than one Point
object, we say they have a set of points.
The PointSet
class is introduced to represent a container of Point
objects. In particular PointSet
is envisioned as:
providing an array-of-structures API, potentially by returning
PointView
objects (objects which act likePoint
instances, but do not own their state),having a structure-of-arrays implementation (to facilitate vectorization)
be an ordered set (points should appear at most once and be index-able)
Charges Class
The Charges
class is simply the generalization of the PointSet
class.
The main design points:
charges stored contiguously,
could introduce
PointChargeView
for an array-of-structures APIpresumably adds a method for computing the total charge of the point charges.
Nuclei Class
When our container contains Nucleus
objects, the container is now a
Nuclei
object. Notably:
deriving form
Charges
satisfies Nuclei reuse.,mass and atomic numbers can be collected into contiguous arrays,
NucleusView
for maintaining an array-of-structures API.
Molecule Class
The namesake of the molecule component, the Molecule
class is the
culmination of the design so far. Notably the Molecule
class:
acts like an array of
Nuclei
objects consistent with Quantum electrons.,stores the electronic charge of the system (the net number of electrons), which is different than the total charge of the nuclei,
stores the multiplicity of the electrons
access to the nuclei as a
Nuclei
object addresses Nuclei reuse.
Molecule Design Summary
To summarize how Chemist’s molecule component addresses the considerations raised in Molecule Considerations:
- Data structure.
This consideration factored more into deciding what was and was not part of the molecule component. In particular, the desire for the classes to be data structures precluded inclusion of complicated methods, such as those for determining topology or connectivity.
- Nuclei reuse.
The fact that
Nucleus
andNuclei
respectively derive fromPointCharge
andCharges
class, means the classes can be used seamlessly whereverPointCharge
andCharges
are used.- Quantum electrons.
The
Molecule
class contains a set ofNucleus
objects, not a set ofAtoms
. This reflects the fact that once we create theMolecule
we can no longer, unambiguously, assign electrons to centers. The (quantum) electrons in theMolecule
class are stored implicitly via charge and multiplicity.- Performance.
By opting for full-fledged classes for the sets in the component, we are able to separate the API of the class from its implementation, which would not be as easy if we had opted to represent a molecule as a
std::vector<Atom>
for example.
Future Considerations
Could template
Atom
on the type of the nucleus and introduce additional nucleus types in cases where the nucleus is to be treated quantum mechanically.