Molecule Design

This section contains notes on the design of Chemist’s molecule component.

What is a Molecule?

In chemistry molecules are sets of atoms which are covalently bonded. In computational chemistry, particularly quantum chemistry, the covalently-bonded requirement is usually dropped, i.e., a molecule is simply a set of atoms. Our present discussion assumes the computational chemistry definition.

Why do we need a Molecule (component)?

Computational chemistry aims to predict chemical phenomena. Part of the input to a computational chemistry package will always be some sort of chemical system. The molecule is used to represent the atomic part of that chemical system. Thus we need a molecule component so that users can specify the atoms in their system.

Molecule Considerations

The concept of a molecule class, is widely ingrained in most, if not all, object-oriented computational chemistry packages. As such, many developers have an implicit idea of what the molecule should do. In this section we explicitly list the considerations which went into our molecule component.

Data structure.

Chemist is designed so that classes are primarily data structures. Manipulations of data structures are primarily done via functions (modules particularly).

Nuclei reuse.

Typical quantum chemistry uses see the nuclei being treated as point charges. Ideally the infrastructure underlying the nuclei reuses the infrastructure for point charges.

Quantum electrons.

Typical quantum chemistry use treats electrons as quantum mechanical particles. The main repercussion of this is that while the number of electrons of an isolated atom is well-defined, when atoms come together it becomes impossible to unambiguously assign electrons to atoms, and we must talk about “the molecule’s electrons” not “an atom’s electrons”. Sub-considerations:

  • Charge and spin properties of Molecule

  • Molecule really shouldn’t be a set of atoms, since that would preserve the electron to atom assignments. Should instead contain a set of nuclei.

Performance.

Thinking of molecules as a set of atoms suggests that an array-of-structures is a natural representation. However, it is also widely known that because of modern hardware’s reliance on vectorization, a structure-of-arrays will perform better.

Not In Scope

Given the prevalence of the molecule class in other software packages, we have also explicitly listed other design points which were considered, but we ultimately decided to be out of scope for the molecule component.

Hierarchical structure.

There are a number of electronic structure techniques (e.g., many-body expansion, symmetry-adapted perturbation theory, or basis-set superposition error) which require one to decompose the system of interest into subsystems. While valid use cases for the an electronic structure package, we have opted to treat these cases at a higher level. Reasoning:

  • Any hierarchical approach will necessarily need a molecular representation of each subsystem.

  • We require a non-hierarchical structure which can be used by methods which will assume a non-hierarchical input, i.e., we want a class that guarantees that the input should be treated in a non-hierarchical manner.

Basis set.

Most electronic structure packages consider the atomic basis set to be an input to the package. For methods relying on atom-centered basis functions, the conversion from atomic basis set to molecular basis set is thus intrinsically linked to the molecule by virtue of the fact that the basis functions reside at the same point in space as the nuclei. We have opted to represent the atomic and molecular basis sets with separate classes, and not have them be part of the molecule component. Reasoning:

  • In practice the molecule is used to construct the molecular basis set. Once the molecular basis set is created, the vast majority of infrastructure needs the basis set, not the molecule (the molecule is minimally still needed for Hamiltonian terms involving the nuclei). Thus separating them helps separate concerns.

  • Atom-centered basis sets are not the only basis sets in common usage. In particular plane-waves are commonly used. Tying atom-centered basis sets to the molecule would create an awkward situation where plane-wave based codes would have to ignore/error if the user sets that state.

Connectivity.

It is not uncommon to want some notion of which atoms are bonded to each other. That said, there are a number of ways to actually determine such connectivity, suggesting it would be better to create modules for each possible algorithm for mapping a molecule to a connectivity table. Given the somewhat non-trivial nature of establishing connectivity, requiring the user to provide a connectivity table is heavy-handed. Similarly, tying the molecule to a particular algorithm is unlikely to satisfy all users; thus, we have opted to represent the connectivity of a molecule with an entirely different object (the ConnectivityTable).

Topology.

Here we define the topology of a molecule as the distances, angles, and torsion angles between and among the atoms in the molecule. Following from arguments nearly identical to those presented for the connectivity, we have opted to store the topology in a different class as well (the Topology class).`

Overview of Molecule Design

../../../_images/molecule.png

Fig. 6 Major pieces of the molecule component of Chemist.

Subject to the considerations raised in Molecule Considerations, the overall design of the molecule component is shown in Fig. 6. The component is comprised of two related class hierarchies. The point hierarchy describes the individual descretized units of the system, whereas the point-set hierarchy represents the overall set of descretized units.

Point Hierarchy

In addressing the Nuclei reuse. consideration, we have opted to design the Atom class as a derived class in a hierarchy. This allows each of the base classes to be reused for other concepts. The design of the Atom and its base classes are summarized in the following subsections.

Point Class

In scientific simulations we often have a number of point-like objects (point multipole moments, point masses, grid points, etc.). While we rarely only care about the location of these points, i.e., each point also usually has additional state, these point-like objects do all share the common property of being located somewhere in Cartesian space. The Point class has been introduced to represent such objects.

The Point class primarily serves as:

  • code-factorization for state common to the various point-like objects,

  • a means of writing generic algorithms which only require knowledge of an object’s coordinates

Point Charge Class

When we associate a charge with a point in space we get a point charge, and the PointCharge class is designed to represent such a charge. Relative to the Point class, the PointCharge class adds the charge of the point charge.

The PointCharge class serves as:

  • code-factorization for state common to point-charge-like objects including both point-charges, and more relevantly, nuclei,

  • a means of writing generic algorithms that work with any point charge, be it nuclei or actual point charges.

Nucleus Class

Main discussion: Designing the Nucleus Component.

Quantum chemistry overwhelming views nuclei as massive point-charges. The Nucleus class adds mass and atomic number to the PointCharge base class. In practice, given consideration Quantum electrons., we expect Nucleus objects to be the descretization of the Molecule class, and thus much of the state of Molecule will be tied up in Nucleus objects.

Atom Class

In practice, when people build molecules they do so by specifying atoms. In electronic structure theory, an atom is a set of electrons and a nucleus. We introduce the Atom class to represent isolated atoms. We primarily see the Atom class serving as:

  • a means of building Molecule objects by push_back-ing Atom objects, and

  • a potential building block of a “classical” molecule object by, for example, setting the number of electrons off of a charge partitioning scheme such as Mulliken or Lowdin charges.

Of some note in our design, Atom contains a Nucleus and does not inherit from it. This is because atoms are in general not just nuclei (the notable exceptions being \(\text{H}^+\) and \(\text{He}^{2+}\)), but are nuclei AND electrons.

Point Set Hierarchy

The point set hierarchy mirrors the point hierarchy, except that each class contains one or more of its namesake descretized units. As explained in more detail below, the introduction of new classes (as opposed to simply just using std::vector<Point>, for example) is in anticipation of the Performance. consideration. More specifically, by having actual classes we are able to define an API, which is decoupled from the implementation, whereas if we declare Molecule as a typedef of std::vector<Atom>.

Note

In C++, having polymorphic containers, with polymorphic elements, introduces a lot of edge cases we do not want to deal with. So in practice Charges does not actually derive from PointSet and Nuclei does not actually derive from PointSet. Instead we allow implicit conversion to read-only references of the bases

Point Set Class

When one has more than one Point object, we say they have a set of points. The PointSet class is introduced to represent a container of Point objects. In particular PointSet is envisioned as:

  • providing an array-of-structures API, potentially by returning PointView objects (objects which act like Point instances, but do not own their state),

  • having a structure-of-arrays implementation (to facilitate vectorization)

  • be an ordered set (points should appear at most once and be index-able)

Charges Class

The Charges class is simply the generalization of the PointSet class. The main design points:

  • charges stored contiguously,

  • could introduce PointChargeView for an array-of-structures API

  • presumably adds a method for computing the total charge of the point charges.

Nuclei Class

When our container contains Nucleus objects, the container is now a Nuclei object. Notably:

  • deriving form Charges satisfies Nuclei reuse.,

  • mass and atomic numbers can be collected into contiguous arrays,

  • NucleusView for maintaining an array-of-structures API.

Molecule Class

The namesake of the molecule component, the Molecule class is the culmination of the design so far. Notably the Molecule class:

  • acts like an array of Nuclei objects consistent with Quantum electrons.,

  • stores the electronic charge of the system (the net number of electrons), which is different than the total charge of the nuclei,

  • stores the multiplicity of the electrons

  • access to the nuclei as a Nuclei object addresses Nuclei reuse.

Molecule Design Summary

To summarize how Chemist’s molecule component addresses the considerations raised in Molecule Considerations:

Data structure.

This consideration factored more into deciding what was and was not part of the molecule component. In particular, the desire for the classes to be data structures precluded inclusion of complicated methods, such as those for determining topology or connectivity.

Nuclei reuse.

The fact that Nucleus and Nuclei respectively derive from PointCharge and Charges class, means the classes can be used seamlessly wherever PointCharge and Charges are used.

Quantum electrons.

The Molecule class contains a set of Nucleus objects, not a set of Atoms. This reflects the fact that once we create the Molecule we can no longer, unambiguously, assign electrons to centers. The (quantum) electrons in the Molecule class are stored implicitly via charge and multiplicity.

Performance.

By opting for full-fledged classes for the sets in the component, we are able to separate the API of the class from its implementation, which would not be as easy if we had opted to represent a molecule as a std::vector<Atom> for example.

Future Considerations

  • Could template Atom on the type of the nucleus and introduce additional nucleus types in cases where the nucleus is to be treated quantum mechanically.