Motivating TensorWrapper
Why Tensors?
TensorWrapper defines a tensor as a multi-dimensional array of values. The values are usually scalars, but in general can be other tensors. In practice this means that more-or-less any indexed quantity is a tensor (according to TensorWrapper). Given the prevalence of indexed-quantities in physics equations, tensors can be seen as a natural domain-specific language (DSL) for physics, since most physics theories can be expressed succinctly in terms of tensors. As a perfect example consider the energy, \(E\), of an \(N\)-dimensional harmonic oscillator:
where \(k_i\) and \(\left(r_i-r^{(0)}_i\right)\) respectively are the force constant and displacement in the \(i\)-th dimension. This is a perfect example because the equation is simple to express in terms of tensors, but few people would actually code this up as a tensor equation, which brings us to…
Are tensors a sufficient DSL?
In practice, most people would compute \(E\) something like:
double e = 0.0;
for(auto i = 0; i < N; ++i){
auto dr = r[i] - r0[i];
e+= k[i] * dr * dr;
}
return 0.5 * e;
That is they’d use a for-loop. Why? Consider pseudo code for the tensor-based version:
Tensor dr, dr2, e;
dr("i") = r("i") - r0("i");
dr2("i") = dr("i") * dr("i");
e = 0.5 * k("i") * dr2("i");
Ignoring compiler optimizations, and taking the code at face value the
operation counts are the same for each algorithm (\(N\)
subtractions, \(N\) squares, and an \(N\)-element dot-product). The primary
difference is that the tensor version stores two additional \(N\)-element vectors
(the displacements and the square of the displacements). While not strictly
inferrable from the code presented here, assuming r
,
r0
, and k
are something akin to a double *
, it is also quite
likely that the compiler will be able to better optimize the loop-based
version.
So if the for-loop produces better overall code, why would we want the DSL
version? The short answer is the for-loop based version is not optimal in
all situations. By using the DSL, we can express our intent while punting the
optimizations to the backend. This is because, using modern object-oriented
programming techniques, there is a disconnect between the apparent API and how
things are actually implemented. Point being, just because the tensor DSL code
looks like it has extra memory usage doesn’t mean it really does (simple
reference counting techniques could identify dr
and dr2
as
intermediates). Furthermore it’s entirely possible that, as written, the
implementation underlying the tensor DSL is parallelized and/or GPU
accelerated. The loop-based version is, however, not parallelized, and would
need to be rewritten for threading and or process parallelization. This can
potentially be a lot of work if multiple loop-based implementations need to be
rewritten.
Why TensorWrapper?
In short, there’s no tensor library out there (that we can find) that really strives to achieve the full-featured DSL we want. While a number of tensor libraries have a DSL, the libraries assume you’re willing to drop out of the DSL for more complicated optimizations. In practice, these are precisely the optimizations we are interested in.