Tensor Train / MPS Tensors and Operators
This Chapter is still work in progress. Currently missing are sections on:
- introduction with short explanation of the TT format (both TTTensor and TTOperator)
- advanced usage:
- lazy evaluation
- access to nodes
- require_correct_format()
- sparse components
In the following, whenever something is said for the TTTensor
class, it also holds true for the TTOperator
class unless
otherwise stated. Everything said explicitely about the TTOperator
class though does not hold for the TTTensor
class.
Construction
The basic constructors for TTTensor
and TTOperator
objects work analogous to the respective Tensor
constructors.
// create a degree 0 TTTensor with entry 0
xerus::TTTensor();
// create a 1x1x1x1 TTOperator with entry 0
xerus::TTOperator(4);
// create a 2x3x4 TTTensor with all entries equal to 0
xerus::TTTensor({2,3,4});
// create a random TTTensor of dimensions 4x4x4 and ranks (2, 2)
xerus::TTTensor::random({4,4,4}, {2,2});
// and a random TTOperator of dimensions 5x5x5x5 and ranks (2)
xerus::TTOperator::random({5,5,5,5}, {2});
// construct an identity operator of dimensions (2x3) x (2x3)
xerus::TTOperator::identity({2,3,2,3});
// a 2x3x4 TTTensor with all entries = 1
xerus::TTTensor::ones({2,3,4});
// a 3x4x3x4 TTTensor with superdiagonal = 1 (where all 4 indices coincide) and = 0 otherwise
// note that this TTTensor is of rank r = min(dimension)
xerus::TTTensor::kronecker({3,4,3,4});
// a 2x2x2 TTTensor with a 1 in position {1,1,1} and 0 everywhere else
xerus::TTTensor::dirac({2,2,2}, {1,1,1});
# create a degree 0 TTTensor with entry 0
xerus.TTTensor()
# create a 1x1x1x1 TTOperator with entry 0
xerus.TTOperator(4)
# create a 2x3x4 TTTensor with all entries equal to 0
xerus.TTTensor([2,3,4])
# create a random TTTensor of dimensions 4x4x4 and ranks (2, 2)
xerus.TTTensor.random([4,4,4], [2,2])
# and a random TTOperator of dimensions 5x5x5x5 and ranks (2)
xerus.TTOperator.random([5,5,5,5], [2])
# construct an identity operator of dimensions (2x3) x (2x3)
xerus.TTOperator.identity([2,3,2,3])
# a 2x3x4 TTTensor with all entries = 1
xerus.TTTensor.ones([2,3,4])
# a 3x4x3x4 TTTensor with superdiagonal = 1 (where all 4 indices coincide) and = 0 otherwise
# note that this TTTensor is of rank r = min(dimension)
xerus.TTTensor.kronecker([3,4,3,4])
# a 2x2x2 TTTensor with a 1 in position {1,1,1} and 0 everywhere else
xerus.TTTensor.dirac([2,2,2], [1,1,1]);
Note, that all of the above constructors, except for the identity operator, work both for TTTensor
s and TTOperator
s.
In case the TTTensor was derived mathematically, every component of it is likely available individually. In such cases the TTTensor can also be constructed per component in xerus.
// construct a TTTensor of the correct degree
xerus::TTTensor U(3);
// and set every component individually to previously calculated Tensors
U.set_component(0, c0);
U.set_component(1, c1);
U.set_component(2, c2);
// ensure that U is canonicalized
U.canonicalize_left();
# construct a TTTensor of the correct degree
U = xerus.TTTensor(3)
# and set every component individually to previously calculated Tensors
U.set_component(0, c0)
U.set_component(1, c1)
U.set_component(2, c2)
# ensure that U is canonicalized
U.canonicalize_left()
Note two things here:
- The components passed to .set_component() have to be of dimensions $r_{i-1}\times n_i\times r_i$ (for TTTensors) respectively $r_{i-1}\times m_i\times n_i\times r_i$ (for TTOperators). Neither ranks nor external dimensions have to coincide with the respective values the TTTensor had before the call to
set_component
, but before any other function is used, the ranks of neighboring components have to be consistent. - As
xerus
will not check for orthogonality of the components, it will assume that the thusly constructed TTTensor is not canonicalized (unless only the core was changed). To ensure numerical stability of the calcuations you should always finish your construction with a call to .canonicalize_left() or fix the canonicalization in another way (see below).
Core Position and Canonicalization
Per default xerus
will always construct TTTensors in which all but the first component are orthogonal (ie. canonical with core position 0 / left-canonical).
Unless you explicitely construct TTTensor via .set_component()
or change their core position you thus don’t have to worry about
this detail.
If two or more TTTensors are added, contracted (with TTOperators) or multiplied entrywise, that do not have the same core position (or if one of them is not canonicalized),
xerus
will not attempt to canonicalize the result. As such calculations can become very unstable numerically you should
be very cautious when you do this.
If you want to ensure that several TTTensors are compatible, you can check .canonicalized to see if the object is canonicalized and .corePosition to see which component is not orthogonal.
Canonicalization can be achieved by a call to .move_core(),
.canonicalize_left() (equivalent to .move_core(0)
) or
.canonicalize_right() (equivalent to .move_core(.degree()-1)
. (There is also
.assume_core_position() if you really know what you are doing.)
Casting from and to Tensors
Fully contracting a TTTensor network to a single tensor is simple, both theoretically and in xerus
: simply cast the TTTensor to
a Tensor. The other direction is a bit more complicated. Per default, xerus
will calculate the exact (up to machine precision)
rank revealing decomposition of the input tensor via successive SVDs. Optional arguments can modify this behaviour though to
instead calculate a quasi-best approximation with a given rank or precision.
// construct a random tensor
auto A = xerus::Tensor::random({10,10,10,10});
// calculate a quasi-best approximation of A with ranks (8,8,8)
xerus::TTTensor ttA(A, {8,8,8});
// contract the TT network to a single tensor
xerus::Tensor A2(ttA);
// and output the error of the low-rank approximation
XERUS_LOG(info, "approximation quality: " << frob_norm(A2-A)/frob_norm(A));
# construct a random tensor
A = xerus.Tensor.random([10,10,10,10])
# calculate a quasi-best approximation of A with ranks (8,8,8)
ttA = xerus.TTTensor(A, [8,8,8])
# contract the TT network to a single tensor
A2 = xerus.Tensor(ttA)
# and output the error of the low-rank approximation
print("approximation quality:", frob_norm(A2-A)/frob_norm(A))
Ranks and Rounding
To decrease the rank of an already existing TTTensor, xerus
offers the .round() method.
It can be called with either a single rank that will be used as a maximum for all positions, a maximal rank tuple or a precision
eps
that determines how small singular values need to be relative to the largest singular valueto be discarded.
Determining the current rank can be done with .ranks() or with .rank(i).
// construct a random tensor
auto A = xerus::Tensor::random({5,5,5,5});
// calculate an exact TT decomposition of A
xerus::TTTensor ttA(A);
// we expect this to be of maximal rank (5, 25, 5)
XERUS_LOG(info, "ranks: " << ttA.ranks());
// from maximal rank (25) down to 1
for (size_t r = 25; r>0; --r) {
// round ttA down to rank r
ttA.round(r);
// and determine the approximation error
XERUS_LOG(info, "rank: " << ttA.rank(1) << " approximation error: "
<< frob_norm(xerus::Tensor(ttA)-A)/frob_norm(A));
}
# construct a random tensor
A = xerus.Tensor.random([5,5,5,5])
# calculate an exact TT decomposition of A
ttA = xerus.TTTensor(A)
# we expect this to be of maximal rank (5, 25, 5)
print("ranks:", ttA.ranks())
# from maximal rank (25) down to 1
for r in reverse(xrange(1, 25)) :
# round ttA down to rank r
ttA.round(r)
# and determine the approximation error
print("rank:", ttA.rank(1), "approximation error:",
xerus.frob_norm(xerus.Tensor(ttA)-A)/A.frob_norm())
Rounding to a fixed rank tuple, ie. finding a quasi-best approximation on the set of TTTensors of this smaller rank, is a projection
that is often used in Riemannian algorithms and is necessary because many operations on TTTensors increase their rank.
(For a more precise but also more time consuming alternative to .round()
see the chapter on the ALS algorithm.)
Operation | Rank of the Result |
---|---|
TTTensor + TTTensor |
$r_1 + r_2$ |
TTOperator * TTTensor |
$r_1 * r_2$ |
entrywise_product(TTTensor, TTTensor) |
$r_1 * r_2$ |
This change in ranks can also be used on purpose to increase the rank of a candidate solution.
// increase the ranks of ttA by 1 (with high probability)
ttA += ttA.frob_norm() * 1e-10
* xerus::TTTensor::random(ttA.dimensions, std::vector<size_t>(ttA.degree()-1,1));
# increase the ranks of ttA by 1 (with high probability)
ttA += ttA.frob_norm() * 1e-10 \
* xerus.TTTensor.random(ttA.dimensions, [1]*(ttA.degree()-1));
Though above method will naturally only increase the ranks with high probability and up to the maximal possible ranks.
Finally we want to mention the soft thresholding operator .soft_threshold() as it can also decrease the rank of a TTTensor. Note that it does not provide a quasi-best approximation though.
Output and Storing
In analogy to the Tensor
class, TTTensors can be queried for their degree (.degree()),
their dimensions (.dimensions) and Frobenius norm (.frob_norm()),
though not for their 1-norm (as it is hard to calculate).
Of particular interest are also the above mentioned ranks of a given TTTensor that can be obtained with .ranks() or with .rank(i).
TTTensors themselves have no .to_string
method as it is not quite obvious what that function should do. To obtain an output
similar to that of the Tensor class you can cast the object to Tensor though.
// construct a random TTTensor of rank 2
auto U = xerus::TTTensor::random({4,4,4}, 2);
// allow piping of vectors to streams
using xerus::misc::operator<<;
// query for different information:
XERUS_LOG(info, "degree: " << U.degree());
XERUS_LOG(info, "dimensions: " << U.dimensions);
XERUS_LOG(info, "frob_norm: " << U.frob_norm());
XERUS_LOG(info, "ranks: " << U.ranks());
XERUS_LOG(info, "entries:\n" << xerus::Tensor(U).to_string());
# construct a random TTTensor of rank 2
U = xerus.TTTensor.random({4,4,4}, 2);
# query for different information:
print("degree:", U.degree())
print("dimensions:", U.dimensions)
print("frob_norm:", U.frob_norm())
print("ranks:", U.ranks())
print("entries:\n", xerus.Tensor(U).to_string())
Storing and restoring TTTensorsto / from files uses the same functions as Tensors do: save_to_file() and load_from_file(). These function perform no transformation to other formats.
// construct a random 3x3x3 TTTensor
auto A = xerus::TTTensor::random({3, 3, 3}, 2);
// store the TTTensor to the file "tensor.dat"
xerus::misc::save_to_file(A, "tensor.dat");
// load the TTTensor from the file
auto B = xerus::misc::load_from_file<xerus::TTTensor>("tensor.dat");
// the following line would throw an exception
// auto C = xerus::misc::load_from_file<xerus::Tensor>("tensor.dat");
// check for correct reconstruction
XERUS_LOG(info, "restoration error: " << frob_norm(A-B));
# construct a random 3x3x3 TTTensor
A = xerus.TTTensor.random([3, 3, 3], 2)
# store the TTTensor to the file "tensor.dat"
xerus.misc.save_to_file(A, "tensor.dat")
# load the TTTensor from the file
B = xerus.misc.load_from_file("tensor.dat");
# the type of B is determined automatically by the content of the file
print("loaded type:", type(B))
# check for correct reconstruction
print("restoration error:", xerus.frob_norm(A-B))
While the operator[] cannot be used to set a single entry, it does provide a way
to efficiently calculate it. If you need to read many entries though, there might be more efficient means to do so. To calculate
(close to) all entries you should definitely cast the network to a single Tensor. If you need many entries, but it is inefficient
to calculate the contracted tensor due to its size, using a SinglepointMeasurementSet
might be the most efficient way (see
the corresponding chapter Tensor Completion to see how to do that).
Operators and Modifications
All well defined operators (+
, -
, scalar *
, / scalar
, +=
, -=
, /= scalar
, *= scalar
) are implemented for TTTensors with their
canonical meaning.
In analogy to the Tensor class, the methods .fix_mode(),
.remove_slate() and .resize_mode() can
be implemented efficiently and are thus provided by xerus
. (For an example of these see the Tensor chapter.)
Similarly the Hadamard product is implemented as entrywise_product() though it necessarily increases the rank as stated above.
Because the indexed equations for TTTensors are somewhat limited (see below), there are two more function provided explicitely for TTTensors: .transpose() to exchange the first half of the modes of a TTOperator with its second half; and the dyadic product dyadic_product() which accepts (either two or) a list of TTTensors to join in a dyadic product, which is useful e.g. for operators of the common form $M_1 \otimes I \otimes I + M_2 \otimes A \otimes I + M_3 \otimes I \otimes A$ as shown in the following example:
// "calculate" M_1, M_2, M3 and A
auto M1 = xerus::TTOperator::random({100,100}, {});
auto M2 = xerus::TTOperator::random({100,100}, {});
auto M3 = xerus::TTOperator::random({100,100}, {});
auto A = xerus::TTOperator::random({10,10}, {});
auto I = xerus::TTOperator::identity({10,10});
// construct total operator
auto Op = xerus::dyadic_product({M1, I, I})
+ xerus::dyadic_product({M2, A, I})
+ xerus::dyadic_product({M3, I, A});
# "calculate" M_1, M_2, M3 and A
M1 = xerus.TTOperator.random([100,100], [])
M2 = xerus.TTOperator.random([100,100], [])
M3 = xerus.TTOperator.random([100,100], [])
A = xerus.TTOperator.random([10,10], [])
I = xerus.TTOperator.identity([10,10])
# construct total operator
Op = xerus.dyadic_product([M1, I, I]) \
+ xerus.dyadic_product([M2, A, I]) \
+ xerus.dyadic_product([M3, I, A])
This is only well defined for TTOperators (and TTTensors), because the index order for the operator functionality is fixed in the format. Because the Tensor class is less unique, it is not possible to provide the same functionality for the Tensor class (it can be achieved there via indexed equations though).
Indexed Equations
Reordering the modes of a Tensor in the TT format is highly non-trivial. As a consequence indexed equations with tensors in a TT format are somewhat limited - if you want to stay in the TT format.
At the time of this writing (xerus
version 3), xerus
will only recognize that the result of an indexed equation
is of a TT format if it was achieved by summation, contraction with a TTOperator or multiplications
with scalars - all without a change in index orders. In any other case it will result in a object of the more general TensorNetwork class
(see Tensor Networks).
auto U = xerus::TTTensor::random({10,10,10}, {2,2});
// fast index reshuffle
xerus::TensorNetwork A;
A(i,j,k) = U(k,j,i);
// A is not stored as TTTensor even though it would still be
// one from a mathematical point of view.
// slow index reshuffle
xerus::TTTensor B;
B(i,j,k) = U(k,j,i);
// xerus does not recognize the similarity in the networks and will thus
// fully contract U to a single tensor and then use SVDs to construct B
U = xerus.TTTensor.random([10,10,10], [2,2])
# fast index reshuffle
A = xerus.TensorNetwork()
A(i,j,k) << U(k,j,i)
# A is not stored as TTTensor even though it would still be
# one from a mathematical point of view.
# slow index reshuffle
B = xerus.TTTensor()
B(i,j,k) << U(k,j,i)
# xerus does not recognize the similarity in the networks and will thus
# fully contract U to a single tensor and then use SVDs to construct B
Which method you should choose depends on your usecase. If you only want to contract other tensors to it, it is absolutely fine
to use A
. As a general Tensor Network it does not define a summation though and calculating the Frobenius norm will be much
slower than for U
(for more details see Tensor Networks).
Calculating B
was relatively slow, but as a one-time cost it might be acceptable in many usecases.
As the ideal solution performance-wise it would furthermore have been possible to construct B
via
.set_component() from the components of U
(.get_component() )
though that in turn would have been more work to write and might be less understandable to read.
Typical, well-defined indexed equations that TTTensors and TTOperators were built for look as follows
// well defined and efficient indexed equation with TTTensors and TTOperators:
// x = x + \alpha * A^T(Ax - b)
ttx(i&0) = ttx(i&0) + alpha * ttA(j/2, i/2) * ( ttA(j/2, k/2) * ttx(k&0) - ttb(j&0) );
# well defined and efficient indexed equation with TTTensors and TTOperators:
# x = x + \alpha * A^T(Ax - b)
ttx(i&0) << ttx(i&0) + alpha * ttA(j/2, i/2) * ( ttA(j/2, k/2) * ttx(k&0) - ttb(j&0) )