in a similar manner to matrix.py; that contains implementations, this collects explanations of facts for use as tools
(not quite modelled after a textbook; if an explanation of something is good, i will link it rather than attempting to reproduce it here)
though whether to store matrices as lists of column vectors or lists of row covectors is a
however, note that elements of vectors/matrices are 0-indexed unless specified otherwise
disclaimer: at time of writing, due to the way my uni operates i have only taken a first-year linalg course (which was Not Great and not even specific to maths students); the 'further linear algebra' one is delivered to two year groups at once, once per two years, and i'm writing this just on the cusp before starting it (so expect to see it changed)
in inverting a triangular matrix, each element in the output is dependent only on the submatrix from it to the diagonal in the input; see this post on how to get a single element of an inverse thereof in terms of a determinant (warning: it is not meaningfully faster or generally useful to compute the det than the inv)
this means truncating a set of infinite triangulars to their top-left \(n\times n\) submatrices is a ring homomorphism
to be written
for a matrix \(M\) with an eigenvector \(\vec v\) of eigenvalue \(v\), the map of multiplying from the right by \(\vec v\) can be considered a ring homomorphism \(\mathbb C[M]\to\mathbb C[v]\)
in particular, this means we have (for any polynom) the eigenvalue transitivity rule that eigenvals(polynom(mat)) = map(polynom,eigenvals(mat))
(more generally this holds for univariate functions defined by power series; they converge on the matrix iff every eigenvalue is within their radius of convergence)
recall that (as detailed in
the outer product of two vectors \(w:=\vec v\otimes\vec u\) is the \(\mathrm{dim}(\vec v)\cdot\mathrm{dim}(\vec u)\)-dimensional vector whose element \(w_{i,j}\) is \(v_iu_j\) (it is not a matrix; one could listify its dimensions by writing \(w_{i,j}\) as \(w_{\mathrm{dim}(\vec u)\cdot i+j}\); ie. we replace each element of \(v\) with its own \(u\) scaled by the original value)
in particular, if \(M\) is a matrix acting on vectors \(\vec v\) in a vector space \(V\) and \(N\) a matrix acting on \(\vec u\in U\), we define the matrix \(M\otimes N\) so that \((M\otimes N)(\vec v\otimes \vec u)=(M\vec v\otimes N\vec u)\); if we listify the vector space (and likewise replace each element of \(M\) with its own \(N\) scaled by the original value), this is called the Kronecker product of \(M\) and \(N\).
in particular, each eigenvec \(\vec v\) of \(M\) with eigenval \(\lambda_v\) outer-producted with an eigenvec \(\vec u\) of \(N\) with eigenval \(\lambda_u\) is an eigenvector of \(M\otimes N\) with eigenval \(\lambda_v\lambda_u\)
so, to take the minpoly of the product of two algebraic numbers, you can take the charpoly of the Kronecker product of their minpolies' companion matrices
via the eigenval trans. rule, the eigenvalues of \(I+M\) are 1 plus the eigenvals of \(M\); more generally, we can make the \(I\) in one of the outer-multiplicand vector spaces do something else in the other space and still have this hold, ie. \(M\otimes I_U+I_V\otimes N\) has eigenvals equal to sums of those of \(M,N\) (or linear combinations by weighting either side)
since we already know how to reciprocate an algebraic (by reversing its minpoly), this completes the proof that algebraics form a field
we consider (not necessarily finite) state machines, where \(M_{ij}\) is the 'amount' (quantity, probability, etc.) of a Creature in \(j\) that moves to \(i\) (so we multiply the matrix with a vector of creatures to iterate a rule which moves/splits/reproduces them into new locations)
if all columns sum to 1, the Creature occupies a probability distribution of locations, while if they have different sums, it grows/reproduces at different rates in different places (ie. if matrix elements count multiplicities of edges between nodes, quantities of Creatures count paths by endpoint; see
in the probability distribution case, there are some cool things one can do; in particular, there is a very nice result allowing the computation of expected waiting time for the Creature to enter a particular state
if a matrix is partitionable into blocks to be of the form \(M=\begin{array}{|c|c|}\hline Q&S\\\hline 0&R\\\hline\end{array}\) (ie. once a Creature has undergone a transition in \(S\) to enter the (square) \(Q\) block, there is no escape; if there is you must pretend there is not), \(M^n=\begin{array}{|c|c|}\hline Q^n&S{1-R^n\over1-R}\\\hline 0&R^n\\\hline\end{array}\) (where \(1\) is the identity matrix). Since each column has sum 1, \(\lim\limits_{n\to\infty}R^n=0\), and \(\lim\limits_{n\to\infty}M^n=\begin{array}{|c|c|}\hline Q^n&{S\over1-R}\\\hline 0&0\\\hline\end{array}\)
the cool result is that \(\left(1\over1-R\right)_{ij}\) is the expected time spent in state \(i\) after starting in state \(j\), so the \(j\)th column sum is the total expected time to get from \(j\) to \(Q\).
in general this result allows one to compute waiting times efficiently, like
say you have \(n\) coins which each have a probability \(p\) of landing death (whereupon they die), and you flip all survivors each turn.
where states represent quantities of coins alive, the chance of \(j\) transitioning to \(i\) is \(M_{ij}=\binom jip^{j-i}q^i\), so \(M^n_{ij}=\binom jiq^{ni}(1-q^n)^{j-i}\), so \(\left(\frac1{1-M}\right)_{ij}=\binom ji\sum_{n=0}^\infty q^{ni}(1-q^n)^{j-i}\), and the expected waiting number of turns to termination is thus \(\sum_{n=0}^\infty\sum_{i=1}^j\binom jiq^{ni}(1-q^n)^{j-i}=\sum_{n=0}^\infty\left(1-(1-q^n)^j\right)=\sum_{i=1}^j\binom ji{(-1)^{i-1}\over1-q^i}\).
houses A and B contain \(k\) and \(n-k\) fleas respectively. each turn, a random flea jumps from its house to the opposite one.
(this is sometimes equivalently formulated via random walks between vertices of a hypercube, where each flea represents a dimension)
so our Markov matrix is \(M_{ij}=\begin{cases}\frac jn&i=j-1\\1-\frac jn&i=j+1\\0&\text{else}\end{cases}\)
here we instead use the fundamental theorem of ergodic Markov chains; if a chain is ergodic (for any pair of states \(i,j\), the chance of hitting \(j\) starting at \(i\) converges to \(1\) as # iterations \(\to\infty\)), it has a unique stationary distribution with eigenvalue 1, and the expected waiting time to return to a starting position is the reciprocal of the proportion of this distribution that it occupies
here we can verify that the stationary distribution has \(\binom nk\over2^n\) in state \(k\), since \(\binom nk=\frac{n-k+1}n\binom n{k-1}+\frac{k+1}n\binom n{k+1}\), so the average waiting time (via \(\frac1{(n+1)\binom nk}=[x^{n+1}](x-1)^{n-k}\log\frac1{1-x}\)) becomes \(2^n[x^{n+1}]{1-(x-1)^{n+1}\over1-(x-1)}\log\frac1{1-x}=\sum_{k=1}^{n+1}{2^{k-1}\over k}\)
(in its own section because it pertains to the two preceding ones)
there are a few nice facts that hold whenever a matrix is symmetric
two important cases in which matrices are symmetrical are adjacency matrices of undirected graphs and quadratic forms
a QF is a matrix \(Q\) that acts on vectors by \(Q(\vec x)=\vec x^TQ\vec x=\sum_{i,j\in[0..n)}Q_{ij}x_ix_j\)
in particular \(Q\)'s action is invariant under transferring some 'mass' from \(Q_{ij}\) to \(Q_{ji}\), so one can canonicalise it to either a triangular or symmetric matrix
so, for any QF \(Q\), written in symmetric form, it can be diagonalised as \(D:=R^{-1}QR\) so \(Q=RDR^{-1}\); recall that since \(R\) is a rotation matrix, \(R^{-1}=R^T\)
then when this diagonalised version acts on the vector \(\vec x\) it can be written as \(\vec x^TD\vec x=\vec x^TR^TQR\vec x=(R\vec x)^TQ(R\vec x)\)
ie. \(Q\) acts as a weighted sum of squares on an orthogonal basis of \(\mathbb R^n\) given by the columns of \(R^{-1}\)
this gives us a cool fact; if we consider a quadratic form's algebraic curve \(Q(\vec x)=c\) to have an associated function \(f\) from \(\mathbb S^{n-1}\) (the surface of an infinitesimal eyeball at the origin) to \(\mathbb R\), which gives the squared distance of roots in the ray outwards from each point on the eye, then (assuming nondegeneracy, ie. that \(Q\) is of full rank)
two other things about quadratic forms
i have a somewhat nice writeup for this residing in
for instance, the all-1s sequence's autodirichonvolution is
>>> print(stratrix(matpow(tap(lambda i: tap(lambda j: j%i==0,range(1,13)),range(1,13)),2)))
(1,2,2,3,2,4,2,4,3,4,2,6,
,1, ,2, ,2, ,3, ,2, ,4,
, ,1, , ,2, , ,2, , ,3,
, , ,1, , , ,2, , , ,2,
, , , ,1, , , , ,2, , ,
, , , , ,1, , , , , ,2,
, , , , , ,1, , , , , ,
, , , , , , ,1, , , , ,
, , , , , , , ,1, , , ,
, , , , , , , , ,1, , ,
, , , , , , , , , ,1, ,
, , , , , , , , , , ,1)
although this is not very useful to implement; even if you have a perfect sparse matrix handler, the # of nonzero entries is
you can represent operations on the vector space of power series, with Stirling numbers being the change-of-basis coefficients between ordinary and rising and falling powers, then rewrite the resulting sum formula for each term of the matrix via bivariate generating functions and annihilate the coefficient extractors; this is discussed in
see also the matrix cookbook (and its adaptation into the tensor cookbook)