The Matrix-Vector Product and the SVD
Fri, 17 Feb 2023
Computer Science, Data Science, Machine Learning, Mathematics, Statistics
The singular value decomposition (SVD) allows one to re-write a given matrix as a sum of rank one matrices. Specifically, using the SVD, one may re-write a given matrix
A as follows:
\[\displaylines{\mathbf{A}=\mathbf{U}\mathbf{\Sigma}\mathbf{V}^T = \sum\limits_{i=1}^{n}{\mathbf{U}_i\mathbf{\Sigma}_{ii}\mathbf{V}_i^T} }\ ,\]
where \(\mathbf{V}_i^T\) is the transpose of the
i-th column of
V. Further, the Eckart-Young-Mirsky theorem proves that the best rank
k approximation to the matrix
A is found by summing only the first
k elements of the right-hand sum.
Properties of the SVD state that the columns of the matrix
U contain the eigenvectors of the matrix \(\mathbf{A}\mathbf{A}^T\). Further, the columns of the matrix
V contain the eigenvectors of the matrix \(\mathbf{A}^T\mathbf{A}\). Finally, the non-zero elements of the diagonal matrix \(\mathbf{\Sigma}\) are the square-roots of the eigenvalues of \(\mathbf{A}^T\mathbf{A}\) or \(\mathbf{A}\mathbf{A}^T\).
Putting all these together provides an interesting insight into the behavior of the standard matrix-vector product. Specifically, consider an arbitrary vector
b and the matrix-vector product
Ab. If
A is an
mx
n matrix and
b is an
nx
1 vector, then the resulting product is a linear combination of the columns of
A and hence an
mx
1 vector.
Another perspective on this product is obtained by considering the rank-one decomposition. The matrix-vector product may be re-written as:
\[\displaylines{\mathbf{A}\mathbf{b}= \sum\limits_{i=1}^{n}{\mathbf{U}_i\mathbf{\Sigma}_{ii}\mathbf{V}_i^T\mathbf{b}} }\ .\]
Note that \(\mathbf{V}_i^T\mathbf{b}\) is the product of a
1x
n vector with an
nx
1 vector and so is equivalent to the standard dot-product. In addition, \(\mathbf{\Sigma}_{ii}\) is the
i-th singular value and so is also a scalar. If one assumes that both the eigenvectors and the vector
b are normalized, this dot product is equivalent to the cosine similarity between
b and \(\mathbf{V}_i\).
This product may be seen as a measurement of the extent to which
b and \(\mathbf{V}_i\) point in the same direction. Specifically, the cosine similarity ranges from 1 to -1 for vectors pointing in the same to opposite directions respectively. Next, this product is scaled proportional to the variance explained by the
i-th component by multiplying by the
i-th singular value. The resulting scalar is then multiplied by the
i-th eigenvector of \(\mathbf{A}\mathbf{A}^T\) and finally the resulting vectors are summed together.
In this way, the matrix-vector product
Ab is written as a linear combination of the eigenvectors of \(\mathbf{A}\mathbf{A}^T\). Specifically, the more that
b points in the direction of \(\mathbf{V}_i\), the more the resulting sum moves in the direction of the
i-th left singular vector \(\mathbf{U}_i\), scaled proportionally by the explained variance.
One can imagine beginning at the origin and then taking
n steps. At each step, the dot-product of
b with the
i-th right-singular vector along with the
i-th singular value determine how far to move in (or against) the direction of the
i-th left-singular vector. After
n steps one arrives at the vector
Ab.