The Matrix-Vector Product and the SVD

Fri, 17 Feb 2023

Computer Science, Data Science, Machine Learning, Mathematics, Statistics

The singular value decomposition (SVD) allows one to re-write a given matrix as a sum of rank one matrices. Specifically, using the SVD, one may re-write a given matrix A as follows:

\[\displaylines{\mathbf{A}=\mathbf{U}\mathbf{\Sigma}\mathbf{V}^T = \sum\limits_{i=1}^{n}{\mathbf{U}_i\mathbf{\Sigma}_{ii}\mathbf{V}_i^T} }\ ,\]

where \(\mathbf{V}_i^T\) is the transpose of the i-th column of V. Further, the Eckart-Young-Mirsky theorem proves that the best rank k approximation to the matrix A is found by summing only the first k elements of the right-hand sum. Properties of the SVD state that the columns of the matrix U contain the eigenvectors of the matrix \(\mathbf{A}\mathbf{A}^T\). Further, the columns of the matrix V contain the eigenvectors of the matrix \(\mathbf{A}^T\mathbf{A}\). Finally, the non-zero elements of the diagonal matrix \(\mathbf{\Sigma}\) are the square-roots of the eigenvalues of \(\mathbf{A}^T\mathbf{A}\) or \(\mathbf{A}\mathbf{A}^T\).

Putting all these together provides an interesting insight into the behavior of the standard matrix-vector product. Specifically, consider an arbitrary vector b and the matrix-vector product Ab. If A is an mxn matrix and b is an nx1 vector, then the resulting product is a linear combination of the columns of A and hence an mx1 vector.

Another perspective on this product is obtained by considering the rank-one decomposition. The matrix-vector product may be re-written as:

\[\displaylines{\mathbf{A}\mathbf{b}= \sum\limits_{i=1}^{n}{\mathbf{U}_i\mathbf{\Sigma}_{ii}\mathbf{V}_i^T\mathbf{b}} }\ .\]

Note that \(\mathbf{V}_i^T\mathbf{b}\) is the product of a 1xn vector with an nx1 vector and so is equivalent to the standard dot-product. In addition, \(\mathbf{\Sigma}_{ii}\) is the i-th singular value and so is also a scalar. If one assumes that both the eigenvectors and the vector b are normalized, this dot product is equivalent to the cosine similarity between b and \(\mathbf{V}_i\).

This product may be seen as a measurement of the extent to which b and \(\mathbf{V}_i\) point in the same direction. Specifically, the cosine similarity ranges from 1 to -1 for vectors pointing in the same to opposite directions respectively. Next, this product is scaled proportional to the variance explained by the i-th component by multiplying by the i-th singular value. The resulting scalar is then multiplied by the i-th eigenvector of \(\mathbf{A}\mathbf{A}^T\) and finally the resulting vectors are summed together.

In this way, the matrix-vector product Ab is written as a linear combination of the eigenvectors of \(\mathbf{A}\mathbf{A}^T\). Specifically, the more that b points in the direction of \(\mathbf{V}_i\), the more the resulting sum moves in the direction of the i-th left singular vector \(\mathbf{U}_i\), scaled proportionally by the explained variance.

One can imagine beginning at the origin and then taking n steps. At each step, the dot-product of b with the i-th right-singular vector along with the i-th singular value determine how far to move in (or against) the direction of the i-th left-singular vector. After n steps one arrives at the vector Ab.