\[\displaylines{\mathbf{\Sigma} = ({\mathbf{X}-\mathbf{U}})^T\mathbf{W}({\mathbf{X}-\mathbf{U}})}\ .\]
Again, this definition is problematic for sparse matrices as centering by subtracting the column means \(\mathbf{U}\) destroys the sparsity of the matrix. For this reason, the SVD of the uncentered matrix \(\mathbf{X}\) is often employed as an alternative (e.g. TruncatedSVD). Unfortunately, the resulting components may be dominated by the column means and produce misleading results.\[\displaylines{\mathbf{\Sigma} = ({\mathbf{X}-\mathbf{U}})^T\mathbf{W}({\mathbf{X}-\mathbf{U}})}\]
\[\displaylines{= {(\mathbf{W}^{1/2}(\mathbf{X}-\mathbf{U}))}^T{(\mathbf{W}^{1/2}(\mathbf{X}-\mathbf{U}))}}\]
\[\displaylines{= {(\mathbf{W}^{1/2}\mathbf{X}-\mathbf{W}^{1/2}\mathbf{U})}^T{(\mathbf{W}^{1/2}\mathbf{X}-\mathbf{W}^{1/2}\mathbf{U})}}\]
\[\displaylines{= {(\mathbf{W}^{1/2}\mathbf{X})^T(\mathbf{W}^{1/2}\mathbf{X})-(\mathbf{W}^{1/2}\mathbf{X})^T(\mathbf{W}^{1/2}\mathbf{U})-(\mathbf{W}^{1/2}\mathbf{U})^T(\mathbf{W}^{1/2}\mathbf{X})+(\mathbf{W}^{1/2}\mathbf{U})^T(\mathbf{W}^{1/2}\mathbf{U})}}\]
\[\displaylines{= {\mathbf{X}^T\mathbf{W}\mathbf{X}-\mathbf{X}^T\mathbf{W}\mathbf{U}-\mathbf{U}^T\mathbf{W}\mathbf{X}+\mathbf{U}^T\mathbf{W}\mathbf{U}}}\ .\]
Now, each of the four elements in the equation are nxn matrices that can be computed without the need for the memory intensive centering operation. However, this formula can be simplified further by noting,\[\displaylines{\mathbf{X}^T\mathbf{W}\mathbf{U} = \mathbf{X}^T\mathbf{W}^T\mathbf{U} = (\mathbf{W}\mathbf{X})^T\mathbf{U}=(\mathbf{U}^T(\mathbf{W}\mathbf{X}))^T }\ .\]
Further, since the sample weights are non-negative, \(\mathbf{Tr}[{\mathbf{W}}]=1\), and \(\mathbf{U}\) contains the weighted means of \(\mathbf{X}\) repeated in its columns, the expression \((\mathbf{W}\mathbf{X})^T\mathbf{U}\) is equivalent to the outer product of the weighted column means: \(\mathbf{\mu}\mathbf{\mu}^T\). Since this outer product is symmetric, the formula reduces to:\[\displaylines{= ({\mathbf{X}^T\mathbf{W}\mathbf{X}-2\mathbf{\mu}\mathbf{\mu}^T+\mathbf{U}^T\mathbf{W}\mathbf{U})}}\ .\]
Finally, the last term contains the two column mean matrices. Each element \(\mathbf{U}_{ij}\) contain the sum of the product of the i-th column mean and the j-th column mean repeated m times. However, as \(\mathbf{Tr}[{\mathbf{W}}]=1\), the weight matrix cancels this factor of m out and the expression is found to be equivalent to the above outer product: \(\mathbf{\mu}\mathbf{\mu}^T\).\[\displaylines{\mathbf{\Sigma} = {\mathbf{X}^T\mathbf{W}\mathbf{X}-2\mathbf{\mu}\mathbf{\mu}^T+\mathbf{\mu}\mathbf{\mu}^T}}\]
\[\displaylines{= {\mathbf{X}^T\mathbf{W}\mathbf{X}-\mathbf{\mu}\mathbf{\mu}^T}}\ .\]
With \(\mathbf{\Sigma}\) estimated, the eigenvalue decomposition can be computed using a routine for symmetric matrices. That is, find an orthogonal matrix \(\mathbf{Q}\) and a diagonal matrix \(\mathbf{\Lambda}\) such that\[\displaylines{\mathbf{\Sigma} = {\mathbf{Q}\mathbf{\Lambda}\mathbf{Q}^T}}\ .\]
The principal components are then recovered from the columns of \(\mathbf{Q}\).\[\displaylines{(\mathbf{A}\mathbf{A}^T)\mathbf{v} = \lambda\mathbf{v} }\]
\[\displaylines{\mathbf{A}^T(\mathbf{A}\mathbf{A}^T)\mathbf{v} = \mathbf{A}^T\lambda\mathbf{v} }\]
\[\displaylines{(\mathbf{A}^T\mathbf{A})(\mathbf{A}^T\mathbf{v}) = \lambda(\mathbf{A}^T\mathbf{v}) }\]
\[\displaylines{(\mathbf{A}^T\mathbf{A})\mathbf{w} = \lambda\mathbf{w} }\ ,\]
which implies that \(\mathbf{w}=\mathbf{A}^T\mathbf{v}\) is an eigenvector of \(\mathbf{A}^T\mathbf{A}\). Thus, the approach is to compute the smaller matrix \(\mathbf{A}\mathbf{A}^T\) and then use the above formula to find the principal components.\[\displaylines{{(\mathbf{W}^{1/2}(\mathbf{X}-\mathbf{U}))}{(\mathbf{W}^{1/2}(\mathbf{X}-\mathbf{U}))}^T }\]
\[\displaylines{= \mathbf{V}[(\mathbf{X}-\mathbf{U})(\mathbf{X}-\mathbf{U})^T]\mathbf{V}^T }\]
\[\displaylines{=\mathbf{V}[\mathbf{X}\mathbf{X}^T - \mathbf{X}\mathbf{U}^T - \mathbf{U}\mathbf{X}^T - \mathbf{U}\mathbf{U}^T]\mathbf{V} }\ .\]
In this case, the cross terms do not simplify as much due to the placement of the \(\mathbf{W}^{1/2}\) matrices. However, the inner terms are simply transposes of each-other and so the matrix product can be computed once and then transposed. The final term is a constant matrix where each element is equal to \(\mathbf{\mu}^T\mathbf{\mu}\). Ultimately, broadcasting allows the final 3 terms to be easily computed.\[\displaylines{(\mathbf{V}(\mathbf{X}-\mathbf{U}))^T\mathbf{Q} = (\mathbf{V}\mathbf{X})^T\mathbf{Q} - (\mathbf{V}\mathbf{U})^T\mathbf{Q}}\ .\]
Both terms above are more efficiently computed using broadcasting. Finally, these components may be re-normalized to form orthonormal principal components.