Decorrelation Redux

Sat, 26 Dec 2020

Data Science, Mathematics, Statistics

Consider the typical statement of the least squares problem.

\[\displaylines{\mathbf{A}\mathbf{x}=\mathbf{y}}\ ,\]

where \(\mathbf{A}\) is the m x n data matrix, \(\mathbf{x}\) is the n x 1 vector of regression coefficients, and \(\mathbf{y}\) is the m x 1 vector of target values. As discussed in a previous post, estimation and interpretation of this problem is made difficult if the matrix \(\mathbf{A}\) suffers from multi-colinearity. Unfortunately, this is frequently found to be the case in practical problems.

Ideally, the columns of the matrix \(\mathbf{A}\) are uncorrelated. For this to be the case, it is sufficient that

\[\displaylines{\mathbf{A}^{T}\mathbf{A}=\mathbf{I}_n}\ ,\]

where \(\mathbf{I}_n\) is the n x n identity matrix. Consider an n x n transformation matrix that is capable of decorrelating the columns of \(\mathbf{A}\). That is, consider a matrix \(\mathbf{W}\) such that

\[\displaylines{(\mathbf{A}\mathbf{W})^{T}(\mathbf{A}\mathbf{W})=\mathbf{I}_n}\ .\]

Using properties of the matrix transpose, this is equivalent to

\[\displaylines{\mathbf{W}^T\mathbf{A}^{T}\mathbf{A}\mathbf{W} =\mathbf{W}^T\mathbf{X}\mathbf{W}=\mathbf{I}_n}\ .\]

Now, consider the n x n matrix \(\mathbf{X}=\mathbf{A}^{T}\mathbf{A}\). It can be shown that this matrix is at least positive semi-definite.

\[\displaylines{\mathbf{a}^T\mathbf{X}\mathbf{a}}\]

\[\displaylines{=\mathbf{a}^T\mathbf{A}^T\mathbf{A}\mathbf{a}}\]

\[\displaylines{=(\mathbf{A}\mathbf{a})^T\mathbf{A}\mathbf{a}}\]

\[\displaylines{=\mathbf{b}^T\mathbf{b} >=0 }\ ,\]

where the substitution \(\mathbf{b}=\mathbf{A}\mathbf{a}\) has been made. Thus, \(\mathbf{X}\) may be re-written as the product of an n x n triangular matrix with its transpose:

\[\displaylines{\mathbf{X}=\mathbf{L}\mathbf{L}^{T}}\ .\]

This decomposition is known as the Cholesky decomposition (*) and it is guaranteed to exist for all real-valued symmetric positive semidefinite matrices.

Since the matrix \(\mathbf{X}\) is symmetric, the above equation may be further rewritten as follows:

\[\displaylines{\mathbf{I}_n=\mathbf{W}^T\mathbf{X}\mathbf{W}}\]

\[\displaylines{=\mathbf{W}^T\mathbf{X}^{T}\mathbf{W}}\]

\[\displaylines{=\mathbf{W}^T(\mathbf{L}\mathbf{L}^T)^{T}\mathbf{W}}\]

\[\displaylines{=\mathbf{W}^T\mathbf{L}^{T}\mathbf{L}\mathbf{W}}\]

\[\displaylines{=(\mathbf{L}\mathbf{W})^{T}\mathbf{L}\mathbf{W}}\ .\]

Thus, the matrix \((\mathbf{L}\mathbf{W})\) is also an orthogonal matrix. However, in the above equation, all matrices are now of dimension n x n, and further, the matrix \(\mathbf{L}\) is a lower triangular matrix. From this, a convenient solution for performing decorrelation is apparent. It is sufficient to let \(\mathbf{W}=\mathbf{L}^{-1}\) as

\[\displaylines{(\mathbf{A}\mathbf{W})^{T}(\mathbf{A}\mathbf{W})}\]

\[\displaylines{\mathbf{W}^T\mathbf{A}^{T}\mathbf{A}\mathbf{W}}\]

\[\displaylines{\mathbf{W}^T\mathbf{X}\mathbf{W}}\]

\[\displaylines{\mathbf{W}^T\mathbf{X}^T\mathbf{W}}\]

\[\displaylines{\mathbf{W}^T(\mathbf{L}\mathbf{L}^{T})^T\mathbf{W}}\]

\[\displaylines{\mathbf{W}^T\mathbf{L}^T\mathbf{L}\mathbf{W}}\]

\[\displaylines{(\mathbf{L}\mathbf{W})^T(\mathbf{L}\mathbf{W})}\]

\[\displaylines{(\mathbf{L}^{-1}\mathbf{L})^T(\mathbf{L}\mathbf{L}^{-1})}\]

\[\displaylines{\mathbf{I}_n^T\mathbf{I}_n=\mathbf{I}_n}\]

produces the desired result. Thus, the inverse of the Cholesky decomposition of the correlation matrix decorrelates the input matrix. Further, this approach is also convenient as \(\mathbf{L}\) is triangular.

It is interesting to consider the decorrelated matrix \(\mathbf{B}=\mathbf{A}\mathbf{W}\). Consider the least squares problem

\[\displaylines{\mathbf{B}\mathbf{\tilde{x}}=\mathbf{y}}\ .\]

The scalar loss of the least squares formulation can be written as:

\[\displaylines{L=(\mathbf{y}-\mathbf{B}\mathbf{\tilde{x}})^{T}(\mathbf{y}-\mathbf{B}\mathbf{\tilde{x}})}\ .\]

Setting the derivative of this expression to zero produces:

\[\displaylines{\frac{\partial L}{\partial \mathbf{\tilde{x}}}=-2\mathbf{B}^Ty+2\mathbf{B}^T\mathbf{B}\mathbf{\tilde{x}}=0}\]

Now, from above, \(\mathbf{B}^T\mathbf{B}=\mathbf{I}_n\) and so this equation simplifies as follows:

\[\displaylines{\frac{\partial L}{\partial \mathbf{\tilde{x}}}=-2\mathbf{B}^Ty+2\mathbf{B}^T\mathbf{B}\mathbf{\tilde{x}}=0}\]

\[\displaylines{\mathbf{B}^T\mathbf{B}\mathbf{\tilde{x}}=\mathbf{B}^T\mathbf{y}}\]

\[\displaylines{\mathbf{I}_n\mathbf{\tilde{x}}=\mathbf{B}^T\mathbf{y}}\]

\[\displaylines{\mathbf{\tilde{x}}=\mathbf{B}^T\mathbf{y}}\]

Thus, the least squares coefficients are simply the dot product of the transformed matrix with the target values. Substituting this into the original equation gives the linear least square approximation:

\[\displaylines{\mathbf{B}\mathbf{B}^T\mathbf{y}=\mathbf{\hat{y}}}\ .\]

From this, it is apparent that the quality of the approximation improves the closer that the scatter matrix \(\mathbf{B}\mathbf{B}^T\) is to the identity matrix. This occurs when the rows of \(\mathbf{B}\) are uncorrelated with each other.

The least squares coefficients in the transformed space can be used in the original space by first transforming the input:

\[\displaylines{\mathbf{B}\mathbf{\tilde{x}}=\mathbf{\hat{y}}}\]

\[\displaylines{\mathbf{A}\mathbf{W}\mathbf{\tilde{x}}=\mathbf{\hat{y}}}\]

\[\displaylines{\mathbf{A}\mathbf{x}=\mathbf{\hat{y}}}\ .\]

From the above, it is apparent that the coefficients in the original space may be recovered using the following equation:

\[\displaylines{\mathbf{x}=\mathbf{W}\mathbf{\tilde{x}}}\ .\]

In summary, among other things, this approach may be used to orthogonalize the columns of a matrix and for least-squares approximation. In total, it requires computing one Cholesky decomposition, one inverse of a triangular matrix, and several matrix multiplications.

Note: (*) The Cholesky decomposition is named after André-Louis Cholesky, a French geodesist and artillery officer who perished in the first World War. The now well-known decomposition was published by one of Cholesky's fellow officers after his death [1].