How bad is Cholesky decomposition for OLS?

I've heard that I should never use Cholesky decomposition for OLS as it's super unstable, so I thought I'd try a very simple example to see how bad it could be.

First let's set up the problem in R:

X <- rbind(c(1, 1), c(1.00001, 1))

kappa(t(X) %*% X)

The condition number is 160002098109! Oooh boy, coefficients will be off by at least a billion for sure!

y <- rnorm(2)

L <- chol(t(X) %*% X)

z <- solve(t(L), t(X) %*% y)



result <- data.frame(chol = solve(L, z), qr = coefficients(lm.fit(X, y, method = 'qr')))

result



        chol        qr

x1 -148230.9 -148231.0

x2  148231.9  148232.1

...almost no difference. What happened? If the condition number is so big that no method can handle it well, what is the range of condition numbers at which the Cholesky solution is in fact appreciably less stable than with QR decomposition?

asked yesterday

badmax

41017

add a comment |

I've heard that I should never use Cholesky decomposition for OLS as it's super unstable, so I thought I'd try a very simple example to see how bad it could be.

First let's set up the problem in R:

X <- rbind(c(1, 1), c(1.00001, 1))

kappa(t(X) %*% X)

The condition number is 160002098109! Oooh boy, coefficients will be off by at least a billion for sure!

y <- rnorm(2)

L <- chol(t(X) %*% X)

z <- solve(t(L), t(X) %*% y)



result <- data.frame(chol = solve(L, z), qr = coefficients(lm.fit(X, y, method = 'qr')))

result



        chol        qr

x1 -148230.9 -148231.0

x2  148231.9  148232.1

asked yesterday

badmax

41017

add a comment |

I've heard that I should never use Cholesky decomposition for OLS as it's super unstable, so I thought I'd try a very simple example to see how bad it could be.

First let's set up the problem in R:

X <- rbind(c(1, 1), c(1.00001, 1))

kappa(t(X) %*% X)

The condition number is 160002098109! Oooh boy, coefficients will be off by at least a billion for sure!

y <- rnorm(2)

L <- chol(t(X) %*% X)

z <- solve(t(L), t(X) %*% y)



result <- data.frame(chol = solve(L, z), qr = coefficients(lm.fit(X, y, method = 'qr')))

result



        chol        qr

x1 -148230.9 -148231.0

x2  148231.9  148232.1

asked yesterday

badmax

41017

I've heard that I should never use Cholesky decomposition for OLS as it's super unstable, so I thought I'd try a very simple example to see how bad it could be.

First let's set up the problem in R:

X <- rbind(c(1, 1), c(1.00001, 1))

kappa(t(X) %*% X)

The condition number is 160002098109! Oooh boy, coefficients will be off by at least a billion for sure!

y <- rnorm(2)

L <- chol(t(X) %*% X)

z <- solve(t(L), t(X) %*% y)



result <- data.frame(chol = solve(L, z), qr = coefficients(lm.fit(X, y, method = 'qr')))

result



        chol        qr

x1 -148230.9 -148231.0

x2  148231.9  148232.1

regression machine-learning least-squares

asked yesterday

badmax

41017

asked yesterday

badmax

41017

asked yesterday

badmax

41017

asked yesterday

badmax

41017

asked yesterday

badmax

41017

add a comment |

1 Answer
1

active

oldest

votes

What you are seeing is exactly what one would expect. The condition number of your matrix $X$ is about $10^6$. Double precision floating point calculations give about 18 figures of accuracy so, in double precision, you would expect to get at most $18-6=12$ significant figures for the estimated coefficients from QR and at most $18-6-6=6$ significant figures from Choleski, after allowing for accuracy lost due to conditioning.
(Choleski loses twice the number of significant figures because forming $X^TX$ squares the condition number.)
Hence the coefficients from the two algorithms should start to differ in the 6th significant figure, which is exactly what you see. Your results show that for x2 the relative difference between the two algorithms is about
$$ 0.2 / 148232 = 1.35 times 10^{-6},$$
which matches the condition number you started out with surprisingly well.

The condition number of $X$ is a reasonable guide to precision when the residuals are small. If we make sure that the exact coefficients are 1 and 1.00001 by

> y <- X %*% c(1,1.00001)

and rerun your calculation we get

> result

                 chol                qr

x1 0.9999833544961814 0.999999999968327

x2 1.0000266455870466 1.000010000031673

confirming that QR is in fact correct to 12 significant figures and Choleski is correct to 6 significant figures.

The fitted values from Choleski are generally more precise than the estimated coefficients. If the fitted values are more important to you than the coefficients, and that will often be so in a highly collinear cases, then Choleski is usually fine.
Choleski is also about as good as QR when the residuals are very large, but this is the hardest case for both algorithms.
My PhD supervisor was Australia's foremost numerical analysis (Mike Osborne) and he used to say that Choleski was not as bad as it's made out to be.

The full sensitivity analysis for Choleski vs QR is given in Section 5.3.8 of Golub and Van Loan (1996), and is very much more complex than the simple calculation I used above. The sensitivity of Choleski is roughly proportion to $kappa + rho kappa^2$ where $kappa$ is the condition number of $X$ and $rho$ is a theoretical quantity that is almost impossible to compute in practice.
$rho$ depends on the size of the residuals-- it is roughly proportional to but much smaller than the average squared residual.
The sensitivity of QR is somewhat better than Choleski although not always as good as I have suggested above. Golub and Van Loan remark:

At the very minimum, this discussion should convince you how difficult it can be to choose the "right" algorithm!

Reference

Golub, GH, and Van Loan CF (1996). Matrix Computations. Johns Hopkins University Press, Baltimore.

edited yesterday

answered yesterday

Gordon Smyth

4,7381125

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "65"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f389353%2fhow-bad-is-cholesky-decomposition-for-ols%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

The condition number of $X$ is a reasonable guide to precision when the residuals are small. If we make sure that the exact coefficients are 1 and 1.00001 by

> y <- X %*% c(1,1.00001)

and rerun your calculation we get

> result

                 chol                qr

x1 0.9999833544961814 0.999999999968327

x2 1.0000266455870466 1.000010000031673

confirming that QR is in fact correct to 12 significant figures and Choleski is correct to 6 significant figures.

At the very minimum, this discussion should convince you how difficult it can be to choose the "right" algorithm!

Reference

Golub, GH, and Van Loan CF (1996). Matrix Computations. Johns Hopkins University Press, Baltimore.

edited yesterday

answered yesterday

Gordon Smyth

4,7381125

add a comment |

The condition number of $X$ is a reasonable guide to precision when the residuals are small. If we make sure that the exact coefficients are 1 and 1.00001 by

> y <- X %*% c(1,1.00001)

and rerun your calculation we get

> result

                 chol                qr

x1 0.9999833544961814 0.999999999968327

x2 1.0000266455870466 1.000010000031673

confirming that QR is in fact correct to 12 significant figures and Choleski is correct to 6 significant figures.

At the very minimum, this discussion should convince you how difficult it can be to choose the "right" algorithm!

Reference

Golub, GH, and Van Loan CF (1996). Matrix Computations. Johns Hopkins University Press, Baltimore.

edited yesterday

answered yesterday

Gordon Smyth

4,7381125

add a comment |

The condition number of $X$ is a reasonable guide to precision when the residuals are small. If we make sure that the exact coefficients are 1 and 1.00001 by

> y <- X %*% c(1,1.00001)

and rerun your calculation we get

> result

                 chol                qr

x1 0.9999833544961814 0.999999999968327

x2 1.0000266455870466 1.000010000031673

confirming that QR is in fact correct to 12 significant figures and Choleski is correct to 6 significant figures.

At the very minimum, this discussion should convince you how difficult it can be to choose the "right" algorithm!

Reference

Golub, GH, and Van Loan CF (1996). Matrix Computations. Johns Hopkins University Press, Baltimore.

edited yesterday

answered yesterday

Gordon Smyth

4,7381125

The condition number of $X$ is a reasonable guide to precision when the residuals are small. If we make sure that the exact coefficients are 1 and 1.00001 by

> y <- X %*% c(1,1.00001)

and rerun your calculation we get

> result

                 chol                qr

x1 0.9999833544961814 0.999999999968327

x2 1.0000266455870466 1.000010000031673

confirming that QR is in fact correct to 12 significant figures and Choleski is correct to 6 significant figures.

At the very minimum, this discussion should convince you how difficult it can be to choose the "right" algorithm!

Reference

Golub, GH, and Van Loan CF (1996). Matrix Computations. Johns Hopkins University Press, Baltimore.

edited yesterday

answered yesterday

Gordon Smyth

4,7381125

edited yesterday

answered yesterday

Gordon Smyth

4,7381125

answered yesterday

Gordon Smyth

4,7381125

answered yesterday

Gordon Smyth

4,7381125

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Cross Validated!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Argthtjtr