Exclude observations with measurements below limit of detection?
$begingroup$
I am analysing a dataset for the relationship between an exposure variable x
and a response y
(in my case, these are urinary concentration of a specific compound and a measure of cognitive function). x
is measured using an analytical method which has a lower detection limit - and approximately 12% of the population have concentrations below the detection limit.
In a first analysis, I compared participants with y
above and below detection limit, and found a significant difference - which is not surprising.
My question is: when I conduct a regression analysis for y ~ x
- should I exclude all those x < detection limit
, or not? It does affect the results and actually inverses the association (if I include all observations, the association is positive - if I exclude them, it is negative).
regression censoring chemometrics
$endgroup$
add a comment |
$begingroup$
I am analysing a dataset for the relationship between an exposure variable x
and a response y
(in my case, these are urinary concentration of a specific compound and a measure of cognitive function). x
is measured using an analytical method which has a lower detection limit - and approximately 12% of the population have concentrations below the detection limit.
In a first analysis, I compared participants with y
above and below detection limit, and found a significant difference - which is not surprising.
My question is: when I conduct a regression analysis for y ~ x
- should I exclude all those x < detection limit
, or not? It does affect the results and actually inverses the association (if I include all observations, the association is positive - if I exclude them, it is negative).
regression censoring chemometrics
$endgroup$
2
$begingroup$
This is a censoring problem as mentioned below. Taking censored data into account is more complicated, but it must be done; one can't ignore censored data. Good luck and have fun, this is a good problem.
$endgroup$
– Robert Dodier
2 days ago
add a comment |
$begingroup$
I am analysing a dataset for the relationship between an exposure variable x
and a response y
(in my case, these are urinary concentration of a specific compound and a measure of cognitive function). x
is measured using an analytical method which has a lower detection limit - and approximately 12% of the population have concentrations below the detection limit.
In a first analysis, I compared participants with y
above and below detection limit, and found a significant difference - which is not surprising.
My question is: when I conduct a regression analysis for y ~ x
- should I exclude all those x < detection limit
, or not? It does affect the results and actually inverses the association (if I include all observations, the association is positive - if I exclude them, it is negative).
regression censoring chemometrics
$endgroup$
I am analysing a dataset for the relationship between an exposure variable x
and a response y
(in my case, these are urinary concentration of a specific compound and a measure of cognitive function). x
is measured using an analytical method which has a lower detection limit - and approximately 12% of the population have concentrations below the detection limit.
In a first analysis, I compared participants with y
above and below detection limit, and found a significant difference - which is not surprising.
My question is: when I conduct a regression analysis for y ~ x
- should I exclude all those x < detection limit
, or not? It does affect the results and actually inverses the association (if I include all observations, the association is positive - if I exclude them, it is negative).
regression censoring chemometrics
regression censoring chemometrics
edited 2 days ago
cbeleites
23.6k148101
23.6k148101
asked 2 days ago
GuxGux
8610
8610
2
$begingroup$
This is a censoring problem as mentioned below. Taking censored data into account is more complicated, but it must be done; one can't ignore censored data. Good luck and have fun, this is a good problem.
$endgroup$
– Robert Dodier
2 days ago
add a comment |
2
$begingroup$
This is a censoring problem as mentioned below. Taking censored data into account is more complicated, but it must be done; one can't ignore censored data. Good luck and have fun, this is a good problem.
$endgroup$
– Robert Dodier
2 days ago
2
2
$begingroup$
This is a censoring problem as mentioned below. Taking censored data into account is more complicated, but it must be done; one can't ignore censored data. Good luck and have fun, this is a good problem.
$endgroup$
– Robert Dodier
2 days ago
$begingroup$
This is a censoring problem as mentioned below. Taking censored data into account is more complicated, but it must be done; one can't ignore censored data. Good luck and have fun, this is a good problem.
$endgroup$
– Robert Dodier
2 days ago
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
original answer:
Don't exclude cases solely because they are below LLOQ! (lower limit of quantitation)
- The LLOQ is not a magic hard threshold below which nothing can be said. It is rather a convention to mark the concentration where the relative error of the analyses falls below 10 %.
- Note that LLOQ is often computed assuming homescedasticity, i.e. the absolute error being independent of the concentration. That is, you don't even assume different absolute error for cases below or above LLOQ. From that point of view, LLOQ is essentially just a way to express the absoute uncertainty of the analytical method in a concentration unit. (Like fuel economy in l/100 km vs. miles/gallon)
- Even if analytical error is concentration dependent, two cases with true concentration almost the same but slightly below and above LLOQ have almost the same uncertainty.
- (Left) censoring data (which is the technical term for excluding cases below LLOQ) leads to all kinds of complications in consecutive data analyses (and you'd need to use particular statistical methods that can deal with such data).
- Say thank you to your clinical lab that they provide you with full data: I've met many people who have the opposite difficulty: getting a report that just says below LLOQ, and no possibility to recover any further information.
Bottomline: never censor your data unless you have really, really good reasons for doing so.
update about replacing below LLOQ results by $frac{LLOQ}{sqrt 2}$ and in view of the comments below:
From the comments below I think that we may be using terms in slightly different ways - possibly due to coming from different fields:
linear range: concentration range where we get linear (can be relaxed to strictly monotonous) dependency of signal as function of concentration.calibration range: concentration range covered by calibration and validation samples. Outside this [precisely: the range covered by validation], we don't really know how our method behaves.
=> I totally agree that something$^{TM}$ needs to be done if 1/8 of the samples are outside calibration range and those samples moreover are important to reach correct conclusions in the study.LLOQ (aka LQ or LOQ) is a method performance characteristic. They can be inside or outside the calibration range. The most basic definition of LLOQ I'm aware of is specifying a relative error that must not be exceeded in the quantitation range. In my field, it is typically set to 10 % (but that can vary, and in fact should vary according to the requirements of the application. So the 10 % relative error to me is like the 5 % threshold for $p$ in significance testing, a default convention)
I've encountered various replacement constants for $leq$ LLOQ concentrations, ranging from 0 over $frac{LLOQ}{2}$ to LLOQ and random values in the concentration range below LLOQ ( $frac{LLOQ}{sqrt 2}$ is new in that collection - what's the idea behind the $frac{1}{sqrt 2}$?). These are normally desperate attempts to impute values for censored data where not even (gu)estimates are available.
With the above definition of LLOQ and validation data available for the whole concentration range encountered in your samples, replacing concentrations below LLOQ would amount to throwing away most of the little information you have for those samples, and that's typically not what you want. See my answer to this related question for illustrations.
Also, this wouldn't avoid the necessity for using statistical methods that work with censored data, so there really isn't much difference to excluding.In your case, however, would it be possible to extend the validated concentration range somewhat further to the low side? Even if you don't extend the calibration range accordingly (which weuld IMHO be best).
You say that excluding or not those 12 % of your samples has a large influence on your findings. So the idea here would be to rescue the study by establishing linearity and analytical error for a sufficiently large fraction of your samples to get stable estimates at the higher level (study question).
While not being as good as having proper calibration from the beginning, the unexpected after all is something that has to be expected in research. With appropriate caution in the conclusions, this would IMHO be acceptable for early research (in contrast to later stages where better knowledge of expected concentration range would be available or work to establish this as analytical/clinical method).There are situations where your calibration function starts with a plateau for low concentrations before reaching a suitable sensitivity in the linear range (e.g. a particular absolute amount of analyte is masked/lost due to adsorption, ...). In contrast to low concentrations outside calibration/validation range but still inside linear range, you basically cannot say anything for concentrations in that plateau.
I think of this as a "wet lab/chemical censoring". In that case => use statistical methods for censored data.
The higher level question here is whether your analytical method is fit for purpose.
$endgroup$
1
$begingroup$
@JamesPhillips: plus, from my chemist's world-view, that makes sense only if we actually have distinct subpopulations, i.e. clusters as opposed to a continuum where a rather arbitrary threshold cuts of a tail - in that case a regression is more sensible. Note that if you cut between clusters of cases, you have less of a thresholding/censoring problem. Whereas cutting "through the middle" of a single population leads to complications that are somewhat related to those of censoring.
$endgroup$
– cbeleites
2 days ago
1
$begingroup$
@Gux: There are some statements in you comment that are not sensible with the terminology I'm familiar with... so I'm not entirely sure I understand you correctly. E.g. "LLOQ was pre-specified and used for validation" -> LLOQ is one of the parameters/characteristics established (measured, estimated in the statistical sense) as part of [method] validation in my vocabulary. You can specify for method development that LLOQ better than x is needed - but it still needs to be established what the actual method capability is, and that is then the LLOQ you got.
$endgroup$
– cbeleites
yesterday
1
$begingroup$
LOD is established similarly to LLOQ (it is a different concentration, corresponding to a different application question - and as a rule of thumb/0th order approximation/rough guesstimate you can expect it to be around 1/3 of LLOQ). Yes, LOD can be sample (e.g. matrix) dependent. But in that case it doesn't make any sense to assume LLOQ not to be affected as well!?
$endgroup$
– cbeleites
yesterday
1
$begingroup$
What are your batches? (I'll tackle the LLOQ/$sqrt 2$ in an update to the answer.)
$endgroup$
– cbeleites
yesterday
1
$begingroup$
One more question: did the lab tell you their calibration range?
$endgroup$
– cbeleites
yesterday
|
show 7 more comments
$begingroup$
Suppose you have an X with a measuring error with a standard deviation of 100. If X is measured to be 1000, then the expected value (assuming a well-calibrated measurement system) is around 1000. But suppose the measured amount is 110. It's possible that the actual value is 310; it's possible to be two standard deviations from the mean. But assuming that X is something that can't be negative, the actual value can't be two standard deviations below the measured value. Thus, there is likely a slight bias, and the expected value of the true value of X, given that the measured value is 110, is slightly more than 110. As your measured value gets smaller and smaller, this bias gets larger and larger.
Whether this happens, and to what degree, depends on a lot of factors, such as what the distribution of actual values is and how the measurement error acts. But the bottom line is that including measured values below the LLOQ quite possibly can harm the validity of the regression. Unfortunately, removing them does not eliminate the problem, and can make it worse. You'll need to look at how the error is being modeled, what assumptions are being made about the actual distribution, and what methods there are to compensate. That removing them inverts the relationship certainly is a red flag that care needs to be taken in dealing with them.
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "65"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f388567%2fexclude-observations-with-measurements-below-limit-of-detection%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
original answer:
Don't exclude cases solely because they are below LLOQ! (lower limit of quantitation)
- The LLOQ is not a magic hard threshold below which nothing can be said. It is rather a convention to mark the concentration where the relative error of the analyses falls below 10 %.
- Note that LLOQ is often computed assuming homescedasticity, i.e. the absolute error being independent of the concentration. That is, you don't even assume different absolute error for cases below or above LLOQ. From that point of view, LLOQ is essentially just a way to express the absoute uncertainty of the analytical method in a concentration unit. (Like fuel economy in l/100 km vs. miles/gallon)
- Even if analytical error is concentration dependent, two cases with true concentration almost the same but slightly below and above LLOQ have almost the same uncertainty.
- (Left) censoring data (which is the technical term for excluding cases below LLOQ) leads to all kinds of complications in consecutive data analyses (and you'd need to use particular statistical methods that can deal with such data).
- Say thank you to your clinical lab that they provide you with full data: I've met many people who have the opposite difficulty: getting a report that just says below LLOQ, and no possibility to recover any further information.
Bottomline: never censor your data unless you have really, really good reasons for doing so.
update about replacing below LLOQ results by $frac{LLOQ}{sqrt 2}$ and in view of the comments below:
From the comments below I think that we may be using terms in slightly different ways - possibly due to coming from different fields:
linear range: concentration range where we get linear (can be relaxed to strictly monotonous) dependency of signal as function of concentration.calibration range: concentration range covered by calibration and validation samples. Outside this [precisely: the range covered by validation], we don't really know how our method behaves.
=> I totally agree that something$^{TM}$ needs to be done if 1/8 of the samples are outside calibration range and those samples moreover are important to reach correct conclusions in the study.LLOQ (aka LQ or LOQ) is a method performance characteristic. They can be inside or outside the calibration range. The most basic definition of LLOQ I'm aware of is specifying a relative error that must not be exceeded in the quantitation range. In my field, it is typically set to 10 % (but that can vary, and in fact should vary according to the requirements of the application. So the 10 % relative error to me is like the 5 % threshold for $p$ in significance testing, a default convention)
I've encountered various replacement constants for $leq$ LLOQ concentrations, ranging from 0 over $frac{LLOQ}{2}$ to LLOQ and random values in the concentration range below LLOQ ( $frac{LLOQ}{sqrt 2}$ is new in that collection - what's the idea behind the $frac{1}{sqrt 2}$?). These are normally desperate attempts to impute values for censored data where not even (gu)estimates are available.
With the above definition of LLOQ and validation data available for the whole concentration range encountered in your samples, replacing concentrations below LLOQ would amount to throwing away most of the little information you have for those samples, and that's typically not what you want. See my answer to this related question for illustrations.
Also, this wouldn't avoid the necessity for using statistical methods that work with censored data, so there really isn't much difference to excluding.In your case, however, would it be possible to extend the validated concentration range somewhat further to the low side? Even if you don't extend the calibration range accordingly (which weuld IMHO be best).
You say that excluding or not those 12 % of your samples has a large influence on your findings. So the idea here would be to rescue the study by establishing linearity and analytical error for a sufficiently large fraction of your samples to get stable estimates at the higher level (study question).
While not being as good as having proper calibration from the beginning, the unexpected after all is something that has to be expected in research. With appropriate caution in the conclusions, this would IMHO be acceptable for early research (in contrast to later stages where better knowledge of expected concentration range would be available or work to establish this as analytical/clinical method).There are situations where your calibration function starts with a plateau for low concentrations before reaching a suitable sensitivity in the linear range (e.g. a particular absolute amount of analyte is masked/lost due to adsorption, ...). In contrast to low concentrations outside calibration/validation range but still inside linear range, you basically cannot say anything for concentrations in that plateau.
I think of this as a "wet lab/chemical censoring". In that case => use statistical methods for censored data.
The higher level question here is whether your analytical method is fit for purpose.
$endgroup$
1
$begingroup$
@JamesPhillips: plus, from my chemist's world-view, that makes sense only if we actually have distinct subpopulations, i.e. clusters as opposed to a continuum where a rather arbitrary threshold cuts of a tail - in that case a regression is more sensible. Note that if you cut between clusters of cases, you have less of a thresholding/censoring problem. Whereas cutting "through the middle" of a single population leads to complications that are somewhat related to those of censoring.
$endgroup$
– cbeleites
2 days ago
1
$begingroup$
@Gux: There are some statements in you comment that are not sensible with the terminology I'm familiar with... so I'm not entirely sure I understand you correctly. E.g. "LLOQ was pre-specified and used for validation" -> LLOQ is one of the parameters/characteristics established (measured, estimated in the statistical sense) as part of [method] validation in my vocabulary. You can specify for method development that LLOQ better than x is needed - but it still needs to be established what the actual method capability is, and that is then the LLOQ you got.
$endgroup$
– cbeleites
yesterday
1
$begingroup$
LOD is established similarly to LLOQ (it is a different concentration, corresponding to a different application question - and as a rule of thumb/0th order approximation/rough guesstimate you can expect it to be around 1/3 of LLOQ). Yes, LOD can be sample (e.g. matrix) dependent. But in that case it doesn't make any sense to assume LLOQ not to be affected as well!?
$endgroup$
– cbeleites
yesterday
1
$begingroup$
What are your batches? (I'll tackle the LLOQ/$sqrt 2$ in an update to the answer.)
$endgroup$
– cbeleites
yesterday
1
$begingroup$
One more question: did the lab tell you their calibration range?
$endgroup$
– cbeleites
yesterday
|
show 7 more comments
$begingroup$
original answer:
Don't exclude cases solely because they are below LLOQ! (lower limit of quantitation)
- The LLOQ is not a magic hard threshold below which nothing can be said. It is rather a convention to mark the concentration where the relative error of the analyses falls below 10 %.
- Note that LLOQ is often computed assuming homescedasticity, i.e. the absolute error being independent of the concentration. That is, you don't even assume different absolute error for cases below or above LLOQ. From that point of view, LLOQ is essentially just a way to express the absoute uncertainty of the analytical method in a concentration unit. (Like fuel economy in l/100 km vs. miles/gallon)
- Even if analytical error is concentration dependent, two cases with true concentration almost the same but slightly below and above LLOQ have almost the same uncertainty.
- (Left) censoring data (which is the technical term for excluding cases below LLOQ) leads to all kinds of complications in consecutive data analyses (and you'd need to use particular statistical methods that can deal with such data).
- Say thank you to your clinical lab that they provide you with full data: I've met many people who have the opposite difficulty: getting a report that just says below LLOQ, and no possibility to recover any further information.
Bottomline: never censor your data unless you have really, really good reasons for doing so.
update about replacing below LLOQ results by $frac{LLOQ}{sqrt 2}$ and in view of the comments below:
From the comments below I think that we may be using terms in slightly different ways - possibly due to coming from different fields:
linear range: concentration range where we get linear (can be relaxed to strictly monotonous) dependency of signal as function of concentration.calibration range: concentration range covered by calibration and validation samples. Outside this [precisely: the range covered by validation], we don't really know how our method behaves.
=> I totally agree that something$^{TM}$ needs to be done if 1/8 of the samples are outside calibration range and those samples moreover are important to reach correct conclusions in the study.LLOQ (aka LQ or LOQ) is a method performance characteristic. They can be inside or outside the calibration range. The most basic definition of LLOQ I'm aware of is specifying a relative error that must not be exceeded in the quantitation range. In my field, it is typically set to 10 % (but that can vary, and in fact should vary according to the requirements of the application. So the 10 % relative error to me is like the 5 % threshold for $p$ in significance testing, a default convention)
I've encountered various replacement constants for $leq$ LLOQ concentrations, ranging from 0 over $frac{LLOQ}{2}$ to LLOQ and random values in the concentration range below LLOQ ( $frac{LLOQ}{sqrt 2}$ is new in that collection - what's the idea behind the $frac{1}{sqrt 2}$?). These are normally desperate attempts to impute values for censored data where not even (gu)estimates are available.
With the above definition of LLOQ and validation data available for the whole concentration range encountered in your samples, replacing concentrations below LLOQ would amount to throwing away most of the little information you have for those samples, and that's typically not what you want. See my answer to this related question for illustrations.
Also, this wouldn't avoid the necessity for using statistical methods that work with censored data, so there really isn't much difference to excluding.In your case, however, would it be possible to extend the validated concentration range somewhat further to the low side? Even if you don't extend the calibration range accordingly (which weuld IMHO be best).
You say that excluding or not those 12 % of your samples has a large influence on your findings. So the idea here would be to rescue the study by establishing linearity and analytical error for a sufficiently large fraction of your samples to get stable estimates at the higher level (study question).
While not being as good as having proper calibration from the beginning, the unexpected after all is something that has to be expected in research. With appropriate caution in the conclusions, this would IMHO be acceptable for early research (in contrast to later stages where better knowledge of expected concentration range would be available or work to establish this as analytical/clinical method).There are situations where your calibration function starts with a plateau for low concentrations before reaching a suitable sensitivity in the linear range (e.g. a particular absolute amount of analyte is masked/lost due to adsorption, ...). In contrast to low concentrations outside calibration/validation range but still inside linear range, you basically cannot say anything for concentrations in that plateau.
I think of this as a "wet lab/chemical censoring". In that case => use statistical methods for censored data.
The higher level question here is whether your analytical method is fit for purpose.
$endgroup$
1
$begingroup$
@JamesPhillips: plus, from my chemist's world-view, that makes sense only if we actually have distinct subpopulations, i.e. clusters as opposed to a continuum where a rather arbitrary threshold cuts of a tail - in that case a regression is more sensible. Note that if you cut between clusters of cases, you have less of a thresholding/censoring problem. Whereas cutting "through the middle" of a single population leads to complications that are somewhat related to those of censoring.
$endgroup$
– cbeleites
2 days ago
1
$begingroup$
@Gux: There are some statements in you comment that are not sensible with the terminology I'm familiar with... so I'm not entirely sure I understand you correctly. E.g. "LLOQ was pre-specified and used for validation" -> LLOQ is one of the parameters/characteristics established (measured, estimated in the statistical sense) as part of [method] validation in my vocabulary. You can specify for method development that LLOQ better than x is needed - but it still needs to be established what the actual method capability is, and that is then the LLOQ you got.
$endgroup$
– cbeleites
yesterday
1
$begingroup$
LOD is established similarly to LLOQ (it is a different concentration, corresponding to a different application question - and as a rule of thumb/0th order approximation/rough guesstimate you can expect it to be around 1/3 of LLOQ). Yes, LOD can be sample (e.g. matrix) dependent. But in that case it doesn't make any sense to assume LLOQ not to be affected as well!?
$endgroup$
– cbeleites
yesterday
1
$begingroup$
What are your batches? (I'll tackle the LLOQ/$sqrt 2$ in an update to the answer.)
$endgroup$
– cbeleites
yesterday
1
$begingroup$
One more question: did the lab tell you their calibration range?
$endgroup$
– cbeleites
yesterday
|
show 7 more comments
$begingroup$
original answer:
Don't exclude cases solely because they are below LLOQ! (lower limit of quantitation)
- The LLOQ is not a magic hard threshold below which nothing can be said. It is rather a convention to mark the concentration where the relative error of the analyses falls below 10 %.
- Note that LLOQ is often computed assuming homescedasticity, i.e. the absolute error being independent of the concentration. That is, you don't even assume different absolute error for cases below or above LLOQ. From that point of view, LLOQ is essentially just a way to express the absoute uncertainty of the analytical method in a concentration unit. (Like fuel economy in l/100 km vs. miles/gallon)
- Even if analytical error is concentration dependent, two cases with true concentration almost the same but slightly below and above LLOQ have almost the same uncertainty.
- (Left) censoring data (which is the technical term for excluding cases below LLOQ) leads to all kinds of complications in consecutive data analyses (and you'd need to use particular statistical methods that can deal with such data).
- Say thank you to your clinical lab that they provide you with full data: I've met many people who have the opposite difficulty: getting a report that just says below LLOQ, and no possibility to recover any further information.
Bottomline: never censor your data unless you have really, really good reasons for doing so.
update about replacing below LLOQ results by $frac{LLOQ}{sqrt 2}$ and in view of the comments below:
From the comments below I think that we may be using terms in slightly different ways - possibly due to coming from different fields:
linear range: concentration range where we get linear (can be relaxed to strictly monotonous) dependency of signal as function of concentration.calibration range: concentration range covered by calibration and validation samples. Outside this [precisely: the range covered by validation], we don't really know how our method behaves.
=> I totally agree that something$^{TM}$ needs to be done if 1/8 of the samples are outside calibration range and those samples moreover are important to reach correct conclusions in the study.LLOQ (aka LQ or LOQ) is a method performance characteristic. They can be inside or outside the calibration range. The most basic definition of LLOQ I'm aware of is specifying a relative error that must not be exceeded in the quantitation range. In my field, it is typically set to 10 % (but that can vary, and in fact should vary according to the requirements of the application. So the 10 % relative error to me is like the 5 % threshold for $p$ in significance testing, a default convention)
I've encountered various replacement constants for $leq$ LLOQ concentrations, ranging from 0 over $frac{LLOQ}{2}$ to LLOQ and random values in the concentration range below LLOQ ( $frac{LLOQ}{sqrt 2}$ is new in that collection - what's the idea behind the $frac{1}{sqrt 2}$?). These are normally desperate attempts to impute values for censored data where not even (gu)estimates are available.
With the above definition of LLOQ and validation data available for the whole concentration range encountered in your samples, replacing concentrations below LLOQ would amount to throwing away most of the little information you have for those samples, and that's typically not what you want. See my answer to this related question for illustrations.
Also, this wouldn't avoid the necessity for using statistical methods that work with censored data, so there really isn't much difference to excluding.In your case, however, would it be possible to extend the validated concentration range somewhat further to the low side? Even if you don't extend the calibration range accordingly (which weuld IMHO be best).
You say that excluding or not those 12 % of your samples has a large influence on your findings. So the idea here would be to rescue the study by establishing linearity and analytical error for a sufficiently large fraction of your samples to get stable estimates at the higher level (study question).
While not being as good as having proper calibration from the beginning, the unexpected after all is something that has to be expected in research. With appropriate caution in the conclusions, this would IMHO be acceptable for early research (in contrast to later stages where better knowledge of expected concentration range would be available or work to establish this as analytical/clinical method).There are situations where your calibration function starts with a plateau for low concentrations before reaching a suitable sensitivity in the linear range (e.g. a particular absolute amount of analyte is masked/lost due to adsorption, ...). In contrast to low concentrations outside calibration/validation range but still inside linear range, you basically cannot say anything for concentrations in that plateau.
I think of this as a "wet lab/chemical censoring". In that case => use statistical methods for censored data.
The higher level question here is whether your analytical method is fit for purpose.
$endgroup$
original answer:
Don't exclude cases solely because they are below LLOQ! (lower limit of quantitation)
- The LLOQ is not a magic hard threshold below which nothing can be said. It is rather a convention to mark the concentration where the relative error of the analyses falls below 10 %.
- Note that LLOQ is often computed assuming homescedasticity, i.e. the absolute error being independent of the concentration. That is, you don't even assume different absolute error for cases below or above LLOQ. From that point of view, LLOQ is essentially just a way to express the absoute uncertainty of the analytical method in a concentration unit. (Like fuel economy in l/100 km vs. miles/gallon)
- Even if analytical error is concentration dependent, two cases with true concentration almost the same but slightly below and above LLOQ have almost the same uncertainty.
- (Left) censoring data (which is the technical term for excluding cases below LLOQ) leads to all kinds of complications in consecutive data analyses (and you'd need to use particular statistical methods that can deal with such data).
- Say thank you to your clinical lab that they provide you with full data: I've met many people who have the opposite difficulty: getting a report that just says below LLOQ, and no possibility to recover any further information.
Bottomline: never censor your data unless you have really, really good reasons for doing so.
update about replacing below LLOQ results by $frac{LLOQ}{sqrt 2}$ and in view of the comments below:
From the comments below I think that we may be using terms in slightly different ways - possibly due to coming from different fields:
linear range: concentration range where we get linear (can be relaxed to strictly monotonous) dependency of signal as function of concentration.calibration range: concentration range covered by calibration and validation samples. Outside this [precisely: the range covered by validation], we don't really know how our method behaves.
=> I totally agree that something$^{TM}$ needs to be done if 1/8 of the samples are outside calibration range and those samples moreover are important to reach correct conclusions in the study.LLOQ (aka LQ or LOQ) is a method performance characteristic. They can be inside or outside the calibration range. The most basic definition of LLOQ I'm aware of is specifying a relative error that must not be exceeded in the quantitation range. In my field, it is typically set to 10 % (but that can vary, and in fact should vary according to the requirements of the application. So the 10 % relative error to me is like the 5 % threshold for $p$ in significance testing, a default convention)
I've encountered various replacement constants for $leq$ LLOQ concentrations, ranging from 0 over $frac{LLOQ}{2}$ to LLOQ and random values in the concentration range below LLOQ ( $frac{LLOQ}{sqrt 2}$ is new in that collection - what's the idea behind the $frac{1}{sqrt 2}$?). These are normally desperate attempts to impute values for censored data where not even (gu)estimates are available.
With the above definition of LLOQ and validation data available for the whole concentration range encountered in your samples, replacing concentrations below LLOQ would amount to throwing away most of the little information you have for those samples, and that's typically not what you want. See my answer to this related question for illustrations.
Also, this wouldn't avoid the necessity for using statistical methods that work with censored data, so there really isn't much difference to excluding.In your case, however, would it be possible to extend the validated concentration range somewhat further to the low side? Even if you don't extend the calibration range accordingly (which weuld IMHO be best).
You say that excluding or not those 12 % of your samples has a large influence on your findings. So the idea here would be to rescue the study by establishing linearity and analytical error for a sufficiently large fraction of your samples to get stable estimates at the higher level (study question).
While not being as good as having proper calibration from the beginning, the unexpected after all is something that has to be expected in research. With appropriate caution in the conclusions, this would IMHO be acceptable for early research (in contrast to later stages where better knowledge of expected concentration range would be available or work to establish this as analytical/clinical method).There are situations where your calibration function starts with a plateau for low concentrations before reaching a suitable sensitivity in the linear range (e.g. a particular absolute amount of analyte is masked/lost due to adsorption, ...). In contrast to low concentrations outside calibration/validation range but still inside linear range, you basically cannot say anything for concentrations in that plateau.
I think of this as a "wet lab/chemical censoring". In that case => use statistical methods for censored data.
The higher level question here is whether your analytical method is fit for purpose.
edited 19 hours ago
answered 2 days ago
cbeleitescbeleites
23.6k148101
23.6k148101
1
$begingroup$
@JamesPhillips: plus, from my chemist's world-view, that makes sense only if we actually have distinct subpopulations, i.e. clusters as opposed to a continuum where a rather arbitrary threshold cuts of a tail - in that case a regression is more sensible. Note that if you cut between clusters of cases, you have less of a thresholding/censoring problem. Whereas cutting "through the middle" of a single population leads to complications that are somewhat related to those of censoring.
$endgroup$
– cbeleites
2 days ago
1
$begingroup$
@Gux: There are some statements in you comment that are not sensible with the terminology I'm familiar with... so I'm not entirely sure I understand you correctly. E.g. "LLOQ was pre-specified and used for validation" -> LLOQ is one of the parameters/characteristics established (measured, estimated in the statistical sense) as part of [method] validation in my vocabulary. You can specify for method development that LLOQ better than x is needed - but it still needs to be established what the actual method capability is, and that is then the LLOQ you got.
$endgroup$
– cbeleites
yesterday
1
$begingroup$
LOD is established similarly to LLOQ (it is a different concentration, corresponding to a different application question - and as a rule of thumb/0th order approximation/rough guesstimate you can expect it to be around 1/3 of LLOQ). Yes, LOD can be sample (e.g. matrix) dependent. But in that case it doesn't make any sense to assume LLOQ not to be affected as well!?
$endgroup$
– cbeleites
yesterday
1
$begingroup$
What are your batches? (I'll tackle the LLOQ/$sqrt 2$ in an update to the answer.)
$endgroup$
– cbeleites
yesterday
1
$begingroup$
One more question: did the lab tell you their calibration range?
$endgroup$
– cbeleites
yesterday
|
show 7 more comments
1
$begingroup$
@JamesPhillips: plus, from my chemist's world-view, that makes sense only if we actually have distinct subpopulations, i.e. clusters as opposed to a continuum where a rather arbitrary threshold cuts of a tail - in that case a regression is more sensible. Note that if you cut between clusters of cases, you have less of a thresholding/censoring problem. Whereas cutting "through the middle" of a single population leads to complications that are somewhat related to those of censoring.
$endgroup$
– cbeleites
2 days ago
1
$begingroup$
@Gux: There are some statements in you comment that are not sensible with the terminology I'm familiar with... so I'm not entirely sure I understand you correctly. E.g. "LLOQ was pre-specified and used for validation" -> LLOQ is one of the parameters/characteristics established (measured, estimated in the statistical sense) as part of [method] validation in my vocabulary. You can specify for method development that LLOQ better than x is needed - but it still needs to be established what the actual method capability is, and that is then the LLOQ you got.
$endgroup$
– cbeleites
yesterday
1
$begingroup$
LOD is established similarly to LLOQ (it is a different concentration, corresponding to a different application question - and as a rule of thumb/0th order approximation/rough guesstimate you can expect it to be around 1/3 of LLOQ). Yes, LOD can be sample (e.g. matrix) dependent. But in that case it doesn't make any sense to assume LLOQ not to be affected as well!?
$endgroup$
– cbeleites
yesterday
1
$begingroup$
What are your batches? (I'll tackle the LLOQ/$sqrt 2$ in an update to the answer.)
$endgroup$
– cbeleites
yesterday
1
$begingroup$
One more question: did the lab tell you their calibration range?
$endgroup$
– cbeleites
yesterday
1
1
$begingroup$
@JamesPhillips: plus, from my chemist's world-view, that makes sense only if we actually have distinct subpopulations, i.e. clusters as opposed to a continuum where a rather arbitrary threshold cuts of a tail - in that case a regression is more sensible. Note that if you cut between clusters of cases, you have less of a thresholding/censoring problem. Whereas cutting "through the middle" of a single population leads to complications that are somewhat related to those of censoring.
$endgroup$
– cbeleites
2 days ago
$begingroup$
@JamesPhillips: plus, from my chemist's world-view, that makes sense only if we actually have distinct subpopulations, i.e. clusters as opposed to a continuum where a rather arbitrary threshold cuts of a tail - in that case a regression is more sensible. Note that if you cut between clusters of cases, you have less of a thresholding/censoring problem. Whereas cutting "through the middle" of a single population leads to complications that are somewhat related to those of censoring.
$endgroup$
– cbeleites
2 days ago
1
1
$begingroup$
@Gux: There are some statements in you comment that are not sensible with the terminology I'm familiar with... so I'm not entirely sure I understand you correctly. E.g. "LLOQ was pre-specified and used for validation" -> LLOQ is one of the parameters/characteristics established (measured, estimated in the statistical sense) as part of [method] validation in my vocabulary. You can specify for method development that LLOQ better than x is needed - but it still needs to be established what the actual method capability is, and that is then the LLOQ you got.
$endgroup$
– cbeleites
yesterday
$begingroup$
@Gux: There are some statements in you comment that are not sensible with the terminology I'm familiar with... so I'm not entirely sure I understand you correctly. E.g. "LLOQ was pre-specified and used for validation" -> LLOQ is one of the parameters/characteristics established (measured, estimated in the statistical sense) as part of [method] validation in my vocabulary. You can specify for method development that LLOQ better than x is needed - but it still needs to be established what the actual method capability is, and that is then the LLOQ you got.
$endgroup$
– cbeleites
yesterday
1
1
$begingroup$
LOD is established similarly to LLOQ (it is a different concentration, corresponding to a different application question - and as a rule of thumb/0th order approximation/rough guesstimate you can expect it to be around 1/3 of LLOQ). Yes, LOD can be sample (e.g. matrix) dependent. But in that case it doesn't make any sense to assume LLOQ not to be affected as well!?
$endgroup$
– cbeleites
yesterday
$begingroup$
LOD is established similarly to LLOQ (it is a different concentration, corresponding to a different application question - and as a rule of thumb/0th order approximation/rough guesstimate you can expect it to be around 1/3 of LLOQ). Yes, LOD can be sample (e.g. matrix) dependent. But in that case it doesn't make any sense to assume LLOQ not to be affected as well!?
$endgroup$
– cbeleites
yesterday
1
1
$begingroup$
What are your batches? (I'll tackle the LLOQ/$sqrt 2$ in an update to the answer.)
$endgroup$
– cbeleites
yesterday
$begingroup$
What are your batches? (I'll tackle the LLOQ/$sqrt 2$ in an update to the answer.)
$endgroup$
– cbeleites
yesterday
1
1
$begingroup$
One more question: did the lab tell you their calibration range?
$endgroup$
– cbeleites
yesterday
$begingroup$
One more question: did the lab tell you their calibration range?
$endgroup$
– cbeleites
yesterday
|
show 7 more comments
$begingroup$
Suppose you have an X with a measuring error with a standard deviation of 100. If X is measured to be 1000, then the expected value (assuming a well-calibrated measurement system) is around 1000. But suppose the measured amount is 110. It's possible that the actual value is 310; it's possible to be two standard deviations from the mean. But assuming that X is something that can't be negative, the actual value can't be two standard deviations below the measured value. Thus, there is likely a slight bias, and the expected value of the true value of X, given that the measured value is 110, is slightly more than 110. As your measured value gets smaller and smaller, this bias gets larger and larger.
Whether this happens, and to what degree, depends on a lot of factors, such as what the distribution of actual values is and how the measurement error acts. But the bottom line is that including measured values below the LLOQ quite possibly can harm the validity of the regression. Unfortunately, removing them does not eliminate the problem, and can make it worse. You'll need to look at how the error is being modeled, what assumptions are being made about the actual distribution, and what methods there are to compensate. That removing them inverts the relationship certainly is a red flag that care needs to be taken in dealing with them.
$endgroup$
add a comment |
$begingroup$
Suppose you have an X with a measuring error with a standard deviation of 100. If X is measured to be 1000, then the expected value (assuming a well-calibrated measurement system) is around 1000. But suppose the measured amount is 110. It's possible that the actual value is 310; it's possible to be two standard deviations from the mean. But assuming that X is something that can't be negative, the actual value can't be two standard deviations below the measured value. Thus, there is likely a slight bias, and the expected value of the true value of X, given that the measured value is 110, is slightly more than 110. As your measured value gets smaller and smaller, this bias gets larger and larger.
Whether this happens, and to what degree, depends on a lot of factors, such as what the distribution of actual values is and how the measurement error acts. But the bottom line is that including measured values below the LLOQ quite possibly can harm the validity of the regression. Unfortunately, removing them does not eliminate the problem, and can make it worse. You'll need to look at how the error is being modeled, what assumptions are being made about the actual distribution, and what methods there are to compensate. That removing them inverts the relationship certainly is a red flag that care needs to be taken in dealing with them.
$endgroup$
add a comment |
$begingroup$
Suppose you have an X with a measuring error with a standard deviation of 100. If X is measured to be 1000, then the expected value (assuming a well-calibrated measurement system) is around 1000. But suppose the measured amount is 110. It's possible that the actual value is 310; it's possible to be two standard deviations from the mean. But assuming that X is something that can't be negative, the actual value can't be two standard deviations below the measured value. Thus, there is likely a slight bias, and the expected value of the true value of X, given that the measured value is 110, is slightly more than 110. As your measured value gets smaller and smaller, this bias gets larger and larger.
Whether this happens, and to what degree, depends on a lot of factors, such as what the distribution of actual values is and how the measurement error acts. But the bottom line is that including measured values below the LLOQ quite possibly can harm the validity of the regression. Unfortunately, removing them does not eliminate the problem, and can make it worse. You'll need to look at how the error is being modeled, what assumptions are being made about the actual distribution, and what methods there are to compensate. That removing them inverts the relationship certainly is a red flag that care needs to be taken in dealing with them.
$endgroup$
Suppose you have an X with a measuring error with a standard deviation of 100. If X is measured to be 1000, then the expected value (assuming a well-calibrated measurement system) is around 1000. But suppose the measured amount is 110. It's possible that the actual value is 310; it's possible to be two standard deviations from the mean. But assuming that X is something that can't be negative, the actual value can't be two standard deviations below the measured value. Thus, there is likely a slight bias, and the expected value of the true value of X, given that the measured value is 110, is slightly more than 110. As your measured value gets smaller and smaller, this bias gets larger and larger.
Whether this happens, and to what degree, depends on a lot of factors, such as what the distribution of actual values is and how the measurement error acts. But the bottom line is that including measured values below the LLOQ quite possibly can harm the validity of the regression. Unfortunately, removing them does not eliminate the problem, and can make it worse. You'll need to look at how the error is being modeled, what assumptions are being made about the actual distribution, and what methods there are to compensate. That removing them inverts the relationship certainly is a red flag that care needs to be taken in dealing with them.
answered 2 days ago
AcccumulationAcccumulation
1,56626
1,56626
add a comment |
add a comment |
Thanks for contributing an answer to Cross Validated!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f388567%2fexclude-observations-with-measurements-below-limit-of-detection%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
$begingroup$
This is a censoring problem as mentioned below. Taking censored data into account is more complicated, but it must be done; one can't ignore censored data. Good luck and have fun, this is a good problem.
$endgroup$
– Robert Dodier
2 days ago