Exclude observations with measurements below limit of detection?












4












$begingroup$


I am analysing a dataset for the relationship between an exposure variable x and a response y (in my case, these are urinary concentration of a specific compound and a measure of cognitive function). x is measured using an analytical method which has a lower detection limit - and approximately 12% of the population have concentrations below the detection limit.



In a first analysis, I compared participants with y above and below detection limit, and found a significant difference - which is not surprising.



My question is: when I conduct a regression analysis for y ~ x - should I exclude all those x < detection limit, or not? It does affect the results and actually inverses the association (if I include all observations, the association is positive - if I exclude them, it is negative).










share|cite|improve this question











$endgroup$








  • 2




    $begingroup$
    This is a censoring problem as mentioned below. Taking censored data into account is more complicated, but it must be done; one can't ignore censored data. Good luck and have fun, this is a good problem.
    $endgroup$
    – Robert Dodier
    2 days ago
















4












$begingroup$


I am analysing a dataset for the relationship between an exposure variable x and a response y (in my case, these are urinary concentration of a specific compound and a measure of cognitive function). x is measured using an analytical method which has a lower detection limit - and approximately 12% of the population have concentrations below the detection limit.



In a first analysis, I compared participants with y above and below detection limit, and found a significant difference - which is not surprising.



My question is: when I conduct a regression analysis for y ~ x - should I exclude all those x < detection limit, or not? It does affect the results and actually inverses the association (if I include all observations, the association is positive - if I exclude them, it is negative).










share|cite|improve this question











$endgroup$








  • 2




    $begingroup$
    This is a censoring problem as mentioned below. Taking censored data into account is more complicated, but it must be done; one can't ignore censored data. Good luck and have fun, this is a good problem.
    $endgroup$
    – Robert Dodier
    2 days ago














4












4








4





$begingroup$


I am analysing a dataset for the relationship between an exposure variable x and a response y (in my case, these are urinary concentration of a specific compound and a measure of cognitive function). x is measured using an analytical method which has a lower detection limit - and approximately 12% of the population have concentrations below the detection limit.



In a first analysis, I compared participants with y above and below detection limit, and found a significant difference - which is not surprising.



My question is: when I conduct a regression analysis for y ~ x - should I exclude all those x < detection limit, or not? It does affect the results and actually inverses the association (if I include all observations, the association is positive - if I exclude them, it is negative).










share|cite|improve this question











$endgroup$




I am analysing a dataset for the relationship between an exposure variable x and a response y (in my case, these are urinary concentration of a specific compound and a measure of cognitive function). x is measured using an analytical method which has a lower detection limit - and approximately 12% of the population have concentrations below the detection limit.



In a first analysis, I compared participants with y above and below detection limit, and found a significant difference - which is not surprising.



My question is: when I conduct a regression analysis for y ~ x - should I exclude all those x < detection limit, or not? It does affect the results and actually inverses the association (if I include all observations, the association is positive - if I exclude them, it is negative).







regression censoring chemometrics






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited 2 days ago









cbeleites

23.6k148101




23.6k148101










asked 2 days ago









GuxGux

8610




8610








  • 2




    $begingroup$
    This is a censoring problem as mentioned below. Taking censored data into account is more complicated, but it must be done; one can't ignore censored data. Good luck and have fun, this is a good problem.
    $endgroup$
    – Robert Dodier
    2 days ago














  • 2




    $begingroup$
    This is a censoring problem as mentioned below. Taking censored data into account is more complicated, but it must be done; one can't ignore censored data. Good luck and have fun, this is a good problem.
    $endgroup$
    – Robert Dodier
    2 days ago








2




2




$begingroup$
This is a censoring problem as mentioned below. Taking censored data into account is more complicated, but it must be done; one can't ignore censored data. Good luck and have fun, this is a good problem.
$endgroup$
– Robert Dodier
2 days ago




$begingroup$
This is a censoring problem as mentioned below. Taking censored data into account is more complicated, but it must be done; one can't ignore censored data. Good luck and have fun, this is a good problem.
$endgroup$
– Robert Dodier
2 days ago










2 Answers
2






active

oldest

votes


















5












$begingroup$



original answer:



Don't exclude cases solely because they are below LLOQ! (lower limit of quantitation)




  • The LLOQ is not a magic hard threshold below which nothing can be said. It is rather a convention to mark the concentration where the relative error of the analyses falls below 10 %.

  • Note that LLOQ is often computed assuming homescedasticity, i.e. the absolute error being independent of the concentration. That is, you don't even assume different absolute error for cases below or above LLOQ. From that point of view, LLOQ is essentially just a way to express the absoute uncertainty of the analytical method in a concentration unit. (Like fuel economy in l/100 km vs. miles/gallon)

  • Even if analytical error is concentration dependent, two cases with true concentration almost the same but slightly below and above LLOQ have almost the same uncertainty.

  • (Left) censoring data (which is the technical term for excluding cases below LLOQ) leads to all kinds of complications in consecutive data analyses (and you'd need to use particular statistical methods that can deal with such data).

  • Say thank you to your clinical lab that they provide you with full data: I've met many people who have the opposite difficulty: getting a report that just says below LLOQ, and no possibility to recover any further information.


Bottomline: never censor your data unless you have really, really good reasons for doing so.





update about replacing below LLOQ results by $frac{LLOQ}{sqrt 2}$ and in view of the comments below:



From the comments below I think that we may be using terms in slightly different ways - possibly due to coming from different fields:





  • linear range: concentration range where we get linear (can be relaxed to strictly monotonous) dependency of signal as function of concentration.

  • calibration range: concentration range covered by calibration and validation samples. Outside this [precisely: the range covered by validation], we don't really know how our method behaves.

    => I totally agree that something$^{TM}$ needs to be done if 1/8 of the samples are outside calibration range and those samples moreover are important to reach correct conclusions in the study.


  • LLOQ (aka LQ or LOQ) is a method performance characteristic. They can be inside or outside the calibration range. The most basic definition of LLOQ I'm aware of is specifying a relative error that must not be exceeded in the quantitation range. In my field, it is typically set to 10 % (but that can vary, and in fact should vary according to the requirements of the application. So the 10 % relative error to me is like the 5 % threshold for $p$ in significance testing, a default convention)



I've encountered various replacement constants for $leq$ LLOQ concentrations, ranging from 0 over $frac{LLOQ}{2}$ to LLOQ and random values in the concentration range below LLOQ ( $frac{LLOQ}{sqrt 2}$ is new in that collection - what's the idea behind the $frac{1}{sqrt 2}$?). These are normally desperate attempts to impute values for censored data where not even (gu)estimates are available.




  • With the above definition of LLOQ and validation data available for the whole concentration range encountered in your samples, replacing concentrations below LLOQ would amount to throwing away most of the little information you have for those samples, and that's typically not what you want. See my answer to this related question for illustrations.

    Also, this wouldn't avoid the necessity for using statistical methods that work with censored data, so there really isn't much difference to excluding.


  • In your case, however, would it be possible to extend the validated concentration range somewhat further to the low side? Even if you don't extend the calibration range accordingly (which weuld IMHO be best).

    You say that excluding or not those 12 % of your samples has a large influence on your findings. So the idea here would be to rescue the study by establishing linearity and analytical error for a sufficiently large fraction of your samples to get stable estimates at the higher level (study question).

    While not being as good as having proper calibration from the beginning, the unexpected after all is something that has to be expected in research. With appropriate caution in the conclusions, this would IMHO be acceptable for early research (in contrast to later stages where better knowledge of expected concentration range would be available or work to establish this as analytical/clinical method).


  • There are situations where your calibration function starts with a plateau for low concentrations before reaching a suitable sensitivity in the linear range (e.g. a particular absolute amount of analyte is masked/lost due to adsorption, ...). In contrast to low concentrations outside calibration/validation range but still inside linear range, you basically cannot say anything for concentrations in that plateau.

    I think of this as a "wet lab/chemical censoring". In that case => use statistical methods for censored data.

    The higher level question here is whether your analytical method is fit for purpose.







share|cite|improve this answer











$endgroup$









  • 1




    $begingroup$
    @JamesPhillips: plus, from my chemist's world-view, that makes sense only if we actually have distinct subpopulations, i.e. clusters as opposed to a continuum where a rather arbitrary threshold cuts of a tail - in that case a regression is more sensible. Note that if you cut between clusters of cases, you have less of a thresholding/censoring problem. Whereas cutting "through the middle" of a single population leads to complications that are somewhat related to those of censoring.
    $endgroup$
    – cbeleites
    2 days ago






  • 1




    $begingroup$
    @Gux: There are some statements in you comment that are not sensible with the terminology I'm familiar with... so I'm not entirely sure I understand you correctly. E.g. "LLOQ was pre-specified and used for validation" -> LLOQ is one of the parameters/characteristics established (measured, estimated in the statistical sense) as part of [method] validation in my vocabulary. You can specify for method development that LLOQ better than x is needed - but it still needs to be established what the actual method capability is, and that is then the LLOQ you got.
    $endgroup$
    – cbeleites
    yesterday






  • 1




    $begingroup$
    LOD is established similarly to LLOQ (it is a different concentration, corresponding to a different application question - and as a rule of thumb/0th order approximation/rough guesstimate you can expect it to be around 1/3 of LLOQ). Yes, LOD can be sample (e.g. matrix) dependent. But in that case it doesn't make any sense to assume LLOQ not to be affected as well!?
    $endgroup$
    – cbeleites
    yesterday






  • 1




    $begingroup$
    What are your batches? (I'll tackle the LLOQ/$sqrt 2$ in an update to the answer.)
    $endgroup$
    – cbeleites
    yesterday








  • 1




    $begingroup$
    One more question: did the lab tell you their calibration range?
    $endgroup$
    – cbeleites
    yesterday



















1












$begingroup$

Suppose you have an X with a measuring error with a standard deviation of 100. If X is measured to be 1000, then the expected value (assuming a well-calibrated measurement system) is around 1000. But suppose the measured amount is 110. It's possible that the actual value is 310; it's possible to be two standard deviations from the mean. But assuming that X is something that can't be negative, the actual value can't be two standard deviations below the measured value. Thus, there is likely a slight bias, and the expected value of the true value of X, given that the measured value is 110, is slightly more than 110. As your measured value gets smaller and smaller, this bias gets larger and larger.



Whether this happens, and to what degree, depends on a lot of factors, such as what the distribution of actual values is and how the measurement error acts. But the bottom line is that including measured values below the LLOQ quite possibly can harm the validity of the regression. Unfortunately, removing them does not eliminate the problem, and can make it worse. You'll need to look at how the error is being modeled, what assumptions are being made about the actual distribution, and what methods there are to compensate. That removing them inverts the relationship certainly is a red flag that care needs to be taken in dealing with them.






share|cite|improve this answer









$endgroup$













    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "65"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f388567%2fexclude-observations-with-measurements-below-limit-of-detection%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    5












    $begingroup$



    original answer:



    Don't exclude cases solely because they are below LLOQ! (lower limit of quantitation)




    • The LLOQ is not a magic hard threshold below which nothing can be said. It is rather a convention to mark the concentration where the relative error of the analyses falls below 10 %.

    • Note that LLOQ is often computed assuming homescedasticity, i.e. the absolute error being independent of the concentration. That is, you don't even assume different absolute error for cases below or above LLOQ. From that point of view, LLOQ is essentially just a way to express the absoute uncertainty of the analytical method in a concentration unit. (Like fuel economy in l/100 km vs. miles/gallon)

    • Even if analytical error is concentration dependent, two cases with true concentration almost the same but slightly below and above LLOQ have almost the same uncertainty.

    • (Left) censoring data (which is the technical term for excluding cases below LLOQ) leads to all kinds of complications in consecutive data analyses (and you'd need to use particular statistical methods that can deal with such data).

    • Say thank you to your clinical lab that they provide you with full data: I've met many people who have the opposite difficulty: getting a report that just says below LLOQ, and no possibility to recover any further information.


    Bottomline: never censor your data unless you have really, really good reasons for doing so.





    update about replacing below LLOQ results by $frac{LLOQ}{sqrt 2}$ and in view of the comments below:



    From the comments below I think that we may be using terms in slightly different ways - possibly due to coming from different fields:





    • linear range: concentration range where we get linear (can be relaxed to strictly monotonous) dependency of signal as function of concentration.

    • calibration range: concentration range covered by calibration and validation samples. Outside this [precisely: the range covered by validation], we don't really know how our method behaves.

      => I totally agree that something$^{TM}$ needs to be done if 1/8 of the samples are outside calibration range and those samples moreover are important to reach correct conclusions in the study.


    • LLOQ (aka LQ or LOQ) is a method performance characteristic. They can be inside or outside the calibration range. The most basic definition of LLOQ I'm aware of is specifying a relative error that must not be exceeded in the quantitation range. In my field, it is typically set to 10 % (but that can vary, and in fact should vary according to the requirements of the application. So the 10 % relative error to me is like the 5 % threshold for $p$ in significance testing, a default convention)



    I've encountered various replacement constants for $leq$ LLOQ concentrations, ranging from 0 over $frac{LLOQ}{2}$ to LLOQ and random values in the concentration range below LLOQ ( $frac{LLOQ}{sqrt 2}$ is new in that collection - what's the idea behind the $frac{1}{sqrt 2}$?). These are normally desperate attempts to impute values for censored data where not even (gu)estimates are available.




    • With the above definition of LLOQ and validation data available for the whole concentration range encountered in your samples, replacing concentrations below LLOQ would amount to throwing away most of the little information you have for those samples, and that's typically not what you want. See my answer to this related question for illustrations.

      Also, this wouldn't avoid the necessity for using statistical methods that work with censored data, so there really isn't much difference to excluding.


    • In your case, however, would it be possible to extend the validated concentration range somewhat further to the low side? Even if you don't extend the calibration range accordingly (which weuld IMHO be best).

      You say that excluding or not those 12 % of your samples has a large influence on your findings. So the idea here would be to rescue the study by establishing linearity and analytical error for a sufficiently large fraction of your samples to get stable estimates at the higher level (study question).

      While not being as good as having proper calibration from the beginning, the unexpected after all is something that has to be expected in research. With appropriate caution in the conclusions, this would IMHO be acceptable for early research (in contrast to later stages where better knowledge of expected concentration range would be available or work to establish this as analytical/clinical method).


    • There are situations where your calibration function starts with a plateau for low concentrations before reaching a suitable sensitivity in the linear range (e.g. a particular absolute amount of analyte is masked/lost due to adsorption, ...). In contrast to low concentrations outside calibration/validation range but still inside linear range, you basically cannot say anything for concentrations in that plateau.

      I think of this as a "wet lab/chemical censoring". In that case => use statistical methods for censored data.

      The higher level question here is whether your analytical method is fit for purpose.







    share|cite|improve this answer











    $endgroup$









    • 1




      $begingroup$
      @JamesPhillips: plus, from my chemist's world-view, that makes sense only if we actually have distinct subpopulations, i.e. clusters as opposed to a continuum where a rather arbitrary threshold cuts of a tail - in that case a regression is more sensible. Note that if you cut between clusters of cases, you have less of a thresholding/censoring problem. Whereas cutting "through the middle" of a single population leads to complications that are somewhat related to those of censoring.
      $endgroup$
      – cbeleites
      2 days ago






    • 1




      $begingroup$
      @Gux: There are some statements in you comment that are not sensible with the terminology I'm familiar with... so I'm not entirely sure I understand you correctly. E.g. "LLOQ was pre-specified and used for validation" -> LLOQ is one of the parameters/characteristics established (measured, estimated in the statistical sense) as part of [method] validation in my vocabulary. You can specify for method development that LLOQ better than x is needed - but it still needs to be established what the actual method capability is, and that is then the LLOQ you got.
      $endgroup$
      – cbeleites
      yesterday






    • 1




      $begingroup$
      LOD is established similarly to LLOQ (it is a different concentration, corresponding to a different application question - and as a rule of thumb/0th order approximation/rough guesstimate you can expect it to be around 1/3 of LLOQ). Yes, LOD can be sample (e.g. matrix) dependent. But in that case it doesn't make any sense to assume LLOQ not to be affected as well!?
      $endgroup$
      – cbeleites
      yesterday






    • 1




      $begingroup$
      What are your batches? (I'll tackle the LLOQ/$sqrt 2$ in an update to the answer.)
      $endgroup$
      – cbeleites
      yesterday








    • 1




      $begingroup$
      One more question: did the lab tell you their calibration range?
      $endgroup$
      – cbeleites
      yesterday
















    5












    $begingroup$



    original answer:



    Don't exclude cases solely because they are below LLOQ! (lower limit of quantitation)




    • The LLOQ is not a magic hard threshold below which nothing can be said. It is rather a convention to mark the concentration where the relative error of the analyses falls below 10 %.

    • Note that LLOQ is often computed assuming homescedasticity, i.e. the absolute error being independent of the concentration. That is, you don't even assume different absolute error for cases below or above LLOQ. From that point of view, LLOQ is essentially just a way to express the absoute uncertainty of the analytical method in a concentration unit. (Like fuel economy in l/100 km vs. miles/gallon)

    • Even if analytical error is concentration dependent, two cases with true concentration almost the same but slightly below and above LLOQ have almost the same uncertainty.

    • (Left) censoring data (which is the technical term for excluding cases below LLOQ) leads to all kinds of complications in consecutive data analyses (and you'd need to use particular statistical methods that can deal with such data).

    • Say thank you to your clinical lab that they provide you with full data: I've met many people who have the opposite difficulty: getting a report that just says below LLOQ, and no possibility to recover any further information.


    Bottomline: never censor your data unless you have really, really good reasons for doing so.





    update about replacing below LLOQ results by $frac{LLOQ}{sqrt 2}$ and in view of the comments below:



    From the comments below I think that we may be using terms in slightly different ways - possibly due to coming from different fields:





    • linear range: concentration range where we get linear (can be relaxed to strictly monotonous) dependency of signal as function of concentration.

    • calibration range: concentration range covered by calibration and validation samples. Outside this [precisely: the range covered by validation], we don't really know how our method behaves.

      => I totally agree that something$^{TM}$ needs to be done if 1/8 of the samples are outside calibration range and those samples moreover are important to reach correct conclusions in the study.


    • LLOQ (aka LQ or LOQ) is a method performance characteristic. They can be inside or outside the calibration range. The most basic definition of LLOQ I'm aware of is specifying a relative error that must not be exceeded in the quantitation range. In my field, it is typically set to 10 % (but that can vary, and in fact should vary according to the requirements of the application. So the 10 % relative error to me is like the 5 % threshold for $p$ in significance testing, a default convention)



    I've encountered various replacement constants for $leq$ LLOQ concentrations, ranging from 0 over $frac{LLOQ}{2}$ to LLOQ and random values in the concentration range below LLOQ ( $frac{LLOQ}{sqrt 2}$ is new in that collection - what's the idea behind the $frac{1}{sqrt 2}$?). These are normally desperate attempts to impute values for censored data where not even (gu)estimates are available.




    • With the above definition of LLOQ and validation data available for the whole concentration range encountered in your samples, replacing concentrations below LLOQ would amount to throwing away most of the little information you have for those samples, and that's typically not what you want. See my answer to this related question for illustrations.

      Also, this wouldn't avoid the necessity for using statistical methods that work with censored data, so there really isn't much difference to excluding.


    • In your case, however, would it be possible to extend the validated concentration range somewhat further to the low side? Even if you don't extend the calibration range accordingly (which weuld IMHO be best).

      You say that excluding or not those 12 % of your samples has a large influence on your findings. So the idea here would be to rescue the study by establishing linearity and analytical error for a sufficiently large fraction of your samples to get stable estimates at the higher level (study question).

      While not being as good as having proper calibration from the beginning, the unexpected after all is something that has to be expected in research. With appropriate caution in the conclusions, this would IMHO be acceptable for early research (in contrast to later stages where better knowledge of expected concentration range would be available or work to establish this as analytical/clinical method).


    • There are situations where your calibration function starts with a plateau for low concentrations before reaching a suitable sensitivity in the linear range (e.g. a particular absolute amount of analyte is masked/lost due to adsorption, ...). In contrast to low concentrations outside calibration/validation range but still inside linear range, you basically cannot say anything for concentrations in that plateau.

      I think of this as a "wet lab/chemical censoring". In that case => use statistical methods for censored data.

      The higher level question here is whether your analytical method is fit for purpose.







    share|cite|improve this answer











    $endgroup$









    • 1




      $begingroup$
      @JamesPhillips: plus, from my chemist's world-view, that makes sense only if we actually have distinct subpopulations, i.e. clusters as opposed to a continuum where a rather arbitrary threshold cuts of a tail - in that case a regression is more sensible. Note that if you cut between clusters of cases, you have less of a thresholding/censoring problem. Whereas cutting "through the middle" of a single population leads to complications that are somewhat related to those of censoring.
      $endgroup$
      – cbeleites
      2 days ago






    • 1




      $begingroup$
      @Gux: There are some statements in you comment that are not sensible with the terminology I'm familiar with... so I'm not entirely sure I understand you correctly. E.g. "LLOQ was pre-specified and used for validation" -> LLOQ is one of the parameters/characteristics established (measured, estimated in the statistical sense) as part of [method] validation in my vocabulary. You can specify for method development that LLOQ better than x is needed - but it still needs to be established what the actual method capability is, and that is then the LLOQ you got.
      $endgroup$
      – cbeleites
      yesterday






    • 1




      $begingroup$
      LOD is established similarly to LLOQ (it is a different concentration, corresponding to a different application question - and as a rule of thumb/0th order approximation/rough guesstimate you can expect it to be around 1/3 of LLOQ). Yes, LOD can be sample (e.g. matrix) dependent. But in that case it doesn't make any sense to assume LLOQ not to be affected as well!?
      $endgroup$
      – cbeleites
      yesterday






    • 1




      $begingroup$
      What are your batches? (I'll tackle the LLOQ/$sqrt 2$ in an update to the answer.)
      $endgroup$
      – cbeleites
      yesterday








    • 1




      $begingroup$
      One more question: did the lab tell you their calibration range?
      $endgroup$
      – cbeleites
      yesterday














    5












    5








    5





    $begingroup$



    original answer:



    Don't exclude cases solely because they are below LLOQ! (lower limit of quantitation)




    • The LLOQ is not a magic hard threshold below which nothing can be said. It is rather a convention to mark the concentration where the relative error of the analyses falls below 10 %.

    • Note that LLOQ is often computed assuming homescedasticity, i.e. the absolute error being independent of the concentration. That is, you don't even assume different absolute error for cases below or above LLOQ. From that point of view, LLOQ is essentially just a way to express the absoute uncertainty of the analytical method in a concentration unit. (Like fuel economy in l/100 km vs. miles/gallon)

    • Even if analytical error is concentration dependent, two cases with true concentration almost the same but slightly below and above LLOQ have almost the same uncertainty.

    • (Left) censoring data (which is the technical term for excluding cases below LLOQ) leads to all kinds of complications in consecutive data analyses (and you'd need to use particular statistical methods that can deal with such data).

    • Say thank you to your clinical lab that they provide you with full data: I've met many people who have the opposite difficulty: getting a report that just says below LLOQ, and no possibility to recover any further information.


    Bottomline: never censor your data unless you have really, really good reasons for doing so.





    update about replacing below LLOQ results by $frac{LLOQ}{sqrt 2}$ and in view of the comments below:



    From the comments below I think that we may be using terms in slightly different ways - possibly due to coming from different fields:





    • linear range: concentration range where we get linear (can be relaxed to strictly monotonous) dependency of signal as function of concentration.

    • calibration range: concentration range covered by calibration and validation samples. Outside this [precisely: the range covered by validation], we don't really know how our method behaves.

      => I totally agree that something$^{TM}$ needs to be done if 1/8 of the samples are outside calibration range and those samples moreover are important to reach correct conclusions in the study.


    • LLOQ (aka LQ or LOQ) is a method performance characteristic. They can be inside or outside the calibration range. The most basic definition of LLOQ I'm aware of is specifying a relative error that must not be exceeded in the quantitation range. In my field, it is typically set to 10 % (but that can vary, and in fact should vary according to the requirements of the application. So the 10 % relative error to me is like the 5 % threshold for $p$ in significance testing, a default convention)



    I've encountered various replacement constants for $leq$ LLOQ concentrations, ranging from 0 over $frac{LLOQ}{2}$ to LLOQ and random values in the concentration range below LLOQ ( $frac{LLOQ}{sqrt 2}$ is new in that collection - what's the idea behind the $frac{1}{sqrt 2}$?). These are normally desperate attempts to impute values for censored data where not even (gu)estimates are available.




    • With the above definition of LLOQ and validation data available for the whole concentration range encountered in your samples, replacing concentrations below LLOQ would amount to throwing away most of the little information you have for those samples, and that's typically not what you want. See my answer to this related question for illustrations.

      Also, this wouldn't avoid the necessity for using statistical methods that work with censored data, so there really isn't much difference to excluding.


    • In your case, however, would it be possible to extend the validated concentration range somewhat further to the low side? Even if you don't extend the calibration range accordingly (which weuld IMHO be best).

      You say that excluding or not those 12 % of your samples has a large influence on your findings. So the idea here would be to rescue the study by establishing linearity and analytical error for a sufficiently large fraction of your samples to get stable estimates at the higher level (study question).

      While not being as good as having proper calibration from the beginning, the unexpected after all is something that has to be expected in research. With appropriate caution in the conclusions, this would IMHO be acceptable for early research (in contrast to later stages where better knowledge of expected concentration range would be available or work to establish this as analytical/clinical method).


    • There are situations where your calibration function starts with a plateau for low concentrations before reaching a suitable sensitivity in the linear range (e.g. a particular absolute amount of analyte is masked/lost due to adsorption, ...). In contrast to low concentrations outside calibration/validation range but still inside linear range, you basically cannot say anything for concentrations in that plateau.

      I think of this as a "wet lab/chemical censoring". In that case => use statistical methods for censored data.

      The higher level question here is whether your analytical method is fit for purpose.







    share|cite|improve this answer











    $endgroup$





    original answer:



    Don't exclude cases solely because they are below LLOQ! (lower limit of quantitation)




    • The LLOQ is not a magic hard threshold below which nothing can be said. It is rather a convention to mark the concentration where the relative error of the analyses falls below 10 %.

    • Note that LLOQ is often computed assuming homescedasticity, i.e. the absolute error being independent of the concentration. That is, you don't even assume different absolute error for cases below or above LLOQ. From that point of view, LLOQ is essentially just a way to express the absoute uncertainty of the analytical method in a concentration unit. (Like fuel economy in l/100 km vs. miles/gallon)

    • Even if analytical error is concentration dependent, two cases with true concentration almost the same but slightly below and above LLOQ have almost the same uncertainty.

    • (Left) censoring data (which is the technical term for excluding cases below LLOQ) leads to all kinds of complications in consecutive data analyses (and you'd need to use particular statistical methods that can deal with such data).

    • Say thank you to your clinical lab that they provide you with full data: I've met many people who have the opposite difficulty: getting a report that just says below LLOQ, and no possibility to recover any further information.


    Bottomline: never censor your data unless you have really, really good reasons for doing so.





    update about replacing below LLOQ results by $frac{LLOQ}{sqrt 2}$ and in view of the comments below:



    From the comments below I think that we may be using terms in slightly different ways - possibly due to coming from different fields:





    • linear range: concentration range where we get linear (can be relaxed to strictly monotonous) dependency of signal as function of concentration.

    • calibration range: concentration range covered by calibration and validation samples. Outside this [precisely: the range covered by validation], we don't really know how our method behaves.

      => I totally agree that something$^{TM}$ needs to be done if 1/8 of the samples are outside calibration range and those samples moreover are important to reach correct conclusions in the study.


    • LLOQ (aka LQ or LOQ) is a method performance characteristic. They can be inside or outside the calibration range. The most basic definition of LLOQ I'm aware of is specifying a relative error that must not be exceeded in the quantitation range. In my field, it is typically set to 10 % (but that can vary, and in fact should vary according to the requirements of the application. So the 10 % relative error to me is like the 5 % threshold for $p$ in significance testing, a default convention)



    I've encountered various replacement constants for $leq$ LLOQ concentrations, ranging from 0 over $frac{LLOQ}{2}$ to LLOQ and random values in the concentration range below LLOQ ( $frac{LLOQ}{sqrt 2}$ is new in that collection - what's the idea behind the $frac{1}{sqrt 2}$?). These are normally desperate attempts to impute values for censored data where not even (gu)estimates are available.




    • With the above definition of LLOQ and validation data available for the whole concentration range encountered in your samples, replacing concentrations below LLOQ would amount to throwing away most of the little information you have for those samples, and that's typically not what you want. See my answer to this related question for illustrations.

      Also, this wouldn't avoid the necessity for using statistical methods that work with censored data, so there really isn't much difference to excluding.


    • In your case, however, would it be possible to extend the validated concentration range somewhat further to the low side? Even if you don't extend the calibration range accordingly (which weuld IMHO be best).

      You say that excluding or not those 12 % of your samples has a large influence on your findings. So the idea here would be to rescue the study by establishing linearity and analytical error for a sufficiently large fraction of your samples to get stable estimates at the higher level (study question).

      While not being as good as having proper calibration from the beginning, the unexpected after all is something that has to be expected in research. With appropriate caution in the conclusions, this would IMHO be acceptable for early research (in contrast to later stages where better knowledge of expected concentration range would be available or work to establish this as analytical/clinical method).


    • There are situations where your calibration function starts with a plateau for low concentrations before reaching a suitable sensitivity in the linear range (e.g. a particular absolute amount of analyte is masked/lost due to adsorption, ...). In contrast to low concentrations outside calibration/validation range but still inside linear range, you basically cannot say anything for concentrations in that plateau.

      I think of this as a "wet lab/chemical censoring". In that case => use statistical methods for censored data.

      The higher level question here is whether your analytical method is fit for purpose.








    share|cite|improve this answer














    share|cite|improve this answer



    share|cite|improve this answer








    edited 19 hours ago

























    answered 2 days ago









    cbeleitescbeleites

    23.6k148101




    23.6k148101








    • 1




      $begingroup$
      @JamesPhillips: plus, from my chemist's world-view, that makes sense only if we actually have distinct subpopulations, i.e. clusters as opposed to a continuum where a rather arbitrary threshold cuts of a tail - in that case a regression is more sensible. Note that if you cut between clusters of cases, you have less of a thresholding/censoring problem. Whereas cutting "through the middle" of a single population leads to complications that are somewhat related to those of censoring.
      $endgroup$
      – cbeleites
      2 days ago






    • 1




      $begingroup$
      @Gux: There are some statements in you comment that are not sensible with the terminology I'm familiar with... so I'm not entirely sure I understand you correctly. E.g. "LLOQ was pre-specified and used for validation" -> LLOQ is one of the parameters/characteristics established (measured, estimated in the statistical sense) as part of [method] validation in my vocabulary. You can specify for method development that LLOQ better than x is needed - but it still needs to be established what the actual method capability is, and that is then the LLOQ you got.
      $endgroup$
      – cbeleites
      yesterday






    • 1




      $begingroup$
      LOD is established similarly to LLOQ (it is a different concentration, corresponding to a different application question - and as a rule of thumb/0th order approximation/rough guesstimate you can expect it to be around 1/3 of LLOQ). Yes, LOD can be sample (e.g. matrix) dependent. But in that case it doesn't make any sense to assume LLOQ not to be affected as well!?
      $endgroup$
      – cbeleites
      yesterday






    • 1




      $begingroup$
      What are your batches? (I'll tackle the LLOQ/$sqrt 2$ in an update to the answer.)
      $endgroup$
      – cbeleites
      yesterday








    • 1




      $begingroup$
      One more question: did the lab tell you their calibration range?
      $endgroup$
      – cbeleites
      yesterday














    • 1




      $begingroup$
      @JamesPhillips: plus, from my chemist's world-view, that makes sense only if we actually have distinct subpopulations, i.e. clusters as opposed to a continuum where a rather arbitrary threshold cuts of a tail - in that case a regression is more sensible. Note that if you cut between clusters of cases, you have less of a thresholding/censoring problem. Whereas cutting "through the middle" of a single population leads to complications that are somewhat related to those of censoring.
      $endgroup$
      – cbeleites
      2 days ago






    • 1




      $begingroup$
      @Gux: There are some statements in you comment that are not sensible with the terminology I'm familiar with... so I'm not entirely sure I understand you correctly. E.g. "LLOQ was pre-specified and used for validation" -> LLOQ is one of the parameters/characteristics established (measured, estimated in the statistical sense) as part of [method] validation in my vocabulary. You can specify for method development that LLOQ better than x is needed - but it still needs to be established what the actual method capability is, and that is then the LLOQ you got.
      $endgroup$
      – cbeleites
      yesterday






    • 1




      $begingroup$
      LOD is established similarly to LLOQ (it is a different concentration, corresponding to a different application question - and as a rule of thumb/0th order approximation/rough guesstimate you can expect it to be around 1/3 of LLOQ). Yes, LOD can be sample (e.g. matrix) dependent. But in that case it doesn't make any sense to assume LLOQ not to be affected as well!?
      $endgroup$
      – cbeleites
      yesterday






    • 1




      $begingroup$
      What are your batches? (I'll tackle the LLOQ/$sqrt 2$ in an update to the answer.)
      $endgroup$
      – cbeleites
      yesterday








    • 1




      $begingroup$
      One more question: did the lab tell you their calibration range?
      $endgroup$
      – cbeleites
      yesterday








    1




    1




    $begingroup$
    @JamesPhillips: plus, from my chemist's world-view, that makes sense only if we actually have distinct subpopulations, i.e. clusters as opposed to a continuum where a rather arbitrary threshold cuts of a tail - in that case a regression is more sensible. Note that if you cut between clusters of cases, you have less of a thresholding/censoring problem. Whereas cutting "through the middle" of a single population leads to complications that are somewhat related to those of censoring.
    $endgroup$
    – cbeleites
    2 days ago




    $begingroup$
    @JamesPhillips: plus, from my chemist's world-view, that makes sense only if we actually have distinct subpopulations, i.e. clusters as opposed to a continuum where a rather arbitrary threshold cuts of a tail - in that case a regression is more sensible. Note that if you cut between clusters of cases, you have less of a thresholding/censoring problem. Whereas cutting "through the middle" of a single population leads to complications that are somewhat related to those of censoring.
    $endgroup$
    – cbeleites
    2 days ago




    1




    1




    $begingroup$
    @Gux: There are some statements in you comment that are not sensible with the terminology I'm familiar with... so I'm not entirely sure I understand you correctly. E.g. "LLOQ was pre-specified and used for validation" -> LLOQ is one of the parameters/characteristics established (measured, estimated in the statistical sense) as part of [method] validation in my vocabulary. You can specify for method development that LLOQ better than x is needed - but it still needs to be established what the actual method capability is, and that is then the LLOQ you got.
    $endgroup$
    – cbeleites
    yesterday




    $begingroup$
    @Gux: There are some statements in you comment that are not sensible with the terminology I'm familiar with... so I'm not entirely sure I understand you correctly. E.g. "LLOQ was pre-specified and used for validation" -> LLOQ is one of the parameters/characteristics established (measured, estimated in the statistical sense) as part of [method] validation in my vocabulary. You can specify for method development that LLOQ better than x is needed - but it still needs to be established what the actual method capability is, and that is then the LLOQ you got.
    $endgroup$
    – cbeleites
    yesterday




    1




    1




    $begingroup$
    LOD is established similarly to LLOQ (it is a different concentration, corresponding to a different application question - and as a rule of thumb/0th order approximation/rough guesstimate you can expect it to be around 1/3 of LLOQ). Yes, LOD can be sample (e.g. matrix) dependent. But in that case it doesn't make any sense to assume LLOQ not to be affected as well!?
    $endgroup$
    – cbeleites
    yesterday




    $begingroup$
    LOD is established similarly to LLOQ (it is a different concentration, corresponding to a different application question - and as a rule of thumb/0th order approximation/rough guesstimate you can expect it to be around 1/3 of LLOQ). Yes, LOD can be sample (e.g. matrix) dependent. But in that case it doesn't make any sense to assume LLOQ not to be affected as well!?
    $endgroup$
    – cbeleites
    yesterday




    1




    1




    $begingroup$
    What are your batches? (I'll tackle the LLOQ/$sqrt 2$ in an update to the answer.)
    $endgroup$
    – cbeleites
    yesterday






    $begingroup$
    What are your batches? (I'll tackle the LLOQ/$sqrt 2$ in an update to the answer.)
    $endgroup$
    – cbeleites
    yesterday






    1




    1




    $begingroup$
    One more question: did the lab tell you their calibration range?
    $endgroup$
    – cbeleites
    yesterday




    $begingroup$
    One more question: did the lab tell you their calibration range?
    $endgroup$
    – cbeleites
    yesterday













    1












    $begingroup$

    Suppose you have an X with a measuring error with a standard deviation of 100. If X is measured to be 1000, then the expected value (assuming a well-calibrated measurement system) is around 1000. But suppose the measured amount is 110. It's possible that the actual value is 310; it's possible to be two standard deviations from the mean. But assuming that X is something that can't be negative, the actual value can't be two standard deviations below the measured value. Thus, there is likely a slight bias, and the expected value of the true value of X, given that the measured value is 110, is slightly more than 110. As your measured value gets smaller and smaller, this bias gets larger and larger.



    Whether this happens, and to what degree, depends on a lot of factors, such as what the distribution of actual values is and how the measurement error acts. But the bottom line is that including measured values below the LLOQ quite possibly can harm the validity of the regression. Unfortunately, removing them does not eliminate the problem, and can make it worse. You'll need to look at how the error is being modeled, what assumptions are being made about the actual distribution, and what methods there are to compensate. That removing them inverts the relationship certainly is a red flag that care needs to be taken in dealing with them.






    share|cite|improve this answer









    $endgroup$


















      1












      $begingroup$

      Suppose you have an X with a measuring error with a standard deviation of 100. If X is measured to be 1000, then the expected value (assuming a well-calibrated measurement system) is around 1000. But suppose the measured amount is 110. It's possible that the actual value is 310; it's possible to be two standard deviations from the mean. But assuming that X is something that can't be negative, the actual value can't be two standard deviations below the measured value. Thus, there is likely a slight bias, and the expected value of the true value of X, given that the measured value is 110, is slightly more than 110. As your measured value gets smaller and smaller, this bias gets larger and larger.



      Whether this happens, and to what degree, depends on a lot of factors, such as what the distribution of actual values is and how the measurement error acts. But the bottom line is that including measured values below the LLOQ quite possibly can harm the validity of the regression. Unfortunately, removing them does not eliminate the problem, and can make it worse. You'll need to look at how the error is being modeled, what assumptions are being made about the actual distribution, and what methods there are to compensate. That removing them inverts the relationship certainly is a red flag that care needs to be taken in dealing with them.






      share|cite|improve this answer









      $endgroup$
















        1












        1








        1





        $begingroup$

        Suppose you have an X with a measuring error with a standard deviation of 100. If X is measured to be 1000, then the expected value (assuming a well-calibrated measurement system) is around 1000. But suppose the measured amount is 110. It's possible that the actual value is 310; it's possible to be two standard deviations from the mean. But assuming that X is something that can't be negative, the actual value can't be two standard deviations below the measured value. Thus, there is likely a slight bias, and the expected value of the true value of X, given that the measured value is 110, is slightly more than 110. As your measured value gets smaller and smaller, this bias gets larger and larger.



        Whether this happens, and to what degree, depends on a lot of factors, such as what the distribution of actual values is and how the measurement error acts. But the bottom line is that including measured values below the LLOQ quite possibly can harm the validity of the regression. Unfortunately, removing them does not eliminate the problem, and can make it worse. You'll need to look at how the error is being modeled, what assumptions are being made about the actual distribution, and what methods there are to compensate. That removing them inverts the relationship certainly is a red flag that care needs to be taken in dealing with them.






        share|cite|improve this answer









        $endgroup$



        Suppose you have an X with a measuring error with a standard deviation of 100. If X is measured to be 1000, then the expected value (assuming a well-calibrated measurement system) is around 1000. But suppose the measured amount is 110. It's possible that the actual value is 310; it's possible to be two standard deviations from the mean. But assuming that X is something that can't be negative, the actual value can't be two standard deviations below the measured value. Thus, there is likely a slight bias, and the expected value of the true value of X, given that the measured value is 110, is slightly more than 110. As your measured value gets smaller and smaller, this bias gets larger and larger.



        Whether this happens, and to what degree, depends on a lot of factors, such as what the distribution of actual values is and how the measurement error acts. But the bottom line is that including measured values below the LLOQ quite possibly can harm the validity of the regression. Unfortunately, removing them does not eliminate the problem, and can make it worse. You'll need to look at how the error is being modeled, what assumptions are being made about the actual distribution, and what methods there are to compensate. That removing them inverts the relationship certainly is a red flag that care needs to be taken in dealing with them.







        share|cite|improve this answer












        share|cite|improve this answer



        share|cite|improve this answer










        answered 2 days ago









        AcccumulationAcccumulation

        1,56626




        1,56626






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Cross Validated!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f388567%2fexclude-observations-with-measurements-below-limit-of-detection%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            "Incorrect syntax near the keyword 'ON'. (on update cascade, on delete cascade,)

            Alcedinidae

            RAC Tourist Trophy