Based on the ideas of Parameter Estimation and Fitting Probability Distributions, what stops us from making...












1












$begingroup$


Currently I am doing an introduction to parameter estimation and fitting probability distributions to sets of data. So in a small synopsis what I understand the whole process to be like is the following:



1) We collect a large amount of raw data, which comes from an underlying probability distribution. We then "graph" the data (perhaps in the form of a bar chart or something similar at least in the 2D and 3D cases).



2) Observing this visual presentation we go through our list of existing probability distributions and form an opinion on which distribution appears to fit the data most precisely.



3) We then take a large sample from this data and attempt to estimate the parameters of our chosen probability distribution by using the array of techinques available at our disposal.



I have a few questions:



i) Is the outline above the procedure used to get parameter estimates?



ii) (more important) What stops us from making any function a probability distribution? What I mean is we have this visual representation of the data, perhaps none of the known probability distributions that we have presently align with the data. What stops us from just saying "this continuous function will now be a distribution as long as it satisfies the necessary axioms." Is there something more rigourous to this? (perhpas I just haven't arrived there yet in my studies).










share|cite|improve this question









$endgroup$








  • 1




    $begingroup$
    To your last question: A probability distribution do not need to be any of the "named" distributions. Any nonnegative function that integrates to 1 can be used as a probability density. Maybe you should look for nonparametric methods.
    $endgroup$
    – kjetil b halvorsen
    yesterday
















1












$begingroup$


Currently I am doing an introduction to parameter estimation and fitting probability distributions to sets of data. So in a small synopsis what I understand the whole process to be like is the following:



1) We collect a large amount of raw data, which comes from an underlying probability distribution. We then "graph" the data (perhaps in the form of a bar chart or something similar at least in the 2D and 3D cases).



2) Observing this visual presentation we go through our list of existing probability distributions and form an opinion on which distribution appears to fit the data most precisely.



3) We then take a large sample from this data and attempt to estimate the parameters of our chosen probability distribution by using the array of techinques available at our disposal.



I have a few questions:



i) Is the outline above the procedure used to get parameter estimates?



ii) (more important) What stops us from making any function a probability distribution? What I mean is we have this visual representation of the data, perhaps none of the known probability distributions that we have presently align with the data. What stops us from just saying "this continuous function will now be a distribution as long as it satisfies the necessary axioms." Is there something more rigourous to this? (perhpas I just haven't arrived there yet in my studies).










share|cite|improve this question









$endgroup$








  • 1




    $begingroup$
    To your last question: A probability distribution do not need to be any of the "named" distributions. Any nonnegative function that integrates to 1 can be used as a probability density. Maybe you should look for nonparametric methods.
    $endgroup$
    – kjetil b halvorsen
    yesterday














1












1








1





$begingroup$


Currently I am doing an introduction to parameter estimation and fitting probability distributions to sets of data. So in a small synopsis what I understand the whole process to be like is the following:



1) We collect a large amount of raw data, which comes from an underlying probability distribution. We then "graph" the data (perhaps in the form of a bar chart or something similar at least in the 2D and 3D cases).



2) Observing this visual presentation we go through our list of existing probability distributions and form an opinion on which distribution appears to fit the data most precisely.



3) We then take a large sample from this data and attempt to estimate the parameters of our chosen probability distribution by using the array of techinques available at our disposal.



I have a few questions:



i) Is the outline above the procedure used to get parameter estimates?



ii) (more important) What stops us from making any function a probability distribution? What I mean is we have this visual representation of the data, perhaps none of the known probability distributions that we have presently align with the data. What stops us from just saying "this continuous function will now be a distribution as long as it satisfies the necessary axioms." Is there something more rigourous to this? (perhpas I just haven't arrived there yet in my studies).










share|cite|improve this question









$endgroup$




Currently I am doing an introduction to parameter estimation and fitting probability distributions to sets of data. So in a small synopsis what I understand the whole process to be like is the following:



1) We collect a large amount of raw data, which comes from an underlying probability distribution. We then "graph" the data (perhaps in the form of a bar chart or something similar at least in the 2D and 3D cases).



2) Observing this visual presentation we go through our list of existing probability distributions and form an opinion on which distribution appears to fit the data most precisely.



3) We then take a large sample from this data and attempt to estimate the parameters of our chosen probability distribution by using the array of techinques available at our disposal.



I have a few questions:



i) Is the outline above the procedure used to get parameter estimates?



ii) (more important) What stops us from making any function a probability distribution? What I mean is we have this visual representation of the data, perhaps none of the known probability distributions that we have presently align with the data. What stops us from just saying "this continuous function will now be a distribution as long as it satisfies the necessary axioms." Is there something more rigourous to this? (perhpas I just haven't arrived there yet in my studies).







distributions fitting estimators theory






share|cite|improve this question













share|cite|improve this question











share|cite|improve this question




share|cite|improve this question










asked yesterday









dc3rddc3rd

15316




15316








  • 1




    $begingroup$
    To your last question: A probability distribution do not need to be any of the "named" distributions. Any nonnegative function that integrates to 1 can be used as a probability density. Maybe you should look for nonparametric methods.
    $endgroup$
    – kjetil b halvorsen
    yesterday














  • 1




    $begingroup$
    To your last question: A probability distribution do not need to be any of the "named" distributions. Any nonnegative function that integrates to 1 can be used as a probability density. Maybe you should look for nonparametric methods.
    $endgroup$
    – kjetil b halvorsen
    yesterday








1




1




$begingroup$
To your last question: A probability distribution do not need to be any of the "named" distributions. Any nonnegative function that integrates to 1 can be used as a probability density. Maybe you should look for nonparametric methods.
$endgroup$
– kjetil b halvorsen
yesterday




$begingroup$
To your last question: A probability distribution do not need to be any of the "named" distributions. Any nonnegative function that integrates to 1 can be used as a probability density. Maybe you should look for nonparametric methods.
$endgroup$
– kjetil b halvorsen
yesterday










1 Answer
1






active

oldest

votes


















5












$begingroup$


1) We collect a large amount of raw data, which comes from an
underlying probability distribution. We then "graph" the data (perhaps
in the form of a bar chart or something similar at least in the 2D and
3D cases).




If the data has more then two dimensions (usually the case), then you cannot graph it. You can graph only the marginal distribution, but not the joint distribution.




2) Observing this visual presentation we go through our list of
existing probability distributions and form an opinion on which
distribution appears to fit the data most precisely.




No. First of all, as stated above, graphs don't tell you the whole story. Second, many distributions can look very similar. Third, there is no such a thing as "list of existing distributions". You can go through the list of popular distributions, but the list of all possible distributions is infinite (you can come up with your own distribution, you can define mixtures of any number of any distributions -- this alone makes the list infinite).



Usually based on what you know about the data (given plots, summary statistics, knowledge on what the data represents, how it was collected) you choose some distribution or few distributions that make sense for this data. For example, if it is a count of independent binary things in fixed number of trials, then most likely you will be using binomial distribution. To understand better when what distributions make sense, you can check the Statistics 110 lectures by Joe Blitzstein.



Moreover, even if you would try several different distributions, then you wouldn't do it based on how the data looks, but rather based on model fit statistics (see questions tagged as model-selection).




3) We then take a large sample from this data and attempt to estimate
the parameters of our chosen probability distribution by using the
array of techinques available at our disposal.




Generally yes, if possible.




ii) (more important) What stops us from making any function a
probability distribution? What I mean is we have this visual
representation of the data, perhaps none of the known probability
distributions that we have presently align with the data. What stops
us from just saying "this continuous function will now be a
distribution as long as it satisfies the necessary axioms." Is there
something more rigourous to this? (perhpas I just haven't arrived
there yet in my studies).




If the function follows the mathematical definition of probability density function, or probability mass function, then it is the function. Usually it is not about finding the distribution that looks exactly like your data. If you wanted it to look like your data, then you would use empirical distribution function or things like kernel density, that would look exactly like your data. In most cases we choose simpler distributions, that look approximately like the empirical distribution. We use the distributions to build simplified models of reality, that can be extended beyond the data you collected. Saying it differently, you don't want the distribution to overfitt to your data.



Here you can find example: What is meant by using a probability distribution to model the output data for a regression problem?






share|cite|improve this answer









$endgroup$









  • 1




    $begingroup$
    Thank you very much for this explanation. It really does provide me with the clarity that I was looking for.
    $endgroup$
    – dc3rd
    yesterday











Your Answer





StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "65"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f390608%2fbased-on-the-ideas-of-parameter-estimation-and-fitting-probability-distributions%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









5












$begingroup$


1) We collect a large amount of raw data, which comes from an
underlying probability distribution. We then "graph" the data (perhaps
in the form of a bar chart or something similar at least in the 2D and
3D cases).




If the data has more then two dimensions (usually the case), then you cannot graph it. You can graph only the marginal distribution, but not the joint distribution.




2) Observing this visual presentation we go through our list of
existing probability distributions and form an opinion on which
distribution appears to fit the data most precisely.




No. First of all, as stated above, graphs don't tell you the whole story. Second, many distributions can look very similar. Third, there is no such a thing as "list of existing distributions". You can go through the list of popular distributions, but the list of all possible distributions is infinite (you can come up with your own distribution, you can define mixtures of any number of any distributions -- this alone makes the list infinite).



Usually based on what you know about the data (given plots, summary statistics, knowledge on what the data represents, how it was collected) you choose some distribution or few distributions that make sense for this data. For example, if it is a count of independent binary things in fixed number of trials, then most likely you will be using binomial distribution. To understand better when what distributions make sense, you can check the Statistics 110 lectures by Joe Blitzstein.



Moreover, even if you would try several different distributions, then you wouldn't do it based on how the data looks, but rather based on model fit statistics (see questions tagged as model-selection).




3) We then take a large sample from this data and attempt to estimate
the parameters of our chosen probability distribution by using the
array of techinques available at our disposal.




Generally yes, if possible.




ii) (more important) What stops us from making any function a
probability distribution? What I mean is we have this visual
representation of the data, perhaps none of the known probability
distributions that we have presently align with the data. What stops
us from just saying "this continuous function will now be a
distribution as long as it satisfies the necessary axioms." Is there
something more rigourous to this? (perhpas I just haven't arrived
there yet in my studies).




If the function follows the mathematical definition of probability density function, or probability mass function, then it is the function. Usually it is not about finding the distribution that looks exactly like your data. If you wanted it to look like your data, then you would use empirical distribution function or things like kernel density, that would look exactly like your data. In most cases we choose simpler distributions, that look approximately like the empirical distribution. We use the distributions to build simplified models of reality, that can be extended beyond the data you collected. Saying it differently, you don't want the distribution to overfitt to your data.



Here you can find example: What is meant by using a probability distribution to model the output data for a regression problem?






share|cite|improve this answer









$endgroup$









  • 1




    $begingroup$
    Thank you very much for this explanation. It really does provide me with the clarity that I was looking for.
    $endgroup$
    – dc3rd
    yesterday
















5












$begingroup$


1) We collect a large amount of raw data, which comes from an
underlying probability distribution. We then "graph" the data (perhaps
in the form of a bar chart or something similar at least in the 2D and
3D cases).




If the data has more then two dimensions (usually the case), then you cannot graph it. You can graph only the marginal distribution, but not the joint distribution.




2) Observing this visual presentation we go through our list of
existing probability distributions and form an opinion on which
distribution appears to fit the data most precisely.




No. First of all, as stated above, graphs don't tell you the whole story. Second, many distributions can look very similar. Third, there is no such a thing as "list of existing distributions". You can go through the list of popular distributions, but the list of all possible distributions is infinite (you can come up with your own distribution, you can define mixtures of any number of any distributions -- this alone makes the list infinite).



Usually based on what you know about the data (given plots, summary statistics, knowledge on what the data represents, how it was collected) you choose some distribution or few distributions that make sense for this data. For example, if it is a count of independent binary things in fixed number of trials, then most likely you will be using binomial distribution. To understand better when what distributions make sense, you can check the Statistics 110 lectures by Joe Blitzstein.



Moreover, even if you would try several different distributions, then you wouldn't do it based on how the data looks, but rather based on model fit statistics (see questions tagged as model-selection).




3) We then take a large sample from this data and attempt to estimate
the parameters of our chosen probability distribution by using the
array of techinques available at our disposal.




Generally yes, if possible.




ii) (more important) What stops us from making any function a
probability distribution? What I mean is we have this visual
representation of the data, perhaps none of the known probability
distributions that we have presently align with the data. What stops
us from just saying "this continuous function will now be a
distribution as long as it satisfies the necessary axioms." Is there
something more rigourous to this? (perhpas I just haven't arrived
there yet in my studies).




If the function follows the mathematical definition of probability density function, or probability mass function, then it is the function. Usually it is not about finding the distribution that looks exactly like your data. If you wanted it to look like your data, then you would use empirical distribution function or things like kernel density, that would look exactly like your data. In most cases we choose simpler distributions, that look approximately like the empirical distribution. We use the distributions to build simplified models of reality, that can be extended beyond the data you collected. Saying it differently, you don't want the distribution to overfitt to your data.



Here you can find example: What is meant by using a probability distribution to model the output data for a regression problem?






share|cite|improve this answer









$endgroup$









  • 1




    $begingroup$
    Thank you very much for this explanation. It really does provide me with the clarity that I was looking for.
    $endgroup$
    – dc3rd
    yesterday














5












5








5





$begingroup$


1) We collect a large amount of raw data, which comes from an
underlying probability distribution. We then "graph" the data (perhaps
in the form of a bar chart or something similar at least in the 2D and
3D cases).




If the data has more then two dimensions (usually the case), then you cannot graph it. You can graph only the marginal distribution, but not the joint distribution.




2) Observing this visual presentation we go through our list of
existing probability distributions and form an opinion on which
distribution appears to fit the data most precisely.




No. First of all, as stated above, graphs don't tell you the whole story. Second, many distributions can look very similar. Third, there is no such a thing as "list of existing distributions". You can go through the list of popular distributions, but the list of all possible distributions is infinite (you can come up with your own distribution, you can define mixtures of any number of any distributions -- this alone makes the list infinite).



Usually based on what you know about the data (given plots, summary statistics, knowledge on what the data represents, how it was collected) you choose some distribution or few distributions that make sense for this data. For example, if it is a count of independent binary things in fixed number of trials, then most likely you will be using binomial distribution. To understand better when what distributions make sense, you can check the Statistics 110 lectures by Joe Blitzstein.



Moreover, even if you would try several different distributions, then you wouldn't do it based on how the data looks, but rather based on model fit statistics (see questions tagged as model-selection).




3) We then take a large sample from this data and attempt to estimate
the parameters of our chosen probability distribution by using the
array of techinques available at our disposal.




Generally yes, if possible.




ii) (more important) What stops us from making any function a
probability distribution? What I mean is we have this visual
representation of the data, perhaps none of the known probability
distributions that we have presently align with the data. What stops
us from just saying "this continuous function will now be a
distribution as long as it satisfies the necessary axioms." Is there
something more rigourous to this? (perhpas I just haven't arrived
there yet in my studies).




If the function follows the mathematical definition of probability density function, or probability mass function, then it is the function. Usually it is not about finding the distribution that looks exactly like your data. If you wanted it to look like your data, then you would use empirical distribution function or things like kernel density, that would look exactly like your data. In most cases we choose simpler distributions, that look approximately like the empirical distribution. We use the distributions to build simplified models of reality, that can be extended beyond the data you collected. Saying it differently, you don't want the distribution to overfitt to your data.



Here you can find example: What is meant by using a probability distribution to model the output data for a regression problem?






share|cite|improve this answer









$endgroup$




1) We collect a large amount of raw data, which comes from an
underlying probability distribution. We then "graph" the data (perhaps
in the form of a bar chart or something similar at least in the 2D and
3D cases).




If the data has more then two dimensions (usually the case), then you cannot graph it. You can graph only the marginal distribution, but not the joint distribution.




2) Observing this visual presentation we go through our list of
existing probability distributions and form an opinion on which
distribution appears to fit the data most precisely.




No. First of all, as stated above, graphs don't tell you the whole story. Second, many distributions can look very similar. Third, there is no such a thing as "list of existing distributions". You can go through the list of popular distributions, but the list of all possible distributions is infinite (you can come up with your own distribution, you can define mixtures of any number of any distributions -- this alone makes the list infinite).



Usually based on what you know about the data (given plots, summary statistics, knowledge on what the data represents, how it was collected) you choose some distribution or few distributions that make sense for this data. For example, if it is a count of independent binary things in fixed number of trials, then most likely you will be using binomial distribution. To understand better when what distributions make sense, you can check the Statistics 110 lectures by Joe Blitzstein.



Moreover, even if you would try several different distributions, then you wouldn't do it based on how the data looks, but rather based on model fit statistics (see questions tagged as model-selection).




3) We then take a large sample from this data and attempt to estimate
the parameters of our chosen probability distribution by using the
array of techinques available at our disposal.




Generally yes, if possible.




ii) (more important) What stops us from making any function a
probability distribution? What I mean is we have this visual
representation of the data, perhaps none of the known probability
distributions that we have presently align with the data. What stops
us from just saying "this continuous function will now be a
distribution as long as it satisfies the necessary axioms." Is there
something more rigourous to this? (perhpas I just haven't arrived
there yet in my studies).




If the function follows the mathematical definition of probability density function, or probability mass function, then it is the function. Usually it is not about finding the distribution that looks exactly like your data. If you wanted it to look like your data, then you would use empirical distribution function or things like kernel density, that would look exactly like your data. In most cases we choose simpler distributions, that look approximately like the empirical distribution. We use the distributions to build simplified models of reality, that can be extended beyond the data you collected. Saying it differently, you don't want the distribution to overfitt to your data.



Here you can find example: What is meant by using a probability distribution to model the output data for a regression problem?







share|cite|improve this answer












share|cite|improve this answer



share|cite|improve this answer










answered yesterday









TimTim

57k9126217




57k9126217








  • 1




    $begingroup$
    Thank you very much for this explanation. It really does provide me with the clarity that I was looking for.
    $endgroup$
    – dc3rd
    yesterday














  • 1




    $begingroup$
    Thank you very much for this explanation. It really does provide me with the clarity that I was looking for.
    $endgroup$
    – dc3rd
    yesterday








1




1




$begingroup$
Thank you very much for this explanation. It really does provide me with the clarity that I was looking for.
$endgroup$
– dc3rd
yesterday




$begingroup$
Thank you very much for this explanation. It really does provide me with the clarity that I was looking for.
$endgroup$
– dc3rd
yesterday


















draft saved

draft discarded




















































Thanks for contributing an answer to Cross Validated!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f390608%2fbased-on-the-ideas-of-parameter-estimation-and-fitting-probability-distributions%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

If I really need a card on my start hand, how many mulligans make sense? [duplicate]

Alcedinidae

Can an atomic nucleus contain both particles and antiparticles? [duplicate]