Sparse 3D binning/histogram function
up vote
8
down vote
favorite
This is a function written in Python I wrote to process galaxy catalogs for cosmology research, however it is more generally applicable than just my pipeline.
The goal is to take a Python list of coordinates in 3D, optionally with associated weights, define a cubic grid over the coordinates which divides the points into a variable number of "bins" or cells, and then computes the total number of points in each bin (or the sum of the weights). These sums are then associated to either the coordinates or indices of the bin.
Obviously there are existing functions in Python libraries which do this, namely numpy.histogram, scipy.stats.histogram, and less obviously numpy.unique. The reason I didn't use those is that I had to process huge catalogs of ${sim}10^6$ galaxies or more, and I wanted to make fairly small bins. The histogram functions store empty bins in memory, so I would often run out of memory trying to store huge numpy arrays of mostly zeros. numpy.unique avoids this, but it cannot handle summing weights instead of just counts.
So I created this function, abusing using the defaultdict subclass of the native Python dictionary to gain the summing feature. I found it to be both fast enough and that it solved my memory problems, but I am interested in improving it.
from collections import defaultdict
"""Accepts a python list of 3D spatial points, e.g. [[x1,y1,z1],...],
optionally with weights e.g. [[x1,x2,x3,w1],...], and returns the sparse
histogram (i.e. no empty bins) with bins of resolution (spacing) given by
res.
The weights option allows you to chose to histogram over counts
instead of weights (equivalent to all weights being 1).
The bin_index option lets you return the points with their bin indices
(the integers representing how many bins in each direction to walk to
find the specified bin) rather than centerpoint coordinates."""
def sparse_hist(points, res, weights=True, bin_index=False):
def _binindex(point):
point = point[:3]
bi = [int(x//res) for x in point]
bi = tuple(bi)
return bi
def _bincenter(point):
point = point[:3]
bc = [(x//res+0.5)*res for x in point]
bc = tuple(bc)
return bc
if not bin_index:
if weights:
pointlist = [(_bincenter(x), x[3]) for x in points]
else:
pointlist = [(_bincenter(x), 1) for x in points]
else:
if weights:
pointlist = [(_binindex(x), x[3]) for x in points]
else:
pointlist = [(_binindex(x), 1) for x in points]
pointdict = defaultdict(list)
for k,v in pointlist:
pointdict[k].append(v)
for key,val in pointdict.items():
val = sum(val)
pointdict.update({key:val})
return pointdict
python python-3.x statistics memory-optimization
New contributor
Davis is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |
up vote
8
down vote
favorite
This is a function written in Python I wrote to process galaxy catalogs for cosmology research, however it is more generally applicable than just my pipeline.
The goal is to take a Python list of coordinates in 3D, optionally with associated weights, define a cubic grid over the coordinates which divides the points into a variable number of "bins" or cells, and then computes the total number of points in each bin (or the sum of the weights). These sums are then associated to either the coordinates or indices of the bin.
Obviously there are existing functions in Python libraries which do this, namely numpy.histogram, scipy.stats.histogram, and less obviously numpy.unique. The reason I didn't use those is that I had to process huge catalogs of ${sim}10^6$ galaxies or more, and I wanted to make fairly small bins. The histogram functions store empty bins in memory, so I would often run out of memory trying to store huge numpy arrays of mostly zeros. numpy.unique avoids this, but it cannot handle summing weights instead of just counts.
So I created this function, abusing using the defaultdict subclass of the native Python dictionary to gain the summing feature. I found it to be both fast enough and that it solved my memory problems, but I am interested in improving it.
from collections import defaultdict
"""Accepts a python list of 3D spatial points, e.g. [[x1,y1,z1],...],
optionally with weights e.g. [[x1,x2,x3,w1],...], and returns the sparse
histogram (i.e. no empty bins) with bins of resolution (spacing) given by
res.
The weights option allows you to chose to histogram over counts
instead of weights (equivalent to all weights being 1).
The bin_index option lets you return the points with their bin indices
(the integers representing how many bins in each direction to walk to
find the specified bin) rather than centerpoint coordinates."""
def sparse_hist(points, res, weights=True, bin_index=False):
def _binindex(point):
point = point[:3]
bi = [int(x//res) for x in point]
bi = tuple(bi)
return bi
def _bincenter(point):
point = point[:3]
bc = [(x//res+0.5)*res for x in point]
bc = tuple(bc)
return bc
if not bin_index:
if weights:
pointlist = [(_bincenter(x), x[3]) for x in points]
else:
pointlist = [(_bincenter(x), 1) for x in points]
else:
if weights:
pointlist = [(_binindex(x), x[3]) for x in points]
else:
pointlist = [(_binindex(x), 1) for x in points]
pointdict = defaultdict(list)
for k,v in pointlist:
pointdict[k].append(v)
for key,val in pointdict.items():
val = sum(val)
pointdict.update({key:val})
return pointdict
python python-3.x statistics memory-optimization
New contributor
Davis is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Have you considered using some kind of kernel density estimation? This generally gives more useful results than a histogram for sparse, only locally clustered data.
– leftaroundabout
Dec 3 at 16:50
@leftaroundabout I haven't been using this code for research for some time, but the original purpose was so that once one had some binned approximation of the catalog, one could map the weights through a function tailored to upweighting particular features in the statistics (e.g. two point correlation function), perhaps enhancing the Fisher information contained in the statistics. I am not sure whether a KDE could be used for such a purpose, but it's not impossible.
– Davis
2 days ago
add a comment |
up vote
8
down vote
favorite
up vote
8
down vote
favorite
This is a function written in Python I wrote to process galaxy catalogs for cosmology research, however it is more generally applicable than just my pipeline.
The goal is to take a Python list of coordinates in 3D, optionally with associated weights, define a cubic grid over the coordinates which divides the points into a variable number of "bins" or cells, and then computes the total number of points in each bin (or the sum of the weights). These sums are then associated to either the coordinates or indices of the bin.
Obviously there are existing functions in Python libraries which do this, namely numpy.histogram, scipy.stats.histogram, and less obviously numpy.unique. The reason I didn't use those is that I had to process huge catalogs of ${sim}10^6$ galaxies or more, and I wanted to make fairly small bins. The histogram functions store empty bins in memory, so I would often run out of memory trying to store huge numpy arrays of mostly zeros. numpy.unique avoids this, but it cannot handle summing weights instead of just counts.
So I created this function, abusing using the defaultdict subclass of the native Python dictionary to gain the summing feature. I found it to be both fast enough and that it solved my memory problems, but I am interested in improving it.
from collections import defaultdict
"""Accepts a python list of 3D spatial points, e.g. [[x1,y1,z1],...],
optionally with weights e.g. [[x1,x2,x3,w1],...], and returns the sparse
histogram (i.e. no empty bins) with bins of resolution (spacing) given by
res.
The weights option allows you to chose to histogram over counts
instead of weights (equivalent to all weights being 1).
The bin_index option lets you return the points with their bin indices
(the integers representing how many bins in each direction to walk to
find the specified bin) rather than centerpoint coordinates."""
def sparse_hist(points, res, weights=True, bin_index=False):
def _binindex(point):
point = point[:3]
bi = [int(x//res) for x in point]
bi = tuple(bi)
return bi
def _bincenter(point):
point = point[:3]
bc = [(x//res+0.5)*res for x in point]
bc = tuple(bc)
return bc
if not bin_index:
if weights:
pointlist = [(_bincenter(x), x[3]) for x in points]
else:
pointlist = [(_bincenter(x), 1) for x in points]
else:
if weights:
pointlist = [(_binindex(x), x[3]) for x in points]
else:
pointlist = [(_binindex(x), 1) for x in points]
pointdict = defaultdict(list)
for k,v in pointlist:
pointdict[k].append(v)
for key,val in pointdict.items():
val = sum(val)
pointdict.update({key:val})
return pointdict
python python-3.x statistics memory-optimization
New contributor
Davis is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
This is a function written in Python I wrote to process galaxy catalogs for cosmology research, however it is more generally applicable than just my pipeline.
The goal is to take a Python list of coordinates in 3D, optionally with associated weights, define a cubic grid over the coordinates which divides the points into a variable number of "bins" or cells, and then computes the total number of points in each bin (or the sum of the weights). These sums are then associated to either the coordinates or indices of the bin.
Obviously there are existing functions in Python libraries which do this, namely numpy.histogram, scipy.stats.histogram, and less obviously numpy.unique. The reason I didn't use those is that I had to process huge catalogs of ${sim}10^6$ galaxies or more, and I wanted to make fairly small bins. The histogram functions store empty bins in memory, so I would often run out of memory trying to store huge numpy arrays of mostly zeros. numpy.unique avoids this, but it cannot handle summing weights instead of just counts.
So I created this function, abusing using the defaultdict subclass of the native Python dictionary to gain the summing feature. I found it to be both fast enough and that it solved my memory problems, but I am interested in improving it.
from collections import defaultdict
"""Accepts a python list of 3D spatial points, e.g. [[x1,y1,z1],...],
optionally with weights e.g. [[x1,x2,x3,w1],...], and returns the sparse
histogram (i.e. no empty bins) with bins of resolution (spacing) given by
res.
The weights option allows you to chose to histogram over counts
instead of weights (equivalent to all weights being 1).
The bin_index option lets you return the points with their bin indices
(the integers representing how many bins in each direction to walk to
find the specified bin) rather than centerpoint coordinates."""
def sparse_hist(points, res, weights=True, bin_index=False):
def _binindex(point):
point = point[:3]
bi = [int(x//res) for x in point]
bi = tuple(bi)
return bi
def _bincenter(point):
point = point[:3]
bc = [(x//res+0.5)*res for x in point]
bc = tuple(bc)
return bc
if not bin_index:
if weights:
pointlist = [(_bincenter(x), x[3]) for x in points]
else:
pointlist = [(_bincenter(x), 1) for x in points]
else:
if weights:
pointlist = [(_binindex(x), x[3]) for x in points]
else:
pointlist = [(_binindex(x), 1) for x in points]
pointdict = defaultdict(list)
for k,v in pointlist:
pointdict[k].append(v)
for key,val in pointdict.items():
val = sum(val)
pointdict.update({key:val})
return pointdict
python python-3.x statistics memory-optimization
python python-3.x statistics memory-optimization
New contributor
Davis is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Davis is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
edited Dec 3 at 10:25
New contributor
Davis is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
asked Dec 3 at 0:30
Davis
434
434
New contributor
Davis is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Davis is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Davis is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Have you considered using some kind of kernel density estimation? This generally gives more useful results than a histogram for sparse, only locally clustered data.
– leftaroundabout
Dec 3 at 16:50
@leftaroundabout I haven't been using this code for research for some time, but the original purpose was so that once one had some binned approximation of the catalog, one could map the weights through a function tailored to upweighting particular features in the statistics (e.g. two point correlation function), perhaps enhancing the Fisher information contained in the statistics. I am not sure whether a KDE could be used for such a purpose, but it's not impossible.
– Davis
2 days ago
add a comment |
Have you considered using some kind of kernel density estimation? This generally gives more useful results than a histogram for sparse, only locally clustered data.
– leftaroundabout
Dec 3 at 16:50
@leftaroundabout I haven't been using this code for research for some time, but the original purpose was so that once one had some binned approximation of the catalog, one could map the weights through a function tailored to upweighting particular features in the statistics (e.g. two point correlation function), perhaps enhancing the Fisher information contained in the statistics. I am not sure whether a KDE could be used for such a purpose, but it's not impossible.
– Davis
2 days ago
Have you considered using some kind of kernel density estimation? This generally gives more useful results than a histogram for sparse, only locally clustered data.
– leftaroundabout
Dec 3 at 16:50
Have you considered using some kind of kernel density estimation? This generally gives more useful results than a histogram for sparse, only locally clustered data.
– leftaroundabout
Dec 3 at 16:50
@leftaroundabout I haven't been using this code for research for some time, but the original purpose was so that once one had some binned approximation of the catalog, one could map the weights through a function tailored to upweighting particular features in the statistics (e.g. two point correlation function), perhaps enhancing the Fisher information contained in the statistics. I am not sure whether a KDE could be used for such a purpose, but it's not impossible.
– Davis
2 days ago
@leftaroundabout I haven't been using this code for research for some time, but the original purpose was so that once one had some binned approximation of the catalog, one could map the weights through a function tailored to upweighting particular features in the statistics (e.g. two point correlation function), perhaps enhancing the Fisher information contained in the statistics. I am not sure whether a KDE could be used for such a purpose, but it's not impossible.
– Davis
2 days ago
add a comment |
2 Answers
2
active
oldest
votes
up vote
5
down vote
accepted
Since you don't care about the distinct values, only the sum in each bin, you can avoid one of these for loops:
pointdict = defaultdict(list)
for k,v in pointlist:
pointdict[k].append(v)
for key,val in pointdict.items():
val = sum(val)
pointdict.update({key:val})
And instead sum directly:
histogram = defaultdict(int)
for i, weight in pointlist:
histogram[i] += weight
I also renamed the variables so it is a bit clearer what they represent.
Instead of making pointlist a list, rename it to points (or something else if you want to avoid reusing the name) and make it a generator. Currently your memory requirements are $mathcal{O}(n+m)$, where $n$ is the number of points you have (since you store them in an intermediate list) and $m$ is the number of bins (in your output dictionary). If you used a generator, this would drop to $mathcal{O}(m)$.
points = ((_bincenter(x), x[3]) for x in points)
I'm accepting this answer due to it not only improving my readability and structure but also my algorithm, however both answers have contributed significantly.
– Davis
yesterday
add a comment |
up vote
7
down vote
Some suggestions:
- Passing the code through at least one linter such as flake8 or pycodestyle will make for more idiomatic and therefore readable code.
- Do you have any automated tests? If so, do they test all four combinations of
weightsandbin_index? - Do you really need
weightsandbin_indexto be defaulted? I'm generally suspicious of code which defaults parameter values without a very clear reason.
Functions are first class citizens in Python, so rather than your nested
ifstatements you can decide which method to call and what the second parameter should be with two unnestedifstatements:
if bin_index:
binning_function = _bincenter
else:
…
if weights:
…
- Naming is very important. I can't easily follow your code because things are named for what they are and not for what they contain or what they are used for. For example,
pointdicttells me nothing about its contents andrescould be short for “result”, “response” or something else entirely. - Reusing variables is generally discouraged - they make the information flow confusing, and they should be optimized away by the interpreter.
- I don't remember the precedence rules other than for simple arithmetic by heart, so I'd enforce it using parentheses in expressions like
x//res+0.5.
Rather than using a defaultdict(list) you can just pointdict.get(key, ) if you know the dict is sparse. This is purposely not a suggestion because whether it's appropriate or not depends on the programming environment, but it is an option.
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
5
down vote
accepted
Since you don't care about the distinct values, only the sum in each bin, you can avoid one of these for loops:
pointdict = defaultdict(list)
for k,v in pointlist:
pointdict[k].append(v)
for key,val in pointdict.items():
val = sum(val)
pointdict.update({key:val})
And instead sum directly:
histogram = defaultdict(int)
for i, weight in pointlist:
histogram[i] += weight
I also renamed the variables so it is a bit clearer what they represent.
Instead of making pointlist a list, rename it to points (or something else if you want to avoid reusing the name) and make it a generator. Currently your memory requirements are $mathcal{O}(n+m)$, where $n$ is the number of points you have (since you store them in an intermediate list) and $m$ is the number of bins (in your output dictionary). If you used a generator, this would drop to $mathcal{O}(m)$.
points = ((_bincenter(x), x[3]) for x in points)
I'm accepting this answer due to it not only improving my readability and structure but also my algorithm, however both answers have contributed significantly.
– Davis
yesterday
add a comment |
up vote
5
down vote
accepted
Since you don't care about the distinct values, only the sum in each bin, you can avoid one of these for loops:
pointdict = defaultdict(list)
for k,v in pointlist:
pointdict[k].append(v)
for key,val in pointdict.items():
val = sum(val)
pointdict.update({key:val})
And instead sum directly:
histogram = defaultdict(int)
for i, weight in pointlist:
histogram[i] += weight
I also renamed the variables so it is a bit clearer what they represent.
Instead of making pointlist a list, rename it to points (or something else if you want to avoid reusing the name) and make it a generator. Currently your memory requirements are $mathcal{O}(n+m)$, where $n$ is the number of points you have (since you store them in an intermediate list) and $m$ is the number of bins (in your output dictionary). If you used a generator, this would drop to $mathcal{O}(m)$.
points = ((_bincenter(x), x[3]) for x in points)
I'm accepting this answer due to it not only improving my readability and structure but also my algorithm, however both answers have contributed significantly.
– Davis
yesterday
add a comment |
up vote
5
down vote
accepted
up vote
5
down vote
accepted
Since you don't care about the distinct values, only the sum in each bin, you can avoid one of these for loops:
pointdict = defaultdict(list)
for k,v in pointlist:
pointdict[k].append(v)
for key,val in pointdict.items():
val = sum(val)
pointdict.update({key:val})
And instead sum directly:
histogram = defaultdict(int)
for i, weight in pointlist:
histogram[i] += weight
I also renamed the variables so it is a bit clearer what they represent.
Instead of making pointlist a list, rename it to points (or something else if you want to avoid reusing the name) and make it a generator. Currently your memory requirements are $mathcal{O}(n+m)$, where $n$ is the number of points you have (since you store them in an intermediate list) and $m$ is the number of bins (in your output dictionary). If you used a generator, this would drop to $mathcal{O}(m)$.
points = ((_bincenter(x), x[3]) for x in points)
Since you don't care about the distinct values, only the sum in each bin, you can avoid one of these for loops:
pointdict = defaultdict(list)
for k,v in pointlist:
pointdict[k].append(v)
for key,val in pointdict.items():
val = sum(val)
pointdict.update({key:val})
And instead sum directly:
histogram = defaultdict(int)
for i, weight in pointlist:
histogram[i] += weight
I also renamed the variables so it is a bit clearer what they represent.
Instead of making pointlist a list, rename it to points (or something else if you want to avoid reusing the name) and make it a generator. Currently your memory requirements are $mathcal{O}(n+m)$, where $n$ is the number of points you have (since you store them in an intermediate list) and $m$ is the number of bins (in your output dictionary). If you used a generator, this would drop to $mathcal{O}(m)$.
points = ((_bincenter(x), x[3]) for x in points)
edited Dec 3 at 13:54
answered Dec 3 at 9:41
Graipher
22.8k53384
22.8k53384
I'm accepting this answer due to it not only improving my readability and structure but also my algorithm, however both answers have contributed significantly.
– Davis
yesterday
add a comment |
I'm accepting this answer due to it not only improving my readability and structure but also my algorithm, however both answers have contributed significantly.
– Davis
yesterday
I'm accepting this answer due to it not only improving my readability and structure but also my algorithm, however both answers have contributed significantly.
– Davis
yesterday
I'm accepting this answer due to it not only improving my readability and structure but also my algorithm, however both answers have contributed significantly.
– Davis
yesterday
add a comment |
up vote
7
down vote
Some suggestions:
- Passing the code through at least one linter such as flake8 or pycodestyle will make for more idiomatic and therefore readable code.
- Do you have any automated tests? If so, do they test all four combinations of
weightsandbin_index? - Do you really need
weightsandbin_indexto be defaulted? I'm generally suspicious of code which defaults parameter values without a very clear reason.
Functions are first class citizens in Python, so rather than your nested
ifstatements you can decide which method to call and what the second parameter should be with two unnestedifstatements:
if bin_index:
binning_function = _bincenter
else:
…
if weights:
…
- Naming is very important. I can't easily follow your code because things are named for what they are and not for what they contain or what they are used for. For example,
pointdicttells me nothing about its contents andrescould be short for “result”, “response” or something else entirely. - Reusing variables is generally discouraged - they make the information flow confusing, and they should be optimized away by the interpreter.
- I don't remember the precedence rules other than for simple arithmetic by heart, so I'd enforce it using parentheses in expressions like
x//res+0.5.
Rather than using a defaultdict(list) you can just pointdict.get(key, ) if you know the dict is sparse. This is purposely not a suggestion because whether it's appropriate or not depends on the programming environment, but it is an option.
add a comment |
up vote
7
down vote
Some suggestions:
- Passing the code through at least one linter such as flake8 or pycodestyle will make for more idiomatic and therefore readable code.
- Do you have any automated tests? If so, do they test all four combinations of
weightsandbin_index? - Do you really need
weightsandbin_indexto be defaulted? I'm generally suspicious of code which defaults parameter values without a very clear reason.
Functions are first class citizens in Python, so rather than your nested
ifstatements you can decide which method to call and what the second parameter should be with two unnestedifstatements:
if bin_index:
binning_function = _bincenter
else:
…
if weights:
…
- Naming is very important. I can't easily follow your code because things are named for what they are and not for what they contain or what they are used for. For example,
pointdicttells me nothing about its contents andrescould be short for “result”, “response” or something else entirely. - Reusing variables is generally discouraged - they make the information flow confusing, and they should be optimized away by the interpreter.
- I don't remember the precedence rules other than for simple arithmetic by heart, so I'd enforce it using parentheses in expressions like
x//res+0.5.
Rather than using a defaultdict(list) you can just pointdict.get(key, ) if you know the dict is sparse. This is purposely not a suggestion because whether it's appropriate or not depends on the programming environment, but it is an option.
add a comment |
up vote
7
down vote
up vote
7
down vote
Some suggestions:
- Passing the code through at least one linter such as flake8 or pycodestyle will make for more idiomatic and therefore readable code.
- Do you have any automated tests? If so, do they test all four combinations of
weightsandbin_index? - Do you really need
weightsandbin_indexto be defaulted? I'm generally suspicious of code which defaults parameter values without a very clear reason.
Functions are first class citizens in Python, so rather than your nested
ifstatements you can decide which method to call and what the second parameter should be with two unnestedifstatements:
if bin_index:
binning_function = _bincenter
else:
…
if weights:
…
- Naming is very important. I can't easily follow your code because things are named for what they are and not for what they contain or what they are used for. For example,
pointdicttells me nothing about its contents andrescould be short for “result”, “response” or something else entirely. - Reusing variables is generally discouraged - they make the information flow confusing, and they should be optimized away by the interpreter.
- I don't remember the precedence rules other than for simple arithmetic by heart, so I'd enforce it using parentheses in expressions like
x//res+0.5.
Rather than using a defaultdict(list) you can just pointdict.get(key, ) if you know the dict is sparse. This is purposely not a suggestion because whether it's appropriate or not depends on the programming environment, but it is an option.
Some suggestions:
- Passing the code through at least one linter such as flake8 or pycodestyle will make for more idiomatic and therefore readable code.
- Do you have any automated tests? If so, do they test all four combinations of
weightsandbin_index? - Do you really need
weightsandbin_indexto be defaulted? I'm generally suspicious of code which defaults parameter values without a very clear reason.
Functions are first class citizens in Python, so rather than your nested
ifstatements you can decide which method to call and what the second parameter should be with two unnestedifstatements:
if bin_index:
binning_function = _bincenter
else:
…
if weights:
…
- Naming is very important. I can't easily follow your code because things are named for what they are and not for what they contain or what they are used for. For example,
pointdicttells me nothing about its contents andrescould be short for “result”, “response” or something else entirely. - Reusing variables is generally discouraged - they make the information flow confusing, and they should be optimized away by the interpreter.
- I don't remember the precedence rules other than for simple arithmetic by heart, so I'd enforce it using parentheses in expressions like
x//res+0.5.
Rather than using a defaultdict(list) you can just pointdict.get(key, ) if you know the dict is sparse. This is purposely not a suggestion because whether it's appropriate or not depends on the programming environment, but it is an option.
answered Dec 3 at 2:08
l0b0
4,112923
4,112923
add a comment |
add a comment |
Davis is a new contributor. Be nice, and check out our Code of Conduct.
Davis is a new contributor. Be nice, and check out our Code of Conduct.
Davis is a new contributor. Be nice, and check out our Code of Conduct.
Davis is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Code Review Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f208899%2fsparse-3d-binning-histogram-function%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Have you considered using some kind of kernel density estimation? This generally gives more useful results than a histogram for sparse, only locally clustered data.
– leftaroundabout
Dec 3 at 16:50
@leftaroundabout I haven't been using this code for research for some time, but the original purpose was so that once one had some binned approximation of the catalog, one could map the weights through a function tailored to upweighting particular features in the statistics (e.g. two point correlation function), perhaps enhancing the Fisher information contained in the statistics. I am not sure whether a KDE could be used for such a purpose, but it's not impossible.
– Davis
2 days ago