How can I calculate probability for all each numpy value at once?












1















I have a function for calculating probability like below:



def multinormpdf(x, mu, var): # calculate probability of multi Gaussian distribution
k = len(x)
det = np.linalg.det(var)
inv = np.linalg.inv(var)
denominator = math.sqrt(((2*math.pi)**k)*det)
numerator = np.dot((x - mean).transpose(), inv)
numerator = np.dot(numerator, (x - mean))
numerator = math.exp(-0.5 * numerator)
return numerator/denominator


and I have mean vector, covariance matrix and 2D numpy array for test



mu = np.array([100, 105, 42]) # mean vector
var = np.array([[100, 124, 11], # covariance matrix
[124, 150, 44],
[11, 44, 130]])

arr = np.array([[42, 234, 124], # arr is 43923794 x 3 matrix
[123, 222, 112],
[42, 213, 11],
...(so many values about 40,000,000 rows),
[23, 55, 251]])


I have to calculate for probability for each value, so I used this code



for i in arr:
print(multinormpdf(i, mu, var)) # I already know mean_vector and variance_matrix


But it is so slow...



Is there any faster way to calculate probability?
Or is there any way to calculate probability for test arr at once like 'batch'?










share|improve this question

























  • Where does normpdf come from?

    – Nils Werner
    Nov 21 '18 at 21:54











  • @Nils Werner I think that is not important. But I updated code for normpdf

    – YeongHwa Jin
    Nov 22 '18 at 2:36













  • Can you give a complete example with input and output?

    – Nils Werner
    Nov 22 '18 at 7:17











  • @Nils Werner I uploaded more explanation

    – YeongHwa Jin
    Nov 22 '18 at 12:04











  • Your code is not valid Python code, and does not work, even after fixing the syntax issues. Please post a proper MVCE!

    – Nils Werner
    Nov 22 '18 at 12:09


















1















I have a function for calculating probability like below:



def multinormpdf(x, mu, var): # calculate probability of multi Gaussian distribution
k = len(x)
det = np.linalg.det(var)
inv = np.linalg.inv(var)
denominator = math.sqrt(((2*math.pi)**k)*det)
numerator = np.dot((x - mean).transpose(), inv)
numerator = np.dot(numerator, (x - mean))
numerator = math.exp(-0.5 * numerator)
return numerator/denominator


and I have mean vector, covariance matrix and 2D numpy array for test



mu = np.array([100, 105, 42]) # mean vector
var = np.array([[100, 124, 11], # covariance matrix
[124, 150, 44],
[11, 44, 130]])

arr = np.array([[42, 234, 124], # arr is 43923794 x 3 matrix
[123, 222, 112],
[42, 213, 11],
...(so many values about 40,000,000 rows),
[23, 55, 251]])


I have to calculate for probability for each value, so I used this code



for i in arr:
print(multinormpdf(i, mu, var)) # I already know mean_vector and variance_matrix


But it is so slow...



Is there any faster way to calculate probability?
Or is there any way to calculate probability for test arr at once like 'batch'?










share|improve this question

























  • Where does normpdf come from?

    – Nils Werner
    Nov 21 '18 at 21:54











  • @Nils Werner I think that is not important. But I updated code for normpdf

    – YeongHwa Jin
    Nov 22 '18 at 2:36













  • Can you give a complete example with input and output?

    – Nils Werner
    Nov 22 '18 at 7:17











  • @Nils Werner I uploaded more explanation

    – YeongHwa Jin
    Nov 22 '18 at 12:04











  • Your code is not valid Python code, and does not work, even after fixing the syntax issues. Please post a proper MVCE!

    – Nils Werner
    Nov 22 '18 at 12:09
















1












1








1








I have a function for calculating probability like below:



def multinormpdf(x, mu, var): # calculate probability of multi Gaussian distribution
k = len(x)
det = np.linalg.det(var)
inv = np.linalg.inv(var)
denominator = math.sqrt(((2*math.pi)**k)*det)
numerator = np.dot((x - mean).transpose(), inv)
numerator = np.dot(numerator, (x - mean))
numerator = math.exp(-0.5 * numerator)
return numerator/denominator


and I have mean vector, covariance matrix and 2D numpy array for test



mu = np.array([100, 105, 42]) # mean vector
var = np.array([[100, 124, 11], # covariance matrix
[124, 150, 44],
[11, 44, 130]])

arr = np.array([[42, 234, 124], # arr is 43923794 x 3 matrix
[123, 222, 112],
[42, 213, 11],
...(so many values about 40,000,000 rows),
[23, 55, 251]])


I have to calculate for probability for each value, so I used this code



for i in arr:
print(multinormpdf(i, mu, var)) # I already know mean_vector and variance_matrix


But it is so slow...



Is there any faster way to calculate probability?
Or is there any way to calculate probability for test arr at once like 'batch'?










share|improve this question
















I have a function for calculating probability like below:



def multinormpdf(x, mu, var): # calculate probability of multi Gaussian distribution
k = len(x)
det = np.linalg.det(var)
inv = np.linalg.inv(var)
denominator = math.sqrt(((2*math.pi)**k)*det)
numerator = np.dot((x - mean).transpose(), inv)
numerator = np.dot(numerator, (x - mean))
numerator = math.exp(-0.5 * numerator)
return numerator/denominator


and I have mean vector, covariance matrix and 2D numpy array for test



mu = np.array([100, 105, 42]) # mean vector
var = np.array([[100, 124, 11], # covariance matrix
[124, 150, 44],
[11, 44, 130]])

arr = np.array([[42, 234, 124], # arr is 43923794 x 3 matrix
[123, 222, 112],
[42, 213, 11],
...(so many values about 40,000,000 rows),
[23, 55, 251]])


I have to calculate for probability for each value, so I used this code



for i in arr:
print(multinormpdf(i, mu, var)) # I already know mean_vector and variance_matrix


But it is so slow...



Is there any faster way to calculate probability?
Or is there any way to calculate probability for test arr at once like 'batch'?







python python-3.x probability






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 29 '18 at 14:05







YeongHwa Jin

















asked Nov 21 '18 at 16:28









YeongHwa JinYeongHwa Jin

386




386













  • Where does normpdf come from?

    – Nils Werner
    Nov 21 '18 at 21:54











  • @Nils Werner I think that is not important. But I updated code for normpdf

    – YeongHwa Jin
    Nov 22 '18 at 2:36













  • Can you give a complete example with input and output?

    – Nils Werner
    Nov 22 '18 at 7:17











  • @Nils Werner I uploaded more explanation

    – YeongHwa Jin
    Nov 22 '18 at 12:04











  • Your code is not valid Python code, and does not work, even after fixing the syntax issues. Please post a proper MVCE!

    – Nils Werner
    Nov 22 '18 at 12:09





















  • Where does normpdf come from?

    – Nils Werner
    Nov 21 '18 at 21:54











  • @Nils Werner I think that is not important. But I updated code for normpdf

    – YeongHwa Jin
    Nov 22 '18 at 2:36













  • Can you give a complete example with input and output?

    – Nils Werner
    Nov 22 '18 at 7:17











  • @Nils Werner I uploaded more explanation

    – YeongHwa Jin
    Nov 22 '18 at 12:04











  • Your code is not valid Python code, and does not work, even after fixing the syntax issues. Please post a proper MVCE!

    – Nils Werner
    Nov 22 '18 at 12:09



















Where does normpdf come from?

– Nils Werner
Nov 21 '18 at 21:54





Where does normpdf come from?

– Nils Werner
Nov 21 '18 at 21:54













@Nils Werner I think that is not important. But I updated code for normpdf

– YeongHwa Jin
Nov 22 '18 at 2:36







@Nils Werner I think that is not important. But I updated code for normpdf

– YeongHwa Jin
Nov 22 '18 at 2:36















Can you give a complete example with input and output?

– Nils Werner
Nov 22 '18 at 7:17





Can you give a complete example with input and output?

– Nils Werner
Nov 22 '18 at 7:17













@Nils Werner I uploaded more explanation

– YeongHwa Jin
Nov 22 '18 at 12:04





@Nils Werner I uploaded more explanation

– YeongHwa Jin
Nov 22 '18 at 12:04













Your code is not valid Python code, and does not work, even after fixing the syntax issues. Please post a proper MVCE!

– Nils Werner
Nov 22 '18 at 12:09







Your code is not valid Python code, and does not work, even after fixing the syntax issues. Please post a proper MVCE!

– Nils Werner
Nov 22 '18 at 12:09














2 Answers
2






active

oldest

votes


















2














You can vectorize your function easily:



import numpy as np

def fast_multinormpdf(x, mu, var):
mu = np.asarray(mu)
var = np.asarray(var)
k = x.shape[-1]
det = np.linalg.det(var)
inv = np.linalg.inv(var)
denominator = np.sqrt(((2*np.pi)**k)*det)
numerator = np.dot((x - mu), inv)
numerator = np.sum((x - mu) * numerator, axis=-1)
numerator = np.exp(-0.5 * numerator)
return numerator/denominator


arr = np.array([[42, 234, 124],
[123, 222, 112],
[42, 213, 11],
[42, 213, 11]])

mu = [0, 0, 1]
var = [[1, 100, 100],
[100, 1, 100],
[100, 100, 1]]

slow_out = np.array([multinormpdf(i, mu, var) for i in arr])
fast_out = fast_multinormpdf(arr, mu, var)

np.allclose(slow_out, fast_out) # True


With fast_multinormpdf being about 1000 times faster than your unvectorized function:



long_arr = np.tile(arr, (10000, 1))

%timeit np.array([multinormpdf(i, mu, var) for i in long_arr])
# 2.12 s ± 93.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit fast_multinormpdf(long_arr, mu, var)
# 2.56 ms ± 76.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)





share|improve this answer


























  • Thanks Nils! I appreciate your advice!

    – YeongHwa Jin
    Nov 22 '18 at 13:24



















1














You can try numba. Just decorate your function with @numba.vectorize.



@numba.vectorize
def multinormpdf(x, mu, var):
# ...
return caculated_probability

new_arr = multinormpdf(arr)


If your multinormpdf doesn't contains any unsupported functions, it can be accelerated. See here: https://numba.pydata.org/numba-doc/dev/reference/numpysupported.html



Moreover, you can use the experimental feature target='parallel' like this.



@numba.vectorize(target='parallel')





share|improve this answer
























  • My input for multinormpdf is already numpy array (like [42, 234, 124] or [123, 222, 112]), not scalar(So maybe function is like multinormpdf([51, 23 ,251], mu_vector, cov_matrix)). Can I use @numba.vectorize yet?

    – YeongHwa Jin
    Nov 22 '18 at 3:41













Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53416511%2fhow-can-i-calculate-probability-for-all-each-numpy-value-at-once%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









2














You can vectorize your function easily:



import numpy as np

def fast_multinormpdf(x, mu, var):
mu = np.asarray(mu)
var = np.asarray(var)
k = x.shape[-1]
det = np.linalg.det(var)
inv = np.linalg.inv(var)
denominator = np.sqrt(((2*np.pi)**k)*det)
numerator = np.dot((x - mu), inv)
numerator = np.sum((x - mu) * numerator, axis=-1)
numerator = np.exp(-0.5 * numerator)
return numerator/denominator


arr = np.array([[42, 234, 124],
[123, 222, 112],
[42, 213, 11],
[42, 213, 11]])

mu = [0, 0, 1]
var = [[1, 100, 100],
[100, 1, 100],
[100, 100, 1]]

slow_out = np.array([multinormpdf(i, mu, var) for i in arr])
fast_out = fast_multinormpdf(arr, mu, var)

np.allclose(slow_out, fast_out) # True


With fast_multinormpdf being about 1000 times faster than your unvectorized function:



long_arr = np.tile(arr, (10000, 1))

%timeit np.array([multinormpdf(i, mu, var) for i in long_arr])
# 2.12 s ± 93.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit fast_multinormpdf(long_arr, mu, var)
# 2.56 ms ± 76.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)





share|improve this answer


























  • Thanks Nils! I appreciate your advice!

    – YeongHwa Jin
    Nov 22 '18 at 13:24
















2














You can vectorize your function easily:



import numpy as np

def fast_multinormpdf(x, mu, var):
mu = np.asarray(mu)
var = np.asarray(var)
k = x.shape[-1]
det = np.linalg.det(var)
inv = np.linalg.inv(var)
denominator = np.sqrt(((2*np.pi)**k)*det)
numerator = np.dot((x - mu), inv)
numerator = np.sum((x - mu) * numerator, axis=-1)
numerator = np.exp(-0.5 * numerator)
return numerator/denominator


arr = np.array([[42, 234, 124],
[123, 222, 112],
[42, 213, 11],
[42, 213, 11]])

mu = [0, 0, 1]
var = [[1, 100, 100],
[100, 1, 100],
[100, 100, 1]]

slow_out = np.array([multinormpdf(i, mu, var) for i in arr])
fast_out = fast_multinormpdf(arr, mu, var)

np.allclose(slow_out, fast_out) # True


With fast_multinormpdf being about 1000 times faster than your unvectorized function:



long_arr = np.tile(arr, (10000, 1))

%timeit np.array([multinormpdf(i, mu, var) for i in long_arr])
# 2.12 s ± 93.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit fast_multinormpdf(long_arr, mu, var)
# 2.56 ms ± 76.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)





share|improve this answer


























  • Thanks Nils! I appreciate your advice!

    – YeongHwa Jin
    Nov 22 '18 at 13:24














2












2








2







You can vectorize your function easily:



import numpy as np

def fast_multinormpdf(x, mu, var):
mu = np.asarray(mu)
var = np.asarray(var)
k = x.shape[-1]
det = np.linalg.det(var)
inv = np.linalg.inv(var)
denominator = np.sqrt(((2*np.pi)**k)*det)
numerator = np.dot((x - mu), inv)
numerator = np.sum((x - mu) * numerator, axis=-1)
numerator = np.exp(-0.5 * numerator)
return numerator/denominator


arr = np.array([[42, 234, 124],
[123, 222, 112],
[42, 213, 11],
[42, 213, 11]])

mu = [0, 0, 1]
var = [[1, 100, 100],
[100, 1, 100],
[100, 100, 1]]

slow_out = np.array([multinormpdf(i, mu, var) for i in arr])
fast_out = fast_multinormpdf(arr, mu, var)

np.allclose(slow_out, fast_out) # True


With fast_multinormpdf being about 1000 times faster than your unvectorized function:



long_arr = np.tile(arr, (10000, 1))

%timeit np.array([multinormpdf(i, mu, var) for i in long_arr])
# 2.12 s ± 93.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit fast_multinormpdf(long_arr, mu, var)
# 2.56 ms ± 76.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)





share|improve this answer















You can vectorize your function easily:



import numpy as np

def fast_multinormpdf(x, mu, var):
mu = np.asarray(mu)
var = np.asarray(var)
k = x.shape[-1]
det = np.linalg.det(var)
inv = np.linalg.inv(var)
denominator = np.sqrt(((2*np.pi)**k)*det)
numerator = np.dot((x - mu), inv)
numerator = np.sum((x - mu) * numerator, axis=-1)
numerator = np.exp(-0.5 * numerator)
return numerator/denominator


arr = np.array([[42, 234, 124],
[123, 222, 112],
[42, 213, 11],
[42, 213, 11]])

mu = [0, 0, 1]
var = [[1, 100, 100],
[100, 1, 100],
[100, 100, 1]]

slow_out = np.array([multinormpdf(i, mu, var) for i in arr])
fast_out = fast_multinormpdf(arr, mu, var)

np.allclose(slow_out, fast_out) # True


With fast_multinormpdf being about 1000 times faster than your unvectorized function:



long_arr = np.tile(arr, (10000, 1))

%timeit np.array([multinormpdf(i, mu, var) for i in long_arr])
# 2.12 s ± 93.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit fast_multinormpdf(long_arr, mu, var)
# 2.56 ms ± 76.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 22 '18 at 9:16

























answered Nov 21 '18 at 21:58









Nils WernerNils Werner

17.7k14161




17.7k14161













  • Thanks Nils! I appreciate your advice!

    – YeongHwa Jin
    Nov 22 '18 at 13:24



















  • Thanks Nils! I appreciate your advice!

    – YeongHwa Jin
    Nov 22 '18 at 13:24

















Thanks Nils! I appreciate your advice!

– YeongHwa Jin
Nov 22 '18 at 13:24





Thanks Nils! I appreciate your advice!

– YeongHwa Jin
Nov 22 '18 at 13:24













1














You can try numba. Just decorate your function with @numba.vectorize.



@numba.vectorize
def multinormpdf(x, mu, var):
# ...
return caculated_probability

new_arr = multinormpdf(arr)


If your multinormpdf doesn't contains any unsupported functions, it can be accelerated. See here: https://numba.pydata.org/numba-doc/dev/reference/numpysupported.html



Moreover, you can use the experimental feature target='parallel' like this.



@numba.vectorize(target='parallel')





share|improve this answer
























  • My input for multinormpdf is already numpy array (like [42, 234, 124] or [123, 222, 112]), not scalar(So maybe function is like multinormpdf([51, 23 ,251], mu_vector, cov_matrix)). Can I use @numba.vectorize yet?

    – YeongHwa Jin
    Nov 22 '18 at 3:41


















1














You can try numba. Just decorate your function with @numba.vectorize.



@numba.vectorize
def multinormpdf(x, mu, var):
# ...
return caculated_probability

new_arr = multinormpdf(arr)


If your multinormpdf doesn't contains any unsupported functions, it can be accelerated. See here: https://numba.pydata.org/numba-doc/dev/reference/numpysupported.html



Moreover, you can use the experimental feature target='parallel' like this.



@numba.vectorize(target='parallel')





share|improve this answer
























  • My input for multinormpdf is already numpy array (like [42, 234, 124] or [123, 222, 112]), not scalar(So maybe function is like multinormpdf([51, 23 ,251], mu_vector, cov_matrix)). Can I use @numba.vectorize yet?

    – YeongHwa Jin
    Nov 22 '18 at 3:41
















1












1








1







You can try numba. Just decorate your function with @numba.vectorize.



@numba.vectorize
def multinormpdf(x, mu, var):
# ...
return caculated_probability

new_arr = multinormpdf(arr)


If your multinormpdf doesn't contains any unsupported functions, it can be accelerated. See here: https://numba.pydata.org/numba-doc/dev/reference/numpysupported.html



Moreover, you can use the experimental feature target='parallel' like this.



@numba.vectorize(target='parallel')





share|improve this answer













You can try numba. Just decorate your function with @numba.vectorize.



@numba.vectorize
def multinormpdf(x, mu, var):
# ...
return caculated_probability

new_arr = multinormpdf(arr)


If your multinormpdf doesn't contains any unsupported functions, it can be accelerated. See here: https://numba.pydata.org/numba-doc/dev/reference/numpysupported.html



Moreover, you can use the experimental feature target='parallel' like this.



@numba.vectorize(target='parallel')






share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 21 '18 at 20:28









anch2150anch2150

113




113













  • My input for multinormpdf is already numpy array (like [42, 234, 124] or [123, 222, 112]), not scalar(So maybe function is like multinormpdf([51, 23 ,251], mu_vector, cov_matrix)). Can I use @numba.vectorize yet?

    – YeongHwa Jin
    Nov 22 '18 at 3:41





















  • My input for multinormpdf is already numpy array (like [42, 234, 124] or [123, 222, 112]), not scalar(So maybe function is like multinormpdf([51, 23 ,251], mu_vector, cov_matrix)). Can I use @numba.vectorize yet?

    – YeongHwa Jin
    Nov 22 '18 at 3:41



















My input for multinormpdf is already numpy array (like [42, 234, 124] or [123, 222, 112]), not scalar(So maybe function is like multinormpdf([51, 23 ,251], mu_vector, cov_matrix)). Can I use @numba.vectorize yet?

– YeongHwa Jin
Nov 22 '18 at 3:41







My input for multinormpdf is already numpy array (like [42, 234, 124] or [123, 222, 112]), not scalar(So maybe function is like multinormpdf([51, 23 ,251], mu_vector, cov_matrix)). Can I use @numba.vectorize yet?

– YeongHwa Jin
Nov 22 '18 at 3:41




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53416511%2fhow-can-i-calculate-probability-for-all-each-numpy-value-at-once%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

"Incorrect syntax near the keyword 'ON'. (on update cascade, on delete cascade,)

Alcedinidae

Origin of the phrase “under your belt”?