random.shuffle very slow in Python 3 with list











up vote
0
down vote

favorite












I am using python-3.x, and I am trying to generate a list of indexes numbers and shuffle them to use them later to select random values from a sample where this sample will have two variables sample size and dimension number, however here how I generate a list of indexes and them I shuffle them:



dimension = 5
sample_size = 100

generate_indexes = itertools.combinations(range(sample_size),dimension)
all_indexes = list(generate_indexes)

# here I do the shuffle
random.shuffle(all_indexes)


the problem when I increased the dimension number it will take a long time to give the result even if the dimension number is 5 it takes very long or it will not proceed.



Is there any way to make it fast?



because I have a multidimensional sample that contains values and I want to select a random number of values form that sample based on the all_indexes...










share|improve this question




















  • 6




    You're generating a list of 75287520 elements to shuffle - that's likely not to be quick... Do you definitely need to do this for what you're trying to achieve?
    – Jon Clements
    Nov 19 at 16:24










  • Yes, I need to have 20 dimensions for my problem, not 5, I thank the way that I doing is not the right one.
    – azeez
    Nov 19 at 16:25










  • Yeah, you should definitely rethink your approach. Even without the shuffle, this will run you out of memory well before you hit 20 dimensions.
    – glibdud
    Nov 19 at 16:29










  • It is the list creation that is consuming the whole time. Some generator ways might do faster.
    – Austin
    Nov 19 at 16:30










  • @Austin I had the same thought and tested it. The shuffle is taking longer than the list creation by a large margin.
    – Andrew McDowell
    Nov 19 at 16:33















up vote
0
down vote

favorite












I am using python-3.x, and I am trying to generate a list of indexes numbers and shuffle them to use them later to select random values from a sample where this sample will have two variables sample size and dimension number, however here how I generate a list of indexes and them I shuffle them:



dimension = 5
sample_size = 100

generate_indexes = itertools.combinations(range(sample_size),dimension)
all_indexes = list(generate_indexes)

# here I do the shuffle
random.shuffle(all_indexes)


the problem when I increased the dimension number it will take a long time to give the result even if the dimension number is 5 it takes very long or it will not proceed.



Is there any way to make it fast?



because I have a multidimensional sample that contains values and I want to select a random number of values form that sample based on the all_indexes...










share|improve this question




















  • 6




    You're generating a list of 75287520 elements to shuffle - that's likely not to be quick... Do you definitely need to do this for what you're trying to achieve?
    – Jon Clements
    Nov 19 at 16:24










  • Yes, I need to have 20 dimensions for my problem, not 5, I thank the way that I doing is not the right one.
    – azeez
    Nov 19 at 16:25










  • Yeah, you should definitely rethink your approach. Even without the shuffle, this will run you out of memory well before you hit 20 dimensions.
    – glibdud
    Nov 19 at 16:29










  • It is the list creation that is consuming the whole time. Some generator ways might do faster.
    – Austin
    Nov 19 at 16:30










  • @Austin I had the same thought and tested it. The shuffle is taking longer than the list creation by a large margin.
    – Andrew McDowell
    Nov 19 at 16:33













up vote
0
down vote

favorite









up vote
0
down vote

favorite











I am using python-3.x, and I am trying to generate a list of indexes numbers and shuffle them to use them later to select random values from a sample where this sample will have two variables sample size and dimension number, however here how I generate a list of indexes and them I shuffle them:



dimension = 5
sample_size = 100

generate_indexes = itertools.combinations(range(sample_size),dimension)
all_indexes = list(generate_indexes)

# here I do the shuffle
random.shuffle(all_indexes)


the problem when I increased the dimension number it will take a long time to give the result even if the dimension number is 5 it takes very long or it will not proceed.



Is there any way to make it fast?



because I have a multidimensional sample that contains values and I want to select a random number of values form that sample based on the all_indexes...










share|improve this question















I am using python-3.x, and I am trying to generate a list of indexes numbers and shuffle them to use them later to select random values from a sample where this sample will have two variables sample size and dimension number, however here how I generate a list of indexes and them I shuffle them:



dimension = 5
sample_size = 100

generate_indexes = itertools.combinations(range(sample_size),dimension)
all_indexes = list(generate_indexes)

# here I do the shuffle
random.shuffle(all_indexes)


the problem when I increased the dimension number it will take a long time to give the result even if the dimension number is 5 it takes very long or it will not proceed.



Is there any way to make it fast?



because I have a multidimensional sample that contains values and I want to select a random number of values form that sample based on the all_indexes...







python-3.x list random shuffle






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 19 at 16:31

























asked Nov 19 at 16:15









azeez

9429




9429








  • 6




    You're generating a list of 75287520 elements to shuffle - that's likely not to be quick... Do you definitely need to do this for what you're trying to achieve?
    – Jon Clements
    Nov 19 at 16:24










  • Yes, I need to have 20 dimensions for my problem, not 5, I thank the way that I doing is not the right one.
    – azeez
    Nov 19 at 16:25










  • Yeah, you should definitely rethink your approach. Even without the shuffle, this will run you out of memory well before you hit 20 dimensions.
    – glibdud
    Nov 19 at 16:29










  • It is the list creation that is consuming the whole time. Some generator ways might do faster.
    – Austin
    Nov 19 at 16:30










  • @Austin I had the same thought and tested it. The shuffle is taking longer than the list creation by a large margin.
    – Andrew McDowell
    Nov 19 at 16:33














  • 6




    You're generating a list of 75287520 elements to shuffle - that's likely not to be quick... Do you definitely need to do this for what you're trying to achieve?
    – Jon Clements
    Nov 19 at 16:24










  • Yes, I need to have 20 dimensions for my problem, not 5, I thank the way that I doing is not the right one.
    – azeez
    Nov 19 at 16:25










  • Yeah, you should definitely rethink your approach. Even without the shuffle, this will run you out of memory well before you hit 20 dimensions.
    – glibdud
    Nov 19 at 16:29










  • It is the list creation that is consuming the whole time. Some generator ways might do faster.
    – Austin
    Nov 19 at 16:30










  • @Austin I had the same thought and tested it. The shuffle is taking longer than the list creation by a large margin.
    – Andrew McDowell
    Nov 19 at 16:33








6




6




You're generating a list of 75287520 elements to shuffle - that's likely not to be quick... Do you definitely need to do this for what you're trying to achieve?
– Jon Clements
Nov 19 at 16:24




You're generating a list of 75287520 elements to shuffle - that's likely not to be quick... Do you definitely need to do this for what you're trying to achieve?
– Jon Clements
Nov 19 at 16:24












Yes, I need to have 20 dimensions for my problem, not 5, I thank the way that I doing is not the right one.
– azeez
Nov 19 at 16:25




Yes, I need to have 20 dimensions for my problem, not 5, I thank the way that I doing is not the right one.
– azeez
Nov 19 at 16:25












Yeah, you should definitely rethink your approach. Even without the shuffle, this will run you out of memory well before you hit 20 dimensions.
– glibdud
Nov 19 at 16:29




Yeah, you should definitely rethink your approach. Even without the shuffle, this will run you out of memory well before you hit 20 dimensions.
– glibdud
Nov 19 at 16:29












It is the list creation that is consuming the whole time. Some generator ways might do faster.
– Austin
Nov 19 at 16:30




It is the list creation that is consuming the whole time. Some generator ways might do faster.
– Austin
Nov 19 at 16:30












@Austin I had the same thought and tested it. The shuffle is taking longer than the list creation by a large margin.
– Andrew McDowell
Nov 19 at 16:33




@Austin I had the same thought and tested it. The shuffle is taking longer than the list creation by a large margin.
– Andrew McDowell
Nov 19 at 16:33












1 Answer
1






active

oldest

votes

















up vote
0
down vote



accepted










As pointed out in the comments, you're generating a very large list and then shuffling it. This is not going to be quick but depending on what you actually need, there may be faster ways of getting what you want.



I ran your code on my machine and found that generating the list of all combinations took about 8 seconds and shuffling it took around 75. If you need to increase the dimension, this time will significantly increase, not to mention the memory requirements of storing very large arrays could start to become significant.



If you are not going to need all of the random indexes, you may be better off sampling each time using,



random.sample(range(sample_size), dimension)


This returns a random collection of distinct dimension elements from 0 to sample_size. This took about 0.0001 second to run with your values of dimension and sample_size. If you don't need too many of the random values, then you'll be much quicker (and memory efficient) generating a new one each time.



There are two issues with this that I can see. Firstly you aren't guaranteed that each new sample won't be a repeat of a previous one, but this can be easily resolved by storing them as you go, and checking if they've been used already.



new_sample = random.sample(range(sample_size), dimension)
if new_sample not in random_indexes:
random_indexes.append(new_sample)
else:
# Handle this however you need.


This does add more run time, but again will be faster if you don't need too many of your samples.



The other difference is that the approach you have used generates tuples of elements that are always sorted so (1,2,3,4,5) will be an element of all_indexes, but (5,4,3,2,1) will not. Using random.sample can produce them in any order, so both could occur. If this is an issue, then you will have to resolve this. Perhaps by putting them into a set before adding them to the list:



new_sample = set(random.sample(range(sample_size), dimension))





share|improve this answer





















  • Thank you very much for this a good comment, the problem know that Yes I need all the random numbers of the index that mean if my sample_size is 10 then I need 10 random of numbers of index [(0, 2), (0, 1),....... if the dimension is 2
    – azeez
    Nov 19 at 17:40






  • 1




    at the end, I use this random_indexx = random.sample ( all_indexes, sample_size ) to select my random numbers of the index that equivalent to the sample_size I did this way because I want to make sure i will not have any duplicates
    – azeez
    Nov 19 at 18:01












  • That's good, you're definitely better off sampling then. If you only need sample_size samples, then you don't need all of all_indexes. With dimension = 2 and sample_size = 10 for example, the size of all_indexes would be 45. In your original example, sample_size is 100, but all_indexes is of size 75287520.
    – Andrew McDowell
    Nov 19 at 20:44











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53378699%2frandom-shuffle-very-slow-in-python-3-with-list%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
0
down vote



accepted










As pointed out in the comments, you're generating a very large list and then shuffling it. This is not going to be quick but depending on what you actually need, there may be faster ways of getting what you want.



I ran your code on my machine and found that generating the list of all combinations took about 8 seconds and shuffling it took around 75. If you need to increase the dimension, this time will significantly increase, not to mention the memory requirements of storing very large arrays could start to become significant.



If you are not going to need all of the random indexes, you may be better off sampling each time using,



random.sample(range(sample_size), dimension)


This returns a random collection of distinct dimension elements from 0 to sample_size. This took about 0.0001 second to run with your values of dimension and sample_size. If you don't need too many of the random values, then you'll be much quicker (and memory efficient) generating a new one each time.



There are two issues with this that I can see. Firstly you aren't guaranteed that each new sample won't be a repeat of a previous one, but this can be easily resolved by storing them as you go, and checking if they've been used already.



new_sample = random.sample(range(sample_size), dimension)
if new_sample not in random_indexes:
random_indexes.append(new_sample)
else:
# Handle this however you need.


This does add more run time, but again will be faster if you don't need too many of your samples.



The other difference is that the approach you have used generates tuples of elements that are always sorted so (1,2,3,4,5) will be an element of all_indexes, but (5,4,3,2,1) will not. Using random.sample can produce them in any order, so both could occur. If this is an issue, then you will have to resolve this. Perhaps by putting them into a set before adding them to the list:



new_sample = set(random.sample(range(sample_size), dimension))





share|improve this answer





















  • Thank you very much for this a good comment, the problem know that Yes I need all the random numbers of the index that mean if my sample_size is 10 then I need 10 random of numbers of index [(0, 2), (0, 1),....... if the dimension is 2
    – azeez
    Nov 19 at 17:40






  • 1




    at the end, I use this random_indexx = random.sample ( all_indexes, sample_size ) to select my random numbers of the index that equivalent to the sample_size I did this way because I want to make sure i will not have any duplicates
    – azeez
    Nov 19 at 18:01












  • That's good, you're definitely better off sampling then. If you only need sample_size samples, then you don't need all of all_indexes. With dimension = 2 and sample_size = 10 for example, the size of all_indexes would be 45. In your original example, sample_size is 100, but all_indexes is of size 75287520.
    – Andrew McDowell
    Nov 19 at 20:44















up vote
0
down vote



accepted










As pointed out in the comments, you're generating a very large list and then shuffling it. This is not going to be quick but depending on what you actually need, there may be faster ways of getting what you want.



I ran your code on my machine and found that generating the list of all combinations took about 8 seconds and shuffling it took around 75. If you need to increase the dimension, this time will significantly increase, not to mention the memory requirements of storing very large arrays could start to become significant.



If you are not going to need all of the random indexes, you may be better off sampling each time using,



random.sample(range(sample_size), dimension)


This returns a random collection of distinct dimension elements from 0 to sample_size. This took about 0.0001 second to run with your values of dimension and sample_size. If you don't need too many of the random values, then you'll be much quicker (and memory efficient) generating a new one each time.



There are two issues with this that I can see. Firstly you aren't guaranteed that each new sample won't be a repeat of a previous one, but this can be easily resolved by storing them as you go, and checking if they've been used already.



new_sample = random.sample(range(sample_size), dimension)
if new_sample not in random_indexes:
random_indexes.append(new_sample)
else:
# Handle this however you need.


This does add more run time, but again will be faster if you don't need too many of your samples.



The other difference is that the approach you have used generates tuples of elements that are always sorted so (1,2,3,4,5) will be an element of all_indexes, but (5,4,3,2,1) will not. Using random.sample can produce them in any order, so both could occur. If this is an issue, then you will have to resolve this. Perhaps by putting them into a set before adding them to the list:



new_sample = set(random.sample(range(sample_size), dimension))





share|improve this answer





















  • Thank you very much for this a good comment, the problem know that Yes I need all the random numbers of the index that mean if my sample_size is 10 then I need 10 random of numbers of index [(0, 2), (0, 1),....... if the dimension is 2
    – azeez
    Nov 19 at 17:40






  • 1




    at the end, I use this random_indexx = random.sample ( all_indexes, sample_size ) to select my random numbers of the index that equivalent to the sample_size I did this way because I want to make sure i will not have any duplicates
    – azeez
    Nov 19 at 18:01












  • That's good, you're definitely better off sampling then. If you only need sample_size samples, then you don't need all of all_indexes. With dimension = 2 and sample_size = 10 for example, the size of all_indexes would be 45. In your original example, sample_size is 100, but all_indexes is of size 75287520.
    – Andrew McDowell
    Nov 19 at 20:44













up vote
0
down vote



accepted







up vote
0
down vote



accepted






As pointed out in the comments, you're generating a very large list and then shuffling it. This is not going to be quick but depending on what you actually need, there may be faster ways of getting what you want.



I ran your code on my machine and found that generating the list of all combinations took about 8 seconds and shuffling it took around 75. If you need to increase the dimension, this time will significantly increase, not to mention the memory requirements of storing very large arrays could start to become significant.



If you are not going to need all of the random indexes, you may be better off sampling each time using,



random.sample(range(sample_size), dimension)


This returns a random collection of distinct dimension elements from 0 to sample_size. This took about 0.0001 second to run with your values of dimension and sample_size. If you don't need too many of the random values, then you'll be much quicker (and memory efficient) generating a new one each time.



There are two issues with this that I can see. Firstly you aren't guaranteed that each new sample won't be a repeat of a previous one, but this can be easily resolved by storing them as you go, and checking if they've been used already.



new_sample = random.sample(range(sample_size), dimension)
if new_sample not in random_indexes:
random_indexes.append(new_sample)
else:
# Handle this however you need.


This does add more run time, but again will be faster if you don't need too many of your samples.



The other difference is that the approach you have used generates tuples of elements that are always sorted so (1,2,3,4,5) will be an element of all_indexes, but (5,4,3,2,1) will not. Using random.sample can produce them in any order, so both could occur. If this is an issue, then you will have to resolve this. Perhaps by putting them into a set before adding them to the list:



new_sample = set(random.sample(range(sample_size), dimension))





share|improve this answer












As pointed out in the comments, you're generating a very large list and then shuffling it. This is not going to be quick but depending on what you actually need, there may be faster ways of getting what you want.



I ran your code on my machine and found that generating the list of all combinations took about 8 seconds and shuffling it took around 75. If you need to increase the dimension, this time will significantly increase, not to mention the memory requirements of storing very large arrays could start to become significant.



If you are not going to need all of the random indexes, you may be better off sampling each time using,



random.sample(range(sample_size), dimension)


This returns a random collection of distinct dimension elements from 0 to sample_size. This took about 0.0001 second to run with your values of dimension and sample_size. If you don't need too many of the random values, then you'll be much quicker (and memory efficient) generating a new one each time.



There are two issues with this that I can see. Firstly you aren't guaranteed that each new sample won't be a repeat of a previous one, but this can be easily resolved by storing them as you go, and checking if they've been used already.



new_sample = random.sample(range(sample_size), dimension)
if new_sample not in random_indexes:
random_indexes.append(new_sample)
else:
# Handle this however you need.


This does add more run time, but again will be faster if you don't need too many of your samples.



The other difference is that the approach you have used generates tuples of elements that are always sorted so (1,2,3,4,5) will be an element of all_indexes, but (5,4,3,2,1) will not. Using random.sample can produce them in any order, so both could occur. If this is an issue, then you will have to resolve this. Perhaps by putting them into a set before adding them to the list:



new_sample = set(random.sample(range(sample_size), dimension))






share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 19 at 17:09









Andrew McDowell

1,5391215




1,5391215












  • Thank you very much for this a good comment, the problem know that Yes I need all the random numbers of the index that mean if my sample_size is 10 then I need 10 random of numbers of index [(0, 2), (0, 1),....... if the dimension is 2
    – azeez
    Nov 19 at 17:40






  • 1




    at the end, I use this random_indexx = random.sample ( all_indexes, sample_size ) to select my random numbers of the index that equivalent to the sample_size I did this way because I want to make sure i will not have any duplicates
    – azeez
    Nov 19 at 18:01












  • That's good, you're definitely better off sampling then. If you only need sample_size samples, then you don't need all of all_indexes. With dimension = 2 and sample_size = 10 for example, the size of all_indexes would be 45. In your original example, sample_size is 100, but all_indexes is of size 75287520.
    – Andrew McDowell
    Nov 19 at 20:44


















  • Thank you very much for this a good comment, the problem know that Yes I need all the random numbers of the index that mean if my sample_size is 10 then I need 10 random of numbers of index [(0, 2), (0, 1),....... if the dimension is 2
    – azeez
    Nov 19 at 17:40






  • 1




    at the end, I use this random_indexx = random.sample ( all_indexes, sample_size ) to select my random numbers of the index that equivalent to the sample_size I did this way because I want to make sure i will not have any duplicates
    – azeez
    Nov 19 at 18:01












  • That's good, you're definitely better off sampling then. If you only need sample_size samples, then you don't need all of all_indexes. With dimension = 2 and sample_size = 10 for example, the size of all_indexes would be 45. In your original example, sample_size is 100, but all_indexes is of size 75287520.
    – Andrew McDowell
    Nov 19 at 20:44
















Thank you very much for this a good comment, the problem know that Yes I need all the random numbers of the index that mean if my sample_size is 10 then I need 10 random of numbers of index [(0, 2), (0, 1),....... if the dimension is 2
– azeez
Nov 19 at 17:40




Thank you very much for this a good comment, the problem know that Yes I need all the random numbers of the index that mean if my sample_size is 10 then I need 10 random of numbers of index [(0, 2), (0, 1),....... if the dimension is 2
– azeez
Nov 19 at 17:40




1




1




at the end, I use this random_indexx = random.sample ( all_indexes, sample_size ) to select my random numbers of the index that equivalent to the sample_size I did this way because I want to make sure i will not have any duplicates
– azeez
Nov 19 at 18:01






at the end, I use this random_indexx = random.sample ( all_indexes, sample_size ) to select my random numbers of the index that equivalent to the sample_size I did this way because I want to make sure i will not have any duplicates
– azeez
Nov 19 at 18:01














That's good, you're definitely better off sampling then. If you only need sample_size samples, then you don't need all of all_indexes. With dimension = 2 and sample_size = 10 for example, the size of all_indexes would be 45. In your original example, sample_size is 100, but all_indexes is of size 75287520.
– Andrew McDowell
Nov 19 at 20:44




That's good, you're definitely better off sampling then. If you only need sample_size samples, then you don't need all of all_indexes. With dimension = 2 and sample_size = 10 for example, the size of all_indexes would be 45. In your original example, sample_size is 100, but all_indexes is of size 75287520.
– Andrew McDowell
Nov 19 at 20:44


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53378699%2frandom-shuffle-very-slow-in-python-3-with-list%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

"Incorrect syntax near the keyword 'ON'. (on update cascade, on delete cascade,)

Alcedinidae

Origin of the phrase “under your belt”?