random.shuffle very slow in Python 3 with list
up vote
0
down vote
favorite
I am using python-3.x, and I am trying to generate a list of indexes numbers and shuffle them to use them later to select random values from a sample where this sample will have two variables sample size and dimension number, however here how I generate a list of indexes and them I shuffle them:
dimension = 5
sample_size = 100
generate_indexes = itertools.combinations(range(sample_size),dimension)
all_indexes = list(generate_indexes)
# here I do the shuffle
random.shuffle(all_indexes)
the problem when I increased the dimension number it will take a long time to give the result even if the dimension number is 5 it takes very long or it will not proceed.
Is there any way to make it fast?
because I have a multidimensional sample that contains values and I want to select a random number of values form that sample based on the all_indexes...
python-3.x list random shuffle
|
show 2 more comments
up vote
0
down vote
favorite
I am using python-3.x, and I am trying to generate a list of indexes numbers and shuffle them to use them later to select random values from a sample where this sample will have two variables sample size and dimension number, however here how I generate a list of indexes and them I shuffle them:
dimension = 5
sample_size = 100
generate_indexes = itertools.combinations(range(sample_size),dimension)
all_indexes = list(generate_indexes)
# here I do the shuffle
random.shuffle(all_indexes)
the problem when I increased the dimension number it will take a long time to give the result even if the dimension number is 5 it takes very long or it will not proceed.
Is there any way to make it fast?
because I have a multidimensional sample that contains values and I want to select a random number of values form that sample based on the all_indexes...
python-3.x list random shuffle
6
You're generating a list of 75287520 elements to shuffle - that's likely not to be quick... Do you definitely need to do this for what you're trying to achieve?
– Jon Clements♦
Nov 19 at 16:24
Yes, I need to have 20 dimensions for my problem, not 5, I thank the way that I doing is not the right one.
– azeez
Nov 19 at 16:25
Yeah, you should definitely rethink your approach. Even without the shuffle, this will run you out of memory well before you hit 20 dimensions.
– glibdud
Nov 19 at 16:29
It is the list creation that is consuming the whole time. Some generator ways might do faster.
– Austin
Nov 19 at 16:30
@Austin I had the same thought and tested it. The shuffle is taking longer than the list creation by a large margin.
– Andrew McDowell
Nov 19 at 16:33
|
show 2 more comments
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I am using python-3.x, and I am trying to generate a list of indexes numbers and shuffle them to use them later to select random values from a sample where this sample will have two variables sample size and dimension number, however here how I generate a list of indexes and them I shuffle them:
dimension = 5
sample_size = 100
generate_indexes = itertools.combinations(range(sample_size),dimension)
all_indexes = list(generate_indexes)
# here I do the shuffle
random.shuffle(all_indexes)
the problem when I increased the dimension number it will take a long time to give the result even if the dimension number is 5 it takes very long or it will not proceed.
Is there any way to make it fast?
because I have a multidimensional sample that contains values and I want to select a random number of values form that sample based on the all_indexes...
python-3.x list random shuffle
I am using python-3.x, and I am trying to generate a list of indexes numbers and shuffle them to use them later to select random values from a sample where this sample will have two variables sample size and dimension number, however here how I generate a list of indexes and them I shuffle them:
dimension = 5
sample_size = 100
generate_indexes = itertools.combinations(range(sample_size),dimension)
all_indexes = list(generate_indexes)
# here I do the shuffle
random.shuffle(all_indexes)
the problem when I increased the dimension number it will take a long time to give the result even if the dimension number is 5 it takes very long or it will not proceed.
Is there any way to make it fast?
because I have a multidimensional sample that contains values and I want to select a random number of values form that sample based on the all_indexes...
python-3.x list random shuffle
python-3.x list random shuffle
edited Nov 19 at 16:31
asked Nov 19 at 16:15
azeez
9429
9429
6
You're generating a list of 75287520 elements to shuffle - that's likely not to be quick... Do you definitely need to do this for what you're trying to achieve?
– Jon Clements♦
Nov 19 at 16:24
Yes, I need to have 20 dimensions for my problem, not 5, I thank the way that I doing is not the right one.
– azeez
Nov 19 at 16:25
Yeah, you should definitely rethink your approach. Even without the shuffle, this will run you out of memory well before you hit 20 dimensions.
– glibdud
Nov 19 at 16:29
It is the list creation that is consuming the whole time. Some generator ways might do faster.
– Austin
Nov 19 at 16:30
@Austin I had the same thought and tested it. The shuffle is taking longer than the list creation by a large margin.
– Andrew McDowell
Nov 19 at 16:33
|
show 2 more comments
6
You're generating a list of 75287520 elements to shuffle - that's likely not to be quick... Do you definitely need to do this for what you're trying to achieve?
– Jon Clements♦
Nov 19 at 16:24
Yes, I need to have 20 dimensions for my problem, not 5, I thank the way that I doing is not the right one.
– azeez
Nov 19 at 16:25
Yeah, you should definitely rethink your approach. Even without the shuffle, this will run you out of memory well before you hit 20 dimensions.
– glibdud
Nov 19 at 16:29
It is the list creation that is consuming the whole time. Some generator ways might do faster.
– Austin
Nov 19 at 16:30
@Austin I had the same thought and tested it. The shuffle is taking longer than the list creation by a large margin.
– Andrew McDowell
Nov 19 at 16:33
6
6
You're generating a list of 75287520 elements to shuffle - that's likely not to be quick... Do you definitely need to do this for what you're trying to achieve?
– Jon Clements♦
Nov 19 at 16:24
You're generating a list of 75287520 elements to shuffle - that's likely not to be quick... Do you definitely need to do this for what you're trying to achieve?
– Jon Clements♦
Nov 19 at 16:24
Yes, I need to have 20 dimensions for my problem, not 5, I thank the way that I doing is not the right one.
– azeez
Nov 19 at 16:25
Yes, I need to have 20 dimensions for my problem, not 5, I thank the way that I doing is not the right one.
– azeez
Nov 19 at 16:25
Yeah, you should definitely rethink your approach. Even without the shuffle, this will run you out of memory well before you hit 20 dimensions.
– glibdud
Nov 19 at 16:29
Yeah, you should definitely rethink your approach. Even without the shuffle, this will run you out of memory well before you hit 20 dimensions.
– glibdud
Nov 19 at 16:29
It is the list creation that is consuming the whole time. Some generator ways might do faster.
– Austin
Nov 19 at 16:30
It is the list creation that is consuming the whole time. Some generator ways might do faster.
– Austin
Nov 19 at 16:30
@Austin I had the same thought and tested it. The shuffle is taking longer than the list creation by a large margin.
– Andrew McDowell
Nov 19 at 16:33
@Austin I had the same thought and tested it. The shuffle is taking longer than the list creation by a large margin.
– Andrew McDowell
Nov 19 at 16:33
|
show 2 more comments
1 Answer
1
active
oldest
votes
up vote
0
down vote
accepted
As pointed out in the comments, you're generating a very large list and then shuffling it. This is not going to be quick but depending on what you actually need, there may be faster ways of getting what you want.
I ran your code on my machine and found that generating the list of all combinations took about 8 seconds and shuffling it took around 75. If you need to increase the dimension, this time will significantly increase, not to mention the memory requirements of storing very large arrays could start to become significant.
If you are not going to need all of the random indexes, you may be better off sampling each time using,
random.sample(range(sample_size), dimension)
This returns a random collection of distinct dimension
elements from 0
to sample_size
. This took about 0.0001 second to run with your values of dimension
and sample_size
. If you don't need too many of the random values, then you'll be much quicker (and memory efficient) generating a new one each time.
There are two issues with this that I can see. Firstly you aren't guaranteed that each new sample won't be a repeat of a previous one, but this can be easily resolved by storing them as you go, and checking if they've been used already.
new_sample = random.sample(range(sample_size), dimension)
if new_sample not in random_indexes:
random_indexes.append(new_sample)
else:
# Handle this however you need.
This does add more run time, but again will be faster if you don't need too many of your samples.
The other difference is that the approach you have used generates tuples of elements that are always sorted so (1,2,3,4,5) will be an element of all_indexes
, but (5,4,3,2,1) will not. Using random.sample
can produce them in any order, so both could occur. If this is an issue, then you will have to resolve this. Perhaps by putting them into a set before adding them to the list:
new_sample = set(random.sample(range(sample_size), dimension))
Thank you very much for this a good comment, the problem know that Yes I need all the random numbers of the index that mean if my sample_size is 10 then I need 10 random of numbers of index [(0, 2), (0, 1),....... if the dimension is 2
– azeez
Nov 19 at 17:40
1
at the end, I use thisrandom_indexx = random.sample ( all_indexes, sample_size )
to select my random numbers of the index that equivalent to the sample_size I did this way because I want to make sure i will not have any duplicates
– azeez
Nov 19 at 18:01
That's good, you're definitely better off sampling then. If you only needsample_size
samples, then you don't need all ofall_indexes
. Withdimension = 2
andsample_size = 10
for example, the size ofall_indexes
would be 45. In your original example,sample_size
is 100, butall_indexes
is of size 75287520.
– Andrew McDowell
Nov 19 at 20:44
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53378699%2frandom-shuffle-very-slow-in-python-3-with-list%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
accepted
As pointed out in the comments, you're generating a very large list and then shuffling it. This is not going to be quick but depending on what you actually need, there may be faster ways of getting what you want.
I ran your code on my machine and found that generating the list of all combinations took about 8 seconds and shuffling it took around 75. If you need to increase the dimension, this time will significantly increase, not to mention the memory requirements of storing very large arrays could start to become significant.
If you are not going to need all of the random indexes, you may be better off sampling each time using,
random.sample(range(sample_size), dimension)
This returns a random collection of distinct dimension
elements from 0
to sample_size
. This took about 0.0001 second to run with your values of dimension
and sample_size
. If you don't need too many of the random values, then you'll be much quicker (and memory efficient) generating a new one each time.
There are two issues with this that I can see. Firstly you aren't guaranteed that each new sample won't be a repeat of a previous one, but this can be easily resolved by storing them as you go, and checking if they've been used already.
new_sample = random.sample(range(sample_size), dimension)
if new_sample not in random_indexes:
random_indexes.append(new_sample)
else:
# Handle this however you need.
This does add more run time, but again will be faster if you don't need too many of your samples.
The other difference is that the approach you have used generates tuples of elements that are always sorted so (1,2,3,4,5) will be an element of all_indexes
, but (5,4,3,2,1) will not. Using random.sample
can produce them in any order, so both could occur. If this is an issue, then you will have to resolve this. Perhaps by putting them into a set before adding them to the list:
new_sample = set(random.sample(range(sample_size), dimension))
Thank you very much for this a good comment, the problem know that Yes I need all the random numbers of the index that mean if my sample_size is 10 then I need 10 random of numbers of index [(0, 2), (0, 1),....... if the dimension is 2
– azeez
Nov 19 at 17:40
1
at the end, I use thisrandom_indexx = random.sample ( all_indexes, sample_size )
to select my random numbers of the index that equivalent to the sample_size I did this way because I want to make sure i will not have any duplicates
– azeez
Nov 19 at 18:01
That's good, you're definitely better off sampling then. If you only needsample_size
samples, then you don't need all ofall_indexes
. Withdimension = 2
andsample_size = 10
for example, the size ofall_indexes
would be 45. In your original example,sample_size
is 100, butall_indexes
is of size 75287520.
– Andrew McDowell
Nov 19 at 20:44
add a comment |
up vote
0
down vote
accepted
As pointed out in the comments, you're generating a very large list and then shuffling it. This is not going to be quick but depending on what you actually need, there may be faster ways of getting what you want.
I ran your code on my machine and found that generating the list of all combinations took about 8 seconds and shuffling it took around 75. If you need to increase the dimension, this time will significantly increase, not to mention the memory requirements of storing very large arrays could start to become significant.
If you are not going to need all of the random indexes, you may be better off sampling each time using,
random.sample(range(sample_size), dimension)
This returns a random collection of distinct dimension
elements from 0
to sample_size
. This took about 0.0001 second to run with your values of dimension
and sample_size
. If you don't need too many of the random values, then you'll be much quicker (and memory efficient) generating a new one each time.
There are two issues with this that I can see. Firstly you aren't guaranteed that each new sample won't be a repeat of a previous one, but this can be easily resolved by storing them as you go, and checking if they've been used already.
new_sample = random.sample(range(sample_size), dimension)
if new_sample not in random_indexes:
random_indexes.append(new_sample)
else:
# Handle this however you need.
This does add more run time, but again will be faster if you don't need too many of your samples.
The other difference is that the approach you have used generates tuples of elements that are always sorted so (1,2,3,4,5) will be an element of all_indexes
, but (5,4,3,2,1) will not. Using random.sample
can produce them in any order, so both could occur. If this is an issue, then you will have to resolve this. Perhaps by putting them into a set before adding them to the list:
new_sample = set(random.sample(range(sample_size), dimension))
Thank you very much for this a good comment, the problem know that Yes I need all the random numbers of the index that mean if my sample_size is 10 then I need 10 random of numbers of index [(0, 2), (0, 1),....... if the dimension is 2
– azeez
Nov 19 at 17:40
1
at the end, I use thisrandom_indexx = random.sample ( all_indexes, sample_size )
to select my random numbers of the index that equivalent to the sample_size I did this way because I want to make sure i will not have any duplicates
– azeez
Nov 19 at 18:01
That's good, you're definitely better off sampling then. If you only needsample_size
samples, then you don't need all ofall_indexes
. Withdimension = 2
andsample_size = 10
for example, the size ofall_indexes
would be 45. In your original example,sample_size
is 100, butall_indexes
is of size 75287520.
– Andrew McDowell
Nov 19 at 20:44
add a comment |
up vote
0
down vote
accepted
up vote
0
down vote
accepted
As pointed out in the comments, you're generating a very large list and then shuffling it. This is not going to be quick but depending on what you actually need, there may be faster ways of getting what you want.
I ran your code on my machine and found that generating the list of all combinations took about 8 seconds and shuffling it took around 75. If you need to increase the dimension, this time will significantly increase, not to mention the memory requirements of storing very large arrays could start to become significant.
If you are not going to need all of the random indexes, you may be better off sampling each time using,
random.sample(range(sample_size), dimension)
This returns a random collection of distinct dimension
elements from 0
to sample_size
. This took about 0.0001 second to run with your values of dimension
and sample_size
. If you don't need too many of the random values, then you'll be much quicker (and memory efficient) generating a new one each time.
There are two issues with this that I can see. Firstly you aren't guaranteed that each new sample won't be a repeat of a previous one, but this can be easily resolved by storing them as you go, and checking if they've been used already.
new_sample = random.sample(range(sample_size), dimension)
if new_sample not in random_indexes:
random_indexes.append(new_sample)
else:
# Handle this however you need.
This does add more run time, but again will be faster if you don't need too many of your samples.
The other difference is that the approach you have used generates tuples of elements that are always sorted so (1,2,3,4,5) will be an element of all_indexes
, but (5,4,3,2,1) will not. Using random.sample
can produce them in any order, so both could occur. If this is an issue, then you will have to resolve this. Perhaps by putting them into a set before adding them to the list:
new_sample = set(random.sample(range(sample_size), dimension))
As pointed out in the comments, you're generating a very large list and then shuffling it. This is not going to be quick but depending on what you actually need, there may be faster ways of getting what you want.
I ran your code on my machine and found that generating the list of all combinations took about 8 seconds and shuffling it took around 75. If you need to increase the dimension, this time will significantly increase, not to mention the memory requirements of storing very large arrays could start to become significant.
If you are not going to need all of the random indexes, you may be better off sampling each time using,
random.sample(range(sample_size), dimension)
This returns a random collection of distinct dimension
elements from 0
to sample_size
. This took about 0.0001 second to run with your values of dimension
and sample_size
. If you don't need too many of the random values, then you'll be much quicker (and memory efficient) generating a new one each time.
There are two issues with this that I can see. Firstly you aren't guaranteed that each new sample won't be a repeat of a previous one, but this can be easily resolved by storing them as you go, and checking if they've been used already.
new_sample = random.sample(range(sample_size), dimension)
if new_sample not in random_indexes:
random_indexes.append(new_sample)
else:
# Handle this however you need.
This does add more run time, but again will be faster if you don't need too many of your samples.
The other difference is that the approach you have used generates tuples of elements that are always sorted so (1,2,3,4,5) will be an element of all_indexes
, but (5,4,3,2,1) will not. Using random.sample
can produce them in any order, so both could occur. If this is an issue, then you will have to resolve this. Perhaps by putting them into a set before adding them to the list:
new_sample = set(random.sample(range(sample_size), dimension))
answered Nov 19 at 17:09
Andrew McDowell
1,5391215
1,5391215
Thank you very much for this a good comment, the problem know that Yes I need all the random numbers of the index that mean if my sample_size is 10 then I need 10 random of numbers of index [(0, 2), (0, 1),....... if the dimension is 2
– azeez
Nov 19 at 17:40
1
at the end, I use thisrandom_indexx = random.sample ( all_indexes, sample_size )
to select my random numbers of the index that equivalent to the sample_size I did this way because I want to make sure i will not have any duplicates
– azeez
Nov 19 at 18:01
That's good, you're definitely better off sampling then. If you only needsample_size
samples, then you don't need all ofall_indexes
. Withdimension = 2
andsample_size = 10
for example, the size ofall_indexes
would be 45. In your original example,sample_size
is 100, butall_indexes
is of size 75287520.
– Andrew McDowell
Nov 19 at 20:44
add a comment |
Thank you very much for this a good comment, the problem know that Yes I need all the random numbers of the index that mean if my sample_size is 10 then I need 10 random of numbers of index [(0, 2), (0, 1),....... if the dimension is 2
– azeez
Nov 19 at 17:40
1
at the end, I use thisrandom_indexx = random.sample ( all_indexes, sample_size )
to select my random numbers of the index that equivalent to the sample_size I did this way because I want to make sure i will not have any duplicates
– azeez
Nov 19 at 18:01
That's good, you're definitely better off sampling then. If you only needsample_size
samples, then you don't need all ofall_indexes
. Withdimension = 2
andsample_size = 10
for example, the size ofall_indexes
would be 45. In your original example,sample_size
is 100, butall_indexes
is of size 75287520.
– Andrew McDowell
Nov 19 at 20:44
Thank you very much for this a good comment, the problem know that Yes I need all the random numbers of the index that mean if my sample_size is 10 then I need 10 random of numbers of index [(0, 2), (0, 1),....... if the dimension is 2
– azeez
Nov 19 at 17:40
Thank you very much for this a good comment, the problem know that Yes I need all the random numbers of the index that mean if my sample_size is 10 then I need 10 random of numbers of index [(0, 2), (0, 1),....... if the dimension is 2
– azeez
Nov 19 at 17:40
1
1
at the end, I use this
random_indexx = random.sample ( all_indexes, sample_size )
to select my random numbers of the index that equivalent to the sample_size I did this way because I want to make sure i will not have any duplicates– azeez
Nov 19 at 18:01
at the end, I use this
random_indexx = random.sample ( all_indexes, sample_size )
to select my random numbers of the index that equivalent to the sample_size I did this way because I want to make sure i will not have any duplicates– azeez
Nov 19 at 18:01
That's good, you're definitely better off sampling then. If you only need
sample_size
samples, then you don't need all of all_indexes
. With dimension = 2
and sample_size = 10
for example, the size of all_indexes
would be 45. In your original example, sample_size
is 100, but all_indexes
is of size 75287520.– Andrew McDowell
Nov 19 at 20:44
That's good, you're definitely better off sampling then. If you only need
sample_size
samples, then you don't need all of all_indexes
. With dimension = 2
and sample_size = 10
for example, the size of all_indexes
would be 45. In your original example, sample_size
is 100, but all_indexes
is of size 75287520.– Andrew McDowell
Nov 19 at 20:44
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53378699%2frandom-shuffle-very-slow-in-python-3-with-list%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
6
You're generating a list of 75287520 elements to shuffle - that's likely not to be quick... Do you definitely need to do this for what you're trying to achieve?
– Jon Clements♦
Nov 19 at 16:24
Yes, I need to have 20 dimensions for my problem, not 5, I thank the way that I doing is not the right one.
– azeez
Nov 19 at 16:25
Yeah, you should definitely rethink your approach. Even without the shuffle, this will run you out of memory well before you hit 20 dimensions.
– glibdud
Nov 19 at 16:29
It is the list creation that is consuming the whole time. Some generator ways might do faster.
– Austin
Nov 19 at 16:30
@Austin I had the same thought and tested it. The shuffle is taking longer than the list creation by a large margin.
– Andrew McDowell
Nov 19 at 16:33