Sort list of objects by a number of attributes using a generator
up vote
4
down vote
favorite
I've got a list of objects which will number somewhere between the thousands & tens of thousands. These objects could be thought of as people which I'm looking to rank based on a score they have.
So first of all they're split into groups by age, then gender etc. At each point a ranking is provided corresponding to that age/gender category. The fields on the objects are age_group
and gender
. So you'd first collect everybody that's got the 30-39
age group, then all the men (M
) and all women (W
) from that age group.
Creating a new list at each of these points is very memory intensive so I'm attempting to use a generator & itertools to group using the original list. So I've got a function to do that;
def group_standings(_standings, field):
""" sort list of standings by a given field """
getter = operator.attrgetter(field)
for k, g in itertools.groupby(_standings, getter):
yield list(g)
def calculate_positions(standings):
"""
sort standings by age_group then gender & set position based on point value
"""
for age_group in group_standings(standings, 'age_group'):
for gender_group in group_standings(age_group, 'gender'):
set_positions(
standings=gender_group,
point_field='points',
position_field='position',
)
For set_positions
to function correctly it needs the whole group so that it can sort by the point_field
value then set the position_field
value.
Debugging the generator, groupby
isn't collecting all objects matching the key as I'd expected. The output is something like;
DEBUG generating k 30-39
DEBUG generating g [<Standing object at 0x7fc86fedbe10>, <Standing object at 0x7fc86fedbe50>, <Standing object at 0x7fc86fedbe90>]
DEBUG generating k 20-29
DEBUG generating g [<Standing object at 0x7fc86fedbed0>]
DEBUG generating k 30-39
DEBUG generating g [<Standing object at 0x7fc86fedbf10>]
DEBUG generating k 20-29
DEBUG generating g [<Standing object at 0x7fc86fedbf50>, <Standing object at 0x7fc86fedbf90>, <Standing object at 0x7fc86fedbfd0>, <Standing object at 0x7fc856ecc050>, <Standing object at 0x7fc856ecc090>, <Standing object at 0x7fc856ecc0d0>, <Standing object at 0x7fc856ecc110>, <Standing object at 0x7fc856ecc150>, <Standing object at 0x7fc856ecc190>, <Standing object at 0x7fc856ecc1d0>]
To confirm, for set_positions
to function, the list provided by the generator would need to contain all objects in the 20-29
age group, but as above, objects from that group are being found on multiple iterations of the list.
python generator itertools
add a comment |
up vote
4
down vote
favorite
I've got a list of objects which will number somewhere between the thousands & tens of thousands. These objects could be thought of as people which I'm looking to rank based on a score they have.
So first of all they're split into groups by age, then gender etc. At each point a ranking is provided corresponding to that age/gender category. The fields on the objects are age_group
and gender
. So you'd first collect everybody that's got the 30-39
age group, then all the men (M
) and all women (W
) from that age group.
Creating a new list at each of these points is very memory intensive so I'm attempting to use a generator & itertools to group using the original list. So I've got a function to do that;
def group_standings(_standings, field):
""" sort list of standings by a given field """
getter = operator.attrgetter(field)
for k, g in itertools.groupby(_standings, getter):
yield list(g)
def calculate_positions(standings):
"""
sort standings by age_group then gender & set position based on point value
"""
for age_group in group_standings(standings, 'age_group'):
for gender_group in group_standings(age_group, 'gender'):
set_positions(
standings=gender_group,
point_field='points',
position_field='position',
)
For set_positions
to function correctly it needs the whole group so that it can sort by the point_field
value then set the position_field
value.
Debugging the generator, groupby
isn't collecting all objects matching the key as I'd expected. The output is something like;
DEBUG generating k 30-39
DEBUG generating g [<Standing object at 0x7fc86fedbe10>, <Standing object at 0x7fc86fedbe50>, <Standing object at 0x7fc86fedbe90>]
DEBUG generating k 20-29
DEBUG generating g [<Standing object at 0x7fc86fedbed0>]
DEBUG generating k 30-39
DEBUG generating g [<Standing object at 0x7fc86fedbf10>]
DEBUG generating k 20-29
DEBUG generating g [<Standing object at 0x7fc86fedbf50>, <Standing object at 0x7fc86fedbf90>, <Standing object at 0x7fc86fedbfd0>, <Standing object at 0x7fc856ecc050>, <Standing object at 0x7fc856ecc090>, <Standing object at 0x7fc856ecc0d0>, <Standing object at 0x7fc856ecc110>, <Standing object at 0x7fc856ecc150>, <Standing object at 0x7fc856ecc190>, <Standing object at 0x7fc856ecc1d0>]
To confirm, for set_positions
to function, the list provided by the generator would need to contain all objects in the 20-29
age group, but as above, objects from that group are being found on multiple iterations of the list.
python generator itertools
1
Include the shortest code necessary to reproduce it in the question itself.
– aaron
Nov 17 at 14:49
add a comment |
up vote
4
down vote
favorite
up vote
4
down vote
favorite
I've got a list of objects which will number somewhere between the thousands & tens of thousands. These objects could be thought of as people which I'm looking to rank based on a score they have.
So first of all they're split into groups by age, then gender etc. At each point a ranking is provided corresponding to that age/gender category. The fields on the objects are age_group
and gender
. So you'd first collect everybody that's got the 30-39
age group, then all the men (M
) and all women (W
) from that age group.
Creating a new list at each of these points is very memory intensive so I'm attempting to use a generator & itertools to group using the original list. So I've got a function to do that;
def group_standings(_standings, field):
""" sort list of standings by a given field """
getter = operator.attrgetter(field)
for k, g in itertools.groupby(_standings, getter):
yield list(g)
def calculate_positions(standings):
"""
sort standings by age_group then gender & set position based on point value
"""
for age_group in group_standings(standings, 'age_group'):
for gender_group in group_standings(age_group, 'gender'):
set_positions(
standings=gender_group,
point_field='points',
position_field='position',
)
For set_positions
to function correctly it needs the whole group so that it can sort by the point_field
value then set the position_field
value.
Debugging the generator, groupby
isn't collecting all objects matching the key as I'd expected. The output is something like;
DEBUG generating k 30-39
DEBUG generating g [<Standing object at 0x7fc86fedbe10>, <Standing object at 0x7fc86fedbe50>, <Standing object at 0x7fc86fedbe90>]
DEBUG generating k 20-29
DEBUG generating g [<Standing object at 0x7fc86fedbed0>]
DEBUG generating k 30-39
DEBUG generating g [<Standing object at 0x7fc86fedbf10>]
DEBUG generating k 20-29
DEBUG generating g [<Standing object at 0x7fc86fedbf50>, <Standing object at 0x7fc86fedbf90>, <Standing object at 0x7fc86fedbfd0>, <Standing object at 0x7fc856ecc050>, <Standing object at 0x7fc856ecc090>, <Standing object at 0x7fc856ecc0d0>, <Standing object at 0x7fc856ecc110>, <Standing object at 0x7fc856ecc150>, <Standing object at 0x7fc856ecc190>, <Standing object at 0x7fc856ecc1d0>]
To confirm, for set_positions
to function, the list provided by the generator would need to contain all objects in the 20-29
age group, but as above, objects from that group are being found on multiple iterations of the list.
python generator itertools
I've got a list of objects which will number somewhere between the thousands & tens of thousands. These objects could be thought of as people which I'm looking to rank based on a score they have.
So first of all they're split into groups by age, then gender etc. At each point a ranking is provided corresponding to that age/gender category. The fields on the objects are age_group
and gender
. So you'd first collect everybody that's got the 30-39
age group, then all the men (M
) and all women (W
) from that age group.
Creating a new list at each of these points is very memory intensive so I'm attempting to use a generator & itertools to group using the original list. So I've got a function to do that;
def group_standings(_standings, field):
""" sort list of standings by a given field """
getter = operator.attrgetter(field)
for k, g in itertools.groupby(_standings, getter):
yield list(g)
def calculate_positions(standings):
"""
sort standings by age_group then gender & set position based on point value
"""
for age_group in group_standings(standings, 'age_group'):
for gender_group in group_standings(age_group, 'gender'):
set_positions(
standings=gender_group,
point_field='points',
position_field='position',
)
For set_positions
to function correctly it needs the whole group so that it can sort by the point_field
value then set the position_field
value.
Debugging the generator, groupby
isn't collecting all objects matching the key as I'd expected. The output is something like;
DEBUG generating k 30-39
DEBUG generating g [<Standing object at 0x7fc86fedbe10>, <Standing object at 0x7fc86fedbe50>, <Standing object at 0x7fc86fedbe90>]
DEBUG generating k 20-29
DEBUG generating g [<Standing object at 0x7fc86fedbed0>]
DEBUG generating k 30-39
DEBUG generating g [<Standing object at 0x7fc86fedbf10>]
DEBUG generating k 20-29
DEBUG generating g [<Standing object at 0x7fc86fedbf50>, <Standing object at 0x7fc86fedbf90>, <Standing object at 0x7fc86fedbfd0>, <Standing object at 0x7fc856ecc050>, <Standing object at 0x7fc856ecc090>, <Standing object at 0x7fc856ecc0d0>, <Standing object at 0x7fc856ecc110>, <Standing object at 0x7fc856ecc150>, <Standing object at 0x7fc856ecc190>, <Standing object at 0x7fc856ecc1d0>]
To confirm, for set_positions
to function, the list provided by the generator would need to contain all objects in the 20-29
age group, but as above, objects from that group are being found on multiple iterations of the list.
python generator itertools
python generator itertools
edited Nov 17 at 14:51
asked Nov 17 at 14:20
markwalker_
4,31853572
4,31853572
1
Include the shortest code necessary to reproduce it in the question itself.
– aaron
Nov 17 at 14:49
add a comment |
1
Include the shortest code necessary to reproduce it in the question itself.
– aaron
Nov 17 at 14:49
1
1
Include the shortest code necessary to reproduce it in the question itself.
– aaron
Nov 17 at 14:49
Include the shortest code necessary to reproduce it in the question itself.
– aaron
Nov 17 at 14:49
add a comment |
2 Answers
2
active
oldest
votes
up vote
4
down vote
accepted
It happens because groupby function assumes that the input iterable is already sorted by key (see documentation). It's made for performance but is confusing.
Also, I wouldn't cast g
to a list in group_standings
function but applied it only when you pass gender_group
to set_positions
.
Thank you, that's obvious looking at it now, but completely missed that in the docs when we implemented this yesterday.
– markwalker_
Nov 17 at 23:24
You're welcome.
– Mikhail Berlinkov
Nov 18 at 2:12
add a comment |
up vote
1
down vote
groupby
works on adjacent elements
As per @MikhailBerlinkov's answer, groupby
aggregates consecutive items only which are the same, optionally using a key
argument for comparison.
It may help to see an example:
from itertools import groupby
L = [1, 1, 1, 2, 2, 2, 1, 1]
res = [list(j) for _, j in groupby(L)]
[[1, 1, 1], [2, 2, 2], [1, 1]]
As you can see, the groups of 1
values are split into two separate lists.
Sort before grouping
Instead, you can sort your list of objects before grouping. For a large list of objects, say of length n, this takes O(n log n) time. Here's an example (using same L
as before):
L_sorted = sorted(L)
res = [list(j) for i, j in groupby(L_sorted)]
[[1, 1, 1, 1, 1], [2, 2, 2]]
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
4
down vote
accepted
It happens because groupby function assumes that the input iterable is already sorted by key (see documentation). It's made for performance but is confusing.
Also, I wouldn't cast g
to a list in group_standings
function but applied it only when you pass gender_group
to set_positions
.
Thank you, that's obvious looking at it now, but completely missed that in the docs when we implemented this yesterday.
– markwalker_
Nov 17 at 23:24
You're welcome.
– Mikhail Berlinkov
Nov 18 at 2:12
add a comment |
up vote
4
down vote
accepted
It happens because groupby function assumes that the input iterable is already sorted by key (see documentation). It's made for performance but is confusing.
Also, I wouldn't cast g
to a list in group_standings
function but applied it only when you pass gender_group
to set_positions
.
Thank you, that's obvious looking at it now, but completely missed that in the docs when we implemented this yesterday.
– markwalker_
Nov 17 at 23:24
You're welcome.
– Mikhail Berlinkov
Nov 18 at 2:12
add a comment |
up vote
4
down vote
accepted
up vote
4
down vote
accepted
It happens because groupby function assumes that the input iterable is already sorted by key (see documentation). It's made for performance but is confusing.
Also, I wouldn't cast g
to a list in group_standings
function but applied it only when you pass gender_group
to set_positions
.
It happens because groupby function assumes that the input iterable is already sorted by key (see documentation). It's made for performance but is confusing.
Also, I wouldn't cast g
to a list in group_standings
function but applied it only when you pass gender_group
to set_positions
.
answered Nov 17 at 15:13
Mikhail Berlinkov
74447
74447
Thank you, that's obvious looking at it now, but completely missed that in the docs when we implemented this yesterday.
– markwalker_
Nov 17 at 23:24
You're welcome.
– Mikhail Berlinkov
Nov 18 at 2:12
add a comment |
Thank you, that's obvious looking at it now, but completely missed that in the docs when we implemented this yesterday.
– markwalker_
Nov 17 at 23:24
You're welcome.
– Mikhail Berlinkov
Nov 18 at 2:12
Thank you, that's obvious looking at it now, but completely missed that in the docs when we implemented this yesterday.
– markwalker_
Nov 17 at 23:24
Thank you, that's obvious looking at it now, but completely missed that in the docs when we implemented this yesterday.
– markwalker_
Nov 17 at 23:24
You're welcome.
– Mikhail Berlinkov
Nov 18 at 2:12
You're welcome.
– Mikhail Berlinkov
Nov 18 at 2:12
add a comment |
up vote
1
down vote
groupby
works on adjacent elements
As per @MikhailBerlinkov's answer, groupby
aggregates consecutive items only which are the same, optionally using a key
argument for comparison.
It may help to see an example:
from itertools import groupby
L = [1, 1, 1, 2, 2, 2, 1, 1]
res = [list(j) for _, j in groupby(L)]
[[1, 1, 1], [2, 2, 2], [1, 1]]
As you can see, the groups of 1
values are split into two separate lists.
Sort before grouping
Instead, you can sort your list of objects before grouping. For a large list of objects, say of length n, this takes O(n log n) time. Here's an example (using same L
as before):
L_sorted = sorted(L)
res = [list(j) for i, j in groupby(L_sorted)]
[[1, 1, 1, 1, 1], [2, 2, 2]]
add a comment |
up vote
1
down vote
groupby
works on adjacent elements
As per @MikhailBerlinkov's answer, groupby
aggregates consecutive items only which are the same, optionally using a key
argument for comparison.
It may help to see an example:
from itertools import groupby
L = [1, 1, 1, 2, 2, 2, 1, 1]
res = [list(j) for _, j in groupby(L)]
[[1, 1, 1], [2, 2, 2], [1, 1]]
As you can see, the groups of 1
values are split into two separate lists.
Sort before grouping
Instead, you can sort your list of objects before grouping. For a large list of objects, say of length n, this takes O(n log n) time. Here's an example (using same L
as before):
L_sorted = sorted(L)
res = [list(j) for i, j in groupby(L_sorted)]
[[1, 1, 1, 1, 1], [2, 2, 2]]
add a comment |
up vote
1
down vote
up vote
1
down vote
groupby
works on adjacent elements
As per @MikhailBerlinkov's answer, groupby
aggregates consecutive items only which are the same, optionally using a key
argument for comparison.
It may help to see an example:
from itertools import groupby
L = [1, 1, 1, 2, 2, 2, 1, 1]
res = [list(j) for _, j in groupby(L)]
[[1, 1, 1], [2, 2, 2], [1, 1]]
As you can see, the groups of 1
values are split into two separate lists.
Sort before grouping
Instead, you can sort your list of objects before grouping. For a large list of objects, say of length n, this takes O(n log n) time. Here's an example (using same L
as before):
L_sorted = sorted(L)
res = [list(j) for i, j in groupby(L_sorted)]
[[1, 1, 1, 1, 1], [2, 2, 2]]
groupby
works on adjacent elements
As per @MikhailBerlinkov's answer, groupby
aggregates consecutive items only which are the same, optionally using a key
argument for comparison.
It may help to see an example:
from itertools import groupby
L = [1, 1, 1, 2, 2, 2, 1, 1]
res = [list(j) for _, j in groupby(L)]
[[1, 1, 1], [2, 2, 2], [1, 1]]
As you can see, the groups of 1
values are split into two separate lists.
Sort before grouping
Instead, you can sort your list of objects before grouping. For a large list of objects, say of length n, this takes O(n log n) time. Here's an example (using same L
as before):
L_sorted = sorted(L)
res = [list(j) for i, j in groupby(L_sorted)]
[[1, 1, 1, 1, 1], [2, 2, 2]]
answered Nov 17 at 17:19
jpp
83.1k194896
83.1k194896
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53352076%2fsort-list-of-objects-by-a-number-of-attributes-using-a-generator%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Include the shortest code necessary to reproduce it in the question itself.
– aaron
Nov 17 at 14:49