Sort list of objects by a number of attributes using a generator











up vote
4
down vote

favorite












I've got a list of objects which will number somewhere between the thousands & tens of thousands. These objects could be thought of as people which I'm looking to rank based on a score they have.



So first of all they're split into groups by age, then gender etc. At each point a ranking is provided corresponding to that age/gender category. The fields on the objects are age_group and gender. So you'd first collect everybody that's got the 30-39 age group, then all the men (M) and all women (W) from that age group.



Creating a new list at each of these points is very memory intensive so I'm attempting to use a generator & itertools to group using the original list. So I've got a function to do that;



def group_standings(_standings, field):
""" sort list of standings by a given field """
getter = operator.attrgetter(field)
for k, g in itertools.groupby(_standings, getter):
yield list(g)


def calculate_positions(standings):
"""
sort standings by age_group then gender & set position based on point value
"""
for age_group in group_standings(standings, 'age_group'):

for gender_group in group_standings(age_group, 'gender'):

set_positions(
standings=gender_group,
point_field='points',
position_field='position',
)


For set_positions to function correctly it needs the whole group so that it can sort by the point_field value then set the position_field value.



Debugging the generator, groupby isn't collecting all objects matching the key as I'd expected. The output is something like;



DEBUG generating k 30-39
DEBUG generating g [<Standing object at 0x7fc86fedbe10>, <Standing object at 0x7fc86fedbe50>, <Standing object at 0x7fc86fedbe90>]

DEBUG generating k 20-29
DEBUG generating g [<Standing object at 0x7fc86fedbed0>]

DEBUG generating k 30-39
DEBUG generating g [<Standing object at 0x7fc86fedbf10>]

DEBUG generating k 20-29
DEBUG generating g [<Standing object at 0x7fc86fedbf50>, <Standing object at 0x7fc86fedbf90>, <Standing object at 0x7fc86fedbfd0>, <Standing object at 0x7fc856ecc050>, <Standing object at 0x7fc856ecc090>, <Standing object at 0x7fc856ecc0d0>, <Standing object at 0x7fc856ecc110>, <Standing object at 0x7fc856ecc150>, <Standing object at 0x7fc856ecc190>, <Standing object at 0x7fc856ecc1d0>]


To confirm, for set_positions to function, the list provided by the generator would need to contain all objects in the 20-29 age group, but as above, objects from that group are being found on multiple iterations of the list.










share|improve this question




















  • 1




    Include the shortest code necessary to reproduce it in the question itself.
    – aaron
    Nov 17 at 14:49















up vote
4
down vote

favorite












I've got a list of objects which will number somewhere between the thousands & tens of thousands. These objects could be thought of as people which I'm looking to rank based on a score they have.



So first of all they're split into groups by age, then gender etc. At each point a ranking is provided corresponding to that age/gender category. The fields on the objects are age_group and gender. So you'd first collect everybody that's got the 30-39 age group, then all the men (M) and all women (W) from that age group.



Creating a new list at each of these points is very memory intensive so I'm attempting to use a generator & itertools to group using the original list. So I've got a function to do that;



def group_standings(_standings, field):
""" sort list of standings by a given field """
getter = operator.attrgetter(field)
for k, g in itertools.groupby(_standings, getter):
yield list(g)


def calculate_positions(standings):
"""
sort standings by age_group then gender & set position based on point value
"""
for age_group in group_standings(standings, 'age_group'):

for gender_group in group_standings(age_group, 'gender'):

set_positions(
standings=gender_group,
point_field='points',
position_field='position',
)


For set_positions to function correctly it needs the whole group so that it can sort by the point_field value then set the position_field value.



Debugging the generator, groupby isn't collecting all objects matching the key as I'd expected. The output is something like;



DEBUG generating k 30-39
DEBUG generating g [<Standing object at 0x7fc86fedbe10>, <Standing object at 0x7fc86fedbe50>, <Standing object at 0x7fc86fedbe90>]

DEBUG generating k 20-29
DEBUG generating g [<Standing object at 0x7fc86fedbed0>]

DEBUG generating k 30-39
DEBUG generating g [<Standing object at 0x7fc86fedbf10>]

DEBUG generating k 20-29
DEBUG generating g [<Standing object at 0x7fc86fedbf50>, <Standing object at 0x7fc86fedbf90>, <Standing object at 0x7fc86fedbfd0>, <Standing object at 0x7fc856ecc050>, <Standing object at 0x7fc856ecc090>, <Standing object at 0x7fc856ecc0d0>, <Standing object at 0x7fc856ecc110>, <Standing object at 0x7fc856ecc150>, <Standing object at 0x7fc856ecc190>, <Standing object at 0x7fc856ecc1d0>]


To confirm, for set_positions to function, the list provided by the generator would need to contain all objects in the 20-29 age group, but as above, objects from that group are being found on multiple iterations of the list.










share|improve this question




















  • 1




    Include the shortest code necessary to reproduce it in the question itself.
    – aaron
    Nov 17 at 14:49













up vote
4
down vote

favorite









up vote
4
down vote

favorite











I've got a list of objects which will number somewhere between the thousands & tens of thousands. These objects could be thought of as people which I'm looking to rank based on a score they have.



So first of all they're split into groups by age, then gender etc. At each point a ranking is provided corresponding to that age/gender category. The fields on the objects are age_group and gender. So you'd first collect everybody that's got the 30-39 age group, then all the men (M) and all women (W) from that age group.



Creating a new list at each of these points is very memory intensive so I'm attempting to use a generator & itertools to group using the original list. So I've got a function to do that;



def group_standings(_standings, field):
""" sort list of standings by a given field """
getter = operator.attrgetter(field)
for k, g in itertools.groupby(_standings, getter):
yield list(g)


def calculate_positions(standings):
"""
sort standings by age_group then gender & set position based on point value
"""
for age_group in group_standings(standings, 'age_group'):

for gender_group in group_standings(age_group, 'gender'):

set_positions(
standings=gender_group,
point_field='points',
position_field='position',
)


For set_positions to function correctly it needs the whole group so that it can sort by the point_field value then set the position_field value.



Debugging the generator, groupby isn't collecting all objects matching the key as I'd expected. The output is something like;



DEBUG generating k 30-39
DEBUG generating g [<Standing object at 0x7fc86fedbe10>, <Standing object at 0x7fc86fedbe50>, <Standing object at 0x7fc86fedbe90>]

DEBUG generating k 20-29
DEBUG generating g [<Standing object at 0x7fc86fedbed0>]

DEBUG generating k 30-39
DEBUG generating g [<Standing object at 0x7fc86fedbf10>]

DEBUG generating k 20-29
DEBUG generating g [<Standing object at 0x7fc86fedbf50>, <Standing object at 0x7fc86fedbf90>, <Standing object at 0x7fc86fedbfd0>, <Standing object at 0x7fc856ecc050>, <Standing object at 0x7fc856ecc090>, <Standing object at 0x7fc856ecc0d0>, <Standing object at 0x7fc856ecc110>, <Standing object at 0x7fc856ecc150>, <Standing object at 0x7fc856ecc190>, <Standing object at 0x7fc856ecc1d0>]


To confirm, for set_positions to function, the list provided by the generator would need to contain all objects in the 20-29 age group, but as above, objects from that group are being found on multiple iterations of the list.










share|improve this question















I've got a list of objects which will number somewhere between the thousands & tens of thousands. These objects could be thought of as people which I'm looking to rank based on a score they have.



So first of all they're split into groups by age, then gender etc. At each point a ranking is provided corresponding to that age/gender category. The fields on the objects are age_group and gender. So you'd first collect everybody that's got the 30-39 age group, then all the men (M) and all women (W) from that age group.



Creating a new list at each of these points is very memory intensive so I'm attempting to use a generator & itertools to group using the original list. So I've got a function to do that;



def group_standings(_standings, field):
""" sort list of standings by a given field """
getter = operator.attrgetter(field)
for k, g in itertools.groupby(_standings, getter):
yield list(g)


def calculate_positions(standings):
"""
sort standings by age_group then gender & set position based on point value
"""
for age_group in group_standings(standings, 'age_group'):

for gender_group in group_standings(age_group, 'gender'):

set_positions(
standings=gender_group,
point_field='points',
position_field='position',
)


For set_positions to function correctly it needs the whole group so that it can sort by the point_field value then set the position_field value.



Debugging the generator, groupby isn't collecting all objects matching the key as I'd expected. The output is something like;



DEBUG generating k 30-39
DEBUG generating g [<Standing object at 0x7fc86fedbe10>, <Standing object at 0x7fc86fedbe50>, <Standing object at 0x7fc86fedbe90>]

DEBUG generating k 20-29
DEBUG generating g [<Standing object at 0x7fc86fedbed0>]

DEBUG generating k 30-39
DEBUG generating g [<Standing object at 0x7fc86fedbf10>]

DEBUG generating k 20-29
DEBUG generating g [<Standing object at 0x7fc86fedbf50>, <Standing object at 0x7fc86fedbf90>, <Standing object at 0x7fc86fedbfd0>, <Standing object at 0x7fc856ecc050>, <Standing object at 0x7fc856ecc090>, <Standing object at 0x7fc856ecc0d0>, <Standing object at 0x7fc856ecc110>, <Standing object at 0x7fc856ecc150>, <Standing object at 0x7fc856ecc190>, <Standing object at 0x7fc856ecc1d0>]


To confirm, for set_positions to function, the list provided by the generator would need to contain all objects in the 20-29 age group, but as above, objects from that group are being found on multiple iterations of the list.







python generator itertools






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 17 at 14:51

























asked Nov 17 at 14:20









markwalker_

4,31853572




4,31853572








  • 1




    Include the shortest code necessary to reproduce it in the question itself.
    – aaron
    Nov 17 at 14:49














  • 1




    Include the shortest code necessary to reproduce it in the question itself.
    – aaron
    Nov 17 at 14:49








1




1




Include the shortest code necessary to reproduce it in the question itself.
– aaron
Nov 17 at 14:49




Include the shortest code necessary to reproduce it in the question itself.
– aaron
Nov 17 at 14:49












2 Answers
2






active

oldest

votes

















up vote
4
down vote



accepted










It happens because groupby function assumes that the input iterable is already sorted by key (see documentation). It's made for performance but is confusing.
Also, I wouldn't cast g to a list in group_standings function but applied it only when you pass gender_group to set_positions.






share|improve this answer





















  • Thank you, that's obvious looking at it now, but completely missed that in the docs when we implemented this yesterday.
    – markwalker_
    Nov 17 at 23:24










  • You're welcome.
    – Mikhail Berlinkov
    Nov 18 at 2:12


















up vote
1
down vote














groupby works on adjacent elements



As per @MikhailBerlinkov's answer, groupby aggregates consecutive items only which are the same, optionally using a key argument for comparison.



It may help to see an example:



from itertools import groupby

L = [1, 1, 1, 2, 2, 2, 1, 1]

res = [list(j) for _, j in groupby(L)]

[[1, 1, 1], [2, 2, 2], [1, 1]]


As you can see, the groups of 1 values are split into two separate lists.



Sort before grouping



Instead, you can sort your list of objects before grouping. For a large list of objects, say of length n, this takes O(n log n) time. Here's an example (using same L as before):



L_sorted = sorted(L)

res = [list(j) for i, j in groupby(L_sorted)]

[[1, 1, 1, 1, 1], [2, 2, 2]]





share|improve this answer





















    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














     

    draft saved


    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53352076%2fsort-list-of-objects-by-a-number-of-attributes-using-a-generator%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    4
    down vote



    accepted










    It happens because groupby function assumes that the input iterable is already sorted by key (see documentation). It's made for performance but is confusing.
    Also, I wouldn't cast g to a list in group_standings function but applied it only when you pass gender_group to set_positions.






    share|improve this answer





















    • Thank you, that's obvious looking at it now, but completely missed that in the docs when we implemented this yesterday.
      – markwalker_
      Nov 17 at 23:24










    • You're welcome.
      – Mikhail Berlinkov
      Nov 18 at 2:12















    up vote
    4
    down vote



    accepted










    It happens because groupby function assumes that the input iterable is already sorted by key (see documentation). It's made for performance but is confusing.
    Also, I wouldn't cast g to a list in group_standings function but applied it only when you pass gender_group to set_positions.






    share|improve this answer





















    • Thank you, that's obvious looking at it now, but completely missed that in the docs when we implemented this yesterday.
      – markwalker_
      Nov 17 at 23:24










    • You're welcome.
      – Mikhail Berlinkov
      Nov 18 at 2:12













    up vote
    4
    down vote



    accepted







    up vote
    4
    down vote



    accepted






    It happens because groupby function assumes that the input iterable is already sorted by key (see documentation). It's made for performance but is confusing.
    Also, I wouldn't cast g to a list in group_standings function but applied it only when you pass gender_group to set_positions.






    share|improve this answer












    It happens because groupby function assumes that the input iterable is already sorted by key (see documentation). It's made for performance but is confusing.
    Also, I wouldn't cast g to a list in group_standings function but applied it only when you pass gender_group to set_positions.







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Nov 17 at 15:13









    Mikhail Berlinkov

    74447




    74447












    • Thank you, that's obvious looking at it now, but completely missed that in the docs when we implemented this yesterday.
      – markwalker_
      Nov 17 at 23:24










    • You're welcome.
      – Mikhail Berlinkov
      Nov 18 at 2:12


















    • Thank you, that's obvious looking at it now, but completely missed that in the docs when we implemented this yesterday.
      – markwalker_
      Nov 17 at 23:24










    • You're welcome.
      – Mikhail Berlinkov
      Nov 18 at 2:12
















    Thank you, that's obvious looking at it now, but completely missed that in the docs when we implemented this yesterday.
    – markwalker_
    Nov 17 at 23:24




    Thank you, that's obvious looking at it now, but completely missed that in the docs when we implemented this yesterday.
    – markwalker_
    Nov 17 at 23:24












    You're welcome.
    – Mikhail Berlinkov
    Nov 18 at 2:12




    You're welcome.
    – Mikhail Berlinkov
    Nov 18 at 2:12












    up vote
    1
    down vote














    groupby works on adjacent elements



    As per @MikhailBerlinkov's answer, groupby aggregates consecutive items only which are the same, optionally using a key argument for comparison.



    It may help to see an example:



    from itertools import groupby

    L = [1, 1, 1, 2, 2, 2, 1, 1]

    res = [list(j) for _, j in groupby(L)]

    [[1, 1, 1], [2, 2, 2], [1, 1]]


    As you can see, the groups of 1 values are split into two separate lists.



    Sort before grouping



    Instead, you can sort your list of objects before grouping. For a large list of objects, say of length n, this takes O(n log n) time. Here's an example (using same L as before):



    L_sorted = sorted(L)

    res = [list(j) for i, j in groupby(L_sorted)]

    [[1, 1, 1, 1, 1], [2, 2, 2]]





    share|improve this answer

























      up vote
      1
      down vote














      groupby works on adjacent elements



      As per @MikhailBerlinkov's answer, groupby aggregates consecutive items only which are the same, optionally using a key argument for comparison.



      It may help to see an example:



      from itertools import groupby

      L = [1, 1, 1, 2, 2, 2, 1, 1]

      res = [list(j) for _, j in groupby(L)]

      [[1, 1, 1], [2, 2, 2], [1, 1]]


      As you can see, the groups of 1 values are split into two separate lists.



      Sort before grouping



      Instead, you can sort your list of objects before grouping. For a large list of objects, say of length n, this takes O(n log n) time. Here's an example (using same L as before):



      L_sorted = sorted(L)

      res = [list(j) for i, j in groupby(L_sorted)]

      [[1, 1, 1, 1, 1], [2, 2, 2]]





      share|improve this answer























        up vote
        1
        down vote










        up vote
        1
        down vote










        groupby works on adjacent elements



        As per @MikhailBerlinkov's answer, groupby aggregates consecutive items only which are the same, optionally using a key argument for comparison.



        It may help to see an example:



        from itertools import groupby

        L = [1, 1, 1, 2, 2, 2, 1, 1]

        res = [list(j) for _, j in groupby(L)]

        [[1, 1, 1], [2, 2, 2], [1, 1]]


        As you can see, the groups of 1 values are split into two separate lists.



        Sort before grouping



        Instead, you can sort your list of objects before grouping. For a large list of objects, say of length n, this takes O(n log n) time. Here's an example (using same L as before):



        L_sorted = sorted(L)

        res = [list(j) for i, j in groupby(L_sorted)]

        [[1, 1, 1, 1, 1], [2, 2, 2]]





        share|improve this answer













        groupby works on adjacent elements



        As per @MikhailBerlinkov's answer, groupby aggregates consecutive items only which are the same, optionally using a key argument for comparison.



        It may help to see an example:



        from itertools import groupby

        L = [1, 1, 1, 2, 2, 2, 1, 1]

        res = [list(j) for _, j in groupby(L)]

        [[1, 1, 1], [2, 2, 2], [1, 1]]


        As you can see, the groups of 1 values are split into two separate lists.



        Sort before grouping



        Instead, you can sort your list of objects before grouping. For a large list of objects, say of length n, this takes O(n log n) time. Here's an example (using same L as before):



        L_sorted = sorted(L)

        res = [list(j) for i, j in groupby(L_sorted)]

        [[1, 1, 1, 1, 1], [2, 2, 2]]






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 17 at 17:19









        jpp

        83.1k194896




        83.1k194896






























             

            draft saved


            draft discarded



















































             


            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53352076%2fsort-list-of-objects-by-a-number-of-attributes-using-a-generator%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            "Incorrect syntax near the keyword 'ON'. (on update cascade, on delete cascade,)

            Alcedinidae

            RAC Tourist Trophy