Python 3 pandas.groupby.filter












7















I am trying to perform a groupby filter that is very similar to the example in this documentation: pandas groupby filter



>>> df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
... 'foo', 'bar'],
... 'B' : [1, 2, 3, 4, 5, 6],
... 'C' : [2.0, 5., 8., 1., 2., 9.]})
>>> grouped = df.groupby('A')
>>> grouped.filter(lambda x: x['B'].mean() > 3.)
A B C
1 bar 2 5.0
3 bar 4 1.0
5 bar 6 9.0


I am trying to return a DataFrame that has all 3 columns, but only 2 rows. Those 2 rows contain the minimum values of column B, after grouping by column A. I tried the following line of code:



grouped.filter(lambda x: x['B'] == x['B'].min())


But this doesn't work, and I get this error:
TypeError: filter function returned a Series, but expected a scalar bool



The DataFrame I am trying to return should look like this:



    A   B   C
0 foo 1 2.0
1 bar 2 5.0


I would appreciate any help you can provide. Thank you, in advance, for your help.










share|improve this question




















  • 1





    The doc string reading can seem a bit ambiguous: "Return a copy of a DataFrame excluding elements from groups that do not satisfy..." You aren't excluding elements from groups, you are excluding elements from the DataFrame of groups that do not satisfy the single condition.

    – ALollz
    5 hours ago











  • Thank you for help.

    – FinProg
    1 hour ago











  • @ALollz: please file a docbug to improve the docstring

    – smci
    1 hour ago
















7















I am trying to perform a groupby filter that is very similar to the example in this documentation: pandas groupby filter



>>> df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
... 'foo', 'bar'],
... 'B' : [1, 2, 3, 4, 5, 6],
... 'C' : [2.0, 5., 8., 1., 2., 9.]})
>>> grouped = df.groupby('A')
>>> grouped.filter(lambda x: x['B'].mean() > 3.)
A B C
1 bar 2 5.0
3 bar 4 1.0
5 bar 6 9.0


I am trying to return a DataFrame that has all 3 columns, but only 2 rows. Those 2 rows contain the minimum values of column B, after grouping by column A. I tried the following line of code:



grouped.filter(lambda x: x['B'] == x['B'].min())


But this doesn't work, and I get this error:
TypeError: filter function returned a Series, but expected a scalar bool



The DataFrame I am trying to return should look like this:



    A   B   C
0 foo 1 2.0
1 bar 2 5.0


I would appreciate any help you can provide. Thank you, in advance, for your help.










share|improve this question




















  • 1





    The doc string reading can seem a bit ambiguous: "Return a copy of a DataFrame excluding elements from groups that do not satisfy..." You aren't excluding elements from groups, you are excluding elements from the DataFrame of groups that do not satisfy the single condition.

    – ALollz
    5 hours ago











  • Thank you for help.

    – FinProg
    1 hour ago











  • @ALollz: please file a docbug to improve the docstring

    – smci
    1 hour ago














7












7








7








I am trying to perform a groupby filter that is very similar to the example in this documentation: pandas groupby filter



>>> df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
... 'foo', 'bar'],
... 'B' : [1, 2, 3, 4, 5, 6],
... 'C' : [2.0, 5., 8., 1., 2., 9.]})
>>> grouped = df.groupby('A')
>>> grouped.filter(lambda x: x['B'].mean() > 3.)
A B C
1 bar 2 5.0
3 bar 4 1.0
5 bar 6 9.0


I am trying to return a DataFrame that has all 3 columns, but only 2 rows. Those 2 rows contain the minimum values of column B, after grouping by column A. I tried the following line of code:



grouped.filter(lambda x: x['B'] == x['B'].min())


But this doesn't work, and I get this error:
TypeError: filter function returned a Series, but expected a scalar bool



The DataFrame I am trying to return should look like this:



    A   B   C
0 foo 1 2.0
1 bar 2 5.0


I would appreciate any help you can provide. Thank you, in advance, for your help.










share|improve this question
















I am trying to perform a groupby filter that is very similar to the example in this documentation: pandas groupby filter



>>> df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
... 'foo', 'bar'],
... 'B' : [1, 2, 3, 4, 5, 6],
... 'C' : [2.0, 5., 8., 1., 2., 9.]})
>>> grouped = df.groupby('A')
>>> grouped.filter(lambda x: x['B'].mean() > 3.)
A B C
1 bar 2 5.0
3 bar 4 1.0
5 bar 6 9.0


I am trying to return a DataFrame that has all 3 columns, but only 2 rows. Those 2 rows contain the minimum values of column B, after grouping by column A. I tried the following line of code:



grouped.filter(lambda x: x['B'] == x['B'].min())


But this doesn't work, and I get this error:
TypeError: filter function returned a Series, but expected a scalar bool



The DataFrame I am trying to return should look like this:



    A   B   C
0 foo 1 2.0
1 bar 2 5.0


I would appreciate any help you can provide. Thank you, in advance, for your help.







python pandas dataframe






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 1 hour ago









weliketocode

522411




522411










asked 6 hours ago









FinProgFinProg

404




404








  • 1





    The doc string reading can seem a bit ambiguous: "Return a copy of a DataFrame excluding elements from groups that do not satisfy..." You aren't excluding elements from groups, you are excluding elements from the DataFrame of groups that do not satisfy the single condition.

    – ALollz
    5 hours ago











  • Thank you for help.

    – FinProg
    1 hour ago











  • @ALollz: please file a docbug to improve the docstring

    – smci
    1 hour ago














  • 1





    The doc string reading can seem a bit ambiguous: "Return a copy of a DataFrame excluding elements from groups that do not satisfy..." You aren't excluding elements from groups, you are excluding elements from the DataFrame of groups that do not satisfy the single condition.

    – ALollz
    5 hours ago











  • Thank you for help.

    – FinProg
    1 hour ago











  • @ALollz: please file a docbug to improve the docstring

    – smci
    1 hour ago








1




1





The doc string reading can seem a bit ambiguous: "Return a copy of a DataFrame excluding elements from groups that do not satisfy..." You aren't excluding elements from groups, you are excluding elements from the DataFrame of groups that do not satisfy the single condition.

– ALollz
5 hours ago





The doc string reading can seem a bit ambiguous: "Return a copy of a DataFrame excluding elements from groups that do not satisfy..." You aren't excluding elements from groups, you are excluding elements from the DataFrame of groups that do not satisfy the single condition.

– ALollz
5 hours ago













Thank you for help.

– FinProg
1 hour ago





Thank you for help.

– FinProg
1 hour ago













@ALollz: please file a docbug to improve the docstring

– smci
1 hour ago





@ALollz: please file a docbug to improve the docstring

– smci
1 hour ago












5 Answers
5






active

oldest

votes


















1














>>> df.loc[df.groupby('A')['B'].idxmin()]

A B C
1 bar 2 5.0
0 foo 1 2.0





share|improve this answer
























  • Thank you very much for your solution.

    – FinProg
    1 hour ago



















3














df.groupby('A').apply(lambda x: x.loc[x['B'].idxmin(), ['B','C']]).reset_index()





share|improve this answer
























  • Thank you very much for your solution.

    – FinProg
    1 hour ago



















3














No need groupby :-)



df.sort_values('B').drop_duplicates('A')
Out[288]:
A B C
0 foo 1 2.0
1 bar 2 5.0





share|improve this answer































    3














    There's a fundamental difference: In the documentation example, there is a single Boolean value per group. That is, you return the entire group if the mean is greater than 3. In your example, you want to filter specific rows within a group.



    For your task the usual trick is to sort values and use .head or .tail to filter to the row with the smallest or largest value respectively:



    df.sort_values('B').groupby('A').head(1)

    # A B C
    #0 foo 1 2.0
    #1 bar 2 5.0


    For more complicated queries you can use .transform or .apply to create a Boolean Series to slice. Also in this case safer if multiple rows share the minimum and you need all of them:



    df[df.groupby('A').B.transform(lambda x: x == x.min())]

    # A B C
    #0 foo 1 2.0
    #1 bar 2 5.0





    share|improve this answer


























    • Thank you very much for your solutions. I really appreciate your help.

      – FinProg
      1 hour ago



















    2














    The short answer:



    grouped.apply(lambda x: x[x['B'] == x['B']].min())




    ... and the longer one:



    Your grouped object has 2 groups:



    In[25]: for df in grouped:
    ...: print(df)
    ...:
    ('bar',
    A B C
    1 bar 2 5.0
    3 bar 4 1.0
    5 bar 6 9.0)

    ('foo',
    A B C
    0 foo 1 2.0
    2 foo 3 8.0
    4 foo 5 2.0)


    filter() method for GroupBy object is for filtering groups as entities, NOT for filtering their individual rows. So using the filter() method, you may obtain only 4 results:




    • an empty DataFrame (0 rows),

    • rows of the group 'bar' (3 rows),

    • rows of the group 'foo' (3 rows),

    • rows of both groups (6 rows)


    Nothing else, regardless of the used parameter (boolean function) in the filter() method.





    So you have to use some other method. An appropriate one is the very flexible apply() method, which lets you apply an arbitrary function which




    • takes a DataFrame (a group of GroupBy object) as its only parameter,

    • returns either a Pandas object or a scalar.


    In your case that function should return (for every of your 2 groups) the 1-row DataFrame having the minimal value in the column 'B', so we will use the Boolean mask



    group['B'] == group['B'].min()


    for selecting such a row (or - maybe - more rows):



    In[26]: def select_min_b(group):
    ...: return group[group['B'] == group['B'].min()]


    Now using this function as a parameter of the apply() method of GroupBy object grouped we will obtain



    In[27]: grouped.apply(select_min_b)
    Out[27]:
    A B C
    A
    bar 1 bar 2 5.0
    foo 0 foo 1 2.0




    Note:



    The same, but as only one command (using the lambda function):



    grouped.apply(lambda group: group[group['B'] == group['B']].min())





    share|improve this answer


























    • Wow, thank you so very much for your detailed solutions! Thank you for taking the time to provide me with such through explanations. I hope to return the favor one day.

      – FinProg
      1 hour ago











    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54717473%2fpython-3-pandas-groupby-filter%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    5 Answers
    5






    active

    oldest

    votes








    5 Answers
    5






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    >>> df.loc[df.groupby('A')['B'].idxmin()]

    A B C
    1 bar 2 5.0
    0 foo 1 2.0





    share|improve this answer
























    • Thank you very much for your solution.

      – FinProg
      1 hour ago
















    1














    >>> df.loc[df.groupby('A')['B'].idxmin()]

    A B C
    1 bar 2 5.0
    0 foo 1 2.0





    share|improve this answer
























    • Thank you very much for your solution.

      – FinProg
      1 hour ago














    1












    1








    1







    >>> df.loc[df.groupby('A')['B'].idxmin()]

    A B C
    1 bar 2 5.0
    0 foo 1 2.0





    share|improve this answer













    >>> df.loc[df.groupby('A')['B'].idxmin()]

    A B C
    1 bar 2 5.0
    0 foo 1 2.0






    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered 3 hours ago









    BallpointBenBallpointBen

    3,6121438




    3,6121438













    • Thank you very much for your solution.

      – FinProg
      1 hour ago



















    • Thank you very much for your solution.

      – FinProg
      1 hour ago

















    Thank you very much for your solution.

    – FinProg
    1 hour ago





    Thank you very much for your solution.

    – FinProg
    1 hour ago













    3














    df.groupby('A').apply(lambda x: x.loc[x['B'].idxmin(), ['B','C']]).reset_index()





    share|improve this answer
























    • Thank you very much for your solution.

      – FinProg
      1 hour ago
















    3














    df.groupby('A').apply(lambda x: x.loc[x['B'].idxmin(), ['B','C']]).reset_index()





    share|improve this answer
























    • Thank you very much for your solution.

      – FinProg
      1 hour ago














    3












    3








    3







    df.groupby('A').apply(lambda x: x.loc[x['B'].idxmin(), ['B','C']]).reset_index()





    share|improve this answer













    df.groupby('A').apply(lambda x: x.loc[x['B'].idxmin(), ['B','C']]).reset_index()






    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered 6 hours ago









    kudehkudeh

    31519




    31519













    • Thank you very much for your solution.

      – FinProg
      1 hour ago



















    • Thank you very much for your solution.

      – FinProg
      1 hour ago

















    Thank you very much for your solution.

    – FinProg
    1 hour ago





    Thank you very much for your solution.

    – FinProg
    1 hour ago











    3














    No need groupby :-)



    df.sort_values('B').drop_duplicates('A')
    Out[288]:
    A B C
    0 foo 1 2.0
    1 bar 2 5.0





    share|improve this answer




























      3














      No need groupby :-)



      df.sort_values('B').drop_duplicates('A')
      Out[288]:
      A B C
      0 foo 1 2.0
      1 bar 2 5.0





      share|improve this answer


























        3












        3








        3







        No need groupby :-)



        df.sort_values('B').drop_duplicates('A')
        Out[288]:
        A B C
        0 foo 1 2.0
        1 bar 2 5.0





        share|improve this answer













        No need groupby :-)



        df.sort_values('B').drop_duplicates('A')
        Out[288]:
        A B C
        0 foo 1 2.0
        1 bar 2 5.0






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered 5 hours ago









        Wen-BenWen-Ben

        110k83266




        110k83266























            3














            There's a fundamental difference: In the documentation example, there is a single Boolean value per group. That is, you return the entire group if the mean is greater than 3. In your example, you want to filter specific rows within a group.



            For your task the usual trick is to sort values and use .head or .tail to filter to the row with the smallest or largest value respectively:



            df.sort_values('B').groupby('A').head(1)

            # A B C
            #0 foo 1 2.0
            #1 bar 2 5.0


            For more complicated queries you can use .transform or .apply to create a Boolean Series to slice. Also in this case safer if multiple rows share the minimum and you need all of them:



            df[df.groupby('A').B.transform(lambda x: x == x.min())]

            # A B C
            #0 foo 1 2.0
            #1 bar 2 5.0





            share|improve this answer


























            • Thank you very much for your solutions. I really appreciate your help.

              – FinProg
              1 hour ago
















            3














            There's a fundamental difference: In the documentation example, there is a single Boolean value per group. That is, you return the entire group if the mean is greater than 3. In your example, you want to filter specific rows within a group.



            For your task the usual trick is to sort values and use .head or .tail to filter to the row with the smallest or largest value respectively:



            df.sort_values('B').groupby('A').head(1)

            # A B C
            #0 foo 1 2.0
            #1 bar 2 5.0


            For more complicated queries you can use .transform or .apply to create a Boolean Series to slice. Also in this case safer if multiple rows share the minimum and you need all of them:



            df[df.groupby('A').B.transform(lambda x: x == x.min())]

            # A B C
            #0 foo 1 2.0
            #1 bar 2 5.0





            share|improve this answer


























            • Thank you very much for your solutions. I really appreciate your help.

              – FinProg
              1 hour ago














            3












            3








            3







            There's a fundamental difference: In the documentation example, there is a single Boolean value per group. That is, you return the entire group if the mean is greater than 3. In your example, you want to filter specific rows within a group.



            For your task the usual trick is to sort values and use .head or .tail to filter to the row with the smallest or largest value respectively:



            df.sort_values('B').groupby('A').head(1)

            # A B C
            #0 foo 1 2.0
            #1 bar 2 5.0


            For more complicated queries you can use .transform or .apply to create a Boolean Series to slice. Also in this case safer if multiple rows share the minimum and you need all of them:



            df[df.groupby('A').B.transform(lambda x: x == x.min())]

            # A B C
            #0 foo 1 2.0
            #1 bar 2 5.0





            share|improve this answer















            There's a fundamental difference: In the documentation example, there is a single Boolean value per group. That is, you return the entire group if the mean is greater than 3. In your example, you want to filter specific rows within a group.



            For your task the usual trick is to sort values and use .head or .tail to filter to the row with the smallest or largest value respectively:



            df.sort_values('B').groupby('A').head(1)

            # A B C
            #0 foo 1 2.0
            #1 bar 2 5.0


            For more complicated queries you can use .transform or .apply to create a Boolean Series to slice. Also in this case safer if multiple rows share the minimum and you need all of them:



            df[df.groupby('A').B.transform(lambda x: x == x.min())]

            # A B C
            #0 foo 1 2.0
            #1 bar 2 5.0






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited 5 hours ago

























            answered 5 hours ago









            ALollzALollz

            13.4k31636




            13.4k31636













            • Thank you very much for your solutions. I really appreciate your help.

              – FinProg
              1 hour ago



















            • Thank you very much for your solutions. I really appreciate your help.

              – FinProg
              1 hour ago

















            Thank you very much for your solutions. I really appreciate your help.

            – FinProg
            1 hour ago





            Thank you very much for your solutions. I really appreciate your help.

            – FinProg
            1 hour ago











            2














            The short answer:



            grouped.apply(lambda x: x[x['B'] == x['B']].min())




            ... and the longer one:



            Your grouped object has 2 groups:



            In[25]: for df in grouped:
            ...: print(df)
            ...:
            ('bar',
            A B C
            1 bar 2 5.0
            3 bar 4 1.0
            5 bar 6 9.0)

            ('foo',
            A B C
            0 foo 1 2.0
            2 foo 3 8.0
            4 foo 5 2.0)


            filter() method for GroupBy object is for filtering groups as entities, NOT for filtering their individual rows. So using the filter() method, you may obtain only 4 results:




            • an empty DataFrame (0 rows),

            • rows of the group 'bar' (3 rows),

            • rows of the group 'foo' (3 rows),

            • rows of both groups (6 rows)


            Nothing else, regardless of the used parameter (boolean function) in the filter() method.





            So you have to use some other method. An appropriate one is the very flexible apply() method, which lets you apply an arbitrary function which




            • takes a DataFrame (a group of GroupBy object) as its only parameter,

            • returns either a Pandas object or a scalar.


            In your case that function should return (for every of your 2 groups) the 1-row DataFrame having the minimal value in the column 'B', so we will use the Boolean mask



            group['B'] == group['B'].min()


            for selecting such a row (or - maybe - more rows):



            In[26]: def select_min_b(group):
            ...: return group[group['B'] == group['B'].min()]


            Now using this function as a parameter of the apply() method of GroupBy object grouped we will obtain



            In[27]: grouped.apply(select_min_b)
            Out[27]:
            A B C
            A
            bar 1 bar 2 5.0
            foo 0 foo 1 2.0




            Note:



            The same, but as only one command (using the lambda function):



            grouped.apply(lambda group: group[group['B'] == group['B']].min())





            share|improve this answer


























            • Wow, thank you so very much for your detailed solutions! Thank you for taking the time to provide me with such through explanations. I hope to return the favor one day.

              – FinProg
              1 hour ago
















            2














            The short answer:



            grouped.apply(lambda x: x[x['B'] == x['B']].min())




            ... and the longer one:



            Your grouped object has 2 groups:



            In[25]: for df in grouped:
            ...: print(df)
            ...:
            ('bar',
            A B C
            1 bar 2 5.0
            3 bar 4 1.0
            5 bar 6 9.0)

            ('foo',
            A B C
            0 foo 1 2.0
            2 foo 3 8.0
            4 foo 5 2.0)


            filter() method for GroupBy object is for filtering groups as entities, NOT for filtering their individual rows. So using the filter() method, you may obtain only 4 results:




            • an empty DataFrame (0 rows),

            • rows of the group 'bar' (3 rows),

            • rows of the group 'foo' (3 rows),

            • rows of both groups (6 rows)


            Nothing else, regardless of the used parameter (boolean function) in the filter() method.





            So you have to use some other method. An appropriate one is the very flexible apply() method, which lets you apply an arbitrary function which




            • takes a DataFrame (a group of GroupBy object) as its only parameter,

            • returns either a Pandas object or a scalar.


            In your case that function should return (for every of your 2 groups) the 1-row DataFrame having the minimal value in the column 'B', so we will use the Boolean mask



            group['B'] == group['B'].min()


            for selecting such a row (or - maybe - more rows):



            In[26]: def select_min_b(group):
            ...: return group[group['B'] == group['B'].min()]


            Now using this function as a parameter of the apply() method of GroupBy object grouped we will obtain



            In[27]: grouped.apply(select_min_b)
            Out[27]:
            A B C
            A
            bar 1 bar 2 5.0
            foo 0 foo 1 2.0




            Note:



            The same, but as only one command (using the lambda function):



            grouped.apply(lambda group: group[group['B'] == group['B']].min())





            share|improve this answer


























            • Wow, thank you so very much for your detailed solutions! Thank you for taking the time to provide me with such through explanations. I hope to return the favor one day.

              – FinProg
              1 hour ago














            2












            2








            2







            The short answer:



            grouped.apply(lambda x: x[x['B'] == x['B']].min())




            ... and the longer one:



            Your grouped object has 2 groups:



            In[25]: for df in grouped:
            ...: print(df)
            ...:
            ('bar',
            A B C
            1 bar 2 5.0
            3 bar 4 1.0
            5 bar 6 9.0)

            ('foo',
            A B C
            0 foo 1 2.0
            2 foo 3 8.0
            4 foo 5 2.0)


            filter() method for GroupBy object is for filtering groups as entities, NOT for filtering their individual rows. So using the filter() method, you may obtain only 4 results:




            • an empty DataFrame (0 rows),

            • rows of the group 'bar' (3 rows),

            • rows of the group 'foo' (3 rows),

            • rows of both groups (6 rows)


            Nothing else, regardless of the used parameter (boolean function) in the filter() method.





            So you have to use some other method. An appropriate one is the very flexible apply() method, which lets you apply an arbitrary function which




            • takes a DataFrame (a group of GroupBy object) as its only parameter,

            • returns either a Pandas object or a scalar.


            In your case that function should return (for every of your 2 groups) the 1-row DataFrame having the minimal value in the column 'B', so we will use the Boolean mask



            group['B'] == group['B'].min()


            for selecting such a row (or - maybe - more rows):



            In[26]: def select_min_b(group):
            ...: return group[group['B'] == group['B'].min()]


            Now using this function as a parameter of the apply() method of GroupBy object grouped we will obtain



            In[27]: grouped.apply(select_min_b)
            Out[27]:
            A B C
            A
            bar 1 bar 2 5.0
            foo 0 foo 1 2.0




            Note:



            The same, but as only one command (using the lambda function):



            grouped.apply(lambda group: group[group['B'] == group['B']].min())





            share|improve this answer















            The short answer:



            grouped.apply(lambda x: x[x['B'] == x['B']].min())




            ... and the longer one:



            Your grouped object has 2 groups:



            In[25]: for df in grouped:
            ...: print(df)
            ...:
            ('bar',
            A B C
            1 bar 2 5.0
            3 bar 4 1.0
            5 bar 6 9.0)

            ('foo',
            A B C
            0 foo 1 2.0
            2 foo 3 8.0
            4 foo 5 2.0)


            filter() method for GroupBy object is for filtering groups as entities, NOT for filtering their individual rows. So using the filter() method, you may obtain only 4 results:




            • an empty DataFrame (0 rows),

            • rows of the group 'bar' (3 rows),

            • rows of the group 'foo' (3 rows),

            • rows of both groups (6 rows)


            Nothing else, regardless of the used parameter (boolean function) in the filter() method.





            So you have to use some other method. An appropriate one is the very flexible apply() method, which lets you apply an arbitrary function which




            • takes a DataFrame (a group of GroupBy object) as its only parameter,

            • returns either a Pandas object or a scalar.


            In your case that function should return (for every of your 2 groups) the 1-row DataFrame having the minimal value in the column 'B', so we will use the Boolean mask



            group['B'] == group['B'].min()


            for selecting such a row (or - maybe - more rows):



            In[26]: def select_min_b(group):
            ...: return group[group['B'] == group['B'].min()]


            Now using this function as a parameter of the apply() method of GroupBy object grouped we will obtain



            In[27]: grouped.apply(select_min_b)
            Out[27]:
            A B C
            A
            bar 1 bar 2 5.0
            foo 0 foo 1 2.0




            Note:



            The same, but as only one command (using the lambda function):



            grouped.apply(lambda group: group[group['B'] == group['B']].min())






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited 4 hours ago

























            answered 5 hours ago









            MarianDMarianD

            4,40761331




            4,40761331













            • Wow, thank you so very much for your detailed solutions! Thank you for taking the time to provide me with such through explanations. I hope to return the favor one day.

              – FinProg
              1 hour ago



















            • Wow, thank you so very much for your detailed solutions! Thank you for taking the time to provide me with such through explanations. I hope to return the favor one day.

              – FinProg
              1 hour ago

















            Wow, thank you so very much for your detailed solutions! Thank you for taking the time to provide me with such through explanations. I hope to return the favor one day.

            – FinProg
            1 hour ago





            Wow, thank you so very much for your detailed solutions! Thank you for taking the time to provide me with such through explanations. I hope to return the favor one day.

            – FinProg
            1 hour ago


















            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54717473%2fpython-3-pandas-groupby-filter%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            "Incorrect syntax near the keyword 'ON'. (on update cascade, on delete cascade,)

            Alcedinidae

            Origin of the phrase “under your belt”?