pandas - how to create multiple columns in groupby with conditional?












3















I need to group a dataframe, but I need to create two columns, one that is a simple count and another that is a count with conditional, as in the example:



enter image description here



The qtd_ok column counts only those that have 'OK'



enter image description here



I tried this, but I do not know how to add the total count in the same groupby:



df.groupby(['column1', 'column2', 'column3']).apply(lambda x : x['status'].sum() == 'OK')









share|improve this question





























    3















    I need to group a dataframe, but I need to create two columns, one that is a simple count and another that is a count with conditional, as in the example:



    enter image description here



    The qtd_ok column counts only those that have 'OK'



    enter image description here



    I tried this, but I do not know how to add the total count in the same groupby:



    df.groupby(['column1', 'column2', 'column3']).apply(lambda x : x['status'].sum() == 'OK')









    share|improve this question



























      3












      3








      3








      I need to group a dataframe, but I need to create two columns, one that is a simple count and another that is a count with conditional, as in the example:



      enter image description here



      The qtd_ok column counts only those that have 'OK'



      enter image description here



      I tried this, but I do not know how to add the total count in the same groupby:



      df.groupby(['column1', 'column2', 'column3']).apply(lambda x : x['status'].sum() == 'OK')









      share|improve this question
















      I need to group a dataframe, but I need to create two columns, one that is a simple count and another that is a count with conditional, as in the example:



      enter image description here



      The qtd_ok column counts only those that have 'OK'



      enter image description here



      I tried this, but I do not know how to add the total count in the same groupby:



      df.groupby(['column1', 'column2', 'column3']).apply(lambda x : x['status'].sum() == 'OK')






      python pandas dataframe pandas-groupby






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 21 '18 at 14:43









      jpp

      99.5k2161110




      99.5k2161110










      asked Nov 21 '18 at 14:15









      Hiago BonamelliHiago Bonamelli

      617




      617
























          3 Answers
          3






          active

          oldest

          votes


















          1














          First create helper column A with assign and then aggregate by agg functions sum for count only OK values and size for count all values per groups:



          df = (df.assign(A=(df['status']== 'OK'))
          .groupby(['column1', 'column2', 'column3'])['A']
          .agg([('qtd_ok','sum'),('qtd','size')])
          .astype(int)
          .reset_index())


          Sample:



          df = pd.DataFrame({
          'column1':['a'] * 9,
          'column2':['a'] * 4 + ['b'] * 5,
          'column3':list('aaabaabbb'),
          'status':list('aabaaabba'),
          })

          print (df)
          column1 column2 column3 status
          0 a a a a
          1 a a a a
          2 a a a b
          3 a a b a
          4 a b a a
          5 a b a a
          6 a b b b
          7 a b b b
          8 a b b a

          df = (df.assign(A=(df['status']== 'a'))
          .groupby(['column1', 'column2', 'column3'])['A']
          .agg([('qtd_ok','sum'),('qtd','size')])
          .astype(int)
          .reset_index())
          print (df)
          column1 column2 column3 qtd_ok qtd
          0 a a a 2 3
          1 a a b 1 1
          2 a b a 2 2
          3 a b b 1 3





          share|improve this answer



















          • 1





            Thank you!! It worked

            – Hiago Bonamelli
            Nov 21 '18 at 15:33






          • 1





            Good one to know, +1

            – pygo
            Nov 21 '18 at 15:54



















          1














          pd.crosstab



          You can use pd.crosstab with margins=True:



          # data from @jezrael

          list_of_lists = df.iloc[:, :-1].values.T.tolist()
          condition = df['status'].eq('a')

          res = pd.crosstab(list_of_lists, condition, margins=True)
          .drop('All', level=0).reset_index()

          print(res)

          status column1 column2 column3 False True All
          0 a a a 1 2 3
          1 a a b 0 1 1
          2 a b a 0 2 2
          3 a b b 2 1 3





          share|improve this answer





















          • 1





            Another nice solution +1

            – pygo
            Nov 21 '18 at 15:54



















          1














          Just an idea to count with groupby with lambda which can further be enhanced ..



          >>> df
          colum1 colum2 colum3 status
          0 unit1 section1 content1 OK
          1 unit1 section1 content1 OK
          2 unit1 section1 content1 error
          3 unit1 section1 content2 OK
          4 unit1 section2 content1 OK
          5 unit1 section2 content1 OK
          6 unit1 section2 content2 error
          7 unit1 section2 content2 error
          8 unit1 section2 content2 OK


          using groupby with lambda..



           >>> df.groupby(['colum1','colum2', 'colum3'])['status'].apply(lambda x: x[x.str.contains('OK', case=False)].count()).reset_index()
          colum1 colum2 colum3 status
          0 unit1 section1 content1 2
          1 unit1 section1 content2 1
          2 unit1 section2 content1 2
          3 unit1 section2 content2 1


          Also can use case=False for ignorecase for ok.






          share|improve this answer

























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53414028%2fpandas-how-to-create-multiple-columns-in-groupby-with-conditional%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            3 Answers
            3






            active

            oldest

            votes








            3 Answers
            3






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            1














            First create helper column A with assign and then aggregate by agg functions sum for count only OK values and size for count all values per groups:



            df = (df.assign(A=(df['status']== 'OK'))
            .groupby(['column1', 'column2', 'column3'])['A']
            .agg([('qtd_ok','sum'),('qtd','size')])
            .astype(int)
            .reset_index())


            Sample:



            df = pd.DataFrame({
            'column1':['a'] * 9,
            'column2':['a'] * 4 + ['b'] * 5,
            'column3':list('aaabaabbb'),
            'status':list('aabaaabba'),
            })

            print (df)
            column1 column2 column3 status
            0 a a a a
            1 a a a a
            2 a a a b
            3 a a b a
            4 a b a a
            5 a b a a
            6 a b b b
            7 a b b b
            8 a b b a

            df = (df.assign(A=(df['status']== 'a'))
            .groupby(['column1', 'column2', 'column3'])['A']
            .agg([('qtd_ok','sum'),('qtd','size')])
            .astype(int)
            .reset_index())
            print (df)
            column1 column2 column3 qtd_ok qtd
            0 a a a 2 3
            1 a a b 1 1
            2 a b a 2 2
            3 a b b 1 3





            share|improve this answer



















            • 1





              Thank you!! It worked

              – Hiago Bonamelli
              Nov 21 '18 at 15:33






            • 1





              Good one to know, +1

              – pygo
              Nov 21 '18 at 15:54
















            1














            First create helper column A with assign and then aggregate by agg functions sum for count only OK values and size for count all values per groups:



            df = (df.assign(A=(df['status']== 'OK'))
            .groupby(['column1', 'column2', 'column3'])['A']
            .agg([('qtd_ok','sum'),('qtd','size')])
            .astype(int)
            .reset_index())


            Sample:



            df = pd.DataFrame({
            'column1':['a'] * 9,
            'column2':['a'] * 4 + ['b'] * 5,
            'column3':list('aaabaabbb'),
            'status':list('aabaaabba'),
            })

            print (df)
            column1 column2 column3 status
            0 a a a a
            1 a a a a
            2 a a a b
            3 a a b a
            4 a b a a
            5 a b a a
            6 a b b b
            7 a b b b
            8 a b b a

            df = (df.assign(A=(df['status']== 'a'))
            .groupby(['column1', 'column2', 'column3'])['A']
            .agg([('qtd_ok','sum'),('qtd','size')])
            .astype(int)
            .reset_index())
            print (df)
            column1 column2 column3 qtd_ok qtd
            0 a a a 2 3
            1 a a b 1 1
            2 a b a 2 2
            3 a b b 1 3





            share|improve this answer



















            • 1





              Thank you!! It worked

              – Hiago Bonamelli
              Nov 21 '18 at 15:33






            • 1





              Good one to know, +1

              – pygo
              Nov 21 '18 at 15:54














            1












            1








            1







            First create helper column A with assign and then aggregate by agg functions sum for count only OK values and size for count all values per groups:



            df = (df.assign(A=(df['status']== 'OK'))
            .groupby(['column1', 'column2', 'column3'])['A']
            .agg([('qtd_ok','sum'),('qtd','size')])
            .astype(int)
            .reset_index())


            Sample:



            df = pd.DataFrame({
            'column1':['a'] * 9,
            'column2':['a'] * 4 + ['b'] * 5,
            'column3':list('aaabaabbb'),
            'status':list('aabaaabba'),
            })

            print (df)
            column1 column2 column3 status
            0 a a a a
            1 a a a a
            2 a a a b
            3 a a b a
            4 a b a a
            5 a b a a
            6 a b b b
            7 a b b b
            8 a b b a

            df = (df.assign(A=(df['status']== 'a'))
            .groupby(['column1', 'column2', 'column3'])['A']
            .agg([('qtd_ok','sum'),('qtd','size')])
            .astype(int)
            .reset_index())
            print (df)
            column1 column2 column3 qtd_ok qtd
            0 a a a 2 3
            1 a a b 1 1
            2 a b a 2 2
            3 a b b 1 3





            share|improve this answer













            First create helper column A with assign and then aggregate by agg functions sum for count only OK values and size for count all values per groups:



            df = (df.assign(A=(df['status']== 'OK'))
            .groupby(['column1', 'column2', 'column3'])['A']
            .agg([('qtd_ok','sum'),('qtd','size')])
            .astype(int)
            .reset_index())


            Sample:



            df = pd.DataFrame({
            'column1':['a'] * 9,
            'column2':['a'] * 4 + ['b'] * 5,
            'column3':list('aaabaabbb'),
            'status':list('aabaaabba'),
            })

            print (df)
            column1 column2 column3 status
            0 a a a a
            1 a a a a
            2 a a a b
            3 a a b a
            4 a b a a
            5 a b a a
            6 a b b b
            7 a b b b
            8 a b b a

            df = (df.assign(A=(df['status']== 'a'))
            .groupby(['column1', 'column2', 'column3'])['A']
            .agg([('qtd_ok','sum'),('qtd','size')])
            .astype(int)
            .reset_index())
            print (df)
            column1 column2 column3 qtd_ok qtd
            0 a a a 2 3
            1 a a b 1 1
            2 a b a 2 2
            3 a b b 1 3






            share|improve this answer












            share|improve this answer



            share|improve this answer










            answered Nov 21 '18 at 14:21









            jezraeljezrael

            331k24273351




            331k24273351








            • 1





              Thank you!! It worked

              – Hiago Bonamelli
              Nov 21 '18 at 15:33






            • 1





              Good one to know, +1

              – pygo
              Nov 21 '18 at 15:54














            • 1





              Thank you!! It worked

              – Hiago Bonamelli
              Nov 21 '18 at 15:33






            • 1





              Good one to know, +1

              – pygo
              Nov 21 '18 at 15:54








            1




            1





            Thank you!! It worked

            – Hiago Bonamelli
            Nov 21 '18 at 15:33





            Thank you!! It worked

            – Hiago Bonamelli
            Nov 21 '18 at 15:33




            1




            1





            Good one to know, +1

            – pygo
            Nov 21 '18 at 15:54





            Good one to know, +1

            – pygo
            Nov 21 '18 at 15:54













            1














            pd.crosstab



            You can use pd.crosstab with margins=True:



            # data from @jezrael

            list_of_lists = df.iloc[:, :-1].values.T.tolist()
            condition = df['status'].eq('a')

            res = pd.crosstab(list_of_lists, condition, margins=True)
            .drop('All', level=0).reset_index()

            print(res)

            status column1 column2 column3 False True All
            0 a a a 1 2 3
            1 a a b 0 1 1
            2 a b a 0 2 2
            3 a b b 2 1 3





            share|improve this answer





















            • 1





              Another nice solution +1

              – pygo
              Nov 21 '18 at 15:54
















            1














            pd.crosstab



            You can use pd.crosstab with margins=True:



            # data from @jezrael

            list_of_lists = df.iloc[:, :-1].values.T.tolist()
            condition = df['status'].eq('a')

            res = pd.crosstab(list_of_lists, condition, margins=True)
            .drop('All', level=0).reset_index()

            print(res)

            status column1 column2 column3 False True All
            0 a a a 1 2 3
            1 a a b 0 1 1
            2 a b a 0 2 2
            3 a b b 2 1 3





            share|improve this answer





















            • 1





              Another nice solution +1

              – pygo
              Nov 21 '18 at 15:54














            1












            1








            1







            pd.crosstab



            You can use pd.crosstab with margins=True:



            # data from @jezrael

            list_of_lists = df.iloc[:, :-1].values.T.tolist()
            condition = df['status'].eq('a')

            res = pd.crosstab(list_of_lists, condition, margins=True)
            .drop('All', level=0).reset_index()

            print(res)

            status column1 column2 column3 False True All
            0 a a a 1 2 3
            1 a a b 0 1 1
            2 a b a 0 2 2
            3 a b b 2 1 3





            share|improve this answer















            pd.crosstab



            You can use pd.crosstab with margins=True:



            # data from @jezrael

            list_of_lists = df.iloc[:, :-1].values.T.tolist()
            condition = df['status'].eq('a')

            res = pd.crosstab(list_of_lists, condition, margins=True)
            .drop('All', level=0).reset_index()

            print(res)

            status column1 column2 column3 False True All
            0 a a a 1 2 3
            1 a a b 0 1 1
            2 a b a 0 2 2
            3 a b b 2 1 3






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Nov 21 '18 at 15:36

























            answered Nov 21 '18 at 14:41









            jppjpp

            99.5k2161110




            99.5k2161110








            • 1





              Another nice solution +1

              – pygo
              Nov 21 '18 at 15:54














            • 1





              Another nice solution +1

              – pygo
              Nov 21 '18 at 15:54








            1




            1





            Another nice solution +1

            – pygo
            Nov 21 '18 at 15:54





            Another nice solution +1

            – pygo
            Nov 21 '18 at 15:54











            1














            Just an idea to count with groupby with lambda which can further be enhanced ..



            >>> df
            colum1 colum2 colum3 status
            0 unit1 section1 content1 OK
            1 unit1 section1 content1 OK
            2 unit1 section1 content1 error
            3 unit1 section1 content2 OK
            4 unit1 section2 content1 OK
            5 unit1 section2 content1 OK
            6 unit1 section2 content2 error
            7 unit1 section2 content2 error
            8 unit1 section2 content2 OK


            using groupby with lambda..



             >>> df.groupby(['colum1','colum2', 'colum3'])['status'].apply(lambda x: x[x.str.contains('OK', case=False)].count()).reset_index()
            colum1 colum2 colum3 status
            0 unit1 section1 content1 2
            1 unit1 section1 content2 1
            2 unit1 section2 content1 2
            3 unit1 section2 content2 1


            Also can use case=False for ignorecase for ok.






            share|improve this answer






























              1














              Just an idea to count with groupby with lambda which can further be enhanced ..



              >>> df
              colum1 colum2 colum3 status
              0 unit1 section1 content1 OK
              1 unit1 section1 content1 OK
              2 unit1 section1 content1 error
              3 unit1 section1 content2 OK
              4 unit1 section2 content1 OK
              5 unit1 section2 content1 OK
              6 unit1 section2 content2 error
              7 unit1 section2 content2 error
              8 unit1 section2 content2 OK


              using groupby with lambda..



               >>> df.groupby(['colum1','colum2', 'colum3'])['status'].apply(lambda x: x[x.str.contains('OK', case=False)].count()).reset_index()
              colum1 colum2 colum3 status
              0 unit1 section1 content1 2
              1 unit1 section1 content2 1
              2 unit1 section2 content1 2
              3 unit1 section2 content2 1


              Also can use case=False for ignorecase for ok.






              share|improve this answer




























                1












                1








                1







                Just an idea to count with groupby with lambda which can further be enhanced ..



                >>> df
                colum1 colum2 colum3 status
                0 unit1 section1 content1 OK
                1 unit1 section1 content1 OK
                2 unit1 section1 content1 error
                3 unit1 section1 content2 OK
                4 unit1 section2 content1 OK
                5 unit1 section2 content1 OK
                6 unit1 section2 content2 error
                7 unit1 section2 content2 error
                8 unit1 section2 content2 OK


                using groupby with lambda..



                 >>> df.groupby(['colum1','colum2', 'colum3'])['status'].apply(lambda x: x[x.str.contains('OK', case=False)].count()).reset_index()
                colum1 colum2 colum3 status
                0 unit1 section1 content1 2
                1 unit1 section1 content2 1
                2 unit1 section2 content1 2
                3 unit1 section2 content2 1


                Also can use case=False for ignorecase for ok.






                share|improve this answer















                Just an idea to count with groupby with lambda which can further be enhanced ..



                >>> df
                colum1 colum2 colum3 status
                0 unit1 section1 content1 OK
                1 unit1 section1 content1 OK
                2 unit1 section1 content1 error
                3 unit1 section1 content2 OK
                4 unit1 section2 content1 OK
                5 unit1 section2 content1 OK
                6 unit1 section2 content2 error
                7 unit1 section2 content2 error
                8 unit1 section2 content2 OK


                using groupby with lambda..



                 >>> df.groupby(['colum1','colum2', 'colum3'])['status'].apply(lambda x: x[x.str.contains('OK', case=False)].count()).reset_index()
                colum1 colum2 colum3 status
                0 unit1 section1 content1 2
                1 unit1 section1 content2 1
                2 unit1 section2 content1 2
                3 unit1 section2 content2 1


                Also can use case=False for ignorecase for ok.







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Nov 21 '18 at 15:48

























                answered Nov 21 '18 at 15:32









                pygopygo

                3,0551619




                3,0551619






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53414028%2fpandas-how-to-create-multiple-columns-in-groupby-with-conditional%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    "Incorrect syntax near the keyword 'ON'. (on update cascade, on delete cascade,)

                    Alcedinidae

                    Origin of the phrase “under your belt”?