pandas - how to create multiple columns in groupby with conditional?

I need to group a dataframe, but I need to create two columns, one that is a simple count and another that is a count with conditional, as in the example:

enter image description here

The qtd_ok column counts only those that have 'OK'

enter image description here

I tried this, but I do not know how to add the total count in the same groupby:

df.groupby(['column1', 'column2', 'column3']).apply(lambda x : x['status'].sum() == 'OK')

edited Nov 21 '18 at 14:43

jpp

99.5k2161110

asked Nov 21 '18 at 14:15

Hiago Bonamelli

617

add a comment |

I need to group a dataframe, but I need to create two columns, one that is a simple count and another that is a count with conditional, as in the example:

enter image description here

The qtd_ok column counts only those that have 'OK'

enter image description here

I tried this, but I do not know how to add the total count in the same groupby:

df.groupby(['column1', 'column2', 'column3']).apply(lambda x : x['status'].sum() == 'OK')

edited Nov 21 '18 at 14:43

jpp

99.5k2161110

asked Nov 21 '18 at 14:15

Hiago Bonamelli

617

add a comment |

I need to group a dataframe, but I need to create two columns, one that is a simple count and another that is a count with conditional, as in the example:

enter image description here

The qtd_ok column counts only those that have 'OK'

enter image description here

I tried this, but I do not know how to add the total count in the same groupby:

df.groupby(['column1', 'column2', 'column3']).apply(lambda x : x['status'].sum() == 'OK')

edited Nov 21 '18 at 14:43

jpp

99.5k2161110

asked Nov 21 '18 at 14:15

Hiago Bonamelli

617

I need to group a dataframe, but I need to create two columns, one that is a simple count and another that is a count with conditional, as in the example:

enter image description here

The qtd_ok column counts only those that have 'OK'

enter image description here

I tried this, but I do not know how to add the total count in the same groupby:

df.groupby(['column1', 'column2', 'column3']).apply(lambda x : x['status'].sum() == 'OK')

python pandas dataframe pandas-groupby

edited Nov 21 '18 at 14:43

jpp

99.5k2161110

asked Nov 21 '18 at 14:15

Hiago Bonamelli

617

edited Nov 21 '18 at 14:43

jpp

99.5k2161110

asked Nov 21 '18 at 14:15

Hiago Bonamelli

617

edited Nov 21 '18 at 14:43

jpp

99.5k2161110

edited Nov 21 '18 at 14:43

jpp

99.5k2161110

edited Nov 21 '18 at 14:43

jpp

99.5k2161110

asked Nov 21 '18 at 14:15

Hiago Bonamelli

617

asked Nov 21 '18 at 14:15

Hiago Bonamelli

617

asked Nov 21 '18 at 14:15

Hiago Bonamelli

617

add a comment |

3 Answers
3

active

oldest

votes

First create helper column A with assign and then aggregate by agg functions sum for count only OK values and size for count all values per groups:

df = (df.assign(A=(df['status']== 'OK'))

        .groupby(['column1', 'column2', 'column3'])['A']

        .agg([('qtd_ok','sum'),('qtd','size')])

        .astype(int)

        .reset_index())

Sample:

df = pd.DataFrame({

        'column1':['a'] * 9,

        'column2':['a'] * 4 + ['b'] * 5,

        'column3':list('aaabaabbb'),

        'status':list('aabaaabba'),

})



print (df)

  column1 column2 column3 status

0       a       a       a      a

1       a       a       a      a

2       a       a       a      b

3       a       a       b      a

4       a       b       a      a

5       a       b       a      a

6       a       b       b      b

7       a       b       b      b

8       a       b       b      a



df = (df.assign(A=(df['status']== 'a'))

        .groupby(['column1', 'column2', 'column3'])['A']

        .agg([('qtd_ok','sum'),('qtd','size')])

        .astype(int)

        .reset_index())

print (df)

  column1 column2 column3  qtd_ok  qtd

0       a       a       a       2    3

1       a       a       b       1    1

2       a       b       a       2    2

3       a       b       b       1    3

answered Nov 21 '18 at 14:21

jezrael

331k24273351

1

Thank you!! It worked

– Hiago Bonamelli
Nov 21 '18 at 15:33

1

Good one to know, +1

– pygo
Nov 21 '18 at 15:54

add a comment |

`pd.crosstab`

You can use pd.crosstab with margins=True:

# data from @jezrael



list_of_lists = df.iloc[:, :-1].values.T.tolist()

condition = df['status'].eq('a')



res = pd.crosstab(list_of_lists, condition, margins=True)

        .drop('All', level=0).reset_index()



print(res)



status column1 column2 column3  False  True  All

0            a       a       a      1     2    3

1            a       a       b      0     1    1

2            a       b       a      0     2    2

3            a       b       b      2     1    3

edited Nov 21 '18 at 15:36

answered Nov 21 '18 at 14:41

jpp

99.5k2161110

1

Another nice solution +1

– pygo
Nov 21 '18 at 15:54

add a comment |

Just an idea to count with groupby with lambda which can further be enhanced ..

>>> df

  colum1    colum2    colum3 status

0  unit1  section1  content1     OK

1  unit1  section1  content1     OK

2  unit1  section1  content1  error

3  unit1  section1  content2     OK

4  unit1  section2  content1     OK

5  unit1  section2  content1     OK

6  unit1  section2  content2  error

7  unit1  section2  content2  error

8  unit1  section2  content2     OK

using groupby with lambda..

 >>> df.groupby(['colum1','colum2', 'colum3'])['status'].apply(lambda x: x[x.str.contains('OK', case=False)].count()).reset_index()

  colum1    colum2    colum3  status

0  unit1  section1  content1       2

1  unit1  section1  content2       1

2  unit1  section2  content1       2

3  unit1  section2  content2       1

Also can use case=False for ignorecase for ok.

edited Nov 21 '18 at 15:48

answered Nov 21 '18 at 15:32

pygo

3,0551619

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53414028%2fpandas-how-to-create-multiple-columns-in-groupby-with-conditional%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

First create helper column A with assign and then aggregate by agg functions sum for count only OK values and size for count all values per groups:

df = (df.assign(A=(df['status']== 'OK'))

        .groupby(['column1', 'column2', 'column3'])['A']

        .agg([('qtd_ok','sum'),('qtd','size')])

        .astype(int)

        .reset_index())

Sample:

df = pd.DataFrame({

        'column1':['a'] * 9,

        'column2':['a'] * 4 + ['b'] * 5,

        'column3':list('aaabaabbb'),

        'status':list('aabaaabba'),

})



print (df)

  column1 column2 column3 status

0       a       a       a      a

1       a       a       a      a

2       a       a       a      b

3       a       a       b      a

4       a       b       a      a

5       a       b       a      a

6       a       b       b      b

7       a       b       b      b

8       a       b       b      a



df = (df.assign(A=(df['status']== 'a'))

        .groupby(['column1', 'column2', 'column3'])['A']

        .agg([('qtd_ok','sum'),('qtd','size')])

        .astype(int)

        .reset_index())

print (df)

  column1 column2 column3  qtd_ok  qtd

0       a       a       a       2    3

1       a       a       b       1    1

2       a       b       a       2    2

3       a       b       b       1    3

answered Nov 21 '18 at 14:21

jezrael

331k24273351

1

Thank you!! It worked

– Hiago Bonamelli
Nov 21 '18 at 15:33

1

Good one to know, +1

– pygo
Nov 21 '18 at 15:54

add a comment |

First create helper column A with assign and then aggregate by agg functions sum for count only OK values and size for count all values per groups:

df = (df.assign(A=(df['status']== 'OK'))

        .groupby(['column1', 'column2', 'column3'])['A']

        .agg([('qtd_ok','sum'),('qtd','size')])

        .astype(int)

        .reset_index())

Sample:

df = pd.DataFrame({

        'column1':['a'] * 9,

        'column2':['a'] * 4 + ['b'] * 5,

        'column3':list('aaabaabbb'),

        'status':list('aabaaabba'),

})



print (df)

  column1 column2 column3 status

0       a       a       a      a

1       a       a       a      a

2       a       a       a      b

3       a       a       b      a

4       a       b       a      a

5       a       b       a      a

6       a       b       b      b

7       a       b       b      b

8       a       b       b      a



df = (df.assign(A=(df['status']== 'a'))

        .groupby(['column1', 'column2', 'column3'])['A']

        .agg([('qtd_ok','sum'),('qtd','size')])

        .astype(int)

        .reset_index())

print (df)

  column1 column2 column3  qtd_ok  qtd

0       a       a       a       2    3

1       a       a       b       1    1

2       a       b       a       2    2

3       a       b       b       1    3

answered Nov 21 '18 at 14:21

jezrael

331k24273351

1

Thank you!! It worked

– Hiago Bonamelli
Nov 21 '18 at 15:33

1

Good one to know, +1

– pygo
Nov 21 '18 at 15:54

add a comment |

First create helper column A with assign and then aggregate by agg functions sum for count only OK values and size for count all values per groups:

df = (df.assign(A=(df['status']== 'OK'))

        .groupby(['column1', 'column2', 'column3'])['A']

        .agg([('qtd_ok','sum'),('qtd','size')])

        .astype(int)

        .reset_index())

Sample:

df = pd.DataFrame({

        'column1':['a'] * 9,

        'column2':['a'] * 4 + ['b'] * 5,

        'column3':list('aaabaabbb'),

        'status':list('aabaaabba'),

})



print (df)

  column1 column2 column3 status

0       a       a       a      a

1       a       a       a      a

2       a       a       a      b

3       a       a       b      a

4       a       b       a      a

5       a       b       a      a

6       a       b       b      b

7       a       b       b      b

8       a       b       b      a



df = (df.assign(A=(df['status']== 'a'))

        .groupby(['column1', 'column2', 'column3'])['A']

        .agg([('qtd_ok','sum'),('qtd','size')])

        .astype(int)

        .reset_index())

print (df)

  column1 column2 column3  qtd_ok  qtd

0       a       a       a       2    3

1       a       a       b       1    1

2       a       b       a       2    2

3       a       b       b       1    3

answered Nov 21 '18 at 14:21

jezrael

331k24273351

First create helper column A with assign and then aggregate by agg functions sum for count only OK values and size for count all values per groups:

df = (df.assign(A=(df['status']== 'OK'))

        .groupby(['column1', 'column2', 'column3'])['A']

        .agg([('qtd_ok','sum'),('qtd','size')])

        .astype(int)

        .reset_index())

Sample:

df = pd.DataFrame({

        'column1':['a'] * 9,

        'column2':['a'] * 4 + ['b'] * 5,

        'column3':list('aaabaabbb'),

        'status':list('aabaaabba'),

})



print (df)

  column1 column2 column3 status

0       a       a       a      a

1       a       a       a      a

2       a       a       a      b

3       a       a       b      a

4       a       b       a      a

5       a       b       a      a

6       a       b       b      b

7       a       b       b      b

8       a       b       b      a



df = (df.assign(A=(df['status']== 'a'))

        .groupby(['column1', 'column2', 'column3'])['A']

        .agg([('qtd_ok','sum'),('qtd','size')])

        .astype(int)

        .reset_index())

print (df)

  column1 column2 column3  qtd_ok  qtd

0       a       a       a       2    3

1       a       a       b       1    1

2       a       b       a       2    2

3       a       b       b       1    3

answered Nov 21 '18 at 14:21

jezrael

331k24273351

answered Nov 21 '18 at 14:21

jezrael

331k24273351

answered Nov 21 '18 at 14:21

jezrael

331k24273351

answered Nov 21 '18 at 14:21

jezrael

331k24273351

1

Thank you!! It worked

– Hiago Bonamelli
Nov 21 '18 at 15:33

1

Good one to know, +1

– pygo
Nov 21 '18 at 15:54

add a comment |

1

Thank you!! It worked

– Hiago Bonamelli
Nov 21 '18 at 15:33

1

Good one to know, +1

– pygo
Nov 21 '18 at 15:54

Thank you!! It worked

– Hiago Bonamelli
Nov 21 '18 at 15:33

Good one to know, +1

– pygo
Nov 21 '18 at 15:54

add a comment |

`pd.crosstab`

You can use pd.crosstab with margins=True:

# data from @jezrael



list_of_lists = df.iloc[:, :-1].values.T.tolist()

condition = df['status'].eq('a')



res = pd.crosstab(list_of_lists, condition, margins=True)

        .drop('All', level=0).reset_index()



print(res)



status column1 column2 column3  False  True  All

0            a       a       a      1     2    3

1            a       a       b      0     1    1

2            a       b       a      0     2    2

3            a       b       b      2     1    3

edited Nov 21 '18 at 15:36

answered Nov 21 '18 at 14:41

jpp

99.5k2161110

1

Another nice solution +1

– pygo
Nov 21 '18 at 15:54

add a comment |

`pd.crosstab`

You can use pd.crosstab with margins=True:

# data from @jezrael



list_of_lists = df.iloc[:, :-1].values.T.tolist()

condition = df['status'].eq('a')



res = pd.crosstab(list_of_lists, condition, margins=True)

        .drop('All', level=0).reset_index()



print(res)



status column1 column2 column3  False  True  All

0            a       a       a      1     2    3

1            a       a       b      0     1    1

2            a       b       a      0     2    2

3            a       b       b      2     1    3

edited Nov 21 '18 at 15:36

answered Nov 21 '18 at 14:41

jpp

99.5k2161110

1

Another nice solution +1

– pygo
Nov 21 '18 at 15:54

add a comment |

`pd.crosstab`

You can use pd.crosstab with margins=True:

# data from @jezrael



list_of_lists = df.iloc[:, :-1].values.T.tolist()

condition = df['status'].eq('a')



res = pd.crosstab(list_of_lists, condition, margins=True)

        .drop('All', level=0).reset_index()



print(res)



status column1 column2 column3  False  True  All

0            a       a       a      1     2    3

1            a       a       b      0     1    1

2            a       b       a      0     2    2

3            a       b       b      2     1    3

edited Nov 21 '18 at 15:36

answered Nov 21 '18 at 14:41

jpp

99.5k2161110

`pd.crosstab`

You can use pd.crosstab with margins=True:

# data from @jezrael



list_of_lists = df.iloc[:, :-1].values.T.tolist()

condition = df['status'].eq('a')



res = pd.crosstab(list_of_lists, condition, margins=True)

        .drop('All', level=0).reset_index()



print(res)



status column1 column2 column3  False  True  All

0            a       a       a      1     2    3

1            a       a       b      0     1    1

2            a       b       a      0     2    2

3            a       b       b      2     1    3

edited Nov 21 '18 at 15:36

answered Nov 21 '18 at 14:41

jpp

99.5k2161110

edited Nov 21 '18 at 15:36

answered Nov 21 '18 at 14:41

jpp

99.5k2161110

answered Nov 21 '18 at 14:41

jpp

99.5k2161110

answered Nov 21 '18 at 14:41

jpp

99.5k2161110

1

Another nice solution +1

– pygo
Nov 21 '18 at 15:54

add a comment |

1

Another nice solution +1

– pygo
Nov 21 '18 at 15:54

Another nice solution +1

– pygo
Nov 21 '18 at 15:54

add a comment |

Just an idea to count with groupby with lambda which can further be enhanced ..

>>> df

  colum1    colum2    colum3 status

0  unit1  section1  content1     OK

1  unit1  section1  content1     OK

2  unit1  section1  content1  error

3  unit1  section1  content2     OK

4  unit1  section2  content1     OK

5  unit1  section2  content1     OK

6  unit1  section2  content2  error

7  unit1  section2  content2  error

8  unit1  section2  content2     OK

using groupby with lambda..

 >>> df.groupby(['colum1','colum2', 'colum3'])['status'].apply(lambda x: x[x.str.contains('OK', case=False)].count()).reset_index()

  colum1    colum2    colum3  status

0  unit1  section1  content1       2

1  unit1  section1  content2       1

2  unit1  section2  content1       2

3  unit1  section2  content2       1

Also can use case=False for ignorecase for ok.

edited Nov 21 '18 at 15:48

answered Nov 21 '18 at 15:32

pygo

3,0551619

add a comment |

Just an idea to count with groupby with lambda which can further be enhanced ..

>>> df

  colum1    colum2    colum3 status

0  unit1  section1  content1     OK

1  unit1  section1  content1     OK

2  unit1  section1  content1  error

3  unit1  section1  content2     OK

4  unit1  section2  content1     OK

5  unit1  section2  content1     OK

6  unit1  section2  content2  error

7  unit1  section2  content2  error

8  unit1  section2  content2     OK

using groupby with lambda..

 >>> df.groupby(['colum1','colum2', 'colum3'])['status'].apply(lambda x: x[x.str.contains('OK', case=False)].count()).reset_index()

  colum1    colum2    colum3  status

0  unit1  section1  content1       2

1  unit1  section1  content2       1

2  unit1  section2  content1       2

3  unit1  section2  content2       1

Also can use case=False for ignorecase for ok.

edited Nov 21 '18 at 15:48

answered Nov 21 '18 at 15:32

pygo

3,0551619

add a comment |

Just an idea to count with groupby with lambda which can further be enhanced ..

>>> df

  colum1    colum2    colum3 status

0  unit1  section1  content1     OK

1  unit1  section1  content1     OK

2  unit1  section1  content1  error

3  unit1  section1  content2     OK

4  unit1  section2  content1     OK

5  unit1  section2  content1     OK

6  unit1  section2  content2  error

7  unit1  section2  content2  error

8  unit1  section2  content2     OK

using groupby with lambda..

 >>> df.groupby(['colum1','colum2', 'colum3'])['status'].apply(lambda x: x[x.str.contains('OK', case=False)].count()).reset_index()

  colum1    colum2    colum3  status

0  unit1  section1  content1       2

1  unit1  section1  content2       1

2  unit1  section2  content1       2

3  unit1  section2  content2       1

Also can use case=False for ignorecase for ok.

edited Nov 21 '18 at 15:48

answered Nov 21 '18 at 15:32

pygo

3,0551619

Just an idea to count with groupby with lambda which can further be enhanced ..

>>> df

  colum1    colum2    colum3 status

0  unit1  section1  content1     OK

1  unit1  section1  content1     OK

2  unit1  section1  content1  error

3  unit1  section1  content2     OK

4  unit1  section2  content1     OK

5  unit1  section2  content1     OK

6  unit1  section2  content2  error

7  unit1  section2  content2  error

8  unit1  section2  content2     OK

using groupby with lambda..

 >>> df.groupby(['colum1','colum2', 'colum3'])['status'].apply(lambda x: x[x.str.contains('OK', case=False)].count()).reset_index()

  colum1    colum2    colum3  status

0  unit1  section1  content1       2

1  unit1  section1  content2       1

2  unit1  section2  content1       2

3  unit1  section2  content2       1

Also can use case=False for ignorecase for ok.

edited Nov 21 '18 at 15:48

answered Nov 21 '18 at 15:32

pygo

3,0551619

edited Nov 21 '18 at 15:48

answered Nov 21 '18 at 15:32

pygo

3,0551619

answered Nov 21 '18 at 15:32

pygo

3,0551619

answered Nov 21 '18 at 15:32

pygo

3,0551619

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Argthtjtr