pandas - how to create multiple columns in groupby with conditional?
I need to group a dataframe, but I need to create two columns, one that is a simple count and another that is a count with conditional, as in the example:
The qtd_ok
column counts only those that have 'OK'
I tried this, but I do not know how to add the total count in the same groupby
:
df.groupby(['column1', 'column2', 'column3']).apply(lambda x : x['status'].sum() == 'OK')
python pandas dataframe pandas-groupby
add a comment |
I need to group a dataframe, but I need to create two columns, one that is a simple count and another that is a count with conditional, as in the example:
The qtd_ok
column counts only those that have 'OK'
I tried this, but I do not know how to add the total count in the same groupby
:
df.groupby(['column1', 'column2', 'column3']).apply(lambda x : x['status'].sum() == 'OK')
python pandas dataframe pandas-groupby
add a comment |
I need to group a dataframe, but I need to create two columns, one that is a simple count and another that is a count with conditional, as in the example:
The qtd_ok
column counts only those that have 'OK'
I tried this, but I do not know how to add the total count in the same groupby
:
df.groupby(['column1', 'column2', 'column3']).apply(lambda x : x['status'].sum() == 'OK')
python pandas dataframe pandas-groupby
I need to group a dataframe, but I need to create two columns, one that is a simple count and another that is a count with conditional, as in the example:
The qtd_ok
column counts only those that have 'OK'
I tried this, but I do not know how to add the total count in the same groupby
:
df.groupby(['column1', 'column2', 'column3']).apply(lambda x : x['status'].sum() == 'OK')
python pandas dataframe pandas-groupby
python pandas dataframe pandas-groupby
edited Nov 21 '18 at 14:43
jpp
99.5k2161110
99.5k2161110
asked Nov 21 '18 at 14:15
Hiago BonamelliHiago Bonamelli
617
617
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
First create helper column A
with assign
and then aggregate by agg
functions sum
for count only OK
values and size
for count all values per groups:
df = (df.assign(A=(df['status']== 'OK'))
.groupby(['column1', 'column2', 'column3'])['A']
.agg([('qtd_ok','sum'),('qtd','size')])
.astype(int)
.reset_index())
Sample:
df = pd.DataFrame({
'column1':['a'] * 9,
'column2':['a'] * 4 + ['b'] * 5,
'column3':list('aaabaabbb'),
'status':list('aabaaabba'),
})
print (df)
column1 column2 column3 status
0 a a a a
1 a a a a
2 a a a b
3 a a b a
4 a b a a
5 a b a a
6 a b b b
7 a b b b
8 a b b a
df = (df.assign(A=(df['status']== 'a'))
.groupby(['column1', 'column2', 'column3'])['A']
.agg([('qtd_ok','sum'),('qtd','size')])
.astype(int)
.reset_index())
print (df)
column1 column2 column3 qtd_ok qtd
0 a a a 2 3
1 a a b 1 1
2 a b a 2 2
3 a b b 1 3
1
Thank you!! It worked
– Hiago Bonamelli
Nov 21 '18 at 15:33
1
Good one to know, +1
– pygo
Nov 21 '18 at 15:54
add a comment |
pd.crosstab
You can use pd.crosstab
with margins=True
:
# data from @jezrael
list_of_lists = df.iloc[:, :-1].values.T.tolist()
condition = df['status'].eq('a')
res = pd.crosstab(list_of_lists, condition, margins=True)
.drop('All', level=0).reset_index()
print(res)
status column1 column2 column3 False True All
0 a a a 1 2 3
1 a a b 0 1 1
2 a b a 0 2 2
3 a b b 2 1 3
1
Another nice solution +1
– pygo
Nov 21 '18 at 15:54
add a comment |
Just an idea to count with groupby with lambda which can further be enhanced ..
>>> df
colum1 colum2 colum3 status
0 unit1 section1 content1 OK
1 unit1 section1 content1 OK
2 unit1 section1 content1 error
3 unit1 section1 content2 OK
4 unit1 section2 content1 OK
5 unit1 section2 content1 OK
6 unit1 section2 content2 error
7 unit1 section2 content2 error
8 unit1 section2 content2 OK
using groupby with lambda..
>>> df.groupby(['colum1','colum2', 'colum3'])['status'].apply(lambda x: x[x.str.contains('OK', case=False)].count()).reset_index()
colum1 colum2 colum3 status
0 unit1 section1 content1 2
1 unit1 section1 content2 1
2 unit1 section2 content1 2
3 unit1 section2 content2 1
Also can use case=False
for ignorecase for ok
.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53414028%2fpandas-how-to-create-multiple-columns-in-groupby-with-conditional%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
First create helper column A
with assign
and then aggregate by agg
functions sum
for count only OK
values and size
for count all values per groups:
df = (df.assign(A=(df['status']== 'OK'))
.groupby(['column1', 'column2', 'column3'])['A']
.agg([('qtd_ok','sum'),('qtd','size')])
.astype(int)
.reset_index())
Sample:
df = pd.DataFrame({
'column1':['a'] * 9,
'column2':['a'] * 4 + ['b'] * 5,
'column3':list('aaabaabbb'),
'status':list('aabaaabba'),
})
print (df)
column1 column2 column3 status
0 a a a a
1 a a a a
2 a a a b
3 a a b a
4 a b a a
5 a b a a
6 a b b b
7 a b b b
8 a b b a
df = (df.assign(A=(df['status']== 'a'))
.groupby(['column1', 'column2', 'column3'])['A']
.agg([('qtd_ok','sum'),('qtd','size')])
.astype(int)
.reset_index())
print (df)
column1 column2 column3 qtd_ok qtd
0 a a a 2 3
1 a a b 1 1
2 a b a 2 2
3 a b b 1 3
1
Thank you!! It worked
– Hiago Bonamelli
Nov 21 '18 at 15:33
1
Good one to know, +1
– pygo
Nov 21 '18 at 15:54
add a comment |
First create helper column A
with assign
and then aggregate by agg
functions sum
for count only OK
values and size
for count all values per groups:
df = (df.assign(A=(df['status']== 'OK'))
.groupby(['column1', 'column2', 'column3'])['A']
.agg([('qtd_ok','sum'),('qtd','size')])
.astype(int)
.reset_index())
Sample:
df = pd.DataFrame({
'column1':['a'] * 9,
'column2':['a'] * 4 + ['b'] * 5,
'column3':list('aaabaabbb'),
'status':list('aabaaabba'),
})
print (df)
column1 column2 column3 status
0 a a a a
1 a a a a
2 a a a b
3 a a b a
4 a b a a
5 a b a a
6 a b b b
7 a b b b
8 a b b a
df = (df.assign(A=(df['status']== 'a'))
.groupby(['column1', 'column2', 'column3'])['A']
.agg([('qtd_ok','sum'),('qtd','size')])
.astype(int)
.reset_index())
print (df)
column1 column2 column3 qtd_ok qtd
0 a a a 2 3
1 a a b 1 1
2 a b a 2 2
3 a b b 1 3
1
Thank you!! It worked
– Hiago Bonamelli
Nov 21 '18 at 15:33
1
Good one to know, +1
– pygo
Nov 21 '18 at 15:54
add a comment |
First create helper column A
with assign
and then aggregate by agg
functions sum
for count only OK
values and size
for count all values per groups:
df = (df.assign(A=(df['status']== 'OK'))
.groupby(['column1', 'column2', 'column3'])['A']
.agg([('qtd_ok','sum'),('qtd','size')])
.astype(int)
.reset_index())
Sample:
df = pd.DataFrame({
'column1':['a'] * 9,
'column2':['a'] * 4 + ['b'] * 5,
'column3':list('aaabaabbb'),
'status':list('aabaaabba'),
})
print (df)
column1 column2 column3 status
0 a a a a
1 a a a a
2 a a a b
3 a a b a
4 a b a a
5 a b a a
6 a b b b
7 a b b b
8 a b b a
df = (df.assign(A=(df['status']== 'a'))
.groupby(['column1', 'column2', 'column3'])['A']
.agg([('qtd_ok','sum'),('qtd','size')])
.astype(int)
.reset_index())
print (df)
column1 column2 column3 qtd_ok qtd
0 a a a 2 3
1 a a b 1 1
2 a b a 2 2
3 a b b 1 3
First create helper column A
with assign
and then aggregate by agg
functions sum
for count only OK
values and size
for count all values per groups:
df = (df.assign(A=(df['status']== 'OK'))
.groupby(['column1', 'column2', 'column3'])['A']
.agg([('qtd_ok','sum'),('qtd','size')])
.astype(int)
.reset_index())
Sample:
df = pd.DataFrame({
'column1':['a'] * 9,
'column2':['a'] * 4 + ['b'] * 5,
'column3':list('aaabaabbb'),
'status':list('aabaaabba'),
})
print (df)
column1 column2 column3 status
0 a a a a
1 a a a a
2 a a a b
3 a a b a
4 a b a a
5 a b a a
6 a b b b
7 a b b b
8 a b b a
df = (df.assign(A=(df['status']== 'a'))
.groupby(['column1', 'column2', 'column3'])['A']
.agg([('qtd_ok','sum'),('qtd','size')])
.astype(int)
.reset_index())
print (df)
column1 column2 column3 qtd_ok qtd
0 a a a 2 3
1 a a b 1 1
2 a b a 2 2
3 a b b 1 3
answered Nov 21 '18 at 14:21
jezraeljezrael
331k24273351
331k24273351
1
Thank you!! It worked
– Hiago Bonamelli
Nov 21 '18 at 15:33
1
Good one to know, +1
– pygo
Nov 21 '18 at 15:54
add a comment |
1
Thank you!! It worked
– Hiago Bonamelli
Nov 21 '18 at 15:33
1
Good one to know, +1
– pygo
Nov 21 '18 at 15:54
1
1
Thank you!! It worked
– Hiago Bonamelli
Nov 21 '18 at 15:33
Thank you!! It worked
– Hiago Bonamelli
Nov 21 '18 at 15:33
1
1
Good one to know, +1
– pygo
Nov 21 '18 at 15:54
Good one to know, +1
– pygo
Nov 21 '18 at 15:54
add a comment |
pd.crosstab
You can use pd.crosstab
with margins=True
:
# data from @jezrael
list_of_lists = df.iloc[:, :-1].values.T.tolist()
condition = df['status'].eq('a')
res = pd.crosstab(list_of_lists, condition, margins=True)
.drop('All', level=0).reset_index()
print(res)
status column1 column2 column3 False True All
0 a a a 1 2 3
1 a a b 0 1 1
2 a b a 0 2 2
3 a b b 2 1 3
1
Another nice solution +1
– pygo
Nov 21 '18 at 15:54
add a comment |
pd.crosstab
You can use pd.crosstab
with margins=True
:
# data from @jezrael
list_of_lists = df.iloc[:, :-1].values.T.tolist()
condition = df['status'].eq('a')
res = pd.crosstab(list_of_lists, condition, margins=True)
.drop('All', level=0).reset_index()
print(res)
status column1 column2 column3 False True All
0 a a a 1 2 3
1 a a b 0 1 1
2 a b a 0 2 2
3 a b b 2 1 3
1
Another nice solution +1
– pygo
Nov 21 '18 at 15:54
add a comment |
pd.crosstab
You can use pd.crosstab
with margins=True
:
# data from @jezrael
list_of_lists = df.iloc[:, :-1].values.T.tolist()
condition = df['status'].eq('a')
res = pd.crosstab(list_of_lists, condition, margins=True)
.drop('All', level=0).reset_index()
print(res)
status column1 column2 column3 False True All
0 a a a 1 2 3
1 a a b 0 1 1
2 a b a 0 2 2
3 a b b 2 1 3
pd.crosstab
You can use pd.crosstab
with margins=True
:
# data from @jezrael
list_of_lists = df.iloc[:, :-1].values.T.tolist()
condition = df['status'].eq('a')
res = pd.crosstab(list_of_lists, condition, margins=True)
.drop('All', level=0).reset_index()
print(res)
status column1 column2 column3 False True All
0 a a a 1 2 3
1 a a b 0 1 1
2 a b a 0 2 2
3 a b b 2 1 3
edited Nov 21 '18 at 15:36
answered Nov 21 '18 at 14:41
jppjpp
99.5k2161110
99.5k2161110
1
Another nice solution +1
– pygo
Nov 21 '18 at 15:54
add a comment |
1
Another nice solution +1
– pygo
Nov 21 '18 at 15:54
1
1
Another nice solution +1
– pygo
Nov 21 '18 at 15:54
Another nice solution +1
– pygo
Nov 21 '18 at 15:54
add a comment |
Just an idea to count with groupby with lambda which can further be enhanced ..
>>> df
colum1 colum2 colum3 status
0 unit1 section1 content1 OK
1 unit1 section1 content1 OK
2 unit1 section1 content1 error
3 unit1 section1 content2 OK
4 unit1 section2 content1 OK
5 unit1 section2 content1 OK
6 unit1 section2 content2 error
7 unit1 section2 content2 error
8 unit1 section2 content2 OK
using groupby with lambda..
>>> df.groupby(['colum1','colum2', 'colum3'])['status'].apply(lambda x: x[x.str.contains('OK', case=False)].count()).reset_index()
colum1 colum2 colum3 status
0 unit1 section1 content1 2
1 unit1 section1 content2 1
2 unit1 section2 content1 2
3 unit1 section2 content2 1
Also can use case=False
for ignorecase for ok
.
add a comment |
Just an idea to count with groupby with lambda which can further be enhanced ..
>>> df
colum1 colum2 colum3 status
0 unit1 section1 content1 OK
1 unit1 section1 content1 OK
2 unit1 section1 content1 error
3 unit1 section1 content2 OK
4 unit1 section2 content1 OK
5 unit1 section2 content1 OK
6 unit1 section2 content2 error
7 unit1 section2 content2 error
8 unit1 section2 content2 OK
using groupby with lambda..
>>> df.groupby(['colum1','colum2', 'colum3'])['status'].apply(lambda x: x[x.str.contains('OK', case=False)].count()).reset_index()
colum1 colum2 colum3 status
0 unit1 section1 content1 2
1 unit1 section1 content2 1
2 unit1 section2 content1 2
3 unit1 section2 content2 1
Also can use case=False
for ignorecase for ok
.
add a comment |
Just an idea to count with groupby with lambda which can further be enhanced ..
>>> df
colum1 colum2 colum3 status
0 unit1 section1 content1 OK
1 unit1 section1 content1 OK
2 unit1 section1 content1 error
3 unit1 section1 content2 OK
4 unit1 section2 content1 OK
5 unit1 section2 content1 OK
6 unit1 section2 content2 error
7 unit1 section2 content2 error
8 unit1 section2 content2 OK
using groupby with lambda..
>>> df.groupby(['colum1','colum2', 'colum3'])['status'].apply(lambda x: x[x.str.contains('OK', case=False)].count()).reset_index()
colum1 colum2 colum3 status
0 unit1 section1 content1 2
1 unit1 section1 content2 1
2 unit1 section2 content1 2
3 unit1 section2 content2 1
Also can use case=False
for ignorecase for ok
.
Just an idea to count with groupby with lambda which can further be enhanced ..
>>> df
colum1 colum2 colum3 status
0 unit1 section1 content1 OK
1 unit1 section1 content1 OK
2 unit1 section1 content1 error
3 unit1 section1 content2 OK
4 unit1 section2 content1 OK
5 unit1 section2 content1 OK
6 unit1 section2 content2 error
7 unit1 section2 content2 error
8 unit1 section2 content2 OK
using groupby with lambda..
>>> df.groupby(['colum1','colum2', 'colum3'])['status'].apply(lambda x: x[x.str.contains('OK', case=False)].count()).reset_index()
colum1 colum2 colum3 status
0 unit1 section1 content1 2
1 unit1 section1 content2 1
2 unit1 section2 content1 2
3 unit1 section2 content2 1
Also can use case=False
for ignorecase for ok
.
edited Nov 21 '18 at 15:48
answered Nov 21 '18 at 15:32
pygopygo
3,0551619
3,0551619
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53414028%2fpandas-how-to-create-multiple-columns-in-groupby-with-conditional%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown