populating pandas columns based on values in other columns
My dataframe contains these columns
ID Address1 Address1-State Address1-City Address2 Address2-State Address2-City Address State City
1 6th street MN Mpls
2 15th St MI Flint
3 MA Boston Essex St NY New York
4 7 street SE MN Mpls 8th St IL Chicago
Now I want to populate Address field in such a way that if Address1 is blank,populate Address2 and the state city fields of Address2
In the above case final dataframe would be like this
ID Address State City
1 6th street MN Mpls
2 15th St MI Flint
3 Essex St NY New York
4 7 street SE MN Mpls
Currently,I am doing this
def fill_add(address1,address2):
if address1!='':
address=address1
elif address1=='' and address2!='':
address=address2
elif address1=='' and address2=='':
address=''
return address
def fill_add_apply(df):
df['Address']=df.apply(lambda row:fill_add(row['Address1'],row['Address2']),axis=1)
Do I have to do the same for all the other columns?Is there a better way?
Just to clarify, in ID=3 Address,State,City should be "Essex St NY New York" because Address 1 is blank so it should choose Address2 and the city and state of Address2.
In short, if Address1 is blank it should choose Address2,Address2-State and Address2-City,even if Address1-State and Address1-City are not blank.
python pandas
add a comment |
My dataframe contains these columns
ID Address1 Address1-State Address1-City Address2 Address2-State Address2-City Address State City
1 6th street MN Mpls
2 15th St MI Flint
3 MA Boston Essex St NY New York
4 7 street SE MN Mpls 8th St IL Chicago
Now I want to populate Address field in such a way that if Address1 is blank,populate Address2 and the state city fields of Address2
In the above case final dataframe would be like this
ID Address State City
1 6th street MN Mpls
2 15th St MI Flint
3 Essex St NY New York
4 7 street SE MN Mpls
Currently,I am doing this
def fill_add(address1,address2):
if address1!='':
address=address1
elif address1=='' and address2!='':
address=address2
elif address1=='' and address2=='':
address=''
return address
def fill_add_apply(df):
df['Address']=df.apply(lambda row:fill_add(row['Address1'],row['Address2']),axis=1)
Do I have to do the same for all the other columns?Is there a better way?
Just to clarify, in ID=3 Address,State,City should be "Essex St NY New York" because Address 1 is blank so it should choose Address2 and the city and state of Address2.
In short, if Address1 is blank it should choose Address2,Address2-State and Address2-City,even if Address1-State and Address1-City are not blank.
python pandas
add a comment |
My dataframe contains these columns
ID Address1 Address1-State Address1-City Address2 Address2-State Address2-City Address State City
1 6th street MN Mpls
2 15th St MI Flint
3 MA Boston Essex St NY New York
4 7 street SE MN Mpls 8th St IL Chicago
Now I want to populate Address field in such a way that if Address1 is blank,populate Address2 and the state city fields of Address2
In the above case final dataframe would be like this
ID Address State City
1 6th street MN Mpls
2 15th St MI Flint
3 Essex St NY New York
4 7 street SE MN Mpls
Currently,I am doing this
def fill_add(address1,address2):
if address1!='':
address=address1
elif address1=='' and address2!='':
address=address2
elif address1=='' and address2=='':
address=''
return address
def fill_add_apply(df):
df['Address']=df.apply(lambda row:fill_add(row['Address1'],row['Address2']),axis=1)
Do I have to do the same for all the other columns?Is there a better way?
Just to clarify, in ID=3 Address,State,City should be "Essex St NY New York" because Address 1 is blank so it should choose Address2 and the city and state of Address2.
In short, if Address1 is blank it should choose Address2,Address2-State and Address2-City,even if Address1-State and Address1-City are not blank.
python pandas
My dataframe contains these columns
ID Address1 Address1-State Address1-City Address2 Address2-State Address2-City Address State City
1 6th street MN Mpls
2 15th St MI Flint
3 MA Boston Essex St NY New York
4 7 street SE MN Mpls 8th St IL Chicago
Now I want to populate Address field in such a way that if Address1 is blank,populate Address2 and the state city fields of Address2
In the above case final dataframe would be like this
ID Address State City
1 6th street MN Mpls
2 15th St MI Flint
3 Essex St NY New York
4 7 street SE MN Mpls
Currently,I am doing this
def fill_add(address1,address2):
if address1!='':
address=address1
elif address1=='' and address2!='':
address=address2
elif address1=='' and address2=='':
address=''
return address
def fill_add_apply(df):
df['Address']=df.apply(lambda row:fill_add(row['Address1'],row['Address2']),axis=1)
Do I have to do the same for all the other columns?Is there a better way?
Just to clarify, in ID=3 Address,State,City should be "Essex St NY New York" because Address 1 is blank so it should choose Address2 and the city and state of Address2.
In short, if Address1 is blank it should choose Address2,Address2-State and Address2-City,even if Address1-State and Address1-City are not blank.
python pandas
python pandas
edited Nov 23 '18 at 4:15
amanda smith
asked Nov 23 '18 at 3:34
amanda smithamanda smith
575
575
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
1st modify your column , then using groupby
+first
df=df.replace('',np.nan)#prepare for first
df.columns=df.columns.str.replace('d+','')
df.columns=df.columns.str.split('-').str[-1]
newdf=df.groupby(level=0,axis=1).first()
newdf.loc[df.iloc[:,1].isnull(),:]=df.groupby(level=0,axis=1).last()
newdf
Out[40]:
Address City ID State
0 6th street Mpls 1 MN
1 15th St Flint 2 MI
2 Essexb St New York 3 NY
3 7 street SE Mpls 4 MN
Thanks! But in ID=3 it should be "Essex St NY New York" because Address 1 is blank so it should choose Address2 and the city and state of Address2.
– amanda smith
Nov 23 '18 at 4:07
@amandasmith check the update, also if this is what you need , would you like accept it ? check mark at the left
– Wen-Ben
Nov 23 '18 at 4:10
add a comment |
import numpy as np
df=df.replace('',np.nan)
addr_1=['ID','Address1','Address1-State','Address1-City']
addr_2=['ID','Address2','Address2-State','Address2-City']
new_df=pd.DataFrame(df[addr_1].values.copy(),columns=['ID','Address','State','City'])
new_df.loc[new_df['Address'].isnull(),:]=df.loc[df['Address1'].isnull(),addr_2].values
#print(new_df)
ID Address State City
0 1 6th street MN Mpls
1 2 15th St MI Flint
2 3 Essex St NY New York
3 4 7 street SE MN Mpls
add a comment |
(Given you do not have any duplicate index)
Select indices you want to fill with Adress1:
Address1_index = df.loc[!df.Address1.empty() and !df.Address1-State.empty() and !df.Address1-City.empty()].index
then put Address1 data in your desired columns:
df.loc[Adress1_index, ["Adress", "State", "City"]] = df.loc[Adress1_index, ["Adress1", "Adress1-State", "Adress1-City"]]
Now select indices you want to fill with address2:
Address2_index = df.loc[df.Adress1.empty() or df.Adress1-State.empty() or df.Adress1-City.empty()].index
then fill these also:
df.loc[Adress2_index, ["Adress", "State", "City"]] = df.loc[Adress2_index, ["Adress2", "Adress2-State", "Adress2-City"]]
Remove columns that you do not want:
df.drop(["Address1", "Adress1-State", "Adress1-City", "Address2", "Adress2-State", "Adress2-City"], axis = 1, inplace = True)
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53440366%2fpopulating-pandas-columns-based-on-values-in-other-columns%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
1st modify your column , then using groupby
+first
df=df.replace('',np.nan)#prepare for first
df.columns=df.columns.str.replace('d+','')
df.columns=df.columns.str.split('-').str[-1]
newdf=df.groupby(level=0,axis=1).first()
newdf.loc[df.iloc[:,1].isnull(),:]=df.groupby(level=0,axis=1).last()
newdf
Out[40]:
Address City ID State
0 6th street Mpls 1 MN
1 15th St Flint 2 MI
2 Essexb St New York 3 NY
3 7 street SE Mpls 4 MN
Thanks! But in ID=3 it should be "Essex St NY New York" because Address 1 is blank so it should choose Address2 and the city and state of Address2.
– amanda smith
Nov 23 '18 at 4:07
@amandasmith check the update, also if this is what you need , would you like accept it ? check mark at the left
– Wen-Ben
Nov 23 '18 at 4:10
add a comment |
1st modify your column , then using groupby
+first
df=df.replace('',np.nan)#prepare for first
df.columns=df.columns.str.replace('d+','')
df.columns=df.columns.str.split('-').str[-1]
newdf=df.groupby(level=0,axis=1).first()
newdf.loc[df.iloc[:,1].isnull(),:]=df.groupby(level=0,axis=1).last()
newdf
Out[40]:
Address City ID State
0 6th street Mpls 1 MN
1 15th St Flint 2 MI
2 Essexb St New York 3 NY
3 7 street SE Mpls 4 MN
Thanks! But in ID=3 it should be "Essex St NY New York" because Address 1 is blank so it should choose Address2 and the city and state of Address2.
– amanda smith
Nov 23 '18 at 4:07
@amandasmith check the update, also if this is what you need , would you like accept it ? check mark at the left
– Wen-Ben
Nov 23 '18 at 4:10
add a comment |
1st modify your column , then using groupby
+first
df=df.replace('',np.nan)#prepare for first
df.columns=df.columns.str.replace('d+','')
df.columns=df.columns.str.split('-').str[-1]
newdf=df.groupby(level=0,axis=1).first()
newdf.loc[df.iloc[:,1].isnull(),:]=df.groupby(level=0,axis=1).last()
newdf
Out[40]:
Address City ID State
0 6th street Mpls 1 MN
1 15th St Flint 2 MI
2 Essexb St New York 3 NY
3 7 street SE Mpls 4 MN
1st modify your column , then using groupby
+first
df=df.replace('',np.nan)#prepare for first
df.columns=df.columns.str.replace('d+','')
df.columns=df.columns.str.split('-').str[-1]
newdf=df.groupby(level=0,axis=1).first()
newdf.loc[df.iloc[:,1].isnull(),:]=df.groupby(level=0,axis=1).last()
newdf
Out[40]:
Address City ID State
0 6th street Mpls 1 MN
1 15th St Flint 2 MI
2 Essexb St New York 3 NY
3 7 street SE Mpls 4 MN
edited Nov 23 '18 at 4:10
answered Nov 23 '18 at 4:03
Wen-BenWen-Ben
118k83469
118k83469
Thanks! But in ID=3 it should be "Essex St NY New York" because Address 1 is blank so it should choose Address2 and the city and state of Address2.
– amanda smith
Nov 23 '18 at 4:07
@amandasmith check the update, also if this is what you need , would you like accept it ? check mark at the left
– Wen-Ben
Nov 23 '18 at 4:10
add a comment |
Thanks! But in ID=3 it should be "Essex St NY New York" because Address 1 is blank so it should choose Address2 and the city and state of Address2.
– amanda smith
Nov 23 '18 at 4:07
@amandasmith check the update, also if this is what you need , would you like accept it ? check mark at the left
– Wen-Ben
Nov 23 '18 at 4:10
Thanks! But in ID=3 it should be "Essex St NY New York" because Address 1 is blank so it should choose Address2 and the city and state of Address2.
– amanda smith
Nov 23 '18 at 4:07
Thanks! But in ID=3 it should be "Essex St NY New York" because Address 1 is blank so it should choose Address2 and the city and state of Address2.
– amanda smith
Nov 23 '18 at 4:07
@amandasmith check the update, also if this is what you need , would you like accept it ? check mark at the left
– Wen-Ben
Nov 23 '18 at 4:10
@amandasmith check the update, also if this is what you need , would you like accept it ? check mark at the left
– Wen-Ben
Nov 23 '18 at 4:10
add a comment |
import numpy as np
df=df.replace('',np.nan)
addr_1=['ID','Address1','Address1-State','Address1-City']
addr_2=['ID','Address2','Address2-State','Address2-City']
new_df=pd.DataFrame(df[addr_1].values.copy(),columns=['ID','Address','State','City'])
new_df.loc[new_df['Address'].isnull(),:]=df.loc[df['Address1'].isnull(),addr_2].values
#print(new_df)
ID Address State City
0 1 6th street MN Mpls
1 2 15th St MI Flint
2 3 Essex St NY New York
3 4 7 street SE MN Mpls
add a comment |
import numpy as np
df=df.replace('',np.nan)
addr_1=['ID','Address1','Address1-State','Address1-City']
addr_2=['ID','Address2','Address2-State','Address2-City']
new_df=pd.DataFrame(df[addr_1].values.copy(),columns=['ID','Address','State','City'])
new_df.loc[new_df['Address'].isnull(),:]=df.loc[df['Address1'].isnull(),addr_2].values
#print(new_df)
ID Address State City
0 1 6th street MN Mpls
1 2 15th St MI Flint
2 3 Essex St NY New York
3 4 7 street SE MN Mpls
add a comment |
import numpy as np
df=df.replace('',np.nan)
addr_1=['ID','Address1','Address1-State','Address1-City']
addr_2=['ID','Address2','Address2-State','Address2-City']
new_df=pd.DataFrame(df[addr_1].values.copy(),columns=['ID','Address','State','City'])
new_df.loc[new_df['Address'].isnull(),:]=df.loc[df['Address1'].isnull(),addr_2].values
#print(new_df)
ID Address State City
0 1 6th street MN Mpls
1 2 15th St MI Flint
2 3 Essex St NY New York
3 4 7 street SE MN Mpls
import numpy as np
df=df.replace('',np.nan)
addr_1=['ID','Address1','Address1-State','Address1-City']
addr_2=['ID','Address2','Address2-State','Address2-City']
new_df=pd.DataFrame(df[addr_1].values.copy(),columns=['ID','Address','State','City'])
new_df.loc[new_df['Address'].isnull(),:]=df.loc[df['Address1'].isnull(),addr_2].values
#print(new_df)
ID Address State City
0 1 6th street MN Mpls
1 2 15th St MI Flint
2 3 Essex St NY New York
3 4 7 street SE MN Mpls
edited Nov 23 '18 at 5:16
answered Nov 23 '18 at 5:10
pydpyd
2,13211129
2,13211129
add a comment |
add a comment |
(Given you do not have any duplicate index)
Select indices you want to fill with Adress1:
Address1_index = df.loc[!df.Address1.empty() and !df.Address1-State.empty() and !df.Address1-City.empty()].index
then put Address1 data in your desired columns:
df.loc[Adress1_index, ["Adress", "State", "City"]] = df.loc[Adress1_index, ["Adress1", "Adress1-State", "Adress1-City"]]
Now select indices you want to fill with address2:
Address2_index = df.loc[df.Adress1.empty() or df.Adress1-State.empty() or df.Adress1-City.empty()].index
then fill these also:
df.loc[Adress2_index, ["Adress", "State", "City"]] = df.loc[Adress2_index, ["Adress2", "Adress2-State", "Adress2-City"]]
Remove columns that you do not want:
df.drop(["Address1", "Adress1-State", "Adress1-City", "Address2", "Adress2-State", "Adress2-City"], axis = 1, inplace = True)
add a comment |
(Given you do not have any duplicate index)
Select indices you want to fill with Adress1:
Address1_index = df.loc[!df.Address1.empty() and !df.Address1-State.empty() and !df.Address1-City.empty()].index
then put Address1 data in your desired columns:
df.loc[Adress1_index, ["Adress", "State", "City"]] = df.loc[Adress1_index, ["Adress1", "Adress1-State", "Adress1-City"]]
Now select indices you want to fill with address2:
Address2_index = df.loc[df.Adress1.empty() or df.Adress1-State.empty() or df.Adress1-City.empty()].index
then fill these also:
df.loc[Adress2_index, ["Adress", "State", "City"]] = df.loc[Adress2_index, ["Adress2", "Adress2-State", "Adress2-City"]]
Remove columns that you do not want:
df.drop(["Address1", "Adress1-State", "Adress1-City", "Address2", "Adress2-State", "Adress2-City"], axis = 1, inplace = True)
add a comment |
(Given you do not have any duplicate index)
Select indices you want to fill with Adress1:
Address1_index = df.loc[!df.Address1.empty() and !df.Address1-State.empty() and !df.Address1-City.empty()].index
then put Address1 data in your desired columns:
df.loc[Adress1_index, ["Adress", "State", "City"]] = df.loc[Adress1_index, ["Adress1", "Adress1-State", "Adress1-City"]]
Now select indices you want to fill with address2:
Address2_index = df.loc[df.Adress1.empty() or df.Adress1-State.empty() or df.Adress1-City.empty()].index
then fill these also:
df.loc[Adress2_index, ["Adress", "State", "City"]] = df.loc[Adress2_index, ["Adress2", "Adress2-State", "Adress2-City"]]
Remove columns that you do not want:
df.drop(["Address1", "Adress1-State", "Adress1-City", "Address2", "Adress2-State", "Adress2-City"], axis = 1, inplace = True)
(Given you do not have any duplicate index)
Select indices you want to fill with Adress1:
Address1_index = df.loc[!df.Address1.empty() and !df.Address1-State.empty() and !df.Address1-City.empty()].index
then put Address1 data in your desired columns:
df.loc[Adress1_index, ["Adress", "State", "City"]] = df.loc[Adress1_index, ["Adress1", "Adress1-State", "Adress1-City"]]
Now select indices you want to fill with address2:
Address2_index = df.loc[df.Adress1.empty() or df.Adress1-State.empty() or df.Adress1-City.empty()].index
then fill these also:
df.loc[Adress2_index, ["Adress", "State", "City"]] = df.loc[Adress2_index, ["Adress2", "Adress2-State", "Adress2-City"]]
Remove columns that you do not want:
df.drop(["Address1", "Adress1-State", "Adress1-City", "Address2", "Adress2-State", "Adress2-City"], axis = 1, inplace = True)
edited Nov 23 '18 at 4:24
answered Nov 23 '18 at 4:19
Joseph ChoiJoseph Choi
838
838
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53440366%2fpopulating-pandas-columns-based-on-values-in-other-columns%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown