populating pandas columns based on values in other columns

My dataframe contains these columns

ID   Address1   Address1-State   Address1-City  Address2  Address2-State  Address2-City   Address    State      City

 1    6th street   MN              Mpls

 2                                               15th St      MI           Flint

 3                 MA              Boston          Essex St   NY            New York

 4    7 street SE  MN              Mpls           8th St      IL             Chicago

Now I want to populate Address field in such a way that if Address1 is blank,populate Address2 and the state city fields of Address2

In the above case final dataframe would be like this

 ID     Address   State    City

  1      6th street   MN    Mpls

  2      15th St      MI    Flint

  3      Essex St     NY     New York

  4     7 street SE  MN     Mpls

Currently,I am doing this

def fill_add(address1,address2):

if address1!='':

    address=address1



elif address1=='' and address2!='':

    address=address2

elif address1=='' and address2=='':

     address=''



return address





def fill_add_apply(df):

df['Address']=df.apply(lambda row:fill_add(row['Address1'],row['Address2']),axis=1)

Do I have to do the same for all the other columns?Is there a better way?

Just to clarify, in ID=3 Address,State,City should be "Essex St NY New York" because Address 1 is blank so it should choose Address2 and the city and state of Address2.
In short, if Address1 is blank it should choose Address2,Address2-State and Address2-City,even if Address1-State and Address1-City are not blank.

edited Nov 23 '18 at 4:15

asked Nov 23 '18 at 3:34

amanda smith

575

add a comment |

My dataframe contains these columns

ID   Address1   Address1-State   Address1-City  Address2  Address2-State  Address2-City   Address    State      City

 1    6th street   MN              Mpls

 2                                               15th St      MI           Flint

 3                 MA              Boston          Essex St   NY            New York

 4    7 street SE  MN              Mpls           8th St      IL             Chicago

Now I want to populate Address field in such a way that if Address1 is blank,populate Address2 and the state city fields of Address2

In the above case final dataframe would be like this

 ID     Address   State    City

  1      6th street   MN    Mpls

  2      15th St      MI    Flint

  3      Essex St     NY     New York

  4     7 street SE  MN     Mpls

Currently,I am doing this

def fill_add(address1,address2):

if address1!='':

    address=address1



elif address1=='' and address2!='':

    address=address2

elif address1=='' and address2=='':

     address=''



return address





def fill_add_apply(df):

df['Address']=df.apply(lambda row:fill_add(row['Address1'],row['Address2']),axis=1)

Do I have to do the same for all the other columns?Is there a better way?

edited Nov 23 '18 at 4:15

asked Nov 23 '18 at 3:34

amanda smith

575

add a comment |

My dataframe contains these columns

ID   Address1   Address1-State   Address1-City  Address2  Address2-State  Address2-City   Address    State      City

 1    6th street   MN              Mpls

 2                                               15th St      MI           Flint

 3                 MA              Boston          Essex St   NY            New York

 4    7 street SE  MN              Mpls           8th St      IL             Chicago

Now I want to populate Address field in such a way that if Address1 is blank,populate Address2 and the state city fields of Address2

In the above case final dataframe would be like this

 ID     Address   State    City

  1      6th street   MN    Mpls

  2      15th St      MI    Flint

  3      Essex St     NY     New York

  4     7 street SE  MN     Mpls

Currently,I am doing this

def fill_add(address1,address2):

if address1!='':

    address=address1



elif address1=='' and address2!='':

    address=address2

elif address1=='' and address2=='':

     address=''



return address





def fill_add_apply(df):

df['Address']=df.apply(lambda row:fill_add(row['Address1'],row['Address2']),axis=1)

Do I have to do the same for all the other columns?Is there a better way?

edited Nov 23 '18 at 4:15

asked Nov 23 '18 at 3:34

amanda smith

575

My dataframe contains these columns

ID   Address1   Address1-State   Address1-City  Address2  Address2-State  Address2-City   Address    State      City

 1    6th street   MN              Mpls

 2                                               15th St      MI           Flint

 3                 MA              Boston          Essex St   NY            New York

 4    7 street SE  MN              Mpls           8th St      IL             Chicago

Now I want to populate Address field in such a way that if Address1 is blank,populate Address2 and the state city fields of Address2

In the above case final dataframe would be like this

 ID     Address   State    City

  1      6th street   MN    Mpls

  2      15th St      MI    Flint

  3      Essex St     NY     New York

  4     7 street SE  MN     Mpls

Currently,I am doing this

def fill_add(address1,address2):

if address1!='':

    address=address1



elif address1=='' and address2!='':

    address=address2

elif address1=='' and address2=='':

     address=''



return address





def fill_add_apply(df):

df['Address']=df.apply(lambda row:fill_add(row['Address1'],row['Address2']),axis=1)

Do I have to do the same for all the other columns?Is there a better way?

python pandas

edited Nov 23 '18 at 4:15

asked Nov 23 '18 at 3:34

amanda smith

575

edited Nov 23 '18 at 4:15

asked Nov 23 '18 at 3:34

amanda smith

575

edited Nov 23 '18 at 4:15

asked Nov 23 '18 at 3:34

amanda smith

575

asked Nov 23 '18 at 3:34

amanda smith

575

asked Nov 23 '18 at 3:34

amanda smith

575

add a comment |

3 Answers
3

active

oldest

votes

1st modify your column , then using groupby +first

df=df.replace('',np.nan)#prepare for first 



df.columns=df.columns.str.replace('d+','')

df.columns=df.columns.str.split('-').str[-1]

newdf=df.groupby(level=0,axis=1).first()

newdf.loc[df.iloc[:,1].isnull(),:]=df.groupby(level=0,axis=1).last()

newdf

Out[40]: 

       Address      City  ID State

0  6th street       Mpls   1    MN

1      15th St     Flint   2    MI

2    Essexb St  New York   3    NY

3  7 street SE      Mpls   4    MN

edited Nov 23 '18 at 4:10

answered Nov 23 '18 at 4:03

Wen-Ben

118k83469

Thanks! But in ID=3 it should be "Essex St NY New York" because Address 1 is blank so it should choose Address2 and the city and state of Address2.

– amanda smith
Nov 23 '18 at 4:07

@amandasmith check the update, also if this is what you need , would you like accept it ? check mark at the left

– Wen-Ben
Nov 23 '18 at 4:10

add a comment |

import numpy as np



df=df.replace('',np.nan)



addr_1=['ID','Address1','Address1-State','Address1-City']

addr_2=['ID','Address2','Address2-State','Address2-City']



new_df=pd.DataFrame(df[addr_1].values.copy(),columns=['ID','Address','State','City'])



new_df.loc[new_df['Address'].isnull(),:]=df.loc[df['Address1'].isnull(),addr_2].values



#print(new_df)

    ID  Address     State   City

0   1   6th street  MN      Mpls

1   2   15th St     MI      Flint

2   3   Essex St    NY      New York

3   4   7 street SE MN      Mpls

edited Nov 23 '18 at 5:16

answered Nov 23 '18 at 5:10

pyd

2,13211129

add a comment |

(Given you do not have any duplicate index)

Select indices you want to fill with Adress1:

Address1_index = df.loc[!df.Address1.empty() and !df.Address1-State.empty() and !df.Address1-City.empty()].index

then put Address1 data in your desired columns:

df.loc[Adress1_index, ["Adress", "State", "City"]] = df.loc[Adress1_index, ["Adress1", "Adress1-State", "Adress1-City"]]

Now select indices you want to fill with address2:

Address2_index = df.loc[df.Adress1.empty() or df.Adress1-State.empty() or df.Adress1-City.empty()].index

then fill these also:

df.loc[Adress2_index, ["Adress", "State", "City"]] = df.loc[Adress2_index, ["Adress2", "Adress2-State", "Adress2-City"]]

Remove columns that you do not want:

df.drop(["Address1", "Adress1-State", "Adress1-City", "Address2", "Adress2-State", "Adress2-City"], axis = 1, inplace = True)

edited Nov 23 '18 at 4:24

answered Nov 23 '18 at 4:19

Joseph Choi

838

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53440366%2fpopulating-pandas-columns-based-on-values-in-other-columns%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

1st modify your column , then using groupby +first

df=df.replace('',np.nan)#prepare for first 



df.columns=df.columns.str.replace('d+','')

df.columns=df.columns.str.split('-').str[-1]

newdf=df.groupby(level=0,axis=1).first()

newdf.loc[df.iloc[:,1].isnull(),:]=df.groupby(level=0,axis=1).last()

newdf

Out[40]: 

       Address      City  ID State

0  6th street       Mpls   1    MN

1      15th St     Flint   2    MI

2    Essexb St  New York   3    NY

3  7 street SE      Mpls   4    MN

edited Nov 23 '18 at 4:10

answered Nov 23 '18 at 4:03

Wen-Ben

118k83469

Thanks! But in ID=3 it should be "Essex St NY New York" because Address 1 is blank so it should choose Address2 and the city and state of Address2.

– amanda smith
Nov 23 '18 at 4:07

@amandasmith check the update, also if this is what you need , would you like accept it ? check mark at the left

– Wen-Ben
Nov 23 '18 at 4:10

add a comment |

1st modify your column , then using groupby +first

df=df.replace('',np.nan)#prepare for first 



df.columns=df.columns.str.replace('d+','')

df.columns=df.columns.str.split('-').str[-1]

newdf=df.groupby(level=0,axis=1).first()

newdf.loc[df.iloc[:,1].isnull(),:]=df.groupby(level=0,axis=1).last()

newdf

Out[40]: 

       Address      City  ID State

0  6th street       Mpls   1    MN

1      15th St     Flint   2    MI

2    Essexb St  New York   3    NY

3  7 street SE      Mpls   4    MN

edited Nov 23 '18 at 4:10

answered Nov 23 '18 at 4:03

Wen-Ben

118k83469

Thanks! But in ID=3 it should be "Essex St NY New York" because Address 1 is blank so it should choose Address2 and the city and state of Address2.

– amanda smith
Nov 23 '18 at 4:07

@amandasmith check the update, also if this is what you need , would you like accept it ? check mark at the left

– Wen-Ben
Nov 23 '18 at 4:10

add a comment |

1st modify your column , then using groupby +first

df=df.replace('',np.nan)#prepare for first 



df.columns=df.columns.str.replace('d+','')

df.columns=df.columns.str.split('-').str[-1]

newdf=df.groupby(level=0,axis=1).first()

newdf.loc[df.iloc[:,1].isnull(),:]=df.groupby(level=0,axis=1).last()

newdf

Out[40]: 

       Address      City  ID State

0  6th street       Mpls   1    MN

1      15th St     Flint   2    MI

2    Essexb St  New York   3    NY

3  7 street SE      Mpls   4    MN

edited Nov 23 '18 at 4:10

answered Nov 23 '18 at 4:03

Wen-Ben

118k83469

1st modify your column , then using groupby +first

df=df.replace('',np.nan)#prepare for first 



df.columns=df.columns.str.replace('d+','')

df.columns=df.columns.str.split('-').str[-1]

newdf=df.groupby(level=0,axis=1).first()

newdf.loc[df.iloc[:,1].isnull(),:]=df.groupby(level=0,axis=1).last()

newdf

Out[40]: 

       Address      City  ID State

0  6th street       Mpls   1    MN

1      15th St     Flint   2    MI

2    Essexb St  New York   3    NY

3  7 street SE      Mpls   4    MN

edited Nov 23 '18 at 4:10

answered Nov 23 '18 at 4:03

Wen-Ben

118k83469

edited Nov 23 '18 at 4:10

answered Nov 23 '18 at 4:03

Wen-Ben

118k83469

answered Nov 23 '18 at 4:03

Wen-Ben

118k83469

answered Nov 23 '18 at 4:03

Wen-Ben

118k83469

Thanks! But in ID=3 it should be "Essex St NY New York" because Address 1 is blank so it should choose Address2 and the city and state of Address2.

– amanda smith
Nov 23 '18 at 4:07

@amandasmith check the update, also if this is what you need , would you like accept it ? check mark at the left

– Wen-Ben
Nov 23 '18 at 4:10

add a comment |

Thanks! But in ID=3 it should be "Essex St NY New York" because Address 1 is blank so it should choose Address2 and the city and state of Address2.

– amanda smith
Nov 23 '18 at 4:07

@amandasmith check the update, also if this is what you need , would you like accept it ? check mark at the left

– Wen-Ben
Nov 23 '18 at 4:10

Thanks! But in ID=3 it should be "Essex St NY New York" because Address 1 is blank so it should choose Address2 and the city and state of Address2.

– amanda smith
Nov 23 '18 at 4:07

@amandasmith check the update, also if this is what you need , would you like accept it ? check mark at the left

– Wen-Ben
Nov 23 '18 at 4:10

add a comment |

import numpy as np



df=df.replace('',np.nan)



addr_1=['ID','Address1','Address1-State','Address1-City']

addr_2=['ID','Address2','Address2-State','Address2-City']



new_df=pd.DataFrame(df[addr_1].values.copy(),columns=['ID','Address','State','City'])



new_df.loc[new_df['Address'].isnull(),:]=df.loc[df['Address1'].isnull(),addr_2].values



#print(new_df)

    ID  Address     State   City

0   1   6th street  MN      Mpls

1   2   15th St     MI      Flint

2   3   Essex St    NY      New York

3   4   7 street SE MN      Mpls

edited Nov 23 '18 at 5:16

answered Nov 23 '18 at 5:10

pyd

2,13211129

add a comment |

import numpy as np



df=df.replace('',np.nan)



addr_1=['ID','Address1','Address1-State','Address1-City']

addr_2=['ID','Address2','Address2-State','Address2-City']



new_df=pd.DataFrame(df[addr_1].values.copy(),columns=['ID','Address','State','City'])



new_df.loc[new_df['Address'].isnull(),:]=df.loc[df['Address1'].isnull(),addr_2].values



#print(new_df)

    ID  Address     State   City

0   1   6th street  MN      Mpls

1   2   15th St     MI      Flint

2   3   Essex St    NY      New York

3   4   7 street SE MN      Mpls

edited Nov 23 '18 at 5:16

answered Nov 23 '18 at 5:10

pyd

2,13211129

add a comment |

import numpy as np



df=df.replace('',np.nan)



addr_1=['ID','Address1','Address1-State','Address1-City']

addr_2=['ID','Address2','Address2-State','Address2-City']



new_df=pd.DataFrame(df[addr_1].values.copy(),columns=['ID','Address','State','City'])



new_df.loc[new_df['Address'].isnull(),:]=df.loc[df['Address1'].isnull(),addr_2].values



#print(new_df)

    ID  Address     State   City

0   1   6th street  MN      Mpls

1   2   15th St     MI      Flint

2   3   Essex St    NY      New York

3   4   7 street SE MN      Mpls

edited Nov 23 '18 at 5:16

answered Nov 23 '18 at 5:10

pyd

2,13211129

import numpy as np



df=df.replace('',np.nan)



addr_1=['ID','Address1','Address1-State','Address1-City']

addr_2=['ID','Address2','Address2-State','Address2-City']



new_df=pd.DataFrame(df[addr_1].values.copy(),columns=['ID','Address','State','City'])



new_df.loc[new_df['Address'].isnull(),:]=df.loc[df['Address1'].isnull(),addr_2].values



#print(new_df)

    ID  Address     State   City

0   1   6th street  MN      Mpls

1   2   15th St     MI      Flint

2   3   Essex St    NY      New York

3   4   7 street SE MN      Mpls

edited Nov 23 '18 at 5:16

answered Nov 23 '18 at 5:10

pyd

2,13211129

edited Nov 23 '18 at 5:16

answered Nov 23 '18 at 5:10

pyd

2,13211129

answered Nov 23 '18 at 5:10

pyd

2,13211129

answered Nov 23 '18 at 5:10

pyd

2,13211129

add a comment |

(Given you do not have any duplicate index)

Select indices you want to fill with Adress1:

Address1_index = df.loc[!df.Address1.empty() and !df.Address1-State.empty() and !df.Address1-City.empty()].index

then put Address1 data in your desired columns:

df.loc[Adress1_index, ["Adress", "State", "City"]] = df.loc[Adress1_index, ["Adress1", "Adress1-State", "Adress1-City"]]

Now select indices you want to fill with address2:

Address2_index = df.loc[df.Adress1.empty() or df.Adress1-State.empty() or df.Adress1-City.empty()].index

then fill these also:

df.loc[Adress2_index, ["Adress", "State", "City"]] = df.loc[Adress2_index, ["Adress2", "Adress2-State", "Adress2-City"]]

Remove columns that you do not want:

df.drop(["Address1", "Adress1-State", "Adress1-City", "Address2", "Adress2-State", "Adress2-City"], axis = 1, inplace = True)

edited Nov 23 '18 at 4:24

answered Nov 23 '18 at 4:19

Joseph Choi

838

add a comment |

(Given you do not have any duplicate index)

Select indices you want to fill with Adress1:

Address1_index = df.loc[!df.Address1.empty() and !df.Address1-State.empty() and !df.Address1-City.empty()].index

then put Address1 data in your desired columns:

df.loc[Adress1_index, ["Adress", "State", "City"]] = df.loc[Adress1_index, ["Adress1", "Adress1-State", "Adress1-City"]]

Now select indices you want to fill with address2:

Address2_index = df.loc[df.Adress1.empty() or df.Adress1-State.empty() or df.Adress1-City.empty()].index

then fill these also:

df.loc[Adress2_index, ["Adress", "State", "City"]] = df.loc[Adress2_index, ["Adress2", "Adress2-State", "Adress2-City"]]

Remove columns that you do not want:

df.drop(["Address1", "Adress1-State", "Adress1-City", "Address2", "Adress2-State", "Adress2-City"], axis = 1, inplace = True)

edited Nov 23 '18 at 4:24

answered Nov 23 '18 at 4:19

Joseph Choi

838

add a comment |

(Given you do not have any duplicate index)

Select indices you want to fill with Adress1:

Address1_index = df.loc[!df.Address1.empty() and !df.Address1-State.empty() and !df.Address1-City.empty()].index

then put Address1 data in your desired columns:

df.loc[Adress1_index, ["Adress", "State", "City"]] = df.loc[Adress1_index, ["Adress1", "Adress1-State", "Adress1-City"]]

Now select indices you want to fill with address2:

Address2_index = df.loc[df.Adress1.empty() or df.Adress1-State.empty() or df.Adress1-City.empty()].index

then fill these also:

df.loc[Adress2_index, ["Adress", "State", "City"]] = df.loc[Adress2_index, ["Adress2", "Adress2-State", "Adress2-City"]]

Remove columns that you do not want:

df.drop(["Address1", "Adress1-State", "Adress1-City", "Address2", "Adress2-State", "Adress2-City"], axis = 1, inplace = True)

edited Nov 23 '18 at 4:24

answered Nov 23 '18 at 4:19

Joseph Choi

838

(Given you do not have any duplicate index)

Select indices you want to fill with Adress1:

Address1_index = df.loc[!df.Address1.empty() and !df.Address1-State.empty() and !df.Address1-City.empty()].index

then put Address1 data in your desired columns:

df.loc[Adress1_index, ["Adress", "State", "City"]] = df.loc[Adress1_index, ["Adress1", "Adress1-State", "Adress1-City"]]

Now select indices you want to fill with address2:

Address2_index = df.loc[df.Adress1.empty() or df.Adress1-State.empty() or df.Adress1-City.empty()].index

then fill these also:

df.loc[Adress2_index, ["Adress", "State", "City"]] = df.loc[Adress2_index, ["Adress2", "Adress2-State", "Adress2-City"]]

Remove columns that you do not want:

df.drop(["Address1", "Adress1-State", "Adress1-City", "Address2", "Adress2-State", "Adress2-City"], axis = 1, inplace = True)

edited Nov 23 '18 at 4:24

answered Nov 23 '18 at 4:19

Joseph Choi

838

edited Nov 23 '18 at 4:24

answered Nov 23 '18 at 4:19

Joseph Choi

838

answered Nov 23 '18 at 4:19

Joseph Choi

838

answered Nov 23 '18 at 4:19

Joseph Choi

838

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

3HmK Ug81YExgC6 7k,J19U,1JNOnAiF N

搜尋此網誌

Argthtjtr