populating pandas columns based on values in other columns












2















My dataframe contains these columns



ID   Address1   Address1-State   Address1-City  Address2  Address2-State  Address2-City   Address    State      City
1 6th street MN Mpls
2 15th St MI Flint
3 MA Boston Essex St NY New York
4 7 street SE MN Mpls 8th St IL Chicago


Now I want to populate Address field in such a way that if Address1 is blank,populate Address2 and the state city fields of Address2



In the above case final dataframe would be like this



 ID     Address   State    City
1 6th street MN Mpls
2 15th St MI Flint
3 Essex St NY New York
4 7 street SE MN Mpls


Currently,I am doing this



def fill_add(address1,address2):
if address1!='':
address=address1

elif address1=='' and address2!='':
address=address2
elif address1=='' and address2=='':
address=''

return address


def fill_add_apply(df):
df['Address']=df.apply(lambda row:fill_add(row['Address1'],row['Address2']),axis=1)


Do I have to do the same for all the other columns?Is there a better way?



Just to clarify, in ID=3 Address,State,City should be "Essex St NY New York" because Address 1 is blank so it should choose Address2 and the city and state of Address2.
In short, if Address1 is blank it should choose Address2,Address2-State and Address2-City,even if Address1-State and Address1-City are not blank.










share|improve this question





























    2















    My dataframe contains these columns



    ID   Address1   Address1-State   Address1-City  Address2  Address2-State  Address2-City   Address    State      City
    1 6th street MN Mpls
    2 15th St MI Flint
    3 MA Boston Essex St NY New York
    4 7 street SE MN Mpls 8th St IL Chicago


    Now I want to populate Address field in such a way that if Address1 is blank,populate Address2 and the state city fields of Address2



    In the above case final dataframe would be like this



     ID     Address   State    City
    1 6th street MN Mpls
    2 15th St MI Flint
    3 Essex St NY New York
    4 7 street SE MN Mpls


    Currently,I am doing this



    def fill_add(address1,address2):
    if address1!='':
    address=address1

    elif address1=='' and address2!='':
    address=address2
    elif address1=='' and address2=='':
    address=''

    return address


    def fill_add_apply(df):
    df['Address']=df.apply(lambda row:fill_add(row['Address1'],row['Address2']),axis=1)


    Do I have to do the same for all the other columns?Is there a better way?



    Just to clarify, in ID=3 Address,State,City should be "Essex St NY New York" because Address 1 is blank so it should choose Address2 and the city and state of Address2.
    In short, if Address1 is blank it should choose Address2,Address2-State and Address2-City,even if Address1-State and Address1-City are not blank.










    share|improve this question



























      2












      2








      2








      My dataframe contains these columns



      ID   Address1   Address1-State   Address1-City  Address2  Address2-State  Address2-City   Address    State      City
      1 6th street MN Mpls
      2 15th St MI Flint
      3 MA Boston Essex St NY New York
      4 7 street SE MN Mpls 8th St IL Chicago


      Now I want to populate Address field in such a way that if Address1 is blank,populate Address2 and the state city fields of Address2



      In the above case final dataframe would be like this



       ID     Address   State    City
      1 6th street MN Mpls
      2 15th St MI Flint
      3 Essex St NY New York
      4 7 street SE MN Mpls


      Currently,I am doing this



      def fill_add(address1,address2):
      if address1!='':
      address=address1

      elif address1=='' and address2!='':
      address=address2
      elif address1=='' and address2=='':
      address=''

      return address


      def fill_add_apply(df):
      df['Address']=df.apply(lambda row:fill_add(row['Address1'],row['Address2']),axis=1)


      Do I have to do the same for all the other columns?Is there a better way?



      Just to clarify, in ID=3 Address,State,City should be "Essex St NY New York" because Address 1 is blank so it should choose Address2 and the city and state of Address2.
      In short, if Address1 is blank it should choose Address2,Address2-State and Address2-City,even if Address1-State and Address1-City are not blank.










      share|improve this question
















      My dataframe contains these columns



      ID   Address1   Address1-State   Address1-City  Address2  Address2-State  Address2-City   Address    State      City
      1 6th street MN Mpls
      2 15th St MI Flint
      3 MA Boston Essex St NY New York
      4 7 street SE MN Mpls 8th St IL Chicago


      Now I want to populate Address field in such a way that if Address1 is blank,populate Address2 and the state city fields of Address2



      In the above case final dataframe would be like this



       ID     Address   State    City
      1 6th street MN Mpls
      2 15th St MI Flint
      3 Essex St NY New York
      4 7 street SE MN Mpls


      Currently,I am doing this



      def fill_add(address1,address2):
      if address1!='':
      address=address1

      elif address1=='' and address2!='':
      address=address2
      elif address1=='' and address2=='':
      address=''

      return address


      def fill_add_apply(df):
      df['Address']=df.apply(lambda row:fill_add(row['Address1'],row['Address2']),axis=1)


      Do I have to do the same for all the other columns?Is there a better way?



      Just to clarify, in ID=3 Address,State,City should be "Essex St NY New York" because Address 1 is blank so it should choose Address2 and the city and state of Address2.
      In short, if Address1 is blank it should choose Address2,Address2-State and Address2-City,even if Address1-State and Address1-City are not blank.







      python pandas






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 23 '18 at 4:15







      amanda smith

















      asked Nov 23 '18 at 3:34









      amanda smithamanda smith

      575




      575
























          3 Answers
          3






          active

          oldest

          votes


















          1














          1st modify your column , then using groupby +first



          df=df.replace('',np.nan)#prepare for first 

          df.columns=df.columns.str.replace('d+','')
          df.columns=df.columns.str.split('-').str[-1]
          newdf=df.groupby(level=0,axis=1).first()
          newdf.loc[df.iloc[:,1].isnull(),:]=df.groupby(level=0,axis=1).last()
          newdf
          Out[40]:
          Address City ID State
          0 6th street Mpls 1 MN
          1 15th St Flint 2 MI
          2 Essexb St New York 3 NY
          3 7 street SE Mpls 4 MN





          share|improve this answer


























          • Thanks! But in ID=3 it should be "Essex St NY New York" because Address 1 is blank so it should choose Address2 and the city and state of Address2.

            – amanda smith
            Nov 23 '18 at 4:07











          • @amandasmith check the update, also if this is what you need , would you like accept it ? check mark at the left

            – Wen-Ben
            Nov 23 '18 at 4:10





















          1














          import numpy as np

          df=df.replace('',np.nan)

          addr_1=['ID','Address1','Address1-State','Address1-City']
          addr_2=['ID','Address2','Address2-State','Address2-City']

          new_df=pd.DataFrame(df[addr_1].values.copy(),columns=['ID','Address','State','City'])

          new_df.loc[new_df['Address'].isnull(),:]=df.loc[df['Address1'].isnull(),addr_2].values

          #print(new_df)
          ID Address State City
          0 1 6th street MN Mpls
          1 2 15th St MI Flint
          2 3 Essex St NY New York
          3 4 7 street SE MN Mpls





          share|improve this answer

































            0














            (Given you do not have any duplicate index)



            Select indices you want to fill with Adress1:



            Address1_index = df.loc[!df.Address1.empty() and !df.Address1-State.empty() and !df.Address1-City.empty()].index


            then put Address1 data in your desired columns:



            df.loc[Adress1_index, ["Adress", "State", "City"]] = df.loc[Adress1_index, ["Adress1", "Adress1-State", "Adress1-City"]]


            Now select indices you want to fill with address2:



            Address2_index = df.loc[df.Adress1.empty() or df.Adress1-State.empty() or df.Adress1-City.empty()].index


            then fill these also:



            df.loc[Adress2_index, ["Adress", "State", "City"]] = df.loc[Adress2_index, ["Adress2", "Adress2-State", "Adress2-City"]]


            Remove columns that you do not want:



            df.drop(["Address1", "Adress1-State", "Adress1-City", "Address2", "Adress2-State", "Adress2-City"], axis = 1, inplace = True)





            share|improve this answer

























              Your Answer






              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "1"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53440366%2fpopulating-pandas-columns-based-on-values-in-other-columns%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              3 Answers
              3






              active

              oldest

              votes








              3 Answers
              3






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              1














              1st modify your column , then using groupby +first



              df=df.replace('',np.nan)#prepare for first 

              df.columns=df.columns.str.replace('d+','')
              df.columns=df.columns.str.split('-').str[-1]
              newdf=df.groupby(level=0,axis=1).first()
              newdf.loc[df.iloc[:,1].isnull(),:]=df.groupby(level=0,axis=1).last()
              newdf
              Out[40]:
              Address City ID State
              0 6th street Mpls 1 MN
              1 15th St Flint 2 MI
              2 Essexb St New York 3 NY
              3 7 street SE Mpls 4 MN





              share|improve this answer


























              • Thanks! But in ID=3 it should be "Essex St NY New York" because Address 1 is blank so it should choose Address2 and the city and state of Address2.

                – amanda smith
                Nov 23 '18 at 4:07











              • @amandasmith check the update, also if this is what you need , would you like accept it ? check mark at the left

                – Wen-Ben
                Nov 23 '18 at 4:10


















              1














              1st modify your column , then using groupby +first



              df=df.replace('',np.nan)#prepare for first 

              df.columns=df.columns.str.replace('d+','')
              df.columns=df.columns.str.split('-').str[-1]
              newdf=df.groupby(level=0,axis=1).first()
              newdf.loc[df.iloc[:,1].isnull(),:]=df.groupby(level=0,axis=1).last()
              newdf
              Out[40]:
              Address City ID State
              0 6th street Mpls 1 MN
              1 15th St Flint 2 MI
              2 Essexb St New York 3 NY
              3 7 street SE Mpls 4 MN





              share|improve this answer


























              • Thanks! But in ID=3 it should be "Essex St NY New York" because Address 1 is blank so it should choose Address2 and the city and state of Address2.

                – amanda smith
                Nov 23 '18 at 4:07











              • @amandasmith check the update, also if this is what you need , would you like accept it ? check mark at the left

                – Wen-Ben
                Nov 23 '18 at 4:10
















              1












              1








              1







              1st modify your column , then using groupby +first



              df=df.replace('',np.nan)#prepare for first 

              df.columns=df.columns.str.replace('d+','')
              df.columns=df.columns.str.split('-').str[-1]
              newdf=df.groupby(level=0,axis=1).first()
              newdf.loc[df.iloc[:,1].isnull(),:]=df.groupby(level=0,axis=1).last()
              newdf
              Out[40]:
              Address City ID State
              0 6th street Mpls 1 MN
              1 15th St Flint 2 MI
              2 Essexb St New York 3 NY
              3 7 street SE Mpls 4 MN





              share|improve this answer















              1st modify your column , then using groupby +first



              df=df.replace('',np.nan)#prepare for first 

              df.columns=df.columns.str.replace('d+','')
              df.columns=df.columns.str.split('-').str[-1]
              newdf=df.groupby(level=0,axis=1).first()
              newdf.loc[df.iloc[:,1].isnull(),:]=df.groupby(level=0,axis=1).last()
              newdf
              Out[40]:
              Address City ID State
              0 6th street Mpls 1 MN
              1 15th St Flint 2 MI
              2 Essexb St New York 3 NY
              3 7 street SE Mpls 4 MN






              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited Nov 23 '18 at 4:10

























              answered Nov 23 '18 at 4:03









              Wen-BenWen-Ben

              118k83469




              118k83469













              • Thanks! But in ID=3 it should be "Essex St NY New York" because Address 1 is blank so it should choose Address2 and the city and state of Address2.

                – amanda smith
                Nov 23 '18 at 4:07











              • @amandasmith check the update, also if this is what you need , would you like accept it ? check mark at the left

                – Wen-Ben
                Nov 23 '18 at 4:10





















              • Thanks! But in ID=3 it should be "Essex St NY New York" because Address 1 is blank so it should choose Address2 and the city and state of Address2.

                – amanda smith
                Nov 23 '18 at 4:07











              • @amandasmith check the update, also if this is what you need , would you like accept it ? check mark at the left

                – Wen-Ben
                Nov 23 '18 at 4:10



















              Thanks! But in ID=3 it should be "Essex St NY New York" because Address 1 is blank so it should choose Address2 and the city and state of Address2.

              – amanda smith
              Nov 23 '18 at 4:07





              Thanks! But in ID=3 it should be "Essex St NY New York" because Address 1 is blank so it should choose Address2 and the city and state of Address2.

              – amanda smith
              Nov 23 '18 at 4:07













              @amandasmith check the update, also if this is what you need , would you like accept it ? check mark at the left

              – Wen-Ben
              Nov 23 '18 at 4:10







              @amandasmith check the update, also if this is what you need , would you like accept it ? check mark at the left

              – Wen-Ben
              Nov 23 '18 at 4:10















              1














              import numpy as np

              df=df.replace('',np.nan)

              addr_1=['ID','Address1','Address1-State','Address1-City']
              addr_2=['ID','Address2','Address2-State','Address2-City']

              new_df=pd.DataFrame(df[addr_1].values.copy(),columns=['ID','Address','State','City'])

              new_df.loc[new_df['Address'].isnull(),:]=df.loc[df['Address1'].isnull(),addr_2].values

              #print(new_df)
              ID Address State City
              0 1 6th street MN Mpls
              1 2 15th St MI Flint
              2 3 Essex St NY New York
              3 4 7 street SE MN Mpls





              share|improve this answer






























                1














                import numpy as np

                df=df.replace('',np.nan)

                addr_1=['ID','Address1','Address1-State','Address1-City']
                addr_2=['ID','Address2','Address2-State','Address2-City']

                new_df=pd.DataFrame(df[addr_1].values.copy(),columns=['ID','Address','State','City'])

                new_df.loc[new_df['Address'].isnull(),:]=df.loc[df['Address1'].isnull(),addr_2].values

                #print(new_df)
                ID Address State City
                0 1 6th street MN Mpls
                1 2 15th St MI Flint
                2 3 Essex St NY New York
                3 4 7 street SE MN Mpls





                share|improve this answer




























                  1












                  1








                  1







                  import numpy as np

                  df=df.replace('',np.nan)

                  addr_1=['ID','Address1','Address1-State','Address1-City']
                  addr_2=['ID','Address2','Address2-State','Address2-City']

                  new_df=pd.DataFrame(df[addr_1].values.copy(),columns=['ID','Address','State','City'])

                  new_df.loc[new_df['Address'].isnull(),:]=df.loc[df['Address1'].isnull(),addr_2].values

                  #print(new_df)
                  ID Address State City
                  0 1 6th street MN Mpls
                  1 2 15th St MI Flint
                  2 3 Essex St NY New York
                  3 4 7 street SE MN Mpls





                  share|improve this answer















                  import numpy as np

                  df=df.replace('',np.nan)

                  addr_1=['ID','Address1','Address1-State','Address1-City']
                  addr_2=['ID','Address2','Address2-State','Address2-City']

                  new_df=pd.DataFrame(df[addr_1].values.copy(),columns=['ID','Address','State','City'])

                  new_df.loc[new_df['Address'].isnull(),:]=df.loc[df['Address1'].isnull(),addr_2].values

                  #print(new_df)
                  ID Address State City
                  0 1 6th street MN Mpls
                  1 2 15th St MI Flint
                  2 3 Essex St NY New York
                  3 4 7 street SE MN Mpls






                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Nov 23 '18 at 5:16

























                  answered Nov 23 '18 at 5:10









                  pydpyd

                  2,13211129




                  2,13211129























                      0














                      (Given you do not have any duplicate index)



                      Select indices you want to fill with Adress1:



                      Address1_index = df.loc[!df.Address1.empty() and !df.Address1-State.empty() and !df.Address1-City.empty()].index


                      then put Address1 data in your desired columns:



                      df.loc[Adress1_index, ["Adress", "State", "City"]] = df.loc[Adress1_index, ["Adress1", "Adress1-State", "Adress1-City"]]


                      Now select indices you want to fill with address2:



                      Address2_index = df.loc[df.Adress1.empty() or df.Adress1-State.empty() or df.Adress1-City.empty()].index


                      then fill these also:



                      df.loc[Adress2_index, ["Adress", "State", "City"]] = df.loc[Adress2_index, ["Adress2", "Adress2-State", "Adress2-City"]]


                      Remove columns that you do not want:



                      df.drop(["Address1", "Adress1-State", "Adress1-City", "Address2", "Adress2-State", "Adress2-City"], axis = 1, inplace = True)





                      share|improve this answer






























                        0














                        (Given you do not have any duplicate index)



                        Select indices you want to fill with Adress1:



                        Address1_index = df.loc[!df.Address1.empty() and !df.Address1-State.empty() and !df.Address1-City.empty()].index


                        then put Address1 data in your desired columns:



                        df.loc[Adress1_index, ["Adress", "State", "City"]] = df.loc[Adress1_index, ["Adress1", "Adress1-State", "Adress1-City"]]


                        Now select indices you want to fill with address2:



                        Address2_index = df.loc[df.Adress1.empty() or df.Adress1-State.empty() or df.Adress1-City.empty()].index


                        then fill these also:



                        df.loc[Adress2_index, ["Adress", "State", "City"]] = df.loc[Adress2_index, ["Adress2", "Adress2-State", "Adress2-City"]]


                        Remove columns that you do not want:



                        df.drop(["Address1", "Adress1-State", "Adress1-City", "Address2", "Adress2-State", "Adress2-City"], axis = 1, inplace = True)





                        share|improve this answer




























                          0












                          0








                          0







                          (Given you do not have any duplicate index)



                          Select indices you want to fill with Adress1:



                          Address1_index = df.loc[!df.Address1.empty() and !df.Address1-State.empty() and !df.Address1-City.empty()].index


                          then put Address1 data in your desired columns:



                          df.loc[Adress1_index, ["Adress", "State", "City"]] = df.loc[Adress1_index, ["Adress1", "Adress1-State", "Adress1-City"]]


                          Now select indices you want to fill with address2:



                          Address2_index = df.loc[df.Adress1.empty() or df.Adress1-State.empty() or df.Adress1-City.empty()].index


                          then fill these also:



                          df.loc[Adress2_index, ["Adress", "State", "City"]] = df.loc[Adress2_index, ["Adress2", "Adress2-State", "Adress2-City"]]


                          Remove columns that you do not want:



                          df.drop(["Address1", "Adress1-State", "Adress1-City", "Address2", "Adress2-State", "Adress2-City"], axis = 1, inplace = True)





                          share|improve this answer















                          (Given you do not have any duplicate index)



                          Select indices you want to fill with Adress1:



                          Address1_index = df.loc[!df.Address1.empty() and !df.Address1-State.empty() and !df.Address1-City.empty()].index


                          then put Address1 data in your desired columns:



                          df.loc[Adress1_index, ["Adress", "State", "City"]] = df.loc[Adress1_index, ["Adress1", "Adress1-State", "Adress1-City"]]


                          Now select indices you want to fill with address2:



                          Address2_index = df.loc[df.Adress1.empty() or df.Adress1-State.empty() or df.Adress1-City.empty()].index


                          then fill these also:



                          df.loc[Adress2_index, ["Adress", "State", "City"]] = df.loc[Adress2_index, ["Adress2", "Adress2-State", "Adress2-City"]]


                          Remove columns that you do not want:



                          df.drop(["Address1", "Adress1-State", "Adress1-City", "Address2", "Adress2-State", "Adress2-City"], axis = 1, inplace = True)






                          share|improve this answer














                          share|improve this answer



                          share|improve this answer








                          edited Nov 23 '18 at 4:24

























                          answered Nov 23 '18 at 4:19









                          Joseph ChoiJoseph Choi

                          838




                          838






























                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Stack Overflow!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53440366%2fpopulating-pandas-columns-based-on-values-in-other-columns%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              "Incorrect syntax near the keyword 'ON'. (on update cascade, on delete cascade,)

                              Alcedinidae

                              Origin of the phrase “under your belt”?