Replacing all 0s in a column in python dataframe with column's median value changes datatype to 'O'












-1














I have a large pandas dataframe with 10000 rows and 33 columns.
One of the columns is 'Age' which has datatype 'int64' and considerable missing values.



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 33 columns):
customer 10000 non-null int64
age 10000 non-null int64


The missing values have been recorded as 0 in the data. Missing values:



 df['customer'][df[' age']==0].count()
>2942


I am trying to replace all such 0s with the median value:



df[' age'].replace(to_replace=0, value = df[' age'].median, inplace = True)


This seems to run fine. But it changes the datatype of the column to O:



df[' age'].dtype
>dtype('O')


What is going wrong?










share|improve this question






















  • df[' age'].median(). pd.Series.median is a method, you have to call it to return the value.
    – jpp
    Nov 20 '18 at 16:23


















-1














I have a large pandas dataframe with 10000 rows and 33 columns.
One of the columns is 'Age' which has datatype 'int64' and considerable missing values.



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 33 columns):
customer 10000 non-null int64
age 10000 non-null int64


The missing values have been recorded as 0 in the data. Missing values:



 df['customer'][df[' age']==0].count()
>2942


I am trying to replace all such 0s with the median value:



df[' age'].replace(to_replace=0, value = df[' age'].median, inplace = True)


This seems to run fine. But it changes the datatype of the column to O:



df[' age'].dtype
>dtype('O')


What is going wrong?










share|improve this question






















  • df[' age'].median(). pd.Series.median is a method, you have to call it to return the value.
    – jpp
    Nov 20 '18 at 16:23
















-1












-1








-1







I have a large pandas dataframe with 10000 rows and 33 columns.
One of the columns is 'Age' which has datatype 'int64' and considerable missing values.



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 33 columns):
customer 10000 non-null int64
age 10000 non-null int64


The missing values have been recorded as 0 in the data. Missing values:



 df['customer'][df[' age']==0].count()
>2942


I am trying to replace all such 0s with the median value:



df[' age'].replace(to_replace=0, value = df[' age'].median, inplace = True)


This seems to run fine. But it changes the datatype of the column to O:



df[' age'].dtype
>dtype('O')


What is going wrong?










share|improve this question













I have a large pandas dataframe with 10000 rows and 33 columns.
One of the columns is 'Age' which has datatype 'int64' and considerable missing values.



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 33 columns):
customer 10000 non-null int64
age 10000 non-null int64


The missing values have been recorded as 0 in the data. Missing values:



 df['customer'][df[' age']==0].count()
>2942


I am trying to replace all such 0s with the median value:



df[' age'].replace(to_replace=0, value = df[' age'].median, inplace = True)


This seems to run fine. But it changes the datatype of the column to O:



df[' age'].dtype
>dtype('O')


What is going wrong?







python pandas replace types median






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 20 '18 at 16:15









aquarian47aquarian47

133




133












  • df[' age'].median(). pd.Series.median is a method, you have to call it to return the value.
    – jpp
    Nov 20 '18 at 16:23




















  • df[' age'].median(). pd.Series.median is a method, you have to call it to return the value.
    – jpp
    Nov 20 '18 at 16:23


















df[' age'].median(). pd.Series.median is a method, you have to call it to return the value.
– jpp
Nov 20 '18 at 16:23






df[' age'].median(). pd.Series.median is a method, you have to call it to return the value.
– jpp
Nov 20 '18 at 16:23














2 Answers
2






active

oldest

votes


















1














It is probably better to replace the missing data with NaNs, and then fill those NaN values with the median.



Otherwise you are actually taking into account the missing data to calculate the median



df = pd.DataFrame([0,1,2,3,], columns = ['data'])
df[df.data == 0] = np.nan
print(df)

data
0 NaN
1 1.0
2 2.0
3 3.0

df.fillna(df.median())

data
0 2.0
1 1.0
2 2.0
3 3.0





share|improve this answer





























    0














    Replace



    df[' age'].replace(to_replace=0, value = df[' age'].median, inplace = True)



    with



    df[' age'].replace(to_replace=0, value = df[' age'].median(), inplace = True)



    That worked for me.






    share|improve this answer





















      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53397162%2freplacing-all-0s-in-a-column-in-python-dataframe-with-columns-median-value-chan%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      1














      It is probably better to replace the missing data with NaNs, and then fill those NaN values with the median.



      Otherwise you are actually taking into account the missing data to calculate the median



      df = pd.DataFrame([0,1,2,3,], columns = ['data'])
      df[df.data == 0] = np.nan
      print(df)

      data
      0 NaN
      1 1.0
      2 2.0
      3 3.0

      df.fillna(df.median())

      data
      0 2.0
      1 1.0
      2 2.0
      3 3.0





      share|improve this answer


























        1














        It is probably better to replace the missing data with NaNs, and then fill those NaN values with the median.



        Otherwise you are actually taking into account the missing data to calculate the median



        df = pd.DataFrame([0,1,2,3,], columns = ['data'])
        df[df.data == 0] = np.nan
        print(df)

        data
        0 NaN
        1 1.0
        2 2.0
        3 3.0

        df.fillna(df.median())

        data
        0 2.0
        1 1.0
        2 2.0
        3 3.0





        share|improve this answer
























          1












          1








          1






          It is probably better to replace the missing data with NaNs, and then fill those NaN values with the median.



          Otherwise you are actually taking into account the missing data to calculate the median



          df = pd.DataFrame([0,1,2,3,], columns = ['data'])
          df[df.data == 0] = np.nan
          print(df)

          data
          0 NaN
          1 1.0
          2 2.0
          3 3.0

          df.fillna(df.median())

          data
          0 2.0
          1 1.0
          2 2.0
          3 3.0





          share|improve this answer












          It is probably better to replace the missing data with NaNs, and then fill those NaN values with the median.



          Otherwise you are actually taking into account the missing data to calculate the median



          df = pd.DataFrame([0,1,2,3,], columns = ['data'])
          df[df.data == 0] = np.nan
          print(df)

          data
          0 NaN
          1 1.0
          2 2.0
          3 3.0

          df.fillna(df.median())

          data
          0 2.0
          1 1.0
          2 2.0
          3 3.0






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 20 '18 at 16:20









          yatuyatu

          6,0551725




          6,0551725

























              0














              Replace



              df[' age'].replace(to_replace=0, value = df[' age'].median, inplace = True)



              with



              df[' age'].replace(to_replace=0, value = df[' age'].median(), inplace = True)



              That worked for me.






              share|improve this answer


























                0














                Replace



                df[' age'].replace(to_replace=0, value = df[' age'].median, inplace = True)



                with



                df[' age'].replace(to_replace=0, value = df[' age'].median(), inplace = True)



                That worked for me.






                share|improve this answer
























                  0












                  0








                  0






                  Replace



                  df[' age'].replace(to_replace=0, value = df[' age'].median, inplace = True)



                  with



                  df[' age'].replace(to_replace=0, value = df[' age'].median(), inplace = True)



                  That worked for me.






                  share|improve this answer












                  Replace



                  df[' age'].replace(to_replace=0, value = df[' age'].median, inplace = True)



                  with



                  df[' age'].replace(to_replace=0, value = df[' age'].median(), inplace = True)



                  That worked for me.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 20 '18 at 16:19









                  Stian UlriksenStian Ulriksen

                  512




                  512






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53397162%2freplacing-all-0s-in-a-column-in-python-dataframe-with-columns-median-value-chan%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Paul Cézanne

                      UIScrollView CustomStickyHeader Resize height generates problems when scroll is too fast

                      Angular material date-picker (MatDatepicker) auto completes the date on focus out