Replacing all 0s in a column in python dataframe with column's median value changes datatype to 'O'
I have a large pandas dataframe with 10000 rows and 33 columns.
One of the columns is 'Age' which has datatype 'int64' and considerable missing values.
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 33 columns):
customer 10000 non-null int64
age 10000 non-null int64
The missing values have been recorded as 0 in the data. Missing values:
df['customer'][df[' age']==0].count()
>2942
I am trying to replace all such 0s with the median value:
df[' age'].replace(to_replace=0, value = df[' age'].median, inplace = True)
This seems to run fine. But it changes the datatype of the column to O:
df[' age'].dtype
>dtype('O')
What is going wrong?
python pandas replace types median
add a comment |
I have a large pandas dataframe with 10000 rows and 33 columns.
One of the columns is 'Age' which has datatype 'int64' and considerable missing values.
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 33 columns):
customer 10000 non-null int64
age 10000 non-null int64
The missing values have been recorded as 0 in the data. Missing values:
df['customer'][df[' age']==0].count()
>2942
I am trying to replace all such 0s with the median value:
df[' age'].replace(to_replace=0, value = df[' age'].median, inplace = True)
This seems to run fine. But it changes the datatype of the column to O:
df[' age'].dtype
>dtype('O')
What is going wrong?
python pandas replace types median
df[' age'].median().pd.Series.medianis a method, you have to call it to return the value.
– jpp
Nov 20 '18 at 16:23
add a comment |
I have a large pandas dataframe with 10000 rows and 33 columns.
One of the columns is 'Age' which has datatype 'int64' and considerable missing values.
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 33 columns):
customer 10000 non-null int64
age 10000 non-null int64
The missing values have been recorded as 0 in the data. Missing values:
df['customer'][df[' age']==0].count()
>2942
I am trying to replace all such 0s with the median value:
df[' age'].replace(to_replace=0, value = df[' age'].median, inplace = True)
This seems to run fine. But it changes the datatype of the column to O:
df[' age'].dtype
>dtype('O')
What is going wrong?
python pandas replace types median
I have a large pandas dataframe with 10000 rows and 33 columns.
One of the columns is 'Age' which has datatype 'int64' and considerable missing values.
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 33 columns):
customer 10000 non-null int64
age 10000 non-null int64
The missing values have been recorded as 0 in the data. Missing values:
df['customer'][df[' age']==0].count()
>2942
I am trying to replace all such 0s with the median value:
df[' age'].replace(to_replace=0, value = df[' age'].median, inplace = True)
This seems to run fine. But it changes the datatype of the column to O:
df[' age'].dtype
>dtype('O')
What is going wrong?
python pandas replace types median
python pandas replace types median
asked Nov 20 '18 at 16:15
aquarian47aquarian47
133
133
df[' age'].median().pd.Series.medianis a method, you have to call it to return the value.
– jpp
Nov 20 '18 at 16:23
add a comment |
df[' age'].median().pd.Series.medianis a method, you have to call it to return the value.
– jpp
Nov 20 '18 at 16:23
df[' age'].median(). pd.Series.median is a method, you have to call it to return the value.– jpp
Nov 20 '18 at 16:23
df[' age'].median(). pd.Series.median is a method, you have to call it to return the value.– jpp
Nov 20 '18 at 16:23
add a comment |
2 Answers
2
active
oldest
votes
It is probably better to replace the missing data with NaNs, and then fill those NaN values with the median.
Otherwise you are actually taking into account the missing data to calculate the median
df = pd.DataFrame([0,1,2,3,], columns = ['data'])
df[df.data == 0] = np.nan
print(df)
data
0 NaN
1 1.0
2 2.0
3 3.0
df.fillna(df.median())
data
0 2.0
1 1.0
2 2.0
3 3.0
add a comment |
Replace
df[' age'].replace(to_replace=0, value = df[' age'].median, inplace = True)
with
df[' age'].replace(to_replace=0, value = df[' age'].median(), inplace = True)
That worked for me.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53397162%2freplacing-all-0s-in-a-column-in-python-dataframe-with-columns-median-value-chan%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
It is probably better to replace the missing data with NaNs, and then fill those NaN values with the median.
Otherwise you are actually taking into account the missing data to calculate the median
df = pd.DataFrame([0,1,2,3,], columns = ['data'])
df[df.data == 0] = np.nan
print(df)
data
0 NaN
1 1.0
2 2.0
3 3.0
df.fillna(df.median())
data
0 2.0
1 1.0
2 2.0
3 3.0
add a comment |
It is probably better to replace the missing data with NaNs, and then fill those NaN values with the median.
Otherwise you are actually taking into account the missing data to calculate the median
df = pd.DataFrame([0,1,2,3,], columns = ['data'])
df[df.data == 0] = np.nan
print(df)
data
0 NaN
1 1.0
2 2.0
3 3.0
df.fillna(df.median())
data
0 2.0
1 1.0
2 2.0
3 3.0
add a comment |
It is probably better to replace the missing data with NaNs, and then fill those NaN values with the median.
Otherwise you are actually taking into account the missing data to calculate the median
df = pd.DataFrame([0,1,2,3,], columns = ['data'])
df[df.data == 0] = np.nan
print(df)
data
0 NaN
1 1.0
2 2.0
3 3.0
df.fillna(df.median())
data
0 2.0
1 1.0
2 2.0
3 3.0
It is probably better to replace the missing data with NaNs, and then fill those NaN values with the median.
Otherwise you are actually taking into account the missing data to calculate the median
df = pd.DataFrame([0,1,2,3,], columns = ['data'])
df[df.data == 0] = np.nan
print(df)
data
0 NaN
1 1.0
2 2.0
3 3.0
df.fillna(df.median())
data
0 2.0
1 1.0
2 2.0
3 3.0
answered Nov 20 '18 at 16:20
yatuyatu
6,0551725
6,0551725
add a comment |
add a comment |
Replace
df[' age'].replace(to_replace=0, value = df[' age'].median, inplace = True)
with
df[' age'].replace(to_replace=0, value = df[' age'].median(), inplace = True)
That worked for me.
add a comment |
Replace
df[' age'].replace(to_replace=0, value = df[' age'].median, inplace = True)
with
df[' age'].replace(to_replace=0, value = df[' age'].median(), inplace = True)
That worked for me.
add a comment |
Replace
df[' age'].replace(to_replace=0, value = df[' age'].median, inplace = True)
with
df[' age'].replace(to_replace=0, value = df[' age'].median(), inplace = True)
That worked for me.
Replace
df[' age'].replace(to_replace=0, value = df[' age'].median, inplace = True)
with
df[' age'].replace(to_replace=0, value = df[' age'].median(), inplace = True)
That worked for me.
answered Nov 20 '18 at 16:19
Stian UlriksenStian Ulriksen
512
512
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53397162%2freplacing-all-0s-in-a-column-in-python-dataframe-with-columns-median-value-chan%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
df[' age'].median().pd.Series.medianis a method, you have to call it to return the value.– jpp
Nov 20 '18 at 16:23