Adding categorical columns into the prediction model
I got a dataframe of customers and information about their activity, and I've built a model that predicts if they buy the product or not. my label is a column 'did_buy' which assigns 1 if a customer bought, and 0 if not. my model takes into consideration the numeric columns, but I'd also like to add categorical columns into the predictive model and I'm not sure how to convert them and use them in my X train. here is a glimpse of my dataframe columns:
Company_Sector Company_size DMU_Final Joining_Date Country
Finance and Insurance 10 End User 2010-04-13 France
Public Administration 1 End User 2004-09-22 France
some more columns:
linkedin_shared_connections online_activity did_buy Sale_Date
11 65 1 2016-05-23
13 100 1 2016-01-12
python pandas numpy scikit-learn data-science
|
show 2 more comments
I got a dataframe of customers and information about their activity, and I've built a model that predicts if they buy the product or not. my label is a column 'did_buy' which assigns 1 if a customer bought, and 0 if not. my model takes into consideration the numeric columns, but I'd also like to add categorical columns into the predictive model and I'm not sure how to convert them and use them in my X train. here is a glimpse of my dataframe columns:
Company_Sector Company_size DMU_Final Joining_Date Country
Finance and Insurance 10 End User 2010-04-13 France
Public Administration 1 End User 2004-09-22 France
some more columns:
linkedin_shared_connections online_activity did_buy Sale_Date
11 65 1 2016-05-23
13 100 1 2016-01-12
python pandas numpy scikit-learn data-science
Are you not able to use the categorial variables for the model? What error are you getting? Scikit learn would automatically apply one hot encoding to the categorical variables.
– Ashok KS
Nov 21 '18 at 12:15
Did you have a look at pd.get_dummies
– DeanLa
Nov 21 '18 at 12:18
I used the numeric variables such as 'online_activity' and 'linkedin_shared_connections' to predict the 'did-buy', and it was pretty good. but when I add for example a categorical column like 'company_Sector' I get the error message of 'can't convert string to float'.
– Dataminer1
Nov 21 '18 at 12:23
1
another problem is converting the categorical DateStamp 'joining-date' column. I used this code: data['joining_date'] = pd.to_datetime(data['joining_date']) data['joining_date']=data['joining_date'].map(dt.datetime.toordinal) but it prints all the dates in 1970
– Dataminer1
Nov 21 '18 at 12:27
@AshokKS No it wont. Scikit-learn will complain about not being able to convert strings to float. The user needs to do it himself.
– Vivek Kumar
Nov 21 '18 at 13:18
|
show 2 more comments
I got a dataframe of customers and information about their activity, and I've built a model that predicts if they buy the product or not. my label is a column 'did_buy' which assigns 1 if a customer bought, and 0 if not. my model takes into consideration the numeric columns, but I'd also like to add categorical columns into the predictive model and I'm not sure how to convert them and use them in my X train. here is a glimpse of my dataframe columns:
Company_Sector Company_size DMU_Final Joining_Date Country
Finance and Insurance 10 End User 2010-04-13 France
Public Administration 1 End User 2004-09-22 France
some more columns:
linkedin_shared_connections online_activity did_buy Sale_Date
11 65 1 2016-05-23
13 100 1 2016-01-12
python pandas numpy scikit-learn data-science
I got a dataframe of customers and information about their activity, and I've built a model that predicts if they buy the product or not. my label is a column 'did_buy' which assigns 1 if a customer bought, and 0 if not. my model takes into consideration the numeric columns, but I'd also like to add categorical columns into the predictive model and I'm not sure how to convert them and use them in my X train. here is a glimpse of my dataframe columns:
Company_Sector Company_size DMU_Final Joining_Date Country
Finance and Insurance 10 End User 2010-04-13 France
Public Administration 1 End User 2004-09-22 France
some more columns:
linkedin_shared_connections online_activity did_buy Sale_Date
11 65 1 2016-05-23
13 100 1 2016-01-12
python pandas numpy scikit-learn data-science
python pandas numpy scikit-learn data-science
asked Nov 21 '18 at 11:51
Dataminer1Dataminer1
284
284
Are you not able to use the categorial variables for the model? What error are you getting? Scikit learn would automatically apply one hot encoding to the categorical variables.
– Ashok KS
Nov 21 '18 at 12:15
Did you have a look at pd.get_dummies
– DeanLa
Nov 21 '18 at 12:18
I used the numeric variables such as 'online_activity' and 'linkedin_shared_connections' to predict the 'did-buy', and it was pretty good. but when I add for example a categorical column like 'company_Sector' I get the error message of 'can't convert string to float'.
– Dataminer1
Nov 21 '18 at 12:23
1
another problem is converting the categorical DateStamp 'joining-date' column. I used this code: data['joining_date'] = pd.to_datetime(data['joining_date']) data['joining_date']=data['joining_date'].map(dt.datetime.toordinal) but it prints all the dates in 1970
– Dataminer1
Nov 21 '18 at 12:27
@AshokKS No it wont. Scikit-learn will complain about not being able to convert strings to float. The user needs to do it himself.
– Vivek Kumar
Nov 21 '18 at 13:18
|
show 2 more comments
Are you not able to use the categorial variables for the model? What error are you getting? Scikit learn would automatically apply one hot encoding to the categorical variables.
– Ashok KS
Nov 21 '18 at 12:15
Did you have a look at pd.get_dummies
– DeanLa
Nov 21 '18 at 12:18
I used the numeric variables such as 'online_activity' and 'linkedin_shared_connections' to predict the 'did-buy', and it was pretty good. but when I add for example a categorical column like 'company_Sector' I get the error message of 'can't convert string to float'.
– Dataminer1
Nov 21 '18 at 12:23
1
another problem is converting the categorical DateStamp 'joining-date' column. I used this code: data['joining_date'] = pd.to_datetime(data['joining_date']) data['joining_date']=data['joining_date'].map(dt.datetime.toordinal) but it prints all the dates in 1970
– Dataminer1
Nov 21 '18 at 12:27
@AshokKS No it wont. Scikit-learn will complain about not being able to convert strings to float. The user needs to do it himself.
– Vivek Kumar
Nov 21 '18 at 13:18
Are you not able to use the categorial variables for the model? What error are you getting? Scikit learn would automatically apply one hot encoding to the categorical variables.
– Ashok KS
Nov 21 '18 at 12:15
Are you not able to use the categorial variables for the model? What error are you getting? Scikit learn would automatically apply one hot encoding to the categorical variables.
– Ashok KS
Nov 21 '18 at 12:15
Did you have a look at pd.get_dummies
– DeanLa
Nov 21 '18 at 12:18
Did you have a look at pd.get_dummies
– DeanLa
Nov 21 '18 at 12:18
I used the numeric variables such as 'online_activity' and 'linkedin_shared_connections' to predict the 'did-buy', and it was pretty good. but when I add for example a categorical column like 'company_Sector' I get the error message of 'can't convert string to float'.
– Dataminer1
Nov 21 '18 at 12:23
I used the numeric variables such as 'online_activity' and 'linkedin_shared_connections' to predict the 'did-buy', and it was pretty good. but when I add for example a categorical column like 'company_Sector' I get the error message of 'can't convert string to float'.
– Dataminer1
Nov 21 '18 at 12:23
1
1
another problem is converting the categorical DateStamp 'joining-date' column. I used this code: data['joining_date'] = pd.to_datetime(data['joining_date']) data['joining_date']=data['joining_date'].map(dt.datetime.toordinal) but it prints all the dates in 1970
– Dataminer1
Nov 21 '18 at 12:27
another problem is converting the categorical DateStamp 'joining-date' column. I used this code: data['joining_date'] = pd.to_datetime(data['joining_date']) data['joining_date']=data['joining_date'].map(dt.datetime.toordinal) but it prints all the dates in 1970
– Dataminer1
Nov 21 '18 at 12:27
@AshokKS No it wont. Scikit-learn will complain about not being able to convert strings to float. The user needs to do it himself.
– Vivek Kumar
Nov 21 '18 at 13:18
@AshokKS No it wont. Scikit-learn will complain about not being able to convert strings to float. The user needs to do it himself.
– Vivek Kumar
Nov 21 '18 at 13:18
|
show 2 more comments
1 Answer
1
active
oldest
votes
you have different choices to convert categorical variables to numerical or binary variables.
for example, country column in your data frame has different values(e.g, France,China,,...). one of solutions that you can convert them to numerical variables is:
{France:1, China:2, ....}
#import libraries
from sklearn import preprocessing
import pandas as pd
#Create a label encoder object and fit to Country Column
label_encoder = preprocessing.LabelEncoder()
label_encoder.fit(df['Country'])
# View the label {France,China,...}
list(label_encoder.classes_)
# Transform Country Column to Numerical Var
label_encoder.transform(df['Country'])
# Convert some integers into their category names --->{China,China,France}
list(label_encoder.inverse_transform([2, 2, 1]))
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53411462%2fadding-categorical-columns-into-the-prediction-model%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
you have different choices to convert categorical variables to numerical or binary variables.
for example, country column in your data frame has different values(e.g, France,China,,...). one of solutions that you can convert them to numerical variables is:
{France:1, China:2, ....}
#import libraries
from sklearn import preprocessing
import pandas as pd
#Create a label encoder object and fit to Country Column
label_encoder = preprocessing.LabelEncoder()
label_encoder.fit(df['Country'])
# View the label {France,China,...}
list(label_encoder.classes_)
# Transform Country Column to Numerical Var
label_encoder.transform(df['Country'])
# Convert some integers into their category names --->{China,China,France}
list(label_encoder.inverse_transform([2, 2, 1]))
add a comment |
you have different choices to convert categorical variables to numerical or binary variables.
for example, country column in your data frame has different values(e.g, France,China,,...). one of solutions that you can convert them to numerical variables is:
{France:1, China:2, ....}
#import libraries
from sklearn import preprocessing
import pandas as pd
#Create a label encoder object and fit to Country Column
label_encoder = preprocessing.LabelEncoder()
label_encoder.fit(df['Country'])
# View the label {France,China,...}
list(label_encoder.classes_)
# Transform Country Column to Numerical Var
label_encoder.transform(df['Country'])
# Convert some integers into their category names --->{China,China,France}
list(label_encoder.inverse_transform([2, 2, 1]))
add a comment |
you have different choices to convert categorical variables to numerical or binary variables.
for example, country column in your data frame has different values(e.g, France,China,,...). one of solutions that you can convert them to numerical variables is:
{France:1, China:2, ....}
#import libraries
from sklearn import preprocessing
import pandas as pd
#Create a label encoder object and fit to Country Column
label_encoder = preprocessing.LabelEncoder()
label_encoder.fit(df['Country'])
# View the label {France,China,...}
list(label_encoder.classes_)
# Transform Country Column to Numerical Var
label_encoder.transform(df['Country'])
# Convert some integers into their category names --->{China,China,France}
list(label_encoder.inverse_transform([2, 2, 1]))
you have different choices to convert categorical variables to numerical or binary variables.
for example, country column in your data frame has different values(e.g, France,China,,...). one of solutions that you can convert them to numerical variables is:
{France:1, China:2, ....}
#import libraries
from sklearn import preprocessing
import pandas as pd
#Create a label encoder object and fit to Country Column
label_encoder = preprocessing.LabelEncoder()
label_encoder.fit(df['Country'])
# View the label {France,China,...}
list(label_encoder.classes_)
# Transform Country Column to Numerical Var
label_encoder.transform(df['Country'])
# Convert some integers into their category names --->{China,China,France}
list(label_encoder.inverse_transform([2, 2, 1]))
answered Nov 29 '18 at 17:21
Mohammad HoseiniMohammad Hoseini
214
214
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53411462%2fadding-categorical-columns-into-the-prediction-model%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Are you not able to use the categorial variables for the model? What error are you getting? Scikit learn would automatically apply one hot encoding to the categorical variables.
– Ashok KS
Nov 21 '18 at 12:15
Did you have a look at pd.get_dummies
– DeanLa
Nov 21 '18 at 12:18
I used the numeric variables such as 'online_activity' and 'linkedin_shared_connections' to predict the 'did-buy', and it was pretty good. but when I add for example a categorical column like 'company_Sector' I get the error message of 'can't convert string to float'.
– Dataminer1
Nov 21 '18 at 12:23
1
another problem is converting the categorical DateStamp 'joining-date' column. I used this code: data['joining_date'] = pd.to_datetime(data['joining_date']) data['joining_date']=data['joining_date'].map(dt.datetime.toordinal) but it prints all the dates in 1970
– Dataminer1
Nov 21 '18 at 12:27
@AshokKS No it wont. Scikit-learn will complain about not being able to convert strings to float. The user needs to do it himself.
– Vivek Kumar
Nov 21 '18 at 13:18