Role of stringsAsFactors in dataframe
Please look at this two dataframes in R.
When I run this code both emp.data1 and emp.data2 are the same despite stringsAsFactors in one of them is TRUE and in theother is FALSE.So what is the role of stringsAsFactors in dataframes?
# Create the data frame.
emp.data1 <- data.frame(
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
"2015-03-27")),
stringsAsFactors = FALSE#Here stringsAsFactors is false
)
emp.data2 <- data.frame(
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
"2015-03-27")),
stringsAsFactors = TRUE#Here stringsAsFactors is true
)
r dataframe
add a comment |
Please look at this two dataframes in R.
When I run this code both emp.data1 and emp.data2 are the same despite stringsAsFactors in one of them is TRUE and in theother is FALSE.So what is the role of stringsAsFactors in dataframes?
# Create the data frame.
emp.data1 <- data.frame(
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
"2015-03-27")),
stringsAsFactors = FALSE#Here stringsAsFactors is false
)
emp.data2 <- data.frame(
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
"2015-03-27")),
stringsAsFactors = TRUE#Here stringsAsFactors is true
)
r dataframe
Comparestr(emp.data1)
andstr(emp.data2)
.
– jay.sf
Nov 22 '18 at 10:11
1
They are not the same, as you claim. Tryidentical(emp.data1, emp.data2)
andall.equal(emp.data1, emp.data2)
.
– Rui Barradas
Nov 22 '18 at 10:30
add a comment |
Please look at this two dataframes in R.
When I run this code both emp.data1 and emp.data2 are the same despite stringsAsFactors in one of them is TRUE and in theother is FALSE.So what is the role of stringsAsFactors in dataframes?
# Create the data frame.
emp.data1 <- data.frame(
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
"2015-03-27")),
stringsAsFactors = FALSE#Here stringsAsFactors is false
)
emp.data2 <- data.frame(
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
"2015-03-27")),
stringsAsFactors = TRUE#Here stringsAsFactors is true
)
r dataframe
Please look at this two dataframes in R.
When I run this code both emp.data1 and emp.data2 are the same despite stringsAsFactors in one of them is TRUE and in theother is FALSE.So what is the role of stringsAsFactors in dataframes?
# Create the data frame.
emp.data1 <- data.frame(
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
"2015-03-27")),
stringsAsFactors = FALSE#Here stringsAsFactors is false
)
emp.data2 <- data.frame(
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),
start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
"2015-03-27")),
stringsAsFactors = TRUE#Here stringsAsFactors is true
)
r dataframe
r dataframe
edited Nov 22 '18 at 10:14
RLave
4,75711124
4,75711124
asked Nov 22 '18 at 10:10
Seyed Ali AletahaSeyed Ali Aletaha
1
1
Comparestr(emp.data1)
andstr(emp.data2)
.
– jay.sf
Nov 22 '18 at 10:11
1
They are not the same, as you claim. Tryidentical(emp.data1, emp.data2)
andall.equal(emp.data1, emp.data2)
.
– Rui Barradas
Nov 22 '18 at 10:30
add a comment |
Comparestr(emp.data1)
andstr(emp.data2)
.
– jay.sf
Nov 22 '18 at 10:11
1
They are not the same, as you claim. Tryidentical(emp.data1, emp.data2)
andall.equal(emp.data1, emp.data2)
.
– Rui Barradas
Nov 22 '18 at 10:30
Compare
str(emp.data1)
and str(emp.data2)
.– jay.sf
Nov 22 '18 at 10:11
Compare
str(emp.data1)
and str(emp.data2)
.– jay.sf
Nov 22 '18 at 10:11
1
1
They are not the same, as you claim. Try
identical(emp.data1, emp.data2)
and all.equal(emp.data1, emp.data2)
.– Rui Barradas
Nov 22 '18 at 10:30
They are not the same, as you claim. Try
identical(emp.data1, emp.data2)
and all.equal(emp.data1, emp.data2)
.– Rui Barradas
Nov 22 '18 at 10:30
add a comment |
2 Answers
2
active
oldest
votes
Read the docs
stringsAsFactors usually converts all strings that appear in the df to a factor variable instead of leaving at as a character variable. In statistical analysis, factors are useful for categorical variables. What you want to have depends on what you want to do with the data.
add a comment |
This setting changes the data type of strings.
sapply(emp.data1, class)
emp_id emp_name salary start_date
"integer" "character" "numeric" "Date"
sapply(emp.data2, class)
emp_id emp_name salary start_date
"integer" "factor" "numeric" "Date"
As you can see, the class of emp_name
is factor
when this option is turned off.
Factors are used when doing data analysis or visualization. For example, in the iris
data set, that comes natively with R we can look at the distribution of petal length, and petal width, while using color to indicate the species.
require(ggplot2)
sapply(iris,class)
ggplot(iris, aes(x=Petal.Length, y=Petal.Width, color=Species)) +
geom_point()
Labeling these as a factor let's R know that a sort of grouping is going on, and R will automatically determine the different groupings (or "levels") that are going on.
Explicit factor labeling allows you to optimally interact with data.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53428509%2frole-of-stringsasfactors-in-dataframe%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
Read the docs
stringsAsFactors usually converts all strings that appear in the df to a factor variable instead of leaving at as a character variable. In statistical analysis, factors are useful for categorical variables. What you want to have depends on what you want to do with the data.
add a comment |
Read the docs
stringsAsFactors usually converts all strings that appear in the df to a factor variable instead of leaving at as a character variable. In statistical analysis, factors are useful for categorical variables. What you want to have depends on what you want to do with the data.
add a comment |
Read the docs
stringsAsFactors usually converts all strings that appear in the df to a factor variable instead of leaving at as a character variable. In statistical analysis, factors are useful for categorical variables. What you want to have depends on what you want to do with the data.
Read the docs
stringsAsFactors usually converts all strings that appear in the df to a factor variable instead of leaving at as a character variable. In statistical analysis, factors are useful for categorical variables. What you want to have depends on what you want to do with the data.
answered Nov 22 '18 at 10:29
Ben T.Ben T.
3117
3117
add a comment |
add a comment |
This setting changes the data type of strings.
sapply(emp.data1, class)
emp_id emp_name salary start_date
"integer" "character" "numeric" "Date"
sapply(emp.data2, class)
emp_id emp_name salary start_date
"integer" "factor" "numeric" "Date"
As you can see, the class of emp_name
is factor
when this option is turned off.
Factors are used when doing data analysis or visualization. For example, in the iris
data set, that comes natively with R we can look at the distribution of petal length, and petal width, while using color to indicate the species.
require(ggplot2)
sapply(iris,class)
ggplot(iris, aes(x=Petal.Length, y=Petal.Width, color=Species)) +
geom_point()
Labeling these as a factor let's R know that a sort of grouping is going on, and R will automatically determine the different groupings (or "levels") that are going on.
Explicit factor labeling allows you to optimally interact with data.
add a comment |
This setting changes the data type of strings.
sapply(emp.data1, class)
emp_id emp_name salary start_date
"integer" "character" "numeric" "Date"
sapply(emp.data2, class)
emp_id emp_name salary start_date
"integer" "factor" "numeric" "Date"
As you can see, the class of emp_name
is factor
when this option is turned off.
Factors are used when doing data analysis or visualization. For example, in the iris
data set, that comes natively with R we can look at the distribution of petal length, and petal width, while using color to indicate the species.
require(ggplot2)
sapply(iris,class)
ggplot(iris, aes(x=Petal.Length, y=Petal.Width, color=Species)) +
geom_point()
Labeling these as a factor let's R know that a sort of grouping is going on, and R will automatically determine the different groupings (or "levels") that are going on.
Explicit factor labeling allows you to optimally interact with data.
add a comment |
This setting changes the data type of strings.
sapply(emp.data1, class)
emp_id emp_name salary start_date
"integer" "character" "numeric" "Date"
sapply(emp.data2, class)
emp_id emp_name salary start_date
"integer" "factor" "numeric" "Date"
As you can see, the class of emp_name
is factor
when this option is turned off.
Factors are used when doing data analysis or visualization. For example, in the iris
data set, that comes natively with R we can look at the distribution of petal length, and petal width, while using color to indicate the species.
require(ggplot2)
sapply(iris,class)
ggplot(iris, aes(x=Petal.Length, y=Petal.Width, color=Species)) +
geom_point()
Labeling these as a factor let's R know that a sort of grouping is going on, and R will automatically determine the different groupings (or "levels") that are going on.
Explicit factor labeling allows you to optimally interact with data.
This setting changes the data type of strings.
sapply(emp.data1, class)
emp_id emp_name salary start_date
"integer" "character" "numeric" "Date"
sapply(emp.data2, class)
emp_id emp_name salary start_date
"integer" "factor" "numeric" "Date"
As you can see, the class of emp_name
is factor
when this option is turned off.
Factors are used when doing data analysis or visualization. For example, in the iris
data set, that comes natively with R we can look at the distribution of petal length, and petal width, while using color to indicate the species.
require(ggplot2)
sapply(iris,class)
ggplot(iris, aes(x=Petal.Length, y=Petal.Width, color=Species)) +
geom_point()
Labeling these as a factor let's R know that a sort of grouping is going on, and R will automatically determine the different groupings (or "levels") that are going on.
Explicit factor labeling allows you to optimally interact with data.
answered Nov 22 '18 at 10:57
Ken OsborneKen Osborne
111
111
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53428509%2frole-of-stringsasfactors-in-dataframe%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Compare
str(emp.data1)
andstr(emp.data2)
.– jay.sf
Nov 22 '18 at 10:11
1
They are not the same, as you claim. Try
identical(emp.data1, emp.data2)
andall.equal(emp.data1, emp.data2)
.– Rui Barradas
Nov 22 '18 at 10:30