Role of stringsAsFactors in dataframe












0















Please look at this two dataframes in R.



When I run this code both emp.data1 and emp.data2 are the same despite stringsAsFactors in one of them is TRUE and in theother is FALSE.So what is the role of stringsAsFactors in dataframes?



# Create the data frame.
emp.data1 <- data.frame(
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),

start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
"2015-03-27")),
stringsAsFactors = FALSE#Here stringsAsFactors is false
)
emp.data2 <- data.frame(
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),

start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
"2015-03-27")),
stringsAsFactors = TRUE#Here stringsAsFactors is true
)









share|improve this question

























  • Compare str(emp.data1) and str(emp.data2).

    – jay.sf
    Nov 22 '18 at 10:11






  • 1





    They are not the same, as you claim. Try identical(emp.data1, emp.data2) and all.equal(emp.data1, emp.data2).

    – Rui Barradas
    Nov 22 '18 at 10:30
















0















Please look at this two dataframes in R.



When I run this code both emp.data1 and emp.data2 are the same despite stringsAsFactors in one of them is TRUE and in theother is FALSE.So what is the role of stringsAsFactors in dataframes?



# Create the data frame.
emp.data1 <- data.frame(
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),

start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
"2015-03-27")),
stringsAsFactors = FALSE#Here stringsAsFactors is false
)
emp.data2 <- data.frame(
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),

start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
"2015-03-27")),
stringsAsFactors = TRUE#Here stringsAsFactors is true
)









share|improve this question

























  • Compare str(emp.data1) and str(emp.data2).

    – jay.sf
    Nov 22 '18 at 10:11






  • 1





    They are not the same, as you claim. Try identical(emp.data1, emp.data2) and all.equal(emp.data1, emp.data2).

    – Rui Barradas
    Nov 22 '18 at 10:30














0












0








0








Please look at this two dataframes in R.



When I run this code both emp.data1 and emp.data2 are the same despite stringsAsFactors in one of them is TRUE and in theother is FALSE.So what is the role of stringsAsFactors in dataframes?



# Create the data frame.
emp.data1 <- data.frame(
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),

start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
"2015-03-27")),
stringsAsFactors = FALSE#Here stringsAsFactors is false
)
emp.data2 <- data.frame(
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),

start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
"2015-03-27")),
stringsAsFactors = TRUE#Here stringsAsFactors is true
)









share|improve this question
















Please look at this two dataframes in R.



When I run this code both emp.data1 and emp.data2 are the same despite stringsAsFactors in one of them is TRUE and in theother is FALSE.So what is the role of stringsAsFactors in dataframes?



# Create the data frame.
emp.data1 <- data.frame(
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),

start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
"2015-03-27")),
stringsAsFactors = FALSE#Here stringsAsFactors is false
)
emp.data2 <- data.frame(
emp_id = c (1:5),
emp_name = c("Rick","Dan","Michelle","Ryan","Gary"),
salary = c(623.3,515.2,611.0,729.0,843.25),

start_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",
"2015-03-27")),
stringsAsFactors = TRUE#Here stringsAsFactors is true
)






r dataframe






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 22 '18 at 10:14









RLave

4,75711124




4,75711124










asked Nov 22 '18 at 10:10









Seyed Ali AletahaSeyed Ali Aletaha

1




1













  • Compare str(emp.data1) and str(emp.data2).

    – jay.sf
    Nov 22 '18 at 10:11






  • 1





    They are not the same, as you claim. Try identical(emp.data1, emp.data2) and all.equal(emp.data1, emp.data2).

    – Rui Barradas
    Nov 22 '18 at 10:30



















  • Compare str(emp.data1) and str(emp.data2).

    – jay.sf
    Nov 22 '18 at 10:11






  • 1





    They are not the same, as you claim. Try identical(emp.data1, emp.data2) and all.equal(emp.data1, emp.data2).

    – Rui Barradas
    Nov 22 '18 at 10:30

















Compare str(emp.data1) and str(emp.data2).

– jay.sf
Nov 22 '18 at 10:11





Compare str(emp.data1) and str(emp.data2).

– jay.sf
Nov 22 '18 at 10:11




1




1





They are not the same, as you claim. Try identical(emp.data1, emp.data2) and all.equal(emp.data1, emp.data2).

– Rui Barradas
Nov 22 '18 at 10:30





They are not the same, as you claim. Try identical(emp.data1, emp.data2) and all.equal(emp.data1, emp.data2).

– Rui Barradas
Nov 22 '18 at 10:30












2 Answers
2






active

oldest

votes


















0














Read the docs



stringsAsFactors usually converts all strings that appear in the df to a factor variable instead of leaving at as a character variable. In statistical analysis, factors are useful for categorical variables. What you want to have depends on what you want to do with the data.






share|improve this answer































    0














    This setting changes the data type of strings.



    sapply(emp.data1, class)
    emp_id emp_name salary start_date
    "integer" "character" "numeric" "Date"

    sapply(emp.data2, class)
    emp_id emp_name salary start_date
    "integer" "factor" "numeric" "Date"


    As you can see, the class of emp_name is factor when this option is turned off.



    Factors are used when doing data analysis or visualization. For example, in the iris data set, that comes natively with R we can look at the distribution of petal length, and petal width, while using color to indicate the species.



    require(ggplot2)
    sapply(iris,class)
    ggplot(iris, aes(x=Petal.Length, y=Petal.Width, color=Species)) +
    geom_point()


    Labeling these as a factor let's R know that a sort of grouping is going on, and R will automatically determine the different groupings (or "levels") that are going on.



    Explicit factor labeling allows you to optimally interact with data.






    share|improve this answer























      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53428509%2frole-of-stringsasfactors-in-dataframe%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      0














      Read the docs



      stringsAsFactors usually converts all strings that appear in the df to a factor variable instead of leaving at as a character variable. In statistical analysis, factors are useful for categorical variables. What you want to have depends on what you want to do with the data.






      share|improve this answer




























        0














        Read the docs



        stringsAsFactors usually converts all strings that appear in the df to a factor variable instead of leaving at as a character variable. In statistical analysis, factors are useful for categorical variables. What you want to have depends on what you want to do with the data.






        share|improve this answer


























          0












          0








          0







          Read the docs



          stringsAsFactors usually converts all strings that appear in the df to a factor variable instead of leaving at as a character variable. In statistical analysis, factors are useful for categorical variables. What you want to have depends on what you want to do with the data.






          share|improve this answer













          Read the docs



          stringsAsFactors usually converts all strings that appear in the df to a factor variable instead of leaving at as a character variable. In statistical analysis, factors are useful for categorical variables. What you want to have depends on what you want to do with the data.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 22 '18 at 10:29









          Ben T.Ben T.

          3117




          3117

























              0














              This setting changes the data type of strings.



              sapply(emp.data1, class)
              emp_id emp_name salary start_date
              "integer" "character" "numeric" "Date"

              sapply(emp.data2, class)
              emp_id emp_name salary start_date
              "integer" "factor" "numeric" "Date"


              As you can see, the class of emp_name is factor when this option is turned off.



              Factors are used when doing data analysis or visualization. For example, in the iris data set, that comes natively with R we can look at the distribution of petal length, and petal width, while using color to indicate the species.



              require(ggplot2)
              sapply(iris,class)
              ggplot(iris, aes(x=Petal.Length, y=Petal.Width, color=Species)) +
              geom_point()


              Labeling these as a factor let's R know that a sort of grouping is going on, and R will automatically determine the different groupings (or "levels") that are going on.



              Explicit factor labeling allows you to optimally interact with data.






              share|improve this answer




























                0














                This setting changes the data type of strings.



                sapply(emp.data1, class)
                emp_id emp_name salary start_date
                "integer" "character" "numeric" "Date"

                sapply(emp.data2, class)
                emp_id emp_name salary start_date
                "integer" "factor" "numeric" "Date"


                As you can see, the class of emp_name is factor when this option is turned off.



                Factors are used when doing data analysis or visualization. For example, in the iris data set, that comes natively with R we can look at the distribution of petal length, and petal width, while using color to indicate the species.



                require(ggplot2)
                sapply(iris,class)
                ggplot(iris, aes(x=Petal.Length, y=Petal.Width, color=Species)) +
                geom_point()


                Labeling these as a factor let's R know that a sort of grouping is going on, and R will automatically determine the different groupings (or "levels") that are going on.



                Explicit factor labeling allows you to optimally interact with data.






                share|improve this answer


























                  0












                  0








                  0







                  This setting changes the data type of strings.



                  sapply(emp.data1, class)
                  emp_id emp_name salary start_date
                  "integer" "character" "numeric" "Date"

                  sapply(emp.data2, class)
                  emp_id emp_name salary start_date
                  "integer" "factor" "numeric" "Date"


                  As you can see, the class of emp_name is factor when this option is turned off.



                  Factors are used when doing data analysis or visualization. For example, in the iris data set, that comes natively with R we can look at the distribution of petal length, and petal width, while using color to indicate the species.



                  require(ggplot2)
                  sapply(iris,class)
                  ggplot(iris, aes(x=Petal.Length, y=Petal.Width, color=Species)) +
                  geom_point()


                  Labeling these as a factor let's R know that a sort of grouping is going on, and R will automatically determine the different groupings (or "levels") that are going on.



                  Explicit factor labeling allows you to optimally interact with data.






                  share|improve this answer













                  This setting changes the data type of strings.



                  sapply(emp.data1, class)
                  emp_id emp_name salary start_date
                  "integer" "character" "numeric" "Date"

                  sapply(emp.data2, class)
                  emp_id emp_name salary start_date
                  "integer" "factor" "numeric" "Date"


                  As you can see, the class of emp_name is factor when this option is turned off.



                  Factors are used when doing data analysis or visualization. For example, in the iris data set, that comes natively with R we can look at the distribution of petal length, and petal width, while using color to indicate the species.



                  require(ggplot2)
                  sapply(iris,class)
                  ggplot(iris, aes(x=Petal.Length, y=Petal.Width, color=Species)) +
                  geom_point()


                  Labeling these as a factor let's R know that a sort of grouping is going on, and R will automatically determine the different groupings (or "levels") that are going on.



                  Explicit factor labeling allows you to optimally interact with data.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 22 '18 at 10:57









                  Ken OsborneKen Osborne

                  111




                  111






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53428509%2frole-of-stringsasfactors-in-dataframe%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      "Incorrect syntax near the keyword 'ON'. (on update cascade, on delete cascade,)

                      Alcedinidae

                      Origin of the phrase “under your belt”?