Making a new dataframe based on certain conditions












-2















I have a dataframe object in R, sample of which is as follows:



4   5   3
4 5 9
4 5 2
4 6 4
4 10 4
4 10 3
4 10 7
4 10 2
4 9 3
4 9 7
4 10 4
4 10 3
4 6 8
4 5 4
12 3 6
12 4 1
12 4 2
12 4 7


From this dataframe, I want to create a new dataframe of 20 columns, as follows:




  1. Only one row in the new dataframe,for each unique value in $1. Hence for this sample data, the new dataframe should have 2 rows(unique 4,12).


  2. $2 represents the column number of the new dataframe, in which the value of $3(of this dataframe) is to be filled. If there are repeating cases, the median of the values of $3 is to be taken. For example, for
    $1 value 4, 5 is repeated 4 times, and in the new dataframe, column 5 of the first row should have the value median(3,9,2,4) =3.


  3. All other column values are zero.



A sample output for this data would be as follows:



0 0 0 0 3 4 0 0 3 4 0 0 0 0 0 0 0 0 0 0
0 0 3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


How can we do this in R? A huge thanks in advance!










share|improve this question


















  • 1





    how can the median of (3,9,2,4) be 3 and how can the median of c(4,8) be 4??

    – Onyambu
    Nov 21 '18 at 17:23
















-2















I have a dataframe object in R, sample of which is as follows:



4   5   3
4 5 9
4 5 2
4 6 4
4 10 4
4 10 3
4 10 7
4 10 2
4 9 3
4 9 7
4 10 4
4 10 3
4 6 8
4 5 4
12 3 6
12 4 1
12 4 2
12 4 7


From this dataframe, I want to create a new dataframe of 20 columns, as follows:




  1. Only one row in the new dataframe,for each unique value in $1. Hence for this sample data, the new dataframe should have 2 rows(unique 4,12).


  2. $2 represents the column number of the new dataframe, in which the value of $3(of this dataframe) is to be filled. If there are repeating cases, the median of the values of $3 is to be taken. For example, for
    $1 value 4, 5 is repeated 4 times, and in the new dataframe, column 5 of the first row should have the value median(3,9,2,4) =3.


  3. All other column values are zero.



A sample output for this data would be as follows:



0 0 0 0 3 4 0 0 3 4 0 0 0 0 0 0 0 0 0 0
0 0 3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


How can we do this in R? A huge thanks in advance!










share|improve this question


















  • 1





    how can the median of (3,9,2,4) be 3 and how can the median of c(4,8) be 4??

    – Onyambu
    Nov 21 '18 at 17:23














-2












-2








-2








I have a dataframe object in R, sample of which is as follows:



4   5   3
4 5 9
4 5 2
4 6 4
4 10 4
4 10 3
4 10 7
4 10 2
4 9 3
4 9 7
4 10 4
4 10 3
4 6 8
4 5 4
12 3 6
12 4 1
12 4 2
12 4 7


From this dataframe, I want to create a new dataframe of 20 columns, as follows:




  1. Only one row in the new dataframe,for each unique value in $1. Hence for this sample data, the new dataframe should have 2 rows(unique 4,12).


  2. $2 represents the column number of the new dataframe, in which the value of $3(of this dataframe) is to be filled. If there are repeating cases, the median of the values of $3 is to be taken. For example, for
    $1 value 4, 5 is repeated 4 times, and in the new dataframe, column 5 of the first row should have the value median(3,9,2,4) =3.


  3. All other column values are zero.



A sample output for this data would be as follows:



0 0 0 0 3 4 0 0 3 4 0 0 0 0 0 0 0 0 0 0
0 0 3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


How can we do this in R? A huge thanks in advance!










share|improve this question














I have a dataframe object in R, sample of which is as follows:



4   5   3
4 5 9
4 5 2
4 6 4
4 10 4
4 10 3
4 10 7
4 10 2
4 9 3
4 9 7
4 10 4
4 10 3
4 6 8
4 5 4
12 3 6
12 4 1
12 4 2
12 4 7


From this dataframe, I want to create a new dataframe of 20 columns, as follows:




  1. Only one row in the new dataframe,for each unique value in $1. Hence for this sample data, the new dataframe should have 2 rows(unique 4,12).


  2. $2 represents the column number of the new dataframe, in which the value of $3(of this dataframe) is to be filled. If there are repeating cases, the median of the values of $3 is to be taken. For example, for
    $1 value 4, 5 is repeated 4 times, and in the new dataframe, column 5 of the first row should have the value median(3,9,2,4) =3.


  3. All other column values are zero.



A sample output for this data would be as follows:



0 0 0 0 3 4 0 0 3 4 0 0 0 0 0 0 0 0 0 0
0 0 3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0


How can we do this in R? A huge thanks in advance!







r dataframe






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 21 '18 at 16:52









rishirishi

627




627








  • 1





    how can the median of (3,9,2,4) be 3 and how can the median of c(4,8) be 4??

    – Onyambu
    Nov 21 '18 at 17:23














  • 1





    how can the median of (3,9,2,4) be 3 and how can the median of c(4,8) be 4??

    – Onyambu
    Nov 21 '18 at 17:23








1




1





how can the median of (3,9,2,4) be 3 and how can the median of c(4,8) be 4??

– Onyambu
Nov 21 '18 at 17:23





how can the median of (3,9,2,4) be 3 and how can the median of c(4,8) be 4??

– Onyambu
Nov 21 '18 at 17:23












2 Answers
2






active

oldest

votes


















0














Are you sure that your expected outcome is correct? I think there is an error in calculating the median in your question, as also pointed out in the comments. You could do it as follows:



library(dplyr)

df$V1 <- as.numeric(as.factor(df$V1))
values <- df %>% group_by(V1,V2) %>% summarise(median=median(V3))
new_df <- matrix(0,nrow=length(unique(df$V1)), ncol=20)

for(i in 1:nrow(new_df)){
for(j in 1:ncol(new_df)){
value <- values$median[values$V1==i & values$V2==j]
if(length(value)>0){
new_df[i,j] = value
}
}
}

new_df

[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
[1,] 0 0 0 0 3.5 6 0 0 5 3.5 0 0 0 0 0 0 0 0 0 0
[2,] 0 0 6 2 0.0 0 0 0 0 0.0 0 0 0 0 0 0 0 0 0 0


The outcome is slightly different though.






share|improve this answer































    0














    df = transform(df,V1=factor(V1))
    fill = matrix(0,length(levels(df$V1)),20)
    df2=aggregate(V3~.,df,function(x)floor(median(x)))
    fill[cbind(as.integer(df2$V1),df2$V2)]=df2$V3
    fill
    [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
    [1,] 0 0 0 0 3 6 0 0 5 3 0 0 0 0
    [2,] 0 0 6 2 0 0 0 0 0 0 0 0 0 0
    [,15] [,16] [,17] [,18] [,19] [,20]
    [1,] 0 0 0 0 0 0
    [2,] 0 0 0 0 0 0





    share|improve this answer























      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53416960%2fmaking-a-new-dataframe-based-on-certain-conditions%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      0














      Are you sure that your expected outcome is correct? I think there is an error in calculating the median in your question, as also pointed out in the comments. You could do it as follows:



      library(dplyr)

      df$V1 <- as.numeric(as.factor(df$V1))
      values <- df %>% group_by(V1,V2) %>% summarise(median=median(V3))
      new_df <- matrix(0,nrow=length(unique(df$V1)), ncol=20)

      for(i in 1:nrow(new_df)){
      for(j in 1:ncol(new_df)){
      value <- values$median[values$V1==i & values$V2==j]
      if(length(value)>0){
      new_df[i,j] = value
      }
      }
      }

      new_df

      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
      [1,] 0 0 0 0 3.5 6 0 0 5 3.5 0 0 0 0 0 0 0 0 0 0
      [2,] 0 0 6 2 0.0 0 0 0 0 0.0 0 0 0 0 0 0 0 0 0 0


      The outcome is slightly different though.






      share|improve this answer




























        0














        Are you sure that your expected outcome is correct? I think there is an error in calculating the median in your question, as also pointed out in the comments. You could do it as follows:



        library(dplyr)

        df$V1 <- as.numeric(as.factor(df$V1))
        values <- df %>% group_by(V1,V2) %>% summarise(median=median(V3))
        new_df <- matrix(0,nrow=length(unique(df$V1)), ncol=20)

        for(i in 1:nrow(new_df)){
        for(j in 1:ncol(new_df)){
        value <- values$median[values$V1==i & values$V2==j]
        if(length(value)>0){
        new_df[i,j] = value
        }
        }
        }

        new_df

        [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
        [1,] 0 0 0 0 3.5 6 0 0 5 3.5 0 0 0 0 0 0 0 0 0 0
        [2,] 0 0 6 2 0.0 0 0 0 0 0.0 0 0 0 0 0 0 0 0 0 0


        The outcome is slightly different though.






        share|improve this answer


























          0












          0








          0







          Are you sure that your expected outcome is correct? I think there is an error in calculating the median in your question, as also pointed out in the comments. You could do it as follows:



          library(dplyr)

          df$V1 <- as.numeric(as.factor(df$V1))
          values <- df %>% group_by(V1,V2) %>% summarise(median=median(V3))
          new_df <- matrix(0,nrow=length(unique(df$V1)), ncol=20)

          for(i in 1:nrow(new_df)){
          for(j in 1:ncol(new_df)){
          value <- values$median[values$V1==i & values$V2==j]
          if(length(value)>0){
          new_df[i,j] = value
          }
          }
          }

          new_df

          [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
          [1,] 0 0 0 0 3.5 6 0 0 5 3.5 0 0 0 0 0 0 0 0 0 0
          [2,] 0 0 6 2 0.0 0 0 0 0 0.0 0 0 0 0 0 0 0 0 0 0


          The outcome is slightly different though.






          share|improve this answer













          Are you sure that your expected outcome is correct? I think there is an error in calculating the median in your question, as also pointed out in the comments. You could do it as follows:



          library(dplyr)

          df$V1 <- as.numeric(as.factor(df$V1))
          values <- df %>% group_by(V1,V2) %>% summarise(median=median(V3))
          new_df <- matrix(0,nrow=length(unique(df$V1)), ncol=20)

          for(i in 1:nrow(new_df)){
          for(j in 1:ncol(new_df)){
          value <- values$median[values$V1==i & values$V2==j]
          if(length(value)>0){
          new_df[i,j] = value
          }
          }
          }

          new_df

          [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
          [1,] 0 0 0 0 3.5 6 0 0 5 3.5 0 0 0 0 0 0 0 0 0 0
          [2,] 0 0 6 2 0.0 0 0 0 0 0.0 0 0 0 0 0 0 0 0 0 0


          The outcome is slightly different though.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 21 '18 at 17:46









          otwtmotwtm

          36110




          36110

























              0














              df = transform(df,V1=factor(V1))
              fill = matrix(0,length(levels(df$V1)),20)
              df2=aggregate(V3~.,df,function(x)floor(median(x)))
              fill[cbind(as.integer(df2$V1),df2$V2)]=df2$V3
              fill
              [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
              [1,] 0 0 0 0 3 6 0 0 5 3 0 0 0 0
              [2,] 0 0 6 2 0 0 0 0 0 0 0 0 0 0
              [,15] [,16] [,17] [,18] [,19] [,20]
              [1,] 0 0 0 0 0 0
              [2,] 0 0 0 0 0 0





              share|improve this answer




























                0














                df = transform(df,V1=factor(V1))
                fill = matrix(0,length(levels(df$V1)),20)
                df2=aggregate(V3~.,df,function(x)floor(median(x)))
                fill[cbind(as.integer(df2$V1),df2$V2)]=df2$V3
                fill
                [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
                [1,] 0 0 0 0 3 6 0 0 5 3 0 0 0 0
                [2,] 0 0 6 2 0 0 0 0 0 0 0 0 0 0
                [,15] [,16] [,17] [,18] [,19] [,20]
                [1,] 0 0 0 0 0 0
                [2,] 0 0 0 0 0 0





                share|improve this answer


























                  0












                  0








                  0







                  df = transform(df,V1=factor(V1))
                  fill = matrix(0,length(levels(df$V1)),20)
                  df2=aggregate(V3~.,df,function(x)floor(median(x)))
                  fill[cbind(as.integer(df2$V1),df2$V2)]=df2$V3
                  fill
                  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
                  [1,] 0 0 0 0 3 6 0 0 5 3 0 0 0 0
                  [2,] 0 0 6 2 0 0 0 0 0 0 0 0 0 0
                  [,15] [,16] [,17] [,18] [,19] [,20]
                  [1,] 0 0 0 0 0 0
                  [2,] 0 0 0 0 0 0





                  share|improve this answer













                  df = transform(df,V1=factor(V1))
                  fill = matrix(0,length(levels(df$V1)),20)
                  df2=aggregate(V3~.,df,function(x)floor(median(x)))
                  fill[cbind(as.integer(df2$V1),df2$V2)]=df2$V3
                  fill
                  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
                  [1,] 0 0 0 0 3 6 0 0 5 3 0 0 0 0
                  [2,] 0 0 6 2 0 0 0 0 0 0 0 0 0 0
                  [,15] [,16] [,17] [,18] [,19] [,20]
                  [1,] 0 0 0 0 0 0
                  [2,] 0 0 0 0 0 0






                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Nov 21 '18 at 17:57









                  OnyambuOnyambu

                  15.8k1521




                  15.8k1521






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53416960%2fmaking-a-new-dataframe-based-on-certain-conditions%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      "Incorrect syntax near the keyword 'ON'. (on update cascade, on delete cascade,)

                      Alcedinidae

                      Origin of the phrase “under your belt”?