From long to wide formats just based on two columns Rstudio












0















This is my data frame:
My data frame



I have a data frame of six columns and last columns contains the values . The Column 'code' includes s and d. column 'Sex' includes M and F. And I have two thousand offsprings in the column offspring.



seq parent code Sex offspring                     Value 

1 49032 s M J44010_CCG7YANXX_2_661_X4 -0.38455056

2 48741 s M J44010_CCG7YANXX_2_661_X4 0.10574340

3 48757 s M J44010_CCG7YANXX_2_661_X4 0.39572906

4 48465 d f J44010_CCG7YANXX_2_661_X4 0.43409006

5 48521 d f J44010_CCG7YANXX_2_661_X4 0.40337447

6 48703 d f J44010_CCG7YANXX_2_661_X4 -0.38148980


The column parent includes ids for both males and females.
I want to keep the female/dam id ,female/dam code and female/dam sex just beside the male/sire as a column and also keep the sire value and dam value seperately . So, the 'value' will be seprated in two parts .



The data frame will look like the below:



'seq''parent1''sirecode''Sex''parent2''damcode''Sex''offspring''sireValue' 'damvalue'

1 49032 s M 48465 d f J44010 -0.38455056 0.43409006

2 48741 s M 48521 d f J44010 0.10574340 0.40337447

3 48757 s M 48703 d f J44010 0.39572906 -0.38148980


So, each offspring will have 3 or 4 pair of parents.

I tried to use dcast function on it.










share|improve this question




















  • 1





    How do we know what male parent to match to what female parents? All the offspring are identical as far as I can tell.

    – iod
    Nov 23 '18 at 1:20











  • I just given the example of one offspring. There are other offspring just like it . And male parent (sire1) and female parent (dam1) are in pair. So, they are sequenced . For example, 1. Sire 1 2. Sire 2 3. Sire 3. 4. Dam1 5. Dam2 6. Dam3

    – Koushik Das
    Nov 23 '18 at 1:40


















0















This is my data frame:
My data frame



I have a data frame of six columns and last columns contains the values . The Column 'code' includes s and d. column 'Sex' includes M and F. And I have two thousand offsprings in the column offspring.



seq parent code Sex offspring                     Value 

1 49032 s M J44010_CCG7YANXX_2_661_X4 -0.38455056

2 48741 s M J44010_CCG7YANXX_2_661_X4 0.10574340

3 48757 s M J44010_CCG7YANXX_2_661_X4 0.39572906

4 48465 d f J44010_CCG7YANXX_2_661_X4 0.43409006

5 48521 d f J44010_CCG7YANXX_2_661_X4 0.40337447

6 48703 d f J44010_CCG7YANXX_2_661_X4 -0.38148980


The column parent includes ids for both males and females.
I want to keep the female/dam id ,female/dam code and female/dam sex just beside the male/sire as a column and also keep the sire value and dam value seperately . So, the 'value' will be seprated in two parts .



The data frame will look like the below:



'seq''parent1''sirecode''Sex''parent2''damcode''Sex''offspring''sireValue' 'damvalue'

1 49032 s M 48465 d f J44010 -0.38455056 0.43409006

2 48741 s M 48521 d f J44010 0.10574340 0.40337447

3 48757 s M 48703 d f J44010 0.39572906 -0.38148980


So, each offspring will have 3 or 4 pair of parents.

I tried to use dcast function on it.










share|improve this question




















  • 1





    How do we know what male parent to match to what female parents? All the offspring are identical as far as I can tell.

    – iod
    Nov 23 '18 at 1:20











  • I just given the example of one offspring. There are other offspring just like it . And male parent (sire1) and female parent (dam1) are in pair. So, they are sequenced . For example, 1. Sire 1 2. Sire 2 3. Sire 3. 4. Dam1 5. Dam2 6. Dam3

    – Koushik Das
    Nov 23 '18 at 1:40
















0












0








0








This is my data frame:
My data frame



I have a data frame of six columns and last columns contains the values . The Column 'code' includes s and d. column 'Sex' includes M and F. And I have two thousand offsprings in the column offspring.



seq parent code Sex offspring                     Value 

1 49032 s M J44010_CCG7YANXX_2_661_X4 -0.38455056

2 48741 s M J44010_CCG7YANXX_2_661_X4 0.10574340

3 48757 s M J44010_CCG7YANXX_2_661_X4 0.39572906

4 48465 d f J44010_CCG7YANXX_2_661_X4 0.43409006

5 48521 d f J44010_CCG7YANXX_2_661_X4 0.40337447

6 48703 d f J44010_CCG7YANXX_2_661_X4 -0.38148980


The column parent includes ids for both males and females.
I want to keep the female/dam id ,female/dam code and female/dam sex just beside the male/sire as a column and also keep the sire value and dam value seperately . So, the 'value' will be seprated in two parts .



The data frame will look like the below:



'seq''parent1''sirecode''Sex''parent2''damcode''Sex''offspring''sireValue' 'damvalue'

1 49032 s M 48465 d f J44010 -0.38455056 0.43409006

2 48741 s M 48521 d f J44010 0.10574340 0.40337447

3 48757 s M 48703 d f J44010 0.39572906 -0.38148980


So, each offspring will have 3 or 4 pair of parents.

I tried to use dcast function on it.










share|improve this question
















This is my data frame:
My data frame



I have a data frame of six columns and last columns contains the values . The Column 'code' includes s and d. column 'Sex' includes M and F. And I have two thousand offsprings in the column offspring.



seq parent code Sex offspring                     Value 

1 49032 s M J44010_CCG7YANXX_2_661_X4 -0.38455056

2 48741 s M J44010_CCG7YANXX_2_661_X4 0.10574340

3 48757 s M J44010_CCG7YANXX_2_661_X4 0.39572906

4 48465 d f J44010_CCG7YANXX_2_661_X4 0.43409006

5 48521 d f J44010_CCG7YANXX_2_661_X4 0.40337447

6 48703 d f J44010_CCG7YANXX_2_661_X4 -0.38148980


The column parent includes ids for both males and females.
I want to keep the female/dam id ,female/dam code and female/dam sex just beside the male/sire as a column and also keep the sire value and dam value seperately . So, the 'value' will be seprated in two parts .



The data frame will look like the below:



'seq''parent1''sirecode''Sex''parent2''damcode''Sex''offspring''sireValue' 'damvalue'

1 49032 s M 48465 d f J44010 -0.38455056 0.43409006

2 48741 s M 48521 d f J44010 0.10574340 0.40337447

3 48757 s M 48703 d f J44010 0.39572906 -0.38148980


So, each offspring will have 3 or 4 pair of parents.

I tried to use dcast function on it.







r






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 23 '18 at 2:19









kit

1,1063817




1,1063817










asked Nov 23 '18 at 1:04









Koushik DasKoushik Das

12




12








  • 1





    How do we know what male parent to match to what female parents? All the offspring are identical as far as I can tell.

    – iod
    Nov 23 '18 at 1:20











  • I just given the example of one offspring. There are other offspring just like it . And male parent (sire1) and female parent (dam1) are in pair. So, they are sequenced . For example, 1. Sire 1 2. Sire 2 3. Sire 3. 4. Dam1 5. Dam2 6. Dam3

    – Koushik Das
    Nov 23 '18 at 1:40
















  • 1





    How do we know what male parent to match to what female parents? All the offspring are identical as far as I can tell.

    – iod
    Nov 23 '18 at 1:20











  • I just given the example of one offspring. There are other offspring just like it . And male parent (sire1) and female parent (dam1) are in pair. So, they are sequenced . For example, 1. Sire 1 2. Sire 2 3. Sire 3. 4. Dam1 5. Dam2 6. Dam3

    – Koushik Das
    Nov 23 '18 at 1:40










1




1





How do we know what male parent to match to what female parents? All the offspring are identical as far as I can tell.

– iod
Nov 23 '18 at 1:20





How do we know what male parent to match to what female parents? All the offspring are identical as far as I can tell.

– iod
Nov 23 '18 at 1:20













I just given the example of one offspring. There are other offspring just like it . And male parent (sire1) and female parent (dam1) are in pair. So, they are sequenced . For example, 1. Sire 1 2. Sire 2 3. Sire 3. 4. Dam1 5. Dam2 6. Dam3

– Koushik Das
Nov 23 '18 at 1:40







I just given the example of one offspring. There are other offspring just like it . And male parent (sire1) and female parent (dam1) are in pair. So, they are sequenced . For example, 1. Sire 1 2. Sire 2 3. Sire 3. 4. Dam1 5. Dam2 6. Dam3

– Koushik Das
Nov 23 '18 at 1:40














1 Answer
1






active

oldest

votes


















0














We could use dcast after creating a sequence column



library(data.table)
setDT(df1)[, n := seq_len(.N), .(code, Sex)]
dcast(df1, n + offspring ~ rowid(n), value.var = c('parent', 'code', 'Sex', 'Value'), sep = "")
# n offspring parent1 parent2 code1 code2 Sex1 Sex2 Value1 Value2
#1: 1 J44010_CCG7YANXX_2_661_X4 49032 48465 s d M f -0.3845506 0.4340901
#2: 2 J44010_CCG7YANXX_2_661_X4 48741 48521 s d M f 0.1057434 0.4033745
#3: 3 J44010_CCG7YANXX_2_661_X4 48757 48703 s d M f 0.3957291 -0.3814898




In base R, we can use reshape



df1$n <- with(df1, ave(seq_along(Sex), Sex, FUN = seq_along))
df1$n1 <- with(df1, ave(n, n, FUN = seq_along))
reshape(df1[-1], idvar = c('n', 'offspring'), timevar = 'n1', direction = 'wide' )


data



df1 <- structure(list(seq = 1:6, parent = c(49032L, 48741L, 48757L, 
48465L, 48521L, 48703L), code = c("s", "s", "s", "d", "d", "d"
), Sex = c("M", "M", "M", "f", "f", "f"),
offspring = c("J44010_CCG7YANXX_2_661_X4",
"J44010_CCG7YANXX_2_661_X4", "J44010_CCG7YANXX_2_661_X4",
"J44010_CCG7YANXX_2_661_X4",
"J44010_CCG7YANXX_2_661_X4", "J44010_CCG7YANXX_2_661_X4"),
Value = c(-0.38455056,
0.1057434, 0.39572906, 0.43409006, 0.40337447, -0.3814898)),
class = "data.frame", row.names = c(NA, -6L))





share|improve this answer

























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53439563%2ffrom-long-to-wide-formats-just-based-on-two-columns-rstudio%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    We could use dcast after creating a sequence column



    library(data.table)
    setDT(df1)[, n := seq_len(.N), .(code, Sex)]
    dcast(df1, n + offspring ~ rowid(n), value.var = c('parent', 'code', 'Sex', 'Value'), sep = "")
    # n offspring parent1 parent2 code1 code2 Sex1 Sex2 Value1 Value2
    #1: 1 J44010_CCG7YANXX_2_661_X4 49032 48465 s d M f -0.3845506 0.4340901
    #2: 2 J44010_CCG7YANXX_2_661_X4 48741 48521 s d M f 0.1057434 0.4033745
    #3: 3 J44010_CCG7YANXX_2_661_X4 48757 48703 s d M f 0.3957291 -0.3814898




    In base R, we can use reshape



    df1$n <- with(df1, ave(seq_along(Sex), Sex, FUN = seq_along))
    df1$n1 <- with(df1, ave(n, n, FUN = seq_along))
    reshape(df1[-1], idvar = c('n', 'offspring'), timevar = 'n1', direction = 'wide' )


    data



    df1 <- structure(list(seq = 1:6, parent = c(49032L, 48741L, 48757L, 
    48465L, 48521L, 48703L), code = c("s", "s", "s", "d", "d", "d"
    ), Sex = c("M", "M", "M", "f", "f", "f"),
    offspring = c("J44010_CCG7YANXX_2_661_X4",
    "J44010_CCG7YANXX_2_661_X4", "J44010_CCG7YANXX_2_661_X4",
    "J44010_CCG7YANXX_2_661_X4",
    "J44010_CCG7YANXX_2_661_X4", "J44010_CCG7YANXX_2_661_X4"),
    Value = c(-0.38455056,
    0.1057434, 0.39572906, 0.43409006, 0.40337447, -0.3814898)),
    class = "data.frame", row.names = c(NA, -6L))





    share|improve this answer






























      0














      We could use dcast after creating a sequence column



      library(data.table)
      setDT(df1)[, n := seq_len(.N), .(code, Sex)]
      dcast(df1, n + offspring ~ rowid(n), value.var = c('parent', 'code', 'Sex', 'Value'), sep = "")
      # n offspring parent1 parent2 code1 code2 Sex1 Sex2 Value1 Value2
      #1: 1 J44010_CCG7YANXX_2_661_X4 49032 48465 s d M f -0.3845506 0.4340901
      #2: 2 J44010_CCG7YANXX_2_661_X4 48741 48521 s d M f 0.1057434 0.4033745
      #3: 3 J44010_CCG7YANXX_2_661_X4 48757 48703 s d M f 0.3957291 -0.3814898




      In base R, we can use reshape



      df1$n <- with(df1, ave(seq_along(Sex), Sex, FUN = seq_along))
      df1$n1 <- with(df1, ave(n, n, FUN = seq_along))
      reshape(df1[-1], idvar = c('n', 'offspring'), timevar = 'n1', direction = 'wide' )


      data



      df1 <- structure(list(seq = 1:6, parent = c(49032L, 48741L, 48757L, 
      48465L, 48521L, 48703L), code = c("s", "s", "s", "d", "d", "d"
      ), Sex = c("M", "M", "M", "f", "f", "f"),
      offspring = c("J44010_CCG7YANXX_2_661_X4",
      "J44010_CCG7YANXX_2_661_X4", "J44010_CCG7YANXX_2_661_X4",
      "J44010_CCG7YANXX_2_661_X4",
      "J44010_CCG7YANXX_2_661_X4", "J44010_CCG7YANXX_2_661_X4"),
      Value = c(-0.38455056,
      0.1057434, 0.39572906, 0.43409006, 0.40337447, -0.3814898)),
      class = "data.frame", row.names = c(NA, -6L))





      share|improve this answer




























        0












        0








        0







        We could use dcast after creating a sequence column



        library(data.table)
        setDT(df1)[, n := seq_len(.N), .(code, Sex)]
        dcast(df1, n + offspring ~ rowid(n), value.var = c('parent', 'code', 'Sex', 'Value'), sep = "")
        # n offspring parent1 parent2 code1 code2 Sex1 Sex2 Value1 Value2
        #1: 1 J44010_CCG7YANXX_2_661_X4 49032 48465 s d M f -0.3845506 0.4340901
        #2: 2 J44010_CCG7YANXX_2_661_X4 48741 48521 s d M f 0.1057434 0.4033745
        #3: 3 J44010_CCG7YANXX_2_661_X4 48757 48703 s d M f 0.3957291 -0.3814898




        In base R, we can use reshape



        df1$n <- with(df1, ave(seq_along(Sex), Sex, FUN = seq_along))
        df1$n1 <- with(df1, ave(n, n, FUN = seq_along))
        reshape(df1[-1], idvar = c('n', 'offspring'), timevar = 'n1', direction = 'wide' )


        data



        df1 <- structure(list(seq = 1:6, parent = c(49032L, 48741L, 48757L, 
        48465L, 48521L, 48703L), code = c("s", "s", "s", "d", "d", "d"
        ), Sex = c("M", "M", "M", "f", "f", "f"),
        offspring = c("J44010_CCG7YANXX_2_661_X4",
        "J44010_CCG7YANXX_2_661_X4", "J44010_CCG7YANXX_2_661_X4",
        "J44010_CCG7YANXX_2_661_X4",
        "J44010_CCG7YANXX_2_661_X4", "J44010_CCG7YANXX_2_661_X4"),
        Value = c(-0.38455056,
        0.1057434, 0.39572906, 0.43409006, 0.40337447, -0.3814898)),
        class = "data.frame", row.names = c(NA, -6L))





        share|improve this answer















        We could use dcast after creating a sequence column



        library(data.table)
        setDT(df1)[, n := seq_len(.N), .(code, Sex)]
        dcast(df1, n + offspring ~ rowid(n), value.var = c('parent', 'code', 'Sex', 'Value'), sep = "")
        # n offspring parent1 parent2 code1 code2 Sex1 Sex2 Value1 Value2
        #1: 1 J44010_CCG7YANXX_2_661_X4 49032 48465 s d M f -0.3845506 0.4340901
        #2: 2 J44010_CCG7YANXX_2_661_X4 48741 48521 s d M f 0.1057434 0.4033745
        #3: 3 J44010_CCG7YANXX_2_661_X4 48757 48703 s d M f 0.3957291 -0.3814898




        In base R, we can use reshape



        df1$n <- with(df1, ave(seq_along(Sex), Sex, FUN = seq_along))
        df1$n1 <- with(df1, ave(n, n, FUN = seq_along))
        reshape(df1[-1], idvar = c('n', 'offspring'), timevar = 'n1', direction = 'wide' )


        data



        df1 <- structure(list(seq = 1:6, parent = c(49032L, 48741L, 48757L, 
        48465L, 48521L, 48703L), code = c("s", "s", "s", "d", "d", "d"
        ), Sex = c("M", "M", "M", "f", "f", "f"),
        offspring = c("J44010_CCG7YANXX_2_661_X4",
        "J44010_CCG7YANXX_2_661_X4", "J44010_CCG7YANXX_2_661_X4",
        "J44010_CCG7YANXX_2_661_X4",
        "J44010_CCG7YANXX_2_661_X4", "J44010_CCG7YANXX_2_661_X4"),
        Value = c(-0.38455056,
        0.1057434, 0.39572906, 0.43409006, 0.40337447, -0.3814898)),
        class = "data.frame", row.names = c(NA, -6L))






        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited Nov 23 '18 at 1:43

























        answered Nov 23 '18 at 1:25









        akrunakrun

        414k13202275




        414k13202275
































            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53439563%2ffrom-long-to-wide-formats-just-based-on-two-columns-rstudio%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            "Incorrect syntax near the keyword 'ON'. (on update cascade, on delete cascade,)

            Alcedinidae

            Origin of the phrase “under your belt”?