How to assign a unique code for duplicate rows in this 'df' in R?












1














I have this data frame df



df <- data.frame(stringsAsFactors=FALSE,
id = c(1L, 2L, 3L, 4L, 5L, 6L),
Country = c("ESP", "ESP", "ESP", "ITA", "ITA", "ITA"),
Year = c(1965L, 1965L, 1965L, 1965L, 1965L, 1965L),
Time.step = c("Month", "Month", "Month", "Month", "Month", "Month"),
GSA.numb = c("GSA 5", "GSA 5", "GSA 5", "GSA 17", "GSA 17", "GSA 17"),
Species = c("Mullus", "Mullus", "Mullus", "Eledone", "Eledone", "Eledone"),
Quantity = c(500L, 200L, 200L, 350L, 350L, 125L)
)

df

id Country Year Time.step GSA.numb Species Quantity
1 ESP 1965 Month GSA 5 Mullus 500
2 ESP 1965 Month GSA 5 Mullus 200
3 ESP 1965 Month GSA 5 Mullus 200
4 ITA 1965 Month GSA 17 Eledone 350
5 ITA 1965 Month GSA 17 Eledone 350
6 ITA 1965 Month GSA 17 Eledone 125


I have some duplicated row, as: 3 and 5.
I can create a column for F or T logic value when the row is duplicated:



df$dup <- duplicated(df[,2:7]) #No id! 


result:



id  Country   Year    Time.step    GSA.numb  Species   Quantity dup
1 ESP 1965 Month GSA 5 Mullus 500 FALSE
2 ESP 1965 Month GSA 5 Mullus 200 FALSE
3 ESP 1965 Month GSA 5 Mullus 200 TRUE
4 ITA 1965 Month GSA 17 Eledone 350 FALSE
5 ITA 1965 Month GSA 17 Eledone 350 TRUE
6 ITA 1965 Month GSA 17 Eledone 125 FALSE


Now, I would like a new column (in a dynamic way, my true df is very big, with many row, column and variable) where is possible to view the number of duplicated row when is TRUE, like this:



aspected.df

id Country Year Time.step GSA.numb Species Quantity dup ref
1 ESP 1965 Month GSA 5 Mullus 500 FALSE NA
2 ESP 1965 Month GSA 5 Mullus 200 FALSE NA
3 ESP 1965 Month GSA 5 Mullus 200 TRUE =id2
4 ITA 1965 Month GSA 17 Eledone 350 FALSE NA
5 ITA 1965 Month GSA 17 Eledone 350 TRUE =id4
6 ITA 1965 Month GSA 17 Eledone 125 FALSE NA


I tried with:



with(df, ave(as.character(Species), df[,2:6], FUN = make.unique)) 


but result is:



[1] "Mullus"    "Mullus.1"  "Mullus.2"  "Eledone"   "Eledone.1" "Eledone.2"


I think I need more code input . Which function are useful? (duplicated,make.unit, row.names and so on...)










share|improve this question
























  • Note that you have accepted an answer that doesn't correspond to your desired output; you may want to correct your output table.
    – arg0naut
    Nov 20 '18 at 12:37
















1














I have this data frame df



df <- data.frame(stringsAsFactors=FALSE,
id = c(1L, 2L, 3L, 4L, 5L, 6L),
Country = c("ESP", "ESP", "ESP", "ITA", "ITA", "ITA"),
Year = c(1965L, 1965L, 1965L, 1965L, 1965L, 1965L),
Time.step = c("Month", "Month", "Month", "Month", "Month", "Month"),
GSA.numb = c("GSA 5", "GSA 5", "GSA 5", "GSA 17", "GSA 17", "GSA 17"),
Species = c("Mullus", "Mullus", "Mullus", "Eledone", "Eledone", "Eledone"),
Quantity = c(500L, 200L, 200L, 350L, 350L, 125L)
)

df

id Country Year Time.step GSA.numb Species Quantity
1 ESP 1965 Month GSA 5 Mullus 500
2 ESP 1965 Month GSA 5 Mullus 200
3 ESP 1965 Month GSA 5 Mullus 200
4 ITA 1965 Month GSA 17 Eledone 350
5 ITA 1965 Month GSA 17 Eledone 350
6 ITA 1965 Month GSA 17 Eledone 125


I have some duplicated row, as: 3 and 5.
I can create a column for F or T logic value when the row is duplicated:



df$dup <- duplicated(df[,2:7]) #No id! 


result:



id  Country   Year    Time.step    GSA.numb  Species   Quantity dup
1 ESP 1965 Month GSA 5 Mullus 500 FALSE
2 ESP 1965 Month GSA 5 Mullus 200 FALSE
3 ESP 1965 Month GSA 5 Mullus 200 TRUE
4 ITA 1965 Month GSA 17 Eledone 350 FALSE
5 ITA 1965 Month GSA 17 Eledone 350 TRUE
6 ITA 1965 Month GSA 17 Eledone 125 FALSE


Now, I would like a new column (in a dynamic way, my true df is very big, with many row, column and variable) where is possible to view the number of duplicated row when is TRUE, like this:



aspected.df

id Country Year Time.step GSA.numb Species Quantity dup ref
1 ESP 1965 Month GSA 5 Mullus 500 FALSE NA
2 ESP 1965 Month GSA 5 Mullus 200 FALSE NA
3 ESP 1965 Month GSA 5 Mullus 200 TRUE =id2
4 ITA 1965 Month GSA 17 Eledone 350 FALSE NA
5 ITA 1965 Month GSA 17 Eledone 350 TRUE =id4
6 ITA 1965 Month GSA 17 Eledone 125 FALSE NA


I tried with:



with(df, ave(as.character(Species), df[,2:6], FUN = make.unique)) 


but result is:



[1] "Mullus"    "Mullus.1"  "Mullus.2"  "Eledone"   "Eledone.1" "Eledone.2"


I think I need more code input . Which function are useful? (duplicated,make.unit, row.names and so on...)










share|improve this question
























  • Note that you have accepted an answer that doesn't correspond to your desired output; you may want to correct your output table.
    – arg0naut
    Nov 20 '18 at 12:37














1












1








1


1





I have this data frame df



df <- data.frame(stringsAsFactors=FALSE,
id = c(1L, 2L, 3L, 4L, 5L, 6L),
Country = c("ESP", "ESP", "ESP", "ITA", "ITA", "ITA"),
Year = c(1965L, 1965L, 1965L, 1965L, 1965L, 1965L),
Time.step = c("Month", "Month", "Month", "Month", "Month", "Month"),
GSA.numb = c("GSA 5", "GSA 5", "GSA 5", "GSA 17", "GSA 17", "GSA 17"),
Species = c("Mullus", "Mullus", "Mullus", "Eledone", "Eledone", "Eledone"),
Quantity = c(500L, 200L, 200L, 350L, 350L, 125L)
)

df

id Country Year Time.step GSA.numb Species Quantity
1 ESP 1965 Month GSA 5 Mullus 500
2 ESP 1965 Month GSA 5 Mullus 200
3 ESP 1965 Month GSA 5 Mullus 200
4 ITA 1965 Month GSA 17 Eledone 350
5 ITA 1965 Month GSA 17 Eledone 350
6 ITA 1965 Month GSA 17 Eledone 125


I have some duplicated row, as: 3 and 5.
I can create a column for F or T logic value when the row is duplicated:



df$dup <- duplicated(df[,2:7]) #No id! 


result:



id  Country   Year    Time.step    GSA.numb  Species   Quantity dup
1 ESP 1965 Month GSA 5 Mullus 500 FALSE
2 ESP 1965 Month GSA 5 Mullus 200 FALSE
3 ESP 1965 Month GSA 5 Mullus 200 TRUE
4 ITA 1965 Month GSA 17 Eledone 350 FALSE
5 ITA 1965 Month GSA 17 Eledone 350 TRUE
6 ITA 1965 Month GSA 17 Eledone 125 FALSE


Now, I would like a new column (in a dynamic way, my true df is very big, with many row, column and variable) where is possible to view the number of duplicated row when is TRUE, like this:



aspected.df

id Country Year Time.step GSA.numb Species Quantity dup ref
1 ESP 1965 Month GSA 5 Mullus 500 FALSE NA
2 ESP 1965 Month GSA 5 Mullus 200 FALSE NA
3 ESP 1965 Month GSA 5 Mullus 200 TRUE =id2
4 ITA 1965 Month GSA 17 Eledone 350 FALSE NA
5 ITA 1965 Month GSA 17 Eledone 350 TRUE =id4
6 ITA 1965 Month GSA 17 Eledone 125 FALSE NA


I tried with:



with(df, ave(as.character(Species), df[,2:6], FUN = make.unique)) 


but result is:



[1] "Mullus"    "Mullus.1"  "Mullus.2"  "Eledone"   "Eledone.1" "Eledone.2"


I think I need more code input . Which function are useful? (duplicated,make.unit, row.names and so on...)










share|improve this question















I have this data frame df



df <- data.frame(stringsAsFactors=FALSE,
id = c(1L, 2L, 3L, 4L, 5L, 6L),
Country = c("ESP", "ESP", "ESP", "ITA", "ITA", "ITA"),
Year = c(1965L, 1965L, 1965L, 1965L, 1965L, 1965L),
Time.step = c("Month", "Month", "Month", "Month", "Month", "Month"),
GSA.numb = c("GSA 5", "GSA 5", "GSA 5", "GSA 17", "GSA 17", "GSA 17"),
Species = c("Mullus", "Mullus", "Mullus", "Eledone", "Eledone", "Eledone"),
Quantity = c(500L, 200L, 200L, 350L, 350L, 125L)
)

df

id Country Year Time.step GSA.numb Species Quantity
1 ESP 1965 Month GSA 5 Mullus 500
2 ESP 1965 Month GSA 5 Mullus 200
3 ESP 1965 Month GSA 5 Mullus 200
4 ITA 1965 Month GSA 17 Eledone 350
5 ITA 1965 Month GSA 17 Eledone 350
6 ITA 1965 Month GSA 17 Eledone 125


I have some duplicated row, as: 3 and 5.
I can create a column for F or T logic value when the row is duplicated:



df$dup <- duplicated(df[,2:7]) #No id! 


result:



id  Country   Year    Time.step    GSA.numb  Species   Quantity dup
1 ESP 1965 Month GSA 5 Mullus 500 FALSE
2 ESP 1965 Month GSA 5 Mullus 200 FALSE
3 ESP 1965 Month GSA 5 Mullus 200 TRUE
4 ITA 1965 Month GSA 17 Eledone 350 FALSE
5 ITA 1965 Month GSA 17 Eledone 350 TRUE
6 ITA 1965 Month GSA 17 Eledone 125 FALSE


Now, I would like a new column (in a dynamic way, my true df is very big, with many row, column and variable) where is possible to view the number of duplicated row when is TRUE, like this:



aspected.df

id Country Year Time.step GSA.numb Species Quantity dup ref
1 ESP 1965 Month GSA 5 Mullus 500 FALSE NA
2 ESP 1965 Month GSA 5 Mullus 200 FALSE NA
3 ESP 1965 Month GSA 5 Mullus 200 TRUE =id2
4 ITA 1965 Month GSA 17 Eledone 350 FALSE NA
5 ITA 1965 Month GSA 17 Eledone 350 TRUE =id4
6 ITA 1965 Month GSA 17 Eledone 125 FALSE NA


I tried with:



with(df, ave(as.character(Species), df[,2:6], FUN = make.unique)) 


but result is:



[1] "Mullus"    "Mullus.1"  "Mullus.2"  "Eledone"   "Eledone.1" "Eledone.2"


I think I need more code input . Which function are useful? (duplicated,make.unit, row.names and so on...)







r dataframe duplicates id rowname






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 20 '18 at 11:54









dmi3kno

1,785521




1,785521










asked Nov 20 '18 at 11:20









skylobo

766




766












  • Note that you have accepted an answer that doesn't correspond to your desired output; you may want to correct your output table.
    – arg0naut
    Nov 20 '18 at 12:37


















  • Note that you have accepted an answer that doesn't correspond to your desired output; you may want to correct your output table.
    – arg0naut
    Nov 20 '18 at 12:37
















Note that you have accepted an answer that doesn't correspond to your desired output; you may want to correct your output table.
– arg0naut
Nov 20 '18 at 12:37




Note that you have accepted an answer that doesn't correspond to your desired output; you may want to correct your output table.
– arg0naut
Nov 20 '18 at 12:37












4 Answers
4






active

oldest

votes


















4














A data.table approach, starting from the initial file:



library(data.table)

setDT(df)[, `:=` (dup = seq_len(.N) > 1, ref = paste0("id", first(id))),
by = .(Country, Year, Time.step, GSA.numb, Species, Quantity)][dup == FALSE, ref := NA]


Output:



   id Country Year Time.step GSA.numb Species Quantity   dup  ref
1: 1 ESP 1965 Month GSA5 Mullus 500 FALSE <NA>
2: 2 ESP 1965 Month GSA5 Mullus 200 FALSE <NA>
3: 3 ESP 1965 Month GSA5 Mullus 200 TRUE id2
4: 4 ITA 1965 Month GSA17 Eledone 350 FALSE <NA>
5: 5 ITA 1965 Month GSA17 Eledone 350 TRUE id4
6: 6 ITA 1965 Month GSA17 Eledone 125 FALSE <NA>


A tidyverse approach (with dup already created before):



library(tidyverse)

df %>%
group_by_at(vars(2:7)) %>%
mutate(ref = ifelse(dup, paste0("id", first(id)), NA_character_))


Output:



     id Country  Year Time.step GSA.numb Species Quantity dup   ref  
<int> <chr> <int> <chr> <chr> <chr> <int> <lgl> <chr>
1 1 ESP 1965 Month GSA5 Mullus 500 FALSE NA
2 2 ESP 1965 Month GSA5 Mullus 200 FALSE NA
3 3 ESP 1965 Month GSA5 Mullus 200 TRUE id2
4 4 ITA 1965 Month GSA17 Eledone 350 FALSE NA
5 5 ITA 1965 Month GSA17 Eledone 350 TRUE id4
6 6 ITA 1965 Month GSA17 Eledone 125 FALSE NA


If you'd like to create the dup column within the statement:



df %>% 
group_by_at(vars(2:7)) %>%
mutate(
dup = row_number() > 1,
ref = ifelse(dup, paste0("id", first(id)), NA_character_))


Output:



     id Country  Year Time.step GSA.numb Species Quantity dup   ref  
<int> <chr> <int> <chr> <chr> <chr> <int> <lgl> <chr>
1 1 ESP 1965 Month GSA5 Mullus 500 FALSE NA
2 2 ESP 1965 Month GSA5 Mullus 200 FALSE NA
3 3 ESP 1965 Month GSA5 Mullus 200 TRUE id2
4 4 ITA 1965 Month GSA17 Eledone 350 FALSE NA
5 5 ITA 1965 Month GSA17 Eledone 350 TRUE id4
6 6 ITA 1965 Month GSA17 Eledone 125 FALSE NA





share|improve this answer































    2














    You could use tidyverse functions to quickly id the duplicates



    df$dup <- duplicated(df[,2:7]) #No id! 

    library(tidyverse)

    df %>%
    group_by(dup) %>%
    mutate(ref=ifelse(dup, paste0("id",1:n()), NA_character_))

    #> # A tibble: 6 x 9
    #> # Groups: dup [2]
    #> id Country Year Time.step GSA.numb Species Quantity dup ref
    #> <int> <chr> <int> <chr> <chr> <chr> <int> <lgl> <chr>
    #> 1 1 ESP 1965 Month GSA 5 Mullus 500 FALSE NA
    #> 2 2 ESP 1965 Month GSA 5 Mullus 200 FALSE NA
    #> 3 3 ESP 1965 Month GSA 5 Mullus 200 TRUE id1
    #> 4 4 ITA 1965 Month GSA 17 Eledone 350 FALSE NA
    #> 5 5 ITA 1965 Month GSA 17 Eledone 350 TRUE id2
    #> 6 6 ITA 1965 Month GSA 17 Eledone 125 FALSE NA





    share|improve this answer





















    • Noticed the id's are the result of the dup loop, not the matched row
      – Evan Friedland
      Nov 20 '18 at 12:10










    • Thank you...question: the output is a list? How can view only ref column? sorry but I don't now 'mutate' function
      – skylobo
      Nov 20 '18 at 12:44



















    0














    This example uses base R and matches the found duplicates with the original value. It's helpful if you have multiple duplicates for a single row as well.



    example data (used dput(control = NULL) so the characters/factors were converted to numeric)



    df <- data.frame(id = c(1, 1, 1, 2, 2, 2), 
    Country = c(1965, 1965, 1965, 1965, 1965, 1965),
    Year = c(1, 1, 1, 1, 1, 1),
    Time.step = c(1, 1, 1, 1, 1, 1),
    GSA.numb = c(5, 5, 5, 17, 17, 17),
    Species = c(2, 2, 2, 1, 1, 1), Quantity = c(500, 200, 200, 350, 350, 125))


    The code is vectorized so, despite the external loop, it should run fairly quickly on your large dataframe.



    df$dup <- duplicated(df)
    dupes <- df[df$dup,]
    df$ref <- NA # initialize
    for(i in 1:nrow(dupes)){
    z=which(df[,1] == dupes[i,1]&
    df[,2] == dupes[i,2]&
    df[,3] == dupes[i,3]&
    df[,4] == dupes[i,4]&
    df[,5] == dupes[i,5]&
    df[,6] == dupes[i,6]&
    df[,7] == dupes[i,7]) # make sure not to include that $dup column!
    df$ref[z[-1]] <- paste0("=id",min(z))
    }
    df
    # id Country Year Time.step GSA.numb Species Quantity dup ref
    #1 1 1965 1 1 5 2 500 FALSE <NA>
    #2 1 1965 1 1 5 2 200 FALSE <NA>
    #3 1 1965 1 1 5 2 200 TRUE =id2
    #4 2 1965 1 1 17 1 350 FALSE <NA>
    #5 2 1965 1 1 17 1 350 TRUE =id4
    #6 2 1965 1 1 17 1 125 FALSE <NA>


    Even though you could tighten this up with apply functions, this will run quicker.






    share|improve this answer





























      0














      Using tidyverse:



      df %>%
      group_by_at(vars(-id)) %>% #Group by all variables except of id
      mutate(n = n(), #Identifying the duplicate rows
      dup = ifelse(seq_along(n) > 1, TRUE, FALSE), #Coding the first unique row as TRUE and others as FALSE
      ref = ifelse(dup == TRUE, paste0("=id", first(id[dup == FALSE])), NA_character_)) %>% #Pasting the id of the first unique row
      select(-n)

      id Country Year Time.step GSA.numb Species Quantity dup ref
      <int> <chr> <int> <chr> <chr> <chr> <int> <lgl> <chr>
      1 1 ESP 1965 Month GSA 5 Mullus 500 FALSE <NA>
      2 2 ESP 1965 Month GSA 5 Mullus 200 FALSE <NA>
      3 3 ESP 1965 Month GSA 5 Mullus 200 TRUE =id2
      4 4 ITA 1965 Month GSA 17 Eledone 350 FALSE <NA>
      5 5 ITA 1965 Month GSA 17 Eledone 350 TRUE =id4
      6 6 ITA 1965 Month GSA 17 Eledone 125 FALSE <NA>





      share|improve this answer





















        Your Answer






        StackExchange.ifUsing("editor", function () {
        StackExchange.using("externalEditor", function () {
        StackExchange.using("snippets", function () {
        StackExchange.snippets.init();
        });
        });
        }, "code-snippets");

        StackExchange.ready(function() {
        var channelOptions = {
        tags: "".split(" "),
        id: "1"
        };
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function() {
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled) {
        StackExchange.using("snippets", function() {
        createEditor();
        });
        }
        else {
        createEditor();
        }
        });

        function createEditor() {
        StackExchange.prepareEditor({
        heartbeatType: 'answer',
        autoActivateHeartbeat: false,
        convertImagesToLinks: true,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: 10,
        bindNavPrevention: true,
        postfix: "",
        imageUploader: {
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        },
        onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        });


        }
        });














        draft saved

        draft discarded


















        StackExchange.ready(
        function () {
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53391887%2fhow-to-assign-a-unique-code-for-duplicate-rows-in-this-df-in-r%23new-answer', 'question_page');
        }
        );

        Post as a guest















        Required, but never shown

























        4 Answers
        4






        active

        oldest

        votes








        4 Answers
        4






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes









        4














        A data.table approach, starting from the initial file:



        library(data.table)

        setDT(df)[, `:=` (dup = seq_len(.N) > 1, ref = paste0("id", first(id))),
        by = .(Country, Year, Time.step, GSA.numb, Species, Quantity)][dup == FALSE, ref := NA]


        Output:



           id Country Year Time.step GSA.numb Species Quantity   dup  ref
        1: 1 ESP 1965 Month GSA5 Mullus 500 FALSE <NA>
        2: 2 ESP 1965 Month GSA5 Mullus 200 FALSE <NA>
        3: 3 ESP 1965 Month GSA5 Mullus 200 TRUE id2
        4: 4 ITA 1965 Month GSA17 Eledone 350 FALSE <NA>
        5: 5 ITA 1965 Month GSA17 Eledone 350 TRUE id4
        6: 6 ITA 1965 Month GSA17 Eledone 125 FALSE <NA>


        A tidyverse approach (with dup already created before):



        library(tidyverse)

        df %>%
        group_by_at(vars(2:7)) %>%
        mutate(ref = ifelse(dup, paste0("id", first(id)), NA_character_))


        Output:



             id Country  Year Time.step GSA.numb Species Quantity dup   ref  
        <int> <chr> <int> <chr> <chr> <chr> <int> <lgl> <chr>
        1 1 ESP 1965 Month GSA5 Mullus 500 FALSE NA
        2 2 ESP 1965 Month GSA5 Mullus 200 FALSE NA
        3 3 ESP 1965 Month GSA5 Mullus 200 TRUE id2
        4 4 ITA 1965 Month GSA17 Eledone 350 FALSE NA
        5 5 ITA 1965 Month GSA17 Eledone 350 TRUE id4
        6 6 ITA 1965 Month GSA17 Eledone 125 FALSE NA


        If you'd like to create the dup column within the statement:



        df %>% 
        group_by_at(vars(2:7)) %>%
        mutate(
        dup = row_number() > 1,
        ref = ifelse(dup, paste0("id", first(id)), NA_character_))


        Output:



             id Country  Year Time.step GSA.numb Species Quantity dup   ref  
        <int> <chr> <int> <chr> <chr> <chr> <int> <lgl> <chr>
        1 1 ESP 1965 Month GSA5 Mullus 500 FALSE NA
        2 2 ESP 1965 Month GSA5 Mullus 200 FALSE NA
        3 3 ESP 1965 Month GSA5 Mullus 200 TRUE id2
        4 4 ITA 1965 Month GSA17 Eledone 350 FALSE NA
        5 5 ITA 1965 Month GSA17 Eledone 350 TRUE id4
        6 6 ITA 1965 Month GSA17 Eledone 125 FALSE NA





        share|improve this answer




























          4














          A data.table approach, starting from the initial file:



          library(data.table)

          setDT(df)[, `:=` (dup = seq_len(.N) > 1, ref = paste0("id", first(id))),
          by = .(Country, Year, Time.step, GSA.numb, Species, Quantity)][dup == FALSE, ref := NA]


          Output:



             id Country Year Time.step GSA.numb Species Quantity   dup  ref
          1: 1 ESP 1965 Month GSA5 Mullus 500 FALSE <NA>
          2: 2 ESP 1965 Month GSA5 Mullus 200 FALSE <NA>
          3: 3 ESP 1965 Month GSA5 Mullus 200 TRUE id2
          4: 4 ITA 1965 Month GSA17 Eledone 350 FALSE <NA>
          5: 5 ITA 1965 Month GSA17 Eledone 350 TRUE id4
          6: 6 ITA 1965 Month GSA17 Eledone 125 FALSE <NA>


          A tidyverse approach (with dup already created before):



          library(tidyverse)

          df %>%
          group_by_at(vars(2:7)) %>%
          mutate(ref = ifelse(dup, paste0("id", first(id)), NA_character_))


          Output:



               id Country  Year Time.step GSA.numb Species Quantity dup   ref  
          <int> <chr> <int> <chr> <chr> <chr> <int> <lgl> <chr>
          1 1 ESP 1965 Month GSA5 Mullus 500 FALSE NA
          2 2 ESP 1965 Month GSA5 Mullus 200 FALSE NA
          3 3 ESP 1965 Month GSA5 Mullus 200 TRUE id2
          4 4 ITA 1965 Month GSA17 Eledone 350 FALSE NA
          5 5 ITA 1965 Month GSA17 Eledone 350 TRUE id4
          6 6 ITA 1965 Month GSA17 Eledone 125 FALSE NA


          If you'd like to create the dup column within the statement:



          df %>% 
          group_by_at(vars(2:7)) %>%
          mutate(
          dup = row_number() > 1,
          ref = ifelse(dup, paste0("id", first(id)), NA_character_))


          Output:



               id Country  Year Time.step GSA.numb Species Quantity dup   ref  
          <int> <chr> <int> <chr> <chr> <chr> <int> <lgl> <chr>
          1 1 ESP 1965 Month GSA5 Mullus 500 FALSE NA
          2 2 ESP 1965 Month GSA5 Mullus 200 FALSE NA
          3 3 ESP 1965 Month GSA5 Mullus 200 TRUE id2
          4 4 ITA 1965 Month GSA17 Eledone 350 FALSE NA
          5 5 ITA 1965 Month GSA17 Eledone 350 TRUE id4
          6 6 ITA 1965 Month GSA17 Eledone 125 FALSE NA





          share|improve this answer


























            4












            4








            4






            A data.table approach, starting from the initial file:



            library(data.table)

            setDT(df)[, `:=` (dup = seq_len(.N) > 1, ref = paste0("id", first(id))),
            by = .(Country, Year, Time.step, GSA.numb, Species, Quantity)][dup == FALSE, ref := NA]


            Output:



               id Country Year Time.step GSA.numb Species Quantity   dup  ref
            1: 1 ESP 1965 Month GSA5 Mullus 500 FALSE <NA>
            2: 2 ESP 1965 Month GSA5 Mullus 200 FALSE <NA>
            3: 3 ESP 1965 Month GSA5 Mullus 200 TRUE id2
            4: 4 ITA 1965 Month GSA17 Eledone 350 FALSE <NA>
            5: 5 ITA 1965 Month GSA17 Eledone 350 TRUE id4
            6: 6 ITA 1965 Month GSA17 Eledone 125 FALSE <NA>


            A tidyverse approach (with dup already created before):



            library(tidyverse)

            df %>%
            group_by_at(vars(2:7)) %>%
            mutate(ref = ifelse(dup, paste0("id", first(id)), NA_character_))


            Output:



                 id Country  Year Time.step GSA.numb Species Quantity dup   ref  
            <int> <chr> <int> <chr> <chr> <chr> <int> <lgl> <chr>
            1 1 ESP 1965 Month GSA5 Mullus 500 FALSE NA
            2 2 ESP 1965 Month GSA5 Mullus 200 FALSE NA
            3 3 ESP 1965 Month GSA5 Mullus 200 TRUE id2
            4 4 ITA 1965 Month GSA17 Eledone 350 FALSE NA
            5 5 ITA 1965 Month GSA17 Eledone 350 TRUE id4
            6 6 ITA 1965 Month GSA17 Eledone 125 FALSE NA


            If you'd like to create the dup column within the statement:



            df %>% 
            group_by_at(vars(2:7)) %>%
            mutate(
            dup = row_number() > 1,
            ref = ifelse(dup, paste0("id", first(id)), NA_character_))


            Output:



                 id Country  Year Time.step GSA.numb Species Quantity dup   ref  
            <int> <chr> <int> <chr> <chr> <chr> <int> <lgl> <chr>
            1 1 ESP 1965 Month GSA5 Mullus 500 FALSE NA
            2 2 ESP 1965 Month GSA5 Mullus 200 FALSE NA
            3 3 ESP 1965 Month GSA5 Mullus 200 TRUE id2
            4 4 ITA 1965 Month GSA17 Eledone 350 FALSE NA
            5 5 ITA 1965 Month GSA17 Eledone 350 TRUE id4
            6 6 ITA 1965 Month GSA17 Eledone 125 FALSE NA





            share|improve this answer














            A data.table approach, starting from the initial file:



            library(data.table)

            setDT(df)[, `:=` (dup = seq_len(.N) > 1, ref = paste0("id", first(id))),
            by = .(Country, Year, Time.step, GSA.numb, Species, Quantity)][dup == FALSE, ref := NA]


            Output:



               id Country Year Time.step GSA.numb Species Quantity   dup  ref
            1: 1 ESP 1965 Month GSA5 Mullus 500 FALSE <NA>
            2: 2 ESP 1965 Month GSA5 Mullus 200 FALSE <NA>
            3: 3 ESP 1965 Month GSA5 Mullus 200 TRUE id2
            4: 4 ITA 1965 Month GSA17 Eledone 350 FALSE <NA>
            5: 5 ITA 1965 Month GSA17 Eledone 350 TRUE id4
            6: 6 ITA 1965 Month GSA17 Eledone 125 FALSE <NA>


            A tidyverse approach (with dup already created before):



            library(tidyverse)

            df %>%
            group_by_at(vars(2:7)) %>%
            mutate(ref = ifelse(dup, paste0("id", first(id)), NA_character_))


            Output:



                 id Country  Year Time.step GSA.numb Species Quantity dup   ref  
            <int> <chr> <int> <chr> <chr> <chr> <int> <lgl> <chr>
            1 1 ESP 1965 Month GSA5 Mullus 500 FALSE NA
            2 2 ESP 1965 Month GSA5 Mullus 200 FALSE NA
            3 3 ESP 1965 Month GSA5 Mullus 200 TRUE id2
            4 4 ITA 1965 Month GSA17 Eledone 350 FALSE NA
            5 5 ITA 1965 Month GSA17 Eledone 350 TRUE id4
            6 6 ITA 1965 Month GSA17 Eledone 125 FALSE NA


            If you'd like to create the dup column within the statement:



            df %>% 
            group_by_at(vars(2:7)) %>%
            mutate(
            dup = row_number() > 1,
            ref = ifelse(dup, paste0("id", first(id)), NA_character_))


            Output:



                 id Country  Year Time.step GSA.numb Species Quantity dup   ref  
            <int> <chr> <int> <chr> <chr> <chr> <int> <lgl> <chr>
            1 1 ESP 1965 Month GSA5 Mullus 500 FALSE NA
            2 2 ESP 1965 Month GSA5 Mullus 200 FALSE NA
            3 3 ESP 1965 Month GSA5 Mullus 200 TRUE id2
            4 4 ITA 1965 Month GSA17 Eledone 350 FALSE NA
            5 5 ITA 1965 Month GSA17 Eledone 350 TRUE id4
            6 6 ITA 1965 Month GSA17 Eledone 125 FALSE NA






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Nov 20 '18 at 12:34

























            answered Nov 20 '18 at 12:26









            arg0naut

            2,077314




            2,077314

























                2














                You could use tidyverse functions to quickly id the duplicates



                df$dup <- duplicated(df[,2:7]) #No id! 

                library(tidyverse)

                df %>%
                group_by(dup) %>%
                mutate(ref=ifelse(dup, paste0("id",1:n()), NA_character_))

                #> # A tibble: 6 x 9
                #> # Groups: dup [2]
                #> id Country Year Time.step GSA.numb Species Quantity dup ref
                #> <int> <chr> <int> <chr> <chr> <chr> <int> <lgl> <chr>
                #> 1 1 ESP 1965 Month GSA 5 Mullus 500 FALSE NA
                #> 2 2 ESP 1965 Month GSA 5 Mullus 200 FALSE NA
                #> 3 3 ESP 1965 Month GSA 5 Mullus 200 TRUE id1
                #> 4 4 ITA 1965 Month GSA 17 Eledone 350 FALSE NA
                #> 5 5 ITA 1965 Month GSA 17 Eledone 350 TRUE id2
                #> 6 6 ITA 1965 Month GSA 17 Eledone 125 FALSE NA





                share|improve this answer





















                • Noticed the id's are the result of the dup loop, not the matched row
                  – Evan Friedland
                  Nov 20 '18 at 12:10










                • Thank you...question: the output is a list? How can view only ref column? sorry but I don't now 'mutate' function
                  – skylobo
                  Nov 20 '18 at 12:44
















                2














                You could use tidyverse functions to quickly id the duplicates



                df$dup <- duplicated(df[,2:7]) #No id! 

                library(tidyverse)

                df %>%
                group_by(dup) %>%
                mutate(ref=ifelse(dup, paste0("id",1:n()), NA_character_))

                #> # A tibble: 6 x 9
                #> # Groups: dup [2]
                #> id Country Year Time.step GSA.numb Species Quantity dup ref
                #> <int> <chr> <int> <chr> <chr> <chr> <int> <lgl> <chr>
                #> 1 1 ESP 1965 Month GSA 5 Mullus 500 FALSE NA
                #> 2 2 ESP 1965 Month GSA 5 Mullus 200 FALSE NA
                #> 3 3 ESP 1965 Month GSA 5 Mullus 200 TRUE id1
                #> 4 4 ITA 1965 Month GSA 17 Eledone 350 FALSE NA
                #> 5 5 ITA 1965 Month GSA 17 Eledone 350 TRUE id2
                #> 6 6 ITA 1965 Month GSA 17 Eledone 125 FALSE NA





                share|improve this answer





















                • Noticed the id's are the result of the dup loop, not the matched row
                  – Evan Friedland
                  Nov 20 '18 at 12:10










                • Thank you...question: the output is a list? How can view only ref column? sorry but I don't now 'mutate' function
                  – skylobo
                  Nov 20 '18 at 12:44














                2












                2








                2






                You could use tidyverse functions to quickly id the duplicates



                df$dup <- duplicated(df[,2:7]) #No id! 

                library(tidyverse)

                df %>%
                group_by(dup) %>%
                mutate(ref=ifelse(dup, paste0("id",1:n()), NA_character_))

                #> # A tibble: 6 x 9
                #> # Groups: dup [2]
                #> id Country Year Time.step GSA.numb Species Quantity dup ref
                #> <int> <chr> <int> <chr> <chr> <chr> <int> <lgl> <chr>
                #> 1 1 ESP 1965 Month GSA 5 Mullus 500 FALSE NA
                #> 2 2 ESP 1965 Month GSA 5 Mullus 200 FALSE NA
                #> 3 3 ESP 1965 Month GSA 5 Mullus 200 TRUE id1
                #> 4 4 ITA 1965 Month GSA 17 Eledone 350 FALSE NA
                #> 5 5 ITA 1965 Month GSA 17 Eledone 350 TRUE id2
                #> 6 6 ITA 1965 Month GSA 17 Eledone 125 FALSE NA





                share|improve this answer












                You could use tidyverse functions to quickly id the duplicates



                df$dup <- duplicated(df[,2:7]) #No id! 

                library(tidyverse)

                df %>%
                group_by(dup) %>%
                mutate(ref=ifelse(dup, paste0("id",1:n()), NA_character_))

                #> # A tibble: 6 x 9
                #> # Groups: dup [2]
                #> id Country Year Time.step GSA.numb Species Quantity dup ref
                #> <int> <chr> <int> <chr> <chr> <chr> <int> <lgl> <chr>
                #> 1 1 ESP 1965 Month GSA 5 Mullus 500 FALSE NA
                #> 2 2 ESP 1965 Month GSA 5 Mullus 200 FALSE NA
                #> 3 3 ESP 1965 Month GSA 5 Mullus 200 TRUE id1
                #> 4 4 ITA 1965 Month GSA 17 Eledone 350 FALSE NA
                #> 5 5 ITA 1965 Month GSA 17 Eledone 350 TRUE id2
                #> 6 6 ITA 1965 Month GSA 17 Eledone 125 FALSE NA






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 20 '18 at 11:35









                dmi3kno

                1,785521




                1,785521












                • Noticed the id's are the result of the dup loop, not the matched row
                  – Evan Friedland
                  Nov 20 '18 at 12:10










                • Thank you...question: the output is a list? How can view only ref column? sorry but I don't now 'mutate' function
                  – skylobo
                  Nov 20 '18 at 12:44


















                • Noticed the id's are the result of the dup loop, not the matched row
                  – Evan Friedland
                  Nov 20 '18 at 12:10










                • Thank you...question: the output is a list? How can view only ref column? sorry but I don't now 'mutate' function
                  – skylobo
                  Nov 20 '18 at 12:44
















                Noticed the id's are the result of the dup loop, not the matched row
                – Evan Friedland
                Nov 20 '18 at 12:10




                Noticed the id's are the result of the dup loop, not the matched row
                – Evan Friedland
                Nov 20 '18 at 12:10












                Thank you...question: the output is a list? How can view only ref column? sorry but I don't now 'mutate' function
                – skylobo
                Nov 20 '18 at 12:44




                Thank you...question: the output is a list? How can view only ref column? sorry but I don't now 'mutate' function
                – skylobo
                Nov 20 '18 at 12:44











                0














                This example uses base R and matches the found duplicates with the original value. It's helpful if you have multiple duplicates for a single row as well.



                example data (used dput(control = NULL) so the characters/factors were converted to numeric)



                df <- data.frame(id = c(1, 1, 1, 2, 2, 2), 
                Country = c(1965, 1965, 1965, 1965, 1965, 1965),
                Year = c(1, 1, 1, 1, 1, 1),
                Time.step = c(1, 1, 1, 1, 1, 1),
                GSA.numb = c(5, 5, 5, 17, 17, 17),
                Species = c(2, 2, 2, 1, 1, 1), Quantity = c(500, 200, 200, 350, 350, 125))


                The code is vectorized so, despite the external loop, it should run fairly quickly on your large dataframe.



                df$dup <- duplicated(df)
                dupes <- df[df$dup,]
                df$ref <- NA # initialize
                for(i in 1:nrow(dupes)){
                z=which(df[,1] == dupes[i,1]&
                df[,2] == dupes[i,2]&
                df[,3] == dupes[i,3]&
                df[,4] == dupes[i,4]&
                df[,5] == dupes[i,5]&
                df[,6] == dupes[i,6]&
                df[,7] == dupes[i,7]) # make sure not to include that $dup column!
                df$ref[z[-1]] <- paste0("=id",min(z))
                }
                df
                # id Country Year Time.step GSA.numb Species Quantity dup ref
                #1 1 1965 1 1 5 2 500 FALSE <NA>
                #2 1 1965 1 1 5 2 200 FALSE <NA>
                #3 1 1965 1 1 5 2 200 TRUE =id2
                #4 2 1965 1 1 17 1 350 FALSE <NA>
                #5 2 1965 1 1 17 1 350 TRUE =id4
                #6 2 1965 1 1 17 1 125 FALSE <NA>


                Even though you could tighten this up with apply functions, this will run quicker.






                share|improve this answer


























                  0














                  This example uses base R and matches the found duplicates with the original value. It's helpful if you have multiple duplicates for a single row as well.



                  example data (used dput(control = NULL) so the characters/factors were converted to numeric)



                  df <- data.frame(id = c(1, 1, 1, 2, 2, 2), 
                  Country = c(1965, 1965, 1965, 1965, 1965, 1965),
                  Year = c(1, 1, 1, 1, 1, 1),
                  Time.step = c(1, 1, 1, 1, 1, 1),
                  GSA.numb = c(5, 5, 5, 17, 17, 17),
                  Species = c(2, 2, 2, 1, 1, 1), Quantity = c(500, 200, 200, 350, 350, 125))


                  The code is vectorized so, despite the external loop, it should run fairly quickly on your large dataframe.



                  df$dup <- duplicated(df)
                  dupes <- df[df$dup,]
                  df$ref <- NA # initialize
                  for(i in 1:nrow(dupes)){
                  z=which(df[,1] == dupes[i,1]&
                  df[,2] == dupes[i,2]&
                  df[,3] == dupes[i,3]&
                  df[,4] == dupes[i,4]&
                  df[,5] == dupes[i,5]&
                  df[,6] == dupes[i,6]&
                  df[,7] == dupes[i,7]) # make sure not to include that $dup column!
                  df$ref[z[-1]] <- paste0("=id",min(z))
                  }
                  df
                  # id Country Year Time.step GSA.numb Species Quantity dup ref
                  #1 1 1965 1 1 5 2 500 FALSE <NA>
                  #2 1 1965 1 1 5 2 200 FALSE <NA>
                  #3 1 1965 1 1 5 2 200 TRUE =id2
                  #4 2 1965 1 1 17 1 350 FALSE <NA>
                  #5 2 1965 1 1 17 1 350 TRUE =id4
                  #6 2 1965 1 1 17 1 125 FALSE <NA>


                  Even though you could tighten this up with apply functions, this will run quicker.






                  share|improve this answer
























                    0












                    0








                    0






                    This example uses base R and matches the found duplicates with the original value. It's helpful if you have multiple duplicates for a single row as well.



                    example data (used dput(control = NULL) so the characters/factors were converted to numeric)



                    df <- data.frame(id = c(1, 1, 1, 2, 2, 2), 
                    Country = c(1965, 1965, 1965, 1965, 1965, 1965),
                    Year = c(1, 1, 1, 1, 1, 1),
                    Time.step = c(1, 1, 1, 1, 1, 1),
                    GSA.numb = c(5, 5, 5, 17, 17, 17),
                    Species = c(2, 2, 2, 1, 1, 1), Quantity = c(500, 200, 200, 350, 350, 125))


                    The code is vectorized so, despite the external loop, it should run fairly quickly on your large dataframe.



                    df$dup <- duplicated(df)
                    dupes <- df[df$dup,]
                    df$ref <- NA # initialize
                    for(i in 1:nrow(dupes)){
                    z=which(df[,1] == dupes[i,1]&
                    df[,2] == dupes[i,2]&
                    df[,3] == dupes[i,3]&
                    df[,4] == dupes[i,4]&
                    df[,5] == dupes[i,5]&
                    df[,6] == dupes[i,6]&
                    df[,7] == dupes[i,7]) # make sure not to include that $dup column!
                    df$ref[z[-1]] <- paste0("=id",min(z))
                    }
                    df
                    # id Country Year Time.step GSA.numb Species Quantity dup ref
                    #1 1 1965 1 1 5 2 500 FALSE <NA>
                    #2 1 1965 1 1 5 2 200 FALSE <NA>
                    #3 1 1965 1 1 5 2 200 TRUE =id2
                    #4 2 1965 1 1 17 1 350 FALSE <NA>
                    #5 2 1965 1 1 17 1 350 TRUE =id4
                    #6 2 1965 1 1 17 1 125 FALSE <NA>


                    Even though you could tighten this up with apply functions, this will run quicker.






                    share|improve this answer












                    This example uses base R and matches the found duplicates with the original value. It's helpful if you have multiple duplicates for a single row as well.



                    example data (used dput(control = NULL) so the characters/factors were converted to numeric)



                    df <- data.frame(id = c(1, 1, 1, 2, 2, 2), 
                    Country = c(1965, 1965, 1965, 1965, 1965, 1965),
                    Year = c(1, 1, 1, 1, 1, 1),
                    Time.step = c(1, 1, 1, 1, 1, 1),
                    GSA.numb = c(5, 5, 5, 17, 17, 17),
                    Species = c(2, 2, 2, 1, 1, 1), Quantity = c(500, 200, 200, 350, 350, 125))


                    The code is vectorized so, despite the external loop, it should run fairly quickly on your large dataframe.



                    df$dup <- duplicated(df)
                    dupes <- df[df$dup,]
                    df$ref <- NA # initialize
                    for(i in 1:nrow(dupes)){
                    z=which(df[,1] == dupes[i,1]&
                    df[,2] == dupes[i,2]&
                    df[,3] == dupes[i,3]&
                    df[,4] == dupes[i,4]&
                    df[,5] == dupes[i,5]&
                    df[,6] == dupes[i,6]&
                    df[,7] == dupes[i,7]) # make sure not to include that $dup column!
                    df$ref[z[-1]] <- paste0("=id",min(z))
                    }
                    df
                    # id Country Year Time.step GSA.numb Species Quantity dup ref
                    #1 1 1965 1 1 5 2 500 FALSE <NA>
                    #2 1 1965 1 1 5 2 200 FALSE <NA>
                    #3 1 1965 1 1 5 2 200 TRUE =id2
                    #4 2 1965 1 1 17 1 350 FALSE <NA>
                    #5 2 1965 1 1 17 1 350 TRUE =id4
                    #6 2 1965 1 1 17 1 125 FALSE <NA>


                    Even though you could tighten this up with apply functions, this will run quicker.







                    share|improve this answer












                    share|improve this answer



                    share|improve this answer










                    answered Nov 20 '18 at 12:08









                    Evan Friedland

                    2,0991621




                    2,0991621























                        0














                        Using tidyverse:



                        df %>%
                        group_by_at(vars(-id)) %>% #Group by all variables except of id
                        mutate(n = n(), #Identifying the duplicate rows
                        dup = ifelse(seq_along(n) > 1, TRUE, FALSE), #Coding the first unique row as TRUE and others as FALSE
                        ref = ifelse(dup == TRUE, paste0("=id", first(id[dup == FALSE])), NA_character_)) %>% #Pasting the id of the first unique row
                        select(-n)

                        id Country Year Time.step GSA.numb Species Quantity dup ref
                        <int> <chr> <int> <chr> <chr> <chr> <int> <lgl> <chr>
                        1 1 ESP 1965 Month GSA 5 Mullus 500 FALSE <NA>
                        2 2 ESP 1965 Month GSA 5 Mullus 200 FALSE <NA>
                        3 3 ESP 1965 Month GSA 5 Mullus 200 TRUE =id2
                        4 4 ITA 1965 Month GSA 17 Eledone 350 FALSE <NA>
                        5 5 ITA 1965 Month GSA 17 Eledone 350 TRUE =id4
                        6 6 ITA 1965 Month GSA 17 Eledone 125 FALSE <NA>





                        share|improve this answer


























                          0














                          Using tidyverse:



                          df %>%
                          group_by_at(vars(-id)) %>% #Group by all variables except of id
                          mutate(n = n(), #Identifying the duplicate rows
                          dup = ifelse(seq_along(n) > 1, TRUE, FALSE), #Coding the first unique row as TRUE and others as FALSE
                          ref = ifelse(dup == TRUE, paste0("=id", first(id[dup == FALSE])), NA_character_)) %>% #Pasting the id of the first unique row
                          select(-n)

                          id Country Year Time.step GSA.numb Species Quantity dup ref
                          <int> <chr> <int> <chr> <chr> <chr> <int> <lgl> <chr>
                          1 1 ESP 1965 Month GSA 5 Mullus 500 FALSE <NA>
                          2 2 ESP 1965 Month GSA 5 Mullus 200 FALSE <NA>
                          3 3 ESP 1965 Month GSA 5 Mullus 200 TRUE =id2
                          4 4 ITA 1965 Month GSA 17 Eledone 350 FALSE <NA>
                          5 5 ITA 1965 Month GSA 17 Eledone 350 TRUE =id4
                          6 6 ITA 1965 Month GSA 17 Eledone 125 FALSE <NA>





                          share|improve this answer
























                            0












                            0








                            0






                            Using tidyverse:



                            df %>%
                            group_by_at(vars(-id)) %>% #Group by all variables except of id
                            mutate(n = n(), #Identifying the duplicate rows
                            dup = ifelse(seq_along(n) > 1, TRUE, FALSE), #Coding the first unique row as TRUE and others as FALSE
                            ref = ifelse(dup == TRUE, paste0("=id", first(id[dup == FALSE])), NA_character_)) %>% #Pasting the id of the first unique row
                            select(-n)

                            id Country Year Time.step GSA.numb Species Quantity dup ref
                            <int> <chr> <int> <chr> <chr> <chr> <int> <lgl> <chr>
                            1 1 ESP 1965 Month GSA 5 Mullus 500 FALSE <NA>
                            2 2 ESP 1965 Month GSA 5 Mullus 200 FALSE <NA>
                            3 3 ESP 1965 Month GSA 5 Mullus 200 TRUE =id2
                            4 4 ITA 1965 Month GSA 17 Eledone 350 FALSE <NA>
                            5 5 ITA 1965 Month GSA 17 Eledone 350 TRUE =id4
                            6 6 ITA 1965 Month GSA 17 Eledone 125 FALSE <NA>





                            share|improve this answer












                            Using tidyverse:



                            df %>%
                            group_by_at(vars(-id)) %>% #Group by all variables except of id
                            mutate(n = n(), #Identifying the duplicate rows
                            dup = ifelse(seq_along(n) > 1, TRUE, FALSE), #Coding the first unique row as TRUE and others as FALSE
                            ref = ifelse(dup == TRUE, paste0("=id", first(id[dup == FALSE])), NA_character_)) %>% #Pasting the id of the first unique row
                            select(-n)

                            id Country Year Time.step GSA.numb Species Quantity dup ref
                            <int> <chr> <int> <chr> <chr> <chr> <int> <lgl> <chr>
                            1 1 ESP 1965 Month GSA 5 Mullus 500 FALSE <NA>
                            2 2 ESP 1965 Month GSA 5 Mullus 200 FALSE <NA>
                            3 3 ESP 1965 Month GSA 5 Mullus 200 TRUE =id2
                            4 4 ITA 1965 Month GSA 17 Eledone 350 FALSE <NA>
                            5 5 ITA 1965 Month GSA 17 Eledone 350 TRUE =id4
                            6 6 ITA 1965 Month GSA 17 Eledone 125 FALSE <NA>






                            share|improve this answer












                            share|improve this answer



                            share|improve this answer










                            answered Nov 20 '18 at 12:17









                            tmfmnk

                            1,8901412




                            1,8901412






























                                draft saved

                                draft discarded




















































                                Thanks for contributing an answer to Stack Overflow!


                                • Please be sure to answer the question. Provide details and share your research!

                                But avoid



                                • Asking for help, clarification, or responding to other answers.

                                • Making statements based on opinion; back them up with references or personal experience.


                                To learn more, see our tips on writing great answers.





                                Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                                Please pay close attention to the following guidance:


                                • Please be sure to answer the question. Provide details and share your research!

                                But avoid



                                • Asking for help, clarification, or responding to other answers.

                                • Making statements based on opinion; back them up with references or personal experience.


                                To learn more, see our tips on writing great answers.




                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function () {
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53391887%2fhow-to-assign-a-unique-code-for-duplicate-rows-in-this-df-in-r%23new-answer', 'question_page');
                                }
                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown







                                Popular posts from this blog

                                "Incorrect syntax near the keyword 'ON'. (on update cascade, on delete cascade,)

                                Alcedinidae

                                Origin of the phrase “under your belt”?