How to assign a unique code for duplicate rows in this 'df' in R?

I have this data frame df

df <- data.frame(stringsAsFactors=FALSE,

          id = c(1L, 2L, 3L, 4L, 5L, 6L),

     Country = c("ESP", "ESP", "ESP", "ITA", "ITA", "ITA"),

        Year = c(1965L, 1965L, 1965L, 1965L, 1965L, 1965L),

   Time.step = c("Month", "Month", "Month", "Month", "Month", "Month"),

    GSA.numb = c("GSA 5", "GSA 5", "GSA 5", "GSA 17", "GSA 17", "GSA 17"),

     Species = c("Mullus", "Mullus", "Mullus", "Eledone", "Eledone", "Eledone"),

    Quantity = c(500L, 200L, 200L, 350L, 350L, 125L)

                )



df



   id  Country   Year    Time.step    GSA.numb  Species   Quantity

    1    ESP     1965     Month       GSA 5      Mullus     500   

    2    ESP     1965     Month       GSA 5      Mullus     200  

    3    ESP     1965     Month       GSA 5      Mullus     200 

    4    ITA     1965     Month       GSA 17     Eledone    350

    5    ITA     1965     Month       GSA 17     Eledone    350 

    6    ITA     1965     Month       GSA 17     Eledone    125

I have some duplicated row, as: 3 and 5.
I can create a column for F or T logic value when the row is duplicated:

df$dup <- duplicated(df[,2:7]) #No id!

result:

id  Country   Year    Time.step    GSA.numb  Species   Quantity dup

 1    ESP     1965     Month       GSA 5      Mullus     500   FALSE

 2    ESP     1965     Month       GSA 5      Mullus     200   FALSE

 3    ESP     1965     Month       GSA 5      Mullus     200   TRUE

 4    ITA     1965     Month       GSA 17     Eledone    350   FALSE

 5    ITA     1965     Month       GSA 17     Eledone    350   TRUE

 6    ITA     1965     Month       GSA 17     Eledone    125   FALSE

Now, I would like a new column (in a dynamic way, my true df is very big, with many row, column and variable) where is possible to view the number of duplicated row when is TRUE, like this:

aspected.df



id  Country Year  Time.step  GSA.numb  Species   Quantity dup  ref  

 1  ESP     1965  Month      GSA 5      Mullus     500   FALSE NA

 2  ESP     1965  Month      GSA 5      Mullus     200   FALSE NA

 3  ESP     1965  Month      GSA 5      Mullus     200   TRUE  =id2

 4  ITA     1965  Month      GSA 17     Eledone    350   FALSE NA

 5  ITA     1965  Month      GSA 17     Eledone    350   TRUE  =id4

 6  ITA     1965  Month      GSA 17     Eledone    125   FALSE NA

I tried with:

with(df, ave(as.character(Species), df[,2:6], FUN = make.unique))

but result is:

[1] "Mullus"    "Mullus.1"  "Mullus.2"  "Eledone"   "Eledone.1" "Eledone.2"

I think I need more code input . Which function are useful? (duplicated,make.unit, row.names and so on...)

edited Nov 20 '18 at 11:54

dmi3kno

1,785521

asked Nov 20 '18 at 11:20

skylobo

766

Note that you have accepted an answer that doesn't correspond to your desired output; you may want to correct your output table.
– arg0naut
Nov 20 '18 at 12:37

add a comment |

I have this data frame df

df <- data.frame(stringsAsFactors=FALSE,

          id = c(1L, 2L, 3L, 4L, 5L, 6L),

     Country = c("ESP", "ESP", "ESP", "ITA", "ITA", "ITA"),

        Year = c(1965L, 1965L, 1965L, 1965L, 1965L, 1965L),

   Time.step = c("Month", "Month", "Month", "Month", "Month", "Month"),

    GSA.numb = c("GSA 5", "GSA 5", "GSA 5", "GSA 17", "GSA 17", "GSA 17"),

     Species = c("Mullus", "Mullus", "Mullus", "Eledone", "Eledone", "Eledone"),

    Quantity = c(500L, 200L, 200L, 350L, 350L, 125L)

                )



df



   id  Country   Year    Time.step    GSA.numb  Species   Quantity

    1    ESP     1965     Month       GSA 5      Mullus     500   

    2    ESP     1965     Month       GSA 5      Mullus     200  

    3    ESP     1965     Month       GSA 5      Mullus     200 

    4    ITA     1965     Month       GSA 17     Eledone    350

    5    ITA     1965     Month       GSA 17     Eledone    350 

    6    ITA     1965     Month       GSA 17     Eledone    125

I have some duplicated row, as: 3 and 5.
I can create a column for F or T logic value when the row is duplicated:

df$dup <- duplicated(df[,2:7]) #No id!

result:

id  Country   Year    Time.step    GSA.numb  Species   Quantity dup

 1    ESP     1965     Month       GSA 5      Mullus     500   FALSE

 2    ESP     1965     Month       GSA 5      Mullus     200   FALSE

 3    ESP     1965     Month       GSA 5      Mullus     200   TRUE

 4    ITA     1965     Month       GSA 17     Eledone    350   FALSE

 5    ITA     1965     Month       GSA 17     Eledone    350   TRUE

 6    ITA     1965     Month       GSA 17     Eledone    125   FALSE

Now, I would like a new column (in a dynamic way, my true df is very big, with many row, column and variable) where is possible to view the number of duplicated row when is TRUE, like this:

aspected.df



id  Country Year  Time.step  GSA.numb  Species   Quantity dup  ref  

 1  ESP     1965  Month      GSA 5      Mullus     500   FALSE NA

 2  ESP     1965  Month      GSA 5      Mullus     200   FALSE NA

 3  ESP     1965  Month      GSA 5      Mullus     200   TRUE  =id2

 4  ITA     1965  Month      GSA 17     Eledone    350   FALSE NA

 5  ITA     1965  Month      GSA 17     Eledone    350   TRUE  =id4

 6  ITA     1965  Month      GSA 17     Eledone    125   FALSE NA

I tried with:

with(df, ave(as.character(Species), df[,2:6], FUN = make.unique))

but result is:

[1] "Mullus"    "Mullus.1"  "Mullus.2"  "Eledone"   "Eledone.1" "Eledone.2"

I think I need more code input . Which function are useful? (duplicated,make.unit, row.names and so on...)

edited Nov 20 '18 at 11:54

dmi3kno

1,785521

asked Nov 20 '18 at 11:20

skylobo

766

Note that you have accepted an answer that doesn't correspond to your desired output; you may want to correct your output table.
– arg0naut
Nov 20 '18 at 12:37

add a comment |

I have this data frame df

df <- data.frame(stringsAsFactors=FALSE,

          id = c(1L, 2L, 3L, 4L, 5L, 6L),

     Country = c("ESP", "ESP", "ESP", "ITA", "ITA", "ITA"),

        Year = c(1965L, 1965L, 1965L, 1965L, 1965L, 1965L),

   Time.step = c("Month", "Month", "Month", "Month", "Month", "Month"),

    GSA.numb = c("GSA 5", "GSA 5", "GSA 5", "GSA 17", "GSA 17", "GSA 17"),

     Species = c("Mullus", "Mullus", "Mullus", "Eledone", "Eledone", "Eledone"),

    Quantity = c(500L, 200L, 200L, 350L, 350L, 125L)

                )



df



   id  Country   Year    Time.step    GSA.numb  Species   Quantity

    1    ESP     1965     Month       GSA 5      Mullus     500   

    2    ESP     1965     Month       GSA 5      Mullus     200  

    3    ESP     1965     Month       GSA 5      Mullus     200 

    4    ITA     1965     Month       GSA 17     Eledone    350

    5    ITA     1965     Month       GSA 17     Eledone    350 

    6    ITA     1965     Month       GSA 17     Eledone    125

I have some duplicated row, as: 3 and 5.
I can create a column for F or T logic value when the row is duplicated:

df$dup <- duplicated(df[,2:7]) #No id!

result:

id  Country   Year    Time.step    GSA.numb  Species   Quantity dup

 1    ESP     1965     Month       GSA 5      Mullus     500   FALSE

 2    ESP     1965     Month       GSA 5      Mullus     200   FALSE

 3    ESP     1965     Month       GSA 5      Mullus     200   TRUE

 4    ITA     1965     Month       GSA 17     Eledone    350   FALSE

 5    ITA     1965     Month       GSA 17     Eledone    350   TRUE

 6    ITA     1965     Month       GSA 17     Eledone    125   FALSE

Now, I would like a new column (in a dynamic way, my true df is very big, with many row, column and variable) where is possible to view the number of duplicated row when is TRUE, like this:

aspected.df



id  Country Year  Time.step  GSA.numb  Species   Quantity dup  ref  

 1  ESP     1965  Month      GSA 5      Mullus     500   FALSE NA

 2  ESP     1965  Month      GSA 5      Mullus     200   FALSE NA

 3  ESP     1965  Month      GSA 5      Mullus     200   TRUE  =id2

 4  ITA     1965  Month      GSA 17     Eledone    350   FALSE NA

 5  ITA     1965  Month      GSA 17     Eledone    350   TRUE  =id4

 6  ITA     1965  Month      GSA 17     Eledone    125   FALSE NA

I tried with:

with(df, ave(as.character(Species), df[,2:6], FUN = make.unique))

but result is:

[1] "Mullus"    "Mullus.1"  "Mullus.2"  "Eledone"   "Eledone.1" "Eledone.2"

I think I need more code input . Which function are useful? (duplicated,make.unit, row.names and so on...)

edited Nov 20 '18 at 11:54

dmi3kno

1,785521

asked Nov 20 '18 at 11:20

skylobo

766

I have this data frame df

df <- data.frame(stringsAsFactors=FALSE,

          id = c(1L, 2L, 3L, 4L, 5L, 6L),

     Country = c("ESP", "ESP", "ESP", "ITA", "ITA", "ITA"),

        Year = c(1965L, 1965L, 1965L, 1965L, 1965L, 1965L),

   Time.step = c("Month", "Month", "Month", "Month", "Month", "Month"),

    GSA.numb = c("GSA 5", "GSA 5", "GSA 5", "GSA 17", "GSA 17", "GSA 17"),

     Species = c("Mullus", "Mullus", "Mullus", "Eledone", "Eledone", "Eledone"),

    Quantity = c(500L, 200L, 200L, 350L, 350L, 125L)

                )



df



   id  Country   Year    Time.step    GSA.numb  Species   Quantity

    1    ESP     1965     Month       GSA 5      Mullus     500   

    2    ESP     1965     Month       GSA 5      Mullus     200  

    3    ESP     1965     Month       GSA 5      Mullus     200 

    4    ITA     1965     Month       GSA 17     Eledone    350

    5    ITA     1965     Month       GSA 17     Eledone    350 

    6    ITA     1965     Month       GSA 17     Eledone    125

I have some duplicated row, as: 3 and 5.
I can create a column for F or T logic value when the row is duplicated:

df$dup <- duplicated(df[,2:7]) #No id!

result:

id  Country   Year    Time.step    GSA.numb  Species   Quantity dup

 1    ESP     1965     Month       GSA 5      Mullus     500   FALSE

 2    ESP     1965     Month       GSA 5      Mullus     200   FALSE

 3    ESP     1965     Month       GSA 5      Mullus     200   TRUE

 4    ITA     1965     Month       GSA 17     Eledone    350   FALSE

 5    ITA     1965     Month       GSA 17     Eledone    350   TRUE

 6    ITA     1965     Month       GSA 17     Eledone    125   FALSE

Now, I would like a new column (in a dynamic way, my true df is very big, with many row, column and variable) where is possible to view the number of duplicated row when is TRUE, like this:

aspected.df



id  Country Year  Time.step  GSA.numb  Species   Quantity dup  ref  

 1  ESP     1965  Month      GSA 5      Mullus     500   FALSE NA

 2  ESP     1965  Month      GSA 5      Mullus     200   FALSE NA

 3  ESP     1965  Month      GSA 5      Mullus     200   TRUE  =id2

 4  ITA     1965  Month      GSA 17     Eledone    350   FALSE NA

 5  ITA     1965  Month      GSA 17     Eledone    350   TRUE  =id4

 6  ITA     1965  Month      GSA 17     Eledone    125   FALSE NA

I tried with:

with(df, ave(as.character(Species), df[,2:6], FUN = make.unique))

but result is:

[1] "Mullus"    "Mullus.1"  "Mullus.2"  "Eledone"   "Eledone.1" "Eledone.2"

I think I need more code input . Which function are useful? (duplicated,make.unit, row.names and so on...)

r dataframe duplicates id rowname

edited Nov 20 '18 at 11:54

dmi3kno

1,785521

asked Nov 20 '18 at 11:20

skylobo

766

edited Nov 20 '18 at 11:54

dmi3kno

1,785521

asked Nov 20 '18 at 11:20

skylobo

766

edited Nov 20 '18 at 11:54

dmi3kno

1,785521

edited Nov 20 '18 at 11:54

dmi3kno

1,785521

edited Nov 20 '18 at 11:54

dmi3kno

1,785521

asked Nov 20 '18 at 11:20

skylobo

766

asked Nov 20 '18 at 11:20

skylobo

766

asked Nov 20 '18 at 11:20

skylobo

766

Note that you have accepted an answer that doesn't correspond to your desired output; you may want to correct your output table.
– arg0naut
Nov 20 '18 at 12:37

add a comment |

Note that you have accepted an answer that doesn't correspond to your desired output; you may want to correct your output table.
– arg0naut
Nov 20 '18 at 12:37

Note that you have accepted an answer that doesn't correspond to your desired output; you may want to correct your output table.
– arg0naut
Nov 20 '18 at 12:37

add a comment |

4 Answers
4

active

oldest

votes

A data.table approach, starting from the initial file:

library(data.table)



setDT(df)[, `:=` (dup = seq_len(.N) > 1, ref = paste0("id", first(id))), 

          by = .(Country, Year, Time.step, GSA.numb, Species, Quantity)][dup == FALSE, ref := NA]

Output:

   id Country Year Time.step GSA.numb Species Quantity   dup  ref

1:  1     ESP 1965     Month     GSA5  Mullus      500 FALSE <NA>

2:  2     ESP 1965     Month     GSA5  Mullus      200 FALSE <NA>

3:  3     ESP 1965     Month     GSA5  Mullus      200  TRUE  id2

4:  4     ITA 1965     Month    GSA17 Eledone      350 FALSE <NA>

5:  5     ITA 1965     Month    GSA17 Eledone      350  TRUE  id4

6:  6     ITA 1965     Month    GSA17 Eledone      125 FALSE <NA>

A tidyverse approach (with dup already created before):

library(tidyverse)



df %>% 

  group_by_at(vars(2:7)) %>% 

  mutate(ref = ifelse(dup, paste0("id", first(id)), NA_character_))

Output:

     id Country  Year Time.step GSA.numb Species Quantity dup   ref  

  <int> <chr>   <int> <chr>     <chr>    <chr>      <int> <lgl> <chr>

1     1 ESP      1965 Month     GSA5     Mullus       500 FALSE NA   

2     2 ESP      1965 Month     GSA5     Mullus       200 FALSE NA   

3     3 ESP      1965 Month     GSA5     Mullus       200 TRUE  id2  

4     4 ITA      1965 Month     GSA17    Eledone      350 FALSE NA   

5     5 ITA      1965 Month     GSA17    Eledone      350 TRUE  id4  

6     6 ITA      1965 Month     GSA17    Eledone      125 FALSE NA

If you'd like to create the dup column within the statement:

df %>% 

  group_by_at(vars(2:7)) %>% 

  mutate(

    dup = row_number() > 1,

    ref = ifelse(dup, paste0("id", first(id)), NA_character_))

Output:

     id Country  Year Time.step GSA.numb Species Quantity dup   ref  

  <int> <chr>   <int> <chr>     <chr>    <chr>      <int> <lgl> <chr>

1     1 ESP      1965 Month     GSA5     Mullus       500 FALSE NA   

2     2 ESP      1965 Month     GSA5     Mullus       200 FALSE NA   

3     3 ESP      1965 Month     GSA5     Mullus       200 TRUE  id2  

4     4 ITA      1965 Month     GSA17    Eledone      350 FALSE NA   

5     5 ITA      1965 Month     GSA17    Eledone      350 TRUE  id4  

6     6 ITA      1965 Month     GSA17    Eledone      125 FALSE NA

edited Nov 20 '18 at 12:34

answered Nov 20 '18 at 12:26

arg0naut

2,077314

add a comment |

You could use tidyverse functions to quickly id the duplicates

df$dup <- duplicated(df[,2:7]) #No id! 



library(tidyverse)



df %>% 

 group_by(dup) %>% 

 mutate(ref=ifelse(dup, paste0("id",1:n()), NA_character_))



#> # A tibble: 6 x 9

#> # Groups:   dup [2]

#>      id Country  Year Time.step GSA.numb Species Quantity dup   ref  

#>   <int> <chr>   <int> <chr>     <chr>    <chr>      <int> <lgl> <chr>

#> 1     1 ESP      1965 Month     GSA 5    Mullus       500 FALSE NA   

#> 2     2 ESP      1965 Month     GSA 5    Mullus       200 FALSE NA   

#> 3     3 ESP      1965 Month     GSA 5    Mullus       200 TRUE  id1  

#> 4     4 ITA      1965 Month     GSA 17   Eledone      350 FALSE NA   

#> 5     5 ITA      1965 Month     GSA 17   Eledone      350 TRUE  id2  

#> 6     6 ITA      1965 Month     GSA 17   Eledone      125 FALSE NA

answered Nov 20 '18 at 11:35

dmi3kno

1,785521

Noticed the id's are the result of the dup loop, not the matched row
– Evan Friedland
Nov 20 '18 at 12:10

Thank you...question: the output is a list? How can view only ref column? sorry but I don't now 'mutate' function
– skylobo
Nov 20 '18 at 12:44

add a comment |

This example uses base R and matches the found duplicates with the original value. It's helpful if you have multiple duplicates for a single row as well.

example data (used dput(control = NULL) so the characters/factors were converted to numeric)

df <- data.frame(id = c(1, 1, 1, 2, 2, 2), 

           Country = c(1965, 1965, 1965, 1965, 1965, 1965), 

           Year = c(1, 1, 1, 1, 1, 1), 

           Time.step = c(1, 1, 1, 1, 1, 1), 

           GSA.numb = c(5, 5, 5, 17, 17, 17), 

           Species = c(2, 2, 2, 1, 1, 1), Quantity = c(500, 200, 200, 350, 350, 125))

The code is vectorized so, despite the external loop, it should run fairly quickly on your large dataframe.

df$dup <- duplicated(df)

dupes <- df[df$dup,]

df$ref <- NA # initialize 

for(i in 1:nrow(dupes)){

  z=which(df[,1] == dupes[i,1]&

          df[,2] == dupes[i,2]&

          df[,3] == dupes[i,3]&

          df[,4] == dupes[i,4]&

          df[,5] == dupes[i,5]&

          df[,6] == dupes[i,6]&

          df[,7] == dupes[i,7]) # make sure not to include that $dup column!

  df$ref[z[-1]] <- paste0("=id",min(z))

}

df

#  id Country Year Time.step GSA.numb Species Quantity   dup  ref

#1  1    1965    1         1        5       2      500 FALSE <NA>

#2  1    1965    1         1        5       2      200 FALSE <NA>

#3  1    1965    1         1        5       2      200  TRUE =id2

#4  2    1965    1         1       17       1      350 FALSE <NA>

#5  2    1965    1         1       17       1      350  TRUE =id4

#6  2    1965    1         1       17       1      125 FALSE <NA>

Even though you could tighten this up with apply functions, this will run quicker.

answered Nov 20 '18 at 12:08

Evan Friedland

2,0991621

add a comment |

Using tidyverse:

df %>%

  group_by_at(vars(-id)) %>% #Group by all variables except of id

  mutate(n = n(), #Identifying the duplicate rows

         dup = ifelse(seq_along(n) > 1, TRUE, FALSE), #Coding the first unique row as TRUE and others as FALSE

         ref = ifelse(dup == TRUE, paste0("=id", first(id[dup == FALSE])), NA_character_)) %>% #Pasting the id of the first unique row

 select(-n)



     id Country  Year Time.step GSA.numb Species Quantity dup   ref  

  <int> <chr>   <int> <chr>     <chr>    <chr>      <int> <lgl> <chr>

1     1 ESP      1965 Month     GSA 5    Mullus       500 FALSE <NA> 

2     2 ESP      1965 Month     GSA 5    Mullus       200 FALSE <NA> 

3     3 ESP      1965 Month     GSA 5    Mullus       200 TRUE  =id2 

4     4 ITA      1965 Month     GSA 17   Eledone      350 FALSE <NA> 

5     5 ITA      1965 Month     GSA 17   Eledone      350 TRUE  =id4 

6     6 ITA      1965 Month     GSA 17   Eledone      125 FALSE <NA>

answered Nov 20 '18 at 12:17

tmfmnk

1,8901412

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53391887%2fhow-to-assign-a-unique-code-for-duplicate-rows-in-this-df-in-r%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

A data.table approach, starting from the initial file:

library(data.table)



setDT(df)[, `:=` (dup = seq_len(.N) > 1, ref = paste0("id", first(id))), 

          by = .(Country, Year, Time.step, GSA.numb, Species, Quantity)][dup == FALSE, ref := NA]

Output:

   id Country Year Time.step GSA.numb Species Quantity   dup  ref

1:  1     ESP 1965     Month     GSA5  Mullus      500 FALSE <NA>

2:  2     ESP 1965     Month     GSA5  Mullus      200 FALSE <NA>

3:  3     ESP 1965     Month     GSA5  Mullus      200  TRUE  id2

4:  4     ITA 1965     Month    GSA17 Eledone      350 FALSE <NA>

5:  5     ITA 1965     Month    GSA17 Eledone      350  TRUE  id4

6:  6     ITA 1965     Month    GSA17 Eledone      125 FALSE <NA>

A tidyverse approach (with dup already created before):

library(tidyverse)



df %>% 

  group_by_at(vars(2:7)) %>% 

  mutate(ref = ifelse(dup, paste0("id", first(id)), NA_character_))

Output:

     id Country  Year Time.step GSA.numb Species Quantity dup   ref  

  <int> <chr>   <int> <chr>     <chr>    <chr>      <int> <lgl> <chr>

1     1 ESP      1965 Month     GSA5     Mullus       500 FALSE NA   

2     2 ESP      1965 Month     GSA5     Mullus       200 FALSE NA   

3     3 ESP      1965 Month     GSA5     Mullus       200 TRUE  id2  

4     4 ITA      1965 Month     GSA17    Eledone      350 FALSE NA   

5     5 ITA      1965 Month     GSA17    Eledone      350 TRUE  id4  

6     6 ITA      1965 Month     GSA17    Eledone      125 FALSE NA

If you'd like to create the dup column within the statement:

df %>% 

  group_by_at(vars(2:7)) %>% 

  mutate(

    dup = row_number() > 1,

    ref = ifelse(dup, paste0("id", first(id)), NA_character_))

Output:

     id Country  Year Time.step GSA.numb Species Quantity dup   ref  

  <int> <chr>   <int> <chr>     <chr>    <chr>      <int> <lgl> <chr>

1     1 ESP      1965 Month     GSA5     Mullus       500 FALSE NA   

2     2 ESP      1965 Month     GSA5     Mullus       200 FALSE NA   

3     3 ESP      1965 Month     GSA5     Mullus       200 TRUE  id2  

4     4 ITA      1965 Month     GSA17    Eledone      350 FALSE NA   

5     5 ITA      1965 Month     GSA17    Eledone      350 TRUE  id4  

6     6 ITA      1965 Month     GSA17    Eledone      125 FALSE NA

edited Nov 20 '18 at 12:34

answered Nov 20 '18 at 12:26

arg0naut

2,077314

add a comment |

A data.table approach, starting from the initial file:

library(data.table)



setDT(df)[, `:=` (dup = seq_len(.N) > 1, ref = paste0("id", first(id))), 

          by = .(Country, Year, Time.step, GSA.numb, Species, Quantity)][dup == FALSE, ref := NA]

Output:

   id Country Year Time.step GSA.numb Species Quantity   dup  ref

1:  1     ESP 1965     Month     GSA5  Mullus      500 FALSE <NA>

2:  2     ESP 1965     Month     GSA5  Mullus      200 FALSE <NA>

3:  3     ESP 1965     Month     GSA5  Mullus      200  TRUE  id2

4:  4     ITA 1965     Month    GSA17 Eledone      350 FALSE <NA>

5:  5     ITA 1965     Month    GSA17 Eledone      350  TRUE  id4

6:  6     ITA 1965     Month    GSA17 Eledone      125 FALSE <NA>

A tidyverse approach (with dup already created before):

library(tidyverse)



df %>% 

  group_by_at(vars(2:7)) %>% 

  mutate(ref = ifelse(dup, paste0("id", first(id)), NA_character_))

Output:

     id Country  Year Time.step GSA.numb Species Quantity dup   ref  

  <int> <chr>   <int> <chr>     <chr>    <chr>      <int> <lgl> <chr>

1     1 ESP      1965 Month     GSA5     Mullus       500 FALSE NA   

2     2 ESP      1965 Month     GSA5     Mullus       200 FALSE NA   

3     3 ESP      1965 Month     GSA5     Mullus       200 TRUE  id2  

4     4 ITA      1965 Month     GSA17    Eledone      350 FALSE NA   

5     5 ITA      1965 Month     GSA17    Eledone      350 TRUE  id4  

6     6 ITA      1965 Month     GSA17    Eledone      125 FALSE NA

If you'd like to create the dup column within the statement:

df %>% 

  group_by_at(vars(2:7)) %>% 

  mutate(

    dup = row_number() > 1,

    ref = ifelse(dup, paste0("id", first(id)), NA_character_))

Output:

     id Country  Year Time.step GSA.numb Species Quantity dup   ref  

  <int> <chr>   <int> <chr>     <chr>    <chr>      <int> <lgl> <chr>

1     1 ESP      1965 Month     GSA5     Mullus       500 FALSE NA   

2     2 ESP      1965 Month     GSA5     Mullus       200 FALSE NA   

3     3 ESP      1965 Month     GSA5     Mullus       200 TRUE  id2  

4     4 ITA      1965 Month     GSA17    Eledone      350 FALSE NA   

5     5 ITA      1965 Month     GSA17    Eledone      350 TRUE  id4  

6     6 ITA      1965 Month     GSA17    Eledone      125 FALSE NA

edited Nov 20 '18 at 12:34

answered Nov 20 '18 at 12:26

arg0naut

2,077314

add a comment |

A data.table approach, starting from the initial file:

library(data.table)



setDT(df)[, `:=` (dup = seq_len(.N) > 1, ref = paste0("id", first(id))), 

          by = .(Country, Year, Time.step, GSA.numb, Species, Quantity)][dup == FALSE, ref := NA]

Output:

   id Country Year Time.step GSA.numb Species Quantity   dup  ref

1:  1     ESP 1965     Month     GSA5  Mullus      500 FALSE <NA>

2:  2     ESP 1965     Month     GSA5  Mullus      200 FALSE <NA>

3:  3     ESP 1965     Month     GSA5  Mullus      200  TRUE  id2

4:  4     ITA 1965     Month    GSA17 Eledone      350 FALSE <NA>

5:  5     ITA 1965     Month    GSA17 Eledone      350  TRUE  id4

6:  6     ITA 1965     Month    GSA17 Eledone      125 FALSE <NA>

A tidyverse approach (with dup already created before):

library(tidyverse)



df %>% 

  group_by_at(vars(2:7)) %>% 

  mutate(ref = ifelse(dup, paste0("id", first(id)), NA_character_))

Output:

     id Country  Year Time.step GSA.numb Species Quantity dup   ref  

  <int> <chr>   <int> <chr>     <chr>    <chr>      <int> <lgl> <chr>

1     1 ESP      1965 Month     GSA5     Mullus       500 FALSE NA   

2     2 ESP      1965 Month     GSA5     Mullus       200 FALSE NA   

3     3 ESP      1965 Month     GSA5     Mullus       200 TRUE  id2  

4     4 ITA      1965 Month     GSA17    Eledone      350 FALSE NA   

5     5 ITA      1965 Month     GSA17    Eledone      350 TRUE  id4  

6     6 ITA      1965 Month     GSA17    Eledone      125 FALSE NA

If you'd like to create the dup column within the statement:

df %>% 

  group_by_at(vars(2:7)) %>% 

  mutate(

    dup = row_number() > 1,

    ref = ifelse(dup, paste0("id", first(id)), NA_character_))

Output:

     id Country  Year Time.step GSA.numb Species Quantity dup   ref  

  <int> <chr>   <int> <chr>     <chr>    <chr>      <int> <lgl> <chr>

1     1 ESP      1965 Month     GSA5     Mullus       500 FALSE NA   

2     2 ESP      1965 Month     GSA5     Mullus       200 FALSE NA   

3     3 ESP      1965 Month     GSA5     Mullus       200 TRUE  id2  

4     4 ITA      1965 Month     GSA17    Eledone      350 FALSE NA   

5     5 ITA      1965 Month     GSA17    Eledone      350 TRUE  id4  

6     6 ITA      1965 Month     GSA17    Eledone      125 FALSE NA

edited Nov 20 '18 at 12:34

answered Nov 20 '18 at 12:26

arg0naut

2,077314

A data.table approach, starting from the initial file:

library(data.table)



setDT(df)[, `:=` (dup = seq_len(.N) > 1, ref = paste0("id", first(id))), 

          by = .(Country, Year, Time.step, GSA.numb, Species, Quantity)][dup == FALSE, ref := NA]

Output:

   id Country Year Time.step GSA.numb Species Quantity   dup  ref

1:  1     ESP 1965     Month     GSA5  Mullus      500 FALSE <NA>

2:  2     ESP 1965     Month     GSA5  Mullus      200 FALSE <NA>

3:  3     ESP 1965     Month     GSA5  Mullus      200  TRUE  id2

4:  4     ITA 1965     Month    GSA17 Eledone      350 FALSE <NA>

5:  5     ITA 1965     Month    GSA17 Eledone      350  TRUE  id4

6:  6     ITA 1965     Month    GSA17 Eledone      125 FALSE <NA>

A tidyverse approach (with dup already created before):

library(tidyverse)



df %>% 

  group_by_at(vars(2:7)) %>% 

  mutate(ref = ifelse(dup, paste0("id", first(id)), NA_character_))

Output:

     id Country  Year Time.step GSA.numb Species Quantity dup   ref  

  <int> <chr>   <int> <chr>     <chr>    <chr>      <int> <lgl> <chr>

1     1 ESP      1965 Month     GSA5     Mullus       500 FALSE NA   

2     2 ESP      1965 Month     GSA5     Mullus       200 FALSE NA   

3     3 ESP      1965 Month     GSA5     Mullus       200 TRUE  id2  

4     4 ITA      1965 Month     GSA17    Eledone      350 FALSE NA   

5     5 ITA      1965 Month     GSA17    Eledone      350 TRUE  id4  

6     6 ITA      1965 Month     GSA17    Eledone      125 FALSE NA

If you'd like to create the dup column within the statement:

df %>% 

  group_by_at(vars(2:7)) %>% 

  mutate(

    dup = row_number() > 1,

    ref = ifelse(dup, paste0("id", first(id)), NA_character_))

Output:

     id Country  Year Time.step GSA.numb Species Quantity dup   ref  

  <int> <chr>   <int> <chr>     <chr>    <chr>      <int> <lgl> <chr>

1     1 ESP      1965 Month     GSA5     Mullus       500 FALSE NA   

2     2 ESP      1965 Month     GSA5     Mullus       200 FALSE NA   

3     3 ESP      1965 Month     GSA5     Mullus       200 TRUE  id2  

4     4 ITA      1965 Month     GSA17    Eledone      350 FALSE NA   

5     5 ITA      1965 Month     GSA17    Eledone      350 TRUE  id4  

6     6 ITA      1965 Month     GSA17    Eledone      125 FALSE NA

edited Nov 20 '18 at 12:34

answered Nov 20 '18 at 12:26

arg0naut

2,077314

edited Nov 20 '18 at 12:34

answered Nov 20 '18 at 12:26

arg0naut

2,077314

answered Nov 20 '18 at 12:26

arg0naut

2,077314

answered Nov 20 '18 at 12:26

arg0naut

2,077314

add a comment |

You could use tidyverse functions to quickly id the duplicates

df$dup <- duplicated(df[,2:7]) #No id! 



library(tidyverse)



df %>% 

 group_by(dup) %>% 

 mutate(ref=ifelse(dup, paste0("id",1:n()), NA_character_))



#> # A tibble: 6 x 9

#> # Groups:   dup [2]

#>      id Country  Year Time.step GSA.numb Species Quantity dup   ref  

#>   <int> <chr>   <int> <chr>     <chr>    <chr>      <int> <lgl> <chr>

#> 1     1 ESP      1965 Month     GSA 5    Mullus       500 FALSE NA   

#> 2     2 ESP      1965 Month     GSA 5    Mullus       200 FALSE NA   

#> 3     3 ESP      1965 Month     GSA 5    Mullus       200 TRUE  id1  

#> 4     4 ITA      1965 Month     GSA 17   Eledone      350 FALSE NA   

#> 5     5 ITA      1965 Month     GSA 17   Eledone      350 TRUE  id2  

#> 6     6 ITA      1965 Month     GSA 17   Eledone      125 FALSE NA

answered Nov 20 '18 at 11:35

dmi3kno

1,785521

Noticed the id's are the result of the dup loop, not the matched row
– Evan Friedland
Nov 20 '18 at 12:10

Thank you...question: the output is a list? How can view only ref column? sorry but I don't now 'mutate' function
– skylobo
Nov 20 '18 at 12:44

add a comment |

You could use tidyverse functions to quickly id the duplicates

df$dup <- duplicated(df[,2:7]) #No id! 



library(tidyverse)



df %>% 

 group_by(dup) %>% 

 mutate(ref=ifelse(dup, paste0("id",1:n()), NA_character_))



#> # A tibble: 6 x 9

#> # Groups:   dup [2]

#>      id Country  Year Time.step GSA.numb Species Quantity dup   ref  

#>   <int> <chr>   <int> <chr>     <chr>    <chr>      <int> <lgl> <chr>

#> 1     1 ESP      1965 Month     GSA 5    Mullus       500 FALSE NA   

#> 2     2 ESP      1965 Month     GSA 5    Mullus       200 FALSE NA   

#> 3     3 ESP      1965 Month     GSA 5    Mullus       200 TRUE  id1  

#> 4     4 ITA      1965 Month     GSA 17   Eledone      350 FALSE NA   

#> 5     5 ITA      1965 Month     GSA 17   Eledone      350 TRUE  id2  

#> 6     6 ITA      1965 Month     GSA 17   Eledone      125 FALSE NA

answered Nov 20 '18 at 11:35

dmi3kno

1,785521

Noticed the id's are the result of the dup loop, not the matched row
– Evan Friedland
Nov 20 '18 at 12:10

Thank you...question: the output is a list? How can view only ref column? sorry but I don't now 'mutate' function
– skylobo
Nov 20 '18 at 12:44

add a comment |

You could use tidyverse functions to quickly id the duplicates

df$dup <- duplicated(df[,2:7]) #No id! 



library(tidyverse)



df %>% 

 group_by(dup) %>% 

 mutate(ref=ifelse(dup, paste0("id",1:n()), NA_character_))



#> # A tibble: 6 x 9

#> # Groups:   dup [2]

#>      id Country  Year Time.step GSA.numb Species Quantity dup   ref  

#>   <int> <chr>   <int> <chr>     <chr>    <chr>      <int> <lgl> <chr>

#> 1     1 ESP      1965 Month     GSA 5    Mullus       500 FALSE NA   

#> 2     2 ESP      1965 Month     GSA 5    Mullus       200 FALSE NA   

#> 3     3 ESP      1965 Month     GSA 5    Mullus       200 TRUE  id1  

#> 4     4 ITA      1965 Month     GSA 17   Eledone      350 FALSE NA   

#> 5     5 ITA      1965 Month     GSA 17   Eledone      350 TRUE  id2  

#> 6     6 ITA      1965 Month     GSA 17   Eledone      125 FALSE NA

answered Nov 20 '18 at 11:35

dmi3kno

1,785521

You could use tidyverse functions to quickly id the duplicates

df$dup <- duplicated(df[,2:7]) #No id! 



library(tidyverse)



df %>% 

 group_by(dup) %>% 

 mutate(ref=ifelse(dup, paste0("id",1:n()), NA_character_))



#> # A tibble: 6 x 9

#> # Groups:   dup [2]

#>      id Country  Year Time.step GSA.numb Species Quantity dup   ref  

#>   <int> <chr>   <int> <chr>     <chr>    <chr>      <int> <lgl> <chr>

#> 1     1 ESP      1965 Month     GSA 5    Mullus       500 FALSE NA   

#> 2     2 ESP      1965 Month     GSA 5    Mullus       200 FALSE NA   

#> 3     3 ESP      1965 Month     GSA 5    Mullus       200 TRUE  id1  

#> 4     4 ITA      1965 Month     GSA 17   Eledone      350 FALSE NA   

#> 5     5 ITA      1965 Month     GSA 17   Eledone      350 TRUE  id2  

#> 6     6 ITA      1965 Month     GSA 17   Eledone      125 FALSE NA

answered Nov 20 '18 at 11:35

dmi3kno

1,785521

answered Nov 20 '18 at 11:35

dmi3kno

1,785521

answered Nov 20 '18 at 11:35

dmi3kno

1,785521

answered Nov 20 '18 at 11:35

dmi3kno

1,785521

Noticed the id's are the result of the dup loop, not the matched row
– Evan Friedland
Nov 20 '18 at 12:10

Thank you...question: the output is a list? How can view only ref column? sorry but I don't now 'mutate' function
– skylobo
Nov 20 '18 at 12:44

add a comment |

Noticed the id's are the result of the dup loop, not the matched row
– Evan Friedland
Nov 20 '18 at 12:10

Thank you...question: the output is a list? How can view only ref column? sorry but I don't now 'mutate' function
– skylobo
Nov 20 '18 at 12:44

Noticed the id's are the result of the dup loop, not the matched row
– Evan Friedland
Nov 20 '18 at 12:10

Thank you...question: the output is a list? How can view only ref column? sorry but I don't now 'mutate' function
– skylobo
Nov 20 '18 at 12:44

add a comment |

This example uses base R and matches the found duplicates with the original value. It's helpful if you have multiple duplicates for a single row as well.

example data (used dput(control = NULL) so the characters/factors were converted to numeric)

df <- data.frame(id = c(1, 1, 1, 2, 2, 2), 

           Country = c(1965, 1965, 1965, 1965, 1965, 1965), 

           Year = c(1, 1, 1, 1, 1, 1), 

           Time.step = c(1, 1, 1, 1, 1, 1), 

           GSA.numb = c(5, 5, 5, 17, 17, 17), 

           Species = c(2, 2, 2, 1, 1, 1), Quantity = c(500, 200, 200, 350, 350, 125))

The code is vectorized so, despite the external loop, it should run fairly quickly on your large dataframe.

df$dup <- duplicated(df)

dupes <- df[df$dup,]

df$ref <- NA # initialize 

for(i in 1:nrow(dupes)){

  z=which(df[,1] == dupes[i,1]&

          df[,2] == dupes[i,2]&

          df[,3] == dupes[i,3]&

          df[,4] == dupes[i,4]&

          df[,5] == dupes[i,5]&

          df[,6] == dupes[i,6]&

          df[,7] == dupes[i,7]) # make sure not to include that $dup column!

  df$ref[z[-1]] <- paste0("=id",min(z))

}

df

#  id Country Year Time.step GSA.numb Species Quantity   dup  ref

#1  1    1965    1         1        5       2      500 FALSE <NA>

#2  1    1965    1         1        5       2      200 FALSE <NA>

#3  1    1965    1         1        5       2      200  TRUE =id2

#4  2    1965    1         1       17       1      350 FALSE <NA>

#5  2    1965    1         1       17       1      350  TRUE =id4

#6  2    1965    1         1       17       1      125 FALSE <NA>

Even though you could tighten this up with apply functions, this will run quicker.

answered Nov 20 '18 at 12:08

Evan Friedland

2,0991621

add a comment |

This example uses base R and matches the found duplicates with the original value. It's helpful if you have multiple duplicates for a single row as well.

example data (used dput(control = NULL) so the characters/factors were converted to numeric)

df <- data.frame(id = c(1, 1, 1, 2, 2, 2), 

           Country = c(1965, 1965, 1965, 1965, 1965, 1965), 

           Year = c(1, 1, 1, 1, 1, 1), 

           Time.step = c(1, 1, 1, 1, 1, 1), 

           GSA.numb = c(5, 5, 5, 17, 17, 17), 

           Species = c(2, 2, 2, 1, 1, 1), Quantity = c(500, 200, 200, 350, 350, 125))

The code is vectorized so, despite the external loop, it should run fairly quickly on your large dataframe.

df$dup <- duplicated(df)

dupes <- df[df$dup,]

df$ref <- NA # initialize 

for(i in 1:nrow(dupes)){

  z=which(df[,1] == dupes[i,1]&

          df[,2] == dupes[i,2]&

          df[,3] == dupes[i,3]&

          df[,4] == dupes[i,4]&

          df[,5] == dupes[i,5]&

          df[,6] == dupes[i,6]&

          df[,7] == dupes[i,7]) # make sure not to include that $dup column!

  df$ref[z[-1]] <- paste0("=id",min(z))

}

df

#  id Country Year Time.step GSA.numb Species Quantity   dup  ref

#1  1    1965    1         1        5       2      500 FALSE <NA>

#2  1    1965    1         1        5       2      200 FALSE <NA>

#3  1    1965    1         1        5       2      200  TRUE =id2

#4  2    1965    1         1       17       1      350 FALSE <NA>

#5  2    1965    1         1       17       1      350  TRUE =id4

#6  2    1965    1         1       17       1      125 FALSE <NA>

Even though you could tighten this up with apply functions, this will run quicker.

answered Nov 20 '18 at 12:08

Evan Friedland

2,0991621

add a comment |

This example uses base R and matches the found duplicates with the original value. It's helpful if you have multiple duplicates for a single row as well.

example data (used dput(control = NULL) so the characters/factors were converted to numeric)

df <- data.frame(id = c(1, 1, 1, 2, 2, 2), 

           Country = c(1965, 1965, 1965, 1965, 1965, 1965), 

           Year = c(1, 1, 1, 1, 1, 1), 

           Time.step = c(1, 1, 1, 1, 1, 1), 

           GSA.numb = c(5, 5, 5, 17, 17, 17), 

           Species = c(2, 2, 2, 1, 1, 1), Quantity = c(500, 200, 200, 350, 350, 125))

The code is vectorized so, despite the external loop, it should run fairly quickly on your large dataframe.

df$dup <- duplicated(df)

dupes <- df[df$dup,]

df$ref <- NA # initialize 

for(i in 1:nrow(dupes)){

  z=which(df[,1] == dupes[i,1]&

          df[,2] == dupes[i,2]&

          df[,3] == dupes[i,3]&

          df[,4] == dupes[i,4]&

          df[,5] == dupes[i,5]&

          df[,6] == dupes[i,6]&

          df[,7] == dupes[i,7]) # make sure not to include that $dup column!

  df$ref[z[-1]] <- paste0("=id",min(z))

}

df

#  id Country Year Time.step GSA.numb Species Quantity   dup  ref

#1  1    1965    1         1        5       2      500 FALSE <NA>

#2  1    1965    1         1        5       2      200 FALSE <NA>

#3  1    1965    1         1        5       2      200  TRUE =id2

#4  2    1965    1         1       17       1      350 FALSE <NA>

#5  2    1965    1         1       17       1      350  TRUE =id4

#6  2    1965    1         1       17       1      125 FALSE <NA>

Even though you could tighten this up with apply functions, this will run quicker.

answered Nov 20 '18 at 12:08

Evan Friedland

2,0991621

This example uses base R and matches the found duplicates with the original value. It's helpful if you have multiple duplicates for a single row as well.

example data (used dput(control = NULL) so the characters/factors were converted to numeric)

df <- data.frame(id = c(1, 1, 1, 2, 2, 2), 

           Country = c(1965, 1965, 1965, 1965, 1965, 1965), 

           Year = c(1, 1, 1, 1, 1, 1), 

           Time.step = c(1, 1, 1, 1, 1, 1), 

           GSA.numb = c(5, 5, 5, 17, 17, 17), 

           Species = c(2, 2, 2, 1, 1, 1), Quantity = c(500, 200, 200, 350, 350, 125))

The code is vectorized so, despite the external loop, it should run fairly quickly on your large dataframe.

df$dup <- duplicated(df)

dupes <- df[df$dup,]

df$ref <- NA # initialize 

for(i in 1:nrow(dupes)){

  z=which(df[,1] == dupes[i,1]&

          df[,2] == dupes[i,2]&

          df[,3] == dupes[i,3]&

          df[,4] == dupes[i,4]&

          df[,5] == dupes[i,5]&

          df[,6] == dupes[i,6]&

          df[,7] == dupes[i,7]) # make sure not to include that $dup column!

  df$ref[z[-1]] <- paste0("=id",min(z))

}

df

#  id Country Year Time.step GSA.numb Species Quantity   dup  ref

#1  1    1965    1         1        5       2      500 FALSE <NA>

#2  1    1965    1         1        5       2      200 FALSE <NA>

#3  1    1965    1         1        5       2      200  TRUE =id2

#4  2    1965    1         1       17       1      350 FALSE <NA>

#5  2    1965    1         1       17       1      350  TRUE =id4

#6  2    1965    1         1       17       1      125 FALSE <NA>

Even though you could tighten this up with apply functions, this will run quicker.

answered Nov 20 '18 at 12:08

Evan Friedland

2,0991621

answered Nov 20 '18 at 12:08

Evan Friedland

2,0991621

answered Nov 20 '18 at 12:08

Evan Friedland

2,0991621

answered Nov 20 '18 at 12:08

Evan Friedland

2,0991621

add a comment |

Using tidyverse:

df %>%

  group_by_at(vars(-id)) %>% #Group by all variables except of id

  mutate(n = n(), #Identifying the duplicate rows

         dup = ifelse(seq_along(n) > 1, TRUE, FALSE), #Coding the first unique row as TRUE and others as FALSE

         ref = ifelse(dup == TRUE, paste0("=id", first(id[dup == FALSE])), NA_character_)) %>% #Pasting the id of the first unique row

 select(-n)



     id Country  Year Time.step GSA.numb Species Quantity dup   ref  

  <int> <chr>   <int> <chr>     <chr>    <chr>      <int> <lgl> <chr>

1     1 ESP      1965 Month     GSA 5    Mullus       500 FALSE <NA> 

2     2 ESP      1965 Month     GSA 5    Mullus       200 FALSE <NA> 

3     3 ESP      1965 Month     GSA 5    Mullus       200 TRUE  =id2 

4     4 ITA      1965 Month     GSA 17   Eledone      350 FALSE <NA> 

5     5 ITA      1965 Month     GSA 17   Eledone      350 TRUE  =id4 

6     6 ITA      1965 Month     GSA 17   Eledone      125 FALSE <NA>

answered Nov 20 '18 at 12:17

tmfmnk

1,8901412

add a comment |

Using tidyverse:

df %>%

  group_by_at(vars(-id)) %>% #Group by all variables except of id

  mutate(n = n(), #Identifying the duplicate rows

         dup = ifelse(seq_along(n) > 1, TRUE, FALSE), #Coding the first unique row as TRUE and others as FALSE

         ref = ifelse(dup == TRUE, paste0("=id", first(id[dup == FALSE])), NA_character_)) %>% #Pasting the id of the first unique row

 select(-n)



     id Country  Year Time.step GSA.numb Species Quantity dup   ref  

  <int> <chr>   <int> <chr>     <chr>    <chr>      <int> <lgl> <chr>

1     1 ESP      1965 Month     GSA 5    Mullus       500 FALSE <NA> 

2     2 ESP      1965 Month     GSA 5    Mullus       200 FALSE <NA> 

3     3 ESP      1965 Month     GSA 5    Mullus       200 TRUE  =id2 

4     4 ITA      1965 Month     GSA 17   Eledone      350 FALSE <NA> 

5     5 ITA      1965 Month     GSA 17   Eledone      350 TRUE  =id4 

6     6 ITA      1965 Month     GSA 17   Eledone      125 FALSE <NA>

answered Nov 20 '18 at 12:17

tmfmnk

1,8901412

add a comment |

Using tidyverse:

df %>%

  group_by_at(vars(-id)) %>% #Group by all variables except of id

  mutate(n = n(), #Identifying the duplicate rows

         dup = ifelse(seq_along(n) > 1, TRUE, FALSE), #Coding the first unique row as TRUE and others as FALSE

         ref = ifelse(dup == TRUE, paste0("=id", first(id[dup == FALSE])), NA_character_)) %>% #Pasting the id of the first unique row

 select(-n)



     id Country  Year Time.step GSA.numb Species Quantity dup   ref  

  <int> <chr>   <int> <chr>     <chr>    <chr>      <int> <lgl> <chr>

1     1 ESP      1965 Month     GSA 5    Mullus       500 FALSE <NA> 

2     2 ESP      1965 Month     GSA 5    Mullus       200 FALSE <NA> 

3     3 ESP      1965 Month     GSA 5    Mullus       200 TRUE  =id2 

4     4 ITA      1965 Month     GSA 17   Eledone      350 FALSE <NA> 

5     5 ITA      1965 Month     GSA 17   Eledone      350 TRUE  =id4 

6     6 ITA      1965 Month     GSA 17   Eledone      125 FALSE <NA>

answered Nov 20 '18 at 12:17

tmfmnk

1,8901412

Using tidyverse:

df %>%

  group_by_at(vars(-id)) %>% #Group by all variables except of id

  mutate(n = n(), #Identifying the duplicate rows

         dup = ifelse(seq_along(n) > 1, TRUE, FALSE), #Coding the first unique row as TRUE and others as FALSE

         ref = ifelse(dup == TRUE, paste0("=id", first(id[dup == FALSE])), NA_character_)) %>% #Pasting the id of the first unique row

 select(-n)



     id Country  Year Time.step GSA.numb Species Quantity dup   ref  

  <int> <chr>   <int> <chr>     <chr>    <chr>      <int> <lgl> <chr>

1     1 ESP      1965 Month     GSA 5    Mullus       500 FALSE <NA> 

2     2 ESP      1965 Month     GSA 5    Mullus       200 FALSE <NA> 

3     3 ESP      1965 Month     GSA 5    Mullus       200 TRUE  =id2 

4     4 ITA      1965 Month     GSA 17   Eledone      350 FALSE <NA> 

5     5 ITA      1965 Month     GSA 17   Eledone      350 TRUE  =id4 

6     6 ITA      1965 Month     GSA 17   Eledone      125 FALSE <NA>

answered Nov 20 '18 at 12:17

tmfmnk

1,8901412

answered Nov 20 '18 at 12:17

tmfmnk

1,8901412

answered Nov 20 '18 at 12:17

tmfmnk

1,8901412

answered Nov 20 '18 at 12:17

tmfmnk

1,8901412

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

u,C3xt,qNP BOVy,2tL6yTE,4i,Ywn WbcuTyHShbNjKFiZvQPlvVf,N qoer,lUNAS2pCld7aP7rNK4iH7JeeFr Tsw

搜尋此網誌

Argthtjtr