How to merge (full join) a list of more than three data.tables correctly?

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

When using merge.data.table() I get ugly error messages about duplicated columns names—strangely enough only if there are more than three data.tables. On the contrary standard merge() with data.frame versions works fine.

I use this code, to achieve a full join avoid time-split columns with merge.data.table().

So, in base R

Reduce(function(...) merge(..., all=TRUE), L)

runs fine, whereas in data.table

Reduce(function(...)

  merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt, names))), L.dt)

trows the error:

  Error in merge.data.table(..., all = TRUE, by = Reduce(intersect, lapply(L.dt,  : 

  x has some duplicated column name(s): p.l01.x,p.l02.x,p.l03.x,p.l05.x,p.l06.x,p.l07.x,

p.l08.x,p.l10.x,p.l11.x,p.l12.x,

p.l01.y,p.l02.y,p.l03.y,p.l05.y,p.l06.y,p.l07.y,

p.l08.y,p.l10.y,p.l11.y,p.l12.y. Please remove 

or rename the duplicate(s) and try again.

In addition: Warning message:

In merge.data.table(..., all = TRUE, by = Reduce(intersect, lapply(L.dt,  :



 Error in merge.data.table(..., all = TRUE, by = Reduce(intersect, lapply(L.dt,  : 

  x has some duplicated column name(s): p.l01.x,p.l02.x,p.l03.x,p.l05.x,p.l06.x,p.l07.x,

p.l08.x,p.l10.x,p.l11.x,p.l12.x,

p.l01.y,p.l02.y,p.l03.y,p.l05.y,p.l06.y,p.l07.y,p.l08.y,p.l10.y,p.l11.y,p.l12.y. Please remove

or rename the duplicate(s) and try again.

Note that the issue appears to be cumulative somehow...

While

Reduce(function(...)

  merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:2], names))), L.dt[1:2])



Reduce(function(...)

  merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:3], names))), L.dt[1:3])



Reduce(function(...)

  merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:4], names))), L.dt[1:4])

still runs fine,

Reduce(function(...)

  merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:5], names))), L.dt[1:5])

throws a warning, and from here

Reduce(function(...)

  merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:6], names))), L.dt[1:6])

finally the error starts.

Thus, merge.data.table() appears to be working fine only until the number of data.tables exceeds 3?? Any ideas how I could solve this issue?

Note: With easy example data no error appears. So it could be caused by my data. But I have no idea why—standard merge() works fine! My real data has about dim=15.000*500, some variable names match throughout the list, others are individual. This MCVE seems to reproduce the issue quite well.

Data and code to reproduce

# names vector mimicking my data   

nm <- list(c("p.d17m", "p.d17", "p.d29", "p.d31", "p.n03", "p.n04", 

"p.n05", "p.d36", "p.d40", "p.d41", "p.d45", "p.d46", "p.d50", 

"p.d51", "p.d55", "p.d56", "p.d60", "p.d61", "p.d65", "p.d66", 

"p.d70", "p.d71", "p.n08"), c("p.d17m", "p.d17", "p.c44", "p.l01", 

"p.l02", "p.l03", "p.l05", "p.l06", "p.l07", "p.l08", "p.l10", 

"p.l11", "p.l12", "p.l13", "p.l15", "p.l16", "p.l17", "p.l18", 

"p.l20", "p.l21", "p.l22", "p.l23", "p.l25"), c("p.d17m", "p.d17", 

"p.c44", "p.l01", "p.l02", "p.l03", "p.l05", "p.l06", "p.l07", 

"p.l08", "p.l10", "p.l11", "p.l12", "p.l13", "p.l15", "p.l16", 

"p.l17", "p.l18", "p.l20", "p.l21", "p.l22", "p.l23", "p.l25"

), c("p.d17m", "p.d17", "p.c44", "p.l01", "p.l01r", "p.l02", 

"p.l03", "p.l05", "p.l06", "p.l07", "p.l08", "p.l10", "p.l11", 

"p.l12", "p.l13", "p.l15", "p.l16", "p.l17", "p.l18", "p.l20", 

"p.l21", "p.l22", "p.l23"), c("p.d17m", "p.d17", "p.c44", "p.l01", 

"p.l01r", "p.l02", "p.l03", "p.l05", "p.l06", "p.l07", "p.l08", 

"p.l10", "p.l11", "p.l12", "p.l13", "p.l15", "p.l16", "p.l17", 

"p.l18", "p.l20", "p.l21", "p.l22", "p.l23"), c("pdate.", "p.d17m", 

"p.d17", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165", 

"p.d166", "p.d167", "p.c44", "p.l01", "p.l01r", "p.l02", "p.l03", 

"p.l05", "p.l06", "p.l07", "p.l08", "p.l10", "p.l11", "p.l12"

), c("pdate.", "p.d17m", "p.d17", "p.d160", "p.d161", "p.d162", 

"p.d163", "p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.l01", 

"p.l01r", "p.l02", "p.l03", "p.l05", "p.l06", "p.l07", "p.l08", 

"p.l10", "p.l11", "p.l12"), c("pdate.", "p.d200", "p.d201", "p.d17m", 

"p.d17", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165", 

"p.d166", "p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50", 

"p.l01", "p.l01r", "p.l02", "p.l03", "p.l05"), c("pdate.", "p.d17m", 

"p.d17", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165", 

"p.d166", "p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50", 

"p.l01", "p.l01r", "p.l02", "p.l03", "p.l05", "p.l06", "p.l07"

), c("pdate.", "p.d17m", "p.d17", "p.d160", "p.d161", "p.d162", 

"p.d163", "p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.c47", 

"p.c48", "p.c49", "p.c50", "p.l01", "p.l01r", "p.l02", "p.l03", 

"p.l05", "p.l06", "p.l07"), c("plingu.", "pdate.", "p.d17m", 

"p.d17", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165", 

"p.d166", "p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50", 

"p.c60", "p.c61", "p.c62", "p.c63", "p.c64", "p.c65"), c("hab_ch.", 

"plingu.", "pdate.", "p.d17m", "p.d17", "p.d160", "p.d161", "p.d162", 

"p.d163", "p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.c47", 

"p.c48", "p.c49", "p.c50", "p.c60", "p.c61", "p.c62", "p.c63", 

"p.c64"), c("hab_ch.", "plingu.", "pdate.", "p.d17m", "p.d17", 

"p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165", "p.d166", 

"p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50", "p.c60", 

"p.c61", "p.c62", "p.c63", "p.c64"), c("hab_ch.", "plingu.", 

"pdate.", "p.d17m", "p.d17", "p.d160", "p.d161", "p.d162", "p.d163", 

"p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.c100", "p.c101", 

"p.c102", "p.c103", "p.c47", "p.c48", "p.c49", "p.c50", "p.l01"

), c("hab_ch.", "plingu.", "pdate.", "p.d17m", "p.d17", "p.d110a", 

"p.d110b", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", 

"p.d165", "p.d166", "p.d167", "p.c44", "p.c47", "p.c48", "p.c49", 

"p.c50", "p.l01", "p.l01r", "p.l02"), c("hab_ch.", "plingu.", 

"pdate.", "p.d17m", "p.d17", "p.d110a", "p.d110b", "p.d160", 

"p.d161", "p.d162", "p.d163", "p.d164", "p.d165", "p.d166", "p.d167", 

"p.c44", "p.c47", "p.c48", "p.c49", "p.c50", "p.l01", "p.l01r", 

"p.l02"), c("hab_ch.", "plingu.", "pdate.", "p.d17m", "p.d17", 

"p.d110a", "p.d110b", "p.d160", "p.d161", "p.d162", "p.d163", 

"p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.c100", "p.c101", 

"p.c102", "p.c103", "p.c47", "p.c48", "p.c49"), c("hab_ch.", 

"plingu.", "pdate.", "p.d17m", "p.d17", "p.d110a", "p.d110b", 

"p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165", "p.d166", 

"p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50", "p.l01", 

"p.l01r", "p.l02"))



n <- 10  # row numbers



fun <- function(x) {

  # fun() produces a n x 22 data.frame

  set.seed(x)

  d <- data.frame(

    cbind(id=1:n, yr=x, 

          matrix(rnorm(n*(20)), n)))

}



tmp <- setNames(lapply(1:6, fun), paste0("df", 1:6))  # produce six data frames



list2env(tmp, globalenv())  # get data frames into globalenv



# insert extra columns into some df's

df3 <- cbind(id=df3[, 1], foo=sample(1e3:2e3, nrow(df3), replace=TRUE), df3[, -1])

df4 <- cbind(id=df4[, 1], foo=sample(1e3:2e3, nrow(df4), replace=TRUE), df4[, -1])

df5 <- cbind(id=df5[, 1], foo=sample(1e3:2e3, nrow(df5), replace=TRUE), df5[, -1])

df6 <- cbind(id=df6[, 1], foo=sample(1e3:2e3, nrow(df5), replace=TRUE), df6[, -1])



# giving var names:

names(df1)[3:22] <- nm[[1]][3:22]

names(df2)[3:22] <- nm[[2]][3:22]

names(df3)[4:23] <- nm[[3]][4:23]

names(df4)[4:23] <- nm[[4]][4:23]

names(df5)[4:23] <- nm[[5]][4:23]

names(df5)[4:23] <- nm[[6]][4:23]



# list of data.frames:

L <- list(df1, df2, df3, df4, df5, df6)



# list of data.tables:

library(data.table)

L.dt <- lapply(L, function(x) setkeyv(data.table(x), c("id", "yr")))

_{Also related: 1, 2}

edited Nov 24 '18 at 10:24

asked Nov 23 '18 at 14:10

jay.sf

6,76131841

Respond to the error about the duplicates: cols <- data.table(cols=unlist(lapply(L, function(X){colnames(X)}))); cols[, .N, by=cols][N>1,]. e.g. you have column p.l01 4 times. What do you want the output to be for this when there are 4 versions of these in the output? data.table handles some duplicates by putting .x and .y after, but it has limits. It's more robust to change the names of your data, going in to the merge, than alter them post-merge

– Jonny Phelps
Nov 23 '18 at 15:18

p.101 is always combined with year so there should expectedly be no problem. W/o this answer I'd expect the output to be p.101.x, p.101.y. p.101.something, but at least this is solved in the linked answer. "but it has limits" - that's exactly the point. In base R it's working - in data.table not (yet).

– jay.sf
Nov 23 '18 at 15:36

When I try Reduce(function(...) merge(..., all=TRUE), L) from your post, it removes p.l04. I wouldn't call that working, I'd call that a silent bug. the data.table error is at least helpful here

– Jonny Phelps
Nov 23 '18 at 15:39

p.l04 is not in the names, try any(sapply(L, function(x) "p.l04" %in% names(x)))

– jay.sf
Nov 23 '18 at 15:51

1

r u looking for Reduce(function(x, y) merge(x, y, all=TRUE, by=intersect(names(x), names(y))), L.dt) ?

– chinsoon12
Nov 26 '18 at 1:01

|
show 3 more comments

I use this code, to achieve a full join avoid time-split columns with merge.data.table().

So, in base R

Reduce(function(...) merge(..., all=TRUE), L)

runs fine, whereas in data.table

Reduce(function(...)

  merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt, names))), L.dt)

trows the error:

  Error in merge.data.table(..., all = TRUE, by = Reduce(intersect, lapply(L.dt,  : 

  x has some duplicated column name(s): p.l01.x,p.l02.x,p.l03.x,p.l05.x,p.l06.x,p.l07.x,

p.l08.x,p.l10.x,p.l11.x,p.l12.x,

p.l01.y,p.l02.y,p.l03.y,p.l05.y,p.l06.y,p.l07.y,

p.l08.y,p.l10.y,p.l11.y,p.l12.y. Please remove 

or rename the duplicate(s) and try again.

In addition: Warning message:

In merge.data.table(..., all = TRUE, by = Reduce(intersect, lapply(L.dt,  :



 Error in merge.data.table(..., all = TRUE, by = Reduce(intersect, lapply(L.dt,  : 

  x has some duplicated column name(s): p.l01.x,p.l02.x,p.l03.x,p.l05.x,p.l06.x,p.l07.x,

p.l08.x,p.l10.x,p.l11.x,p.l12.x,

p.l01.y,p.l02.y,p.l03.y,p.l05.y,p.l06.y,p.l07.y,p.l08.y,p.l10.y,p.l11.y,p.l12.y. Please remove

or rename the duplicate(s) and try again.

Note that the issue appears to be cumulative somehow...

While

Reduce(function(...)

  merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:2], names))), L.dt[1:2])



Reduce(function(...)

  merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:3], names))), L.dt[1:3])



Reduce(function(...)

  merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:4], names))), L.dt[1:4])

still runs fine,

Reduce(function(...)

  merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:5], names))), L.dt[1:5])

throws a warning, and from here

Reduce(function(...)

  merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:6], names))), L.dt[1:6])

finally the error starts.

Thus, merge.data.table() appears to be working fine only until the number of data.tables exceeds 3?? Any ideas how I could solve this issue?

Data and code to reproduce

# names vector mimicking my data   

nm <- list(c("p.d17m", "p.d17", "p.d29", "p.d31", "p.n03", "p.n04", 

"p.n05", "p.d36", "p.d40", "p.d41", "p.d45", "p.d46", "p.d50", 

"p.d51", "p.d55", "p.d56", "p.d60", "p.d61", "p.d65", "p.d66", 

"p.d70", "p.d71", "p.n08"), c("p.d17m", "p.d17", "p.c44", "p.l01", 

"p.l02", "p.l03", "p.l05", "p.l06", "p.l07", "p.l08", "p.l10", 

"p.l11", "p.l12", "p.l13", "p.l15", "p.l16", "p.l17", "p.l18", 

"p.l20", "p.l21", "p.l22", "p.l23", "p.l25"), c("p.d17m", "p.d17", 

"p.c44", "p.l01", "p.l02", "p.l03", "p.l05", "p.l06", "p.l07", 

"p.l08", "p.l10", "p.l11", "p.l12", "p.l13", "p.l15", "p.l16", 

"p.l17", "p.l18", "p.l20", "p.l21", "p.l22", "p.l23", "p.l25"

), c("p.d17m", "p.d17", "p.c44", "p.l01", "p.l01r", "p.l02", 

"p.l03", "p.l05", "p.l06", "p.l07", "p.l08", "p.l10", "p.l11", 

"p.l12", "p.l13", "p.l15", "p.l16", "p.l17", "p.l18", "p.l20", 

"p.l21", "p.l22", "p.l23"), c("p.d17m", "p.d17", "p.c44", "p.l01", 

"p.l01r", "p.l02", "p.l03", "p.l05", "p.l06", "p.l07", "p.l08", 

"p.l10", "p.l11", "p.l12", "p.l13", "p.l15", "p.l16", "p.l17", 

"p.l18", "p.l20", "p.l21", "p.l22", "p.l23"), c("pdate.", "p.d17m", 

"p.d17", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165", 

"p.d166", "p.d167", "p.c44", "p.l01", "p.l01r", "p.l02", "p.l03", 

"p.l05", "p.l06", "p.l07", "p.l08", "p.l10", "p.l11", "p.l12"

), c("pdate.", "p.d17m", "p.d17", "p.d160", "p.d161", "p.d162", 

"p.d163", "p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.l01", 

"p.l01r", "p.l02", "p.l03", "p.l05", "p.l06", "p.l07", "p.l08", 

"p.l10", "p.l11", "p.l12"), c("pdate.", "p.d200", "p.d201", "p.d17m", 

"p.d17", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165", 

"p.d166", "p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50", 

"p.l01", "p.l01r", "p.l02", "p.l03", "p.l05"), c("pdate.", "p.d17m", 

"p.d17", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165", 

"p.d166", "p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50", 

"p.l01", "p.l01r", "p.l02", "p.l03", "p.l05", "p.l06", "p.l07"

), c("pdate.", "p.d17m", "p.d17", "p.d160", "p.d161", "p.d162", 

"p.d163", "p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.c47", 

"p.c48", "p.c49", "p.c50", "p.l01", "p.l01r", "p.l02", "p.l03", 

"p.l05", "p.l06", "p.l07"), c("plingu.", "pdate.", "p.d17m", 

"p.d17", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165", 

"p.d166", "p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50", 

"p.c60", "p.c61", "p.c62", "p.c63", "p.c64", "p.c65"), c("hab_ch.", 

"plingu.", "pdate.", "p.d17m", "p.d17", "p.d160", "p.d161", "p.d162", 

"p.d163", "p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.c47", 

"p.c48", "p.c49", "p.c50", "p.c60", "p.c61", "p.c62", "p.c63", 

"p.c64"), c("hab_ch.", "plingu.", "pdate.", "p.d17m", "p.d17", 

"p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165", "p.d166", 

"p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50", "p.c60", 

"p.c61", "p.c62", "p.c63", "p.c64"), c("hab_ch.", "plingu.", 

"pdate.", "p.d17m", "p.d17", "p.d160", "p.d161", "p.d162", "p.d163", 

"p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.c100", "p.c101", 

"p.c102", "p.c103", "p.c47", "p.c48", "p.c49", "p.c50", "p.l01"

), c("hab_ch.", "plingu.", "pdate.", "p.d17m", "p.d17", "p.d110a", 

"p.d110b", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", 

"p.d165", "p.d166", "p.d167", "p.c44", "p.c47", "p.c48", "p.c49", 

"p.c50", "p.l01", "p.l01r", "p.l02"), c("hab_ch.", "plingu.", 

"pdate.", "p.d17m", "p.d17", "p.d110a", "p.d110b", "p.d160", 

"p.d161", "p.d162", "p.d163", "p.d164", "p.d165", "p.d166", "p.d167", 

"p.c44", "p.c47", "p.c48", "p.c49", "p.c50", "p.l01", "p.l01r", 

"p.l02"), c("hab_ch.", "plingu.", "pdate.", "p.d17m", "p.d17", 

"p.d110a", "p.d110b", "p.d160", "p.d161", "p.d162", "p.d163", 

"p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.c100", "p.c101", 

"p.c102", "p.c103", "p.c47", "p.c48", "p.c49"), c("hab_ch.", 

"plingu.", "pdate.", "p.d17m", "p.d17", "p.d110a", "p.d110b", 

"p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165", "p.d166", 

"p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50", "p.l01", 

"p.l01r", "p.l02"))



n <- 10  # row numbers



fun <- function(x) {

  # fun() produces a n x 22 data.frame

  set.seed(x)

  d <- data.frame(

    cbind(id=1:n, yr=x, 

          matrix(rnorm(n*(20)), n)))

}



tmp <- setNames(lapply(1:6, fun), paste0("df", 1:6))  # produce six data frames



list2env(tmp, globalenv())  # get data frames into globalenv



# insert extra columns into some df's

df3 <- cbind(id=df3[, 1], foo=sample(1e3:2e3, nrow(df3), replace=TRUE), df3[, -1])

df4 <- cbind(id=df4[, 1], foo=sample(1e3:2e3, nrow(df4), replace=TRUE), df4[, -1])

df5 <- cbind(id=df5[, 1], foo=sample(1e3:2e3, nrow(df5), replace=TRUE), df5[, -1])

df6 <- cbind(id=df6[, 1], foo=sample(1e3:2e3, nrow(df5), replace=TRUE), df6[, -1])



# giving var names:

names(df1)[3:22] <- nm[[1]][3:22]

names(df2)[3:22] <- nm[[2]][3:22]

names(df3)[4:23] <- nm[[3]][4:23]

names(df4)[4:23] <- nm[[4]][4:23]

names(df5)[4:23] <- nm[[5]][4:23]

names(df5)[4:23] <- nm[[6]][4:23]



# list of data.frames:

L <- list(df1, df2, df3, df4, df5, df6)



# list of data.tables:

library(data.table)

L.dt <- lapply(L, function(x) setkeyv(data.table(x), c("id", "yr")))

_{Also related: 1, 2}

edited Nov 24 '18 at 10:24

asked Nov 23 '18 at 14:10

jay.sf

6,76131841

Respond to the error about the duplicates: cols <- data.table(cols=unlist(lapply(L, function(X){colnames(X)}))); cols[, .N, by=cols][N>1,]. e.g. you have column p.l01 4 times. What do you want the output to be for this when there are 4 versions of these in the output? data.table handles some duplicates by putting .x and .y after, but it has limits. It's more robust to change the names of your data, going in to the merge, than alter them post-merge

– Jonny Phelps
Nov 23 '18 at 15:18

p.101 is always combined with year so there should expectedly be no problem. W/o this answer I'd expect the output to be p.101.x, p.101.y. p.101.something, but at least this is solved in the linked answer. "but it has limits" - that's exactly the point. In base R it's working - in data.table not (yet).

– jay.sf
Nov 23 '18 at 15:36

When I try Reduce(function(...) merge(..., all=TRUE), L) from your post, it removes p.l04. I wouldn't call that working, I'd call that a silent bug. the data.table error is at least helpful here

– Jonny Phelps
Nov 23 '18 at 15:39

p.l04 is not in the names, try any(sapply(L, function(x) "p.l04" %in% names(x)))

– jay.sf
Nov 23 '18 at 15:51

1

r u looking for Reduce(function(x, y) merge(x, y, all=TRUE, by=intersect(names(x), names(y))), L.dt) ?

– chinsoon12
Nov 26 '18 at 1:01

|
show 3 more comments

I use this code, to achieve a full join avoid time-split columns with merge.data.table().

So, in base R

Reduce(function(...) merge(..., all=TRUE), L)

runs fine, whereas in data.table

Reduce(function(...)

  merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt, names))), L.dt)

trows the error:

  Error in merge.data.table(..., all = TRUE, by = Reduce(intersect, lapply(L.dt,  : 

  x has some duplicated column name(s): p.l01.x,p.l02.x,p.l03.x,p.l05.x,p.l06.x,p.l07.x,

p.l08.x,p.l10.x,p.l11.x,p.l12.x,

p.l01.y,p.l02.y,p.l03.y,p.l05.y,p.l06.y,p.l07.y,

p.l08.y,p.l10.y,p.l11.y,p.l12.y. Please remove 

or rename the duplicate(s) and try again.

In addition: Warning message:

In merge.data.table(..., all = TRUE, by = Reduce(intersect, lapply(L.dt,  :



 Error in merge.data.table(..., all = TRUE, by = Reduce(intersect, lapply(L.dt,  : 

  x has some duplicated column name(s): p.l01.x,p.l02.x,p.l03.x,p.l05.x,p.l06.x,p.l07.x,

p.l08.x,p.l10.x,p.l11.x,p.l12.x,

p.l01.y,p.l02.y,p.l03.y,p.l05.y,p.l06.y,p.l07.y,p.l08.y,p.l10.y,p.l11.y,p.l12.y. Please remove

or rename the duplicate(s) and try again.

Note that the issue appears to be cumulative somehow...

While

Reduce(function(...)

  merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:2], names))), L.dt[1:2])



Reduce(function(...)

  merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:3], names))), L.dt[1:3])



Reduce(function(...)

  merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:4], names))), L.dt[1:4])

still runs fine,

Reduce(function(...)

  merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:5], names))), L.dt[1:5])

throws a warning, and from here

Reduce(function(...)

  merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:6], names))), L.dt[1:6])

finally the error starts.

Thus, merge.data.table() appears to be working fine only until the number of data.tables exceeds 3?? Any ideas how I could solve this issue?

Data and code to reproduce

# names vector mimicking my data   

nm <- list(c("p.d17m", "p.d17", "p.d29", "p.d31", "p.n03", "p.n04", 

"p.n05", "p.d36", "p.d40", "p.d41", "p.d45", "p.d46", "p.d50", 

"p.d51", "p.d55", "p.d56", "p.d60", "p.d61", "p.d65", "p.d66", 

"p.d70", "p.d71", "p.n08"), c("p.d17m", "p.d17", "p.c44", "p.l01", 

"p.l02", "p.l03", "p.l05", "p.l06", "p.l07", "p.l08", "p.l10", 

"p.l11", "p.l12", "p.l13", "p.l15", "p.l16", "p.l17", "p.l18", 

"p.l20", "p.l21", "p.l22", "p.l23", "p.l25"), c("p.d17m", "p.d17", 

"p.c44", "p.l01", "p.l02", "p.l03", "p.l05", "p.l06", "p.l07", 

"p.l08", "p.l10", "p.l11", "p.l12", "p.l13", "p.l15", "p.l16", 

"p.l17", "p.l18", "p.l20", "p.l21", "p.l22", "p.l23", "p.l25"

), c("p.d17m", "p.d17", "p.c44", "p.l01", "p.l01r", "p.l02", 

"p.l03", "p.l05", "p.l06", "p.l07", "p.l08", "p.l10", "p.l11", 

"p.l12", "p.l13", "p.l15", "p.l16", "p.l17", "p.l18", "p.l20", 

"p.l21", "p.l22", "p.l23"), c("p.d17m", "p.d17", "p.c44", "p.l01", 

"p.l01r", "p.l02", "p.l03", "p.l05", "p.l06", "p.l07", "p.l08", 

"p.l10", "p.l11", "p.l12", "p.l13", "p.l15", "p.l16", "p.l17", 

"p.l18", "p.l20", "p.l21", "p.l22", "p.l23"), c("pdate.", "p.d17m", 

"p.d17", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165", 

"p.d166", "p.d167", "p.c44", "p.l01", "p.l01r", "p.l02", "p.l03", 

"p.l05", "p.l06", "p.l07", "p.l08", "p.l10", "p.l11", "p.l12"

), c("pdate.", "p.d17m", "p.d17", "p.d160", "p.d161", "p.d162", 

"p.d163", "p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.l01", 

"p.l01r", "p.l02", "p.l03", "p.l05", "p.l06", "p.l07", "p.l08", 

"p.l10", "p.l11", "p.l12"), c("pdate.", "p.d200", "p.d201", "p.d17m", 

"p.d17", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165", 

"p.d166", "p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50", 

"p.l01", "p.l01r", "p.l02", "p.l03", "p.l05"), c("pdate.", "p.d17m", 

"p.d17", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165", 

"p.d166", "p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50", 

"p.l01", "p.l01r", "p.l02", "p.l03", "p.l05", "p.l06", "p.l07"

), c("pdate.", "p.d17m", "p.d17", "p.d160", "p.d161", "p.d162", 

"p.d163", "p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.c47", 

"p.c48", "p.c49", "p.c50", "p.l01", "p.l01r", "p.l02", "p.l03", 

"p.l05", "p.l06", "p.l07"), c("plingu.", "pdate.", "p.d17m", 

"p.d17", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165", 

"p.d166", "p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50", 

"p.c60", "p.c61", "p.c62", "p.c63", "p.c64", "p.c65"), c("hab_ch.", 

"plingu.", "pdate.", "p.d17m", "p.d17", "p.d160", "p.d161", "p.d162", 

"p.d163", "p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.c47", 

"p.c48", "p.c49", "p.c50", "p.c60", "p.c61", "p.c62", "p.c63", 

"p.c64"), c("hab_ch.", "plingu.", "pdate.", "p.d17m", "p.d17", 

"p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165", "p.d166", 

"p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50", "p.c60", 

"p.c61", "p.c62", "p.c63", "p.c64"), c("hab_ch.", "plingu.", 

"pdate.", "p.d17m", "p.d17", "p.d160", "p.d161", "p.d162", "p.d163", 

"p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.c100", "p.c101", 

"p.c102", "p.c103", "p.c47", "p.c48", "p.c49", "p.c50", "p.l01"

), c("hab_ch.", "plingu.", "pdate.", "p.d17m", "p.d17", "p.d110a", 

"p.d110b", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", 

"p.d165", "p.d166", "p.d167", "p.c44", "p.c47", "p.c48", "p.c49", 

"p.c50", "p.l01", "p.l01r", "p.l02"), c("hab_ch.", "plingu.", 

"pdate.", "p.d17m", "p.d17", "p.d110a", "p.d110b", "p.d160", 

"p.d161", "p.d162", "p.d163", "p.d164", "p.d165", "p.d166", "p.d167", 

"p.c44", "p.c47", "p.c48", "p.c49", "p.c50", "p.l01", "p.l01r", 

"p.l02"), c("hab_ch.", "plingu.", "pdate.", "p.d17m", "p.d17", 

"p.d110a", "p.d110b", "p.d160", "p.d161", "p.d162", "p.d163", 

"p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.c100", "p.c101", 

"p.c102", "p.c103", "p.c47", "p.c48", "p.c49"), c("hab_ch.", 

"plingu.", "pdate.", "p.d17m", "p.d17", "p.d110a", "p.d110b", 

"p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165", "p.d166", 

"p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50", "p.l01", 

"p.l01r", "p.l02"))



n <- 10  # row numbers



fun <- function(x) {

  # fun() produces a n x 22 data.frame

  set.seed(x)

  d <- data.frame(

    cbind(id=1:n, yr=x, 

          matrix(rnorm(n*(20)), n)))

}



tmp <- setNames(lapply(1:6, fun), paste0("df", 1:6))  # produce six data frames



list2env(tmp, globalenv())  # get data frames into globalenv



# insert extra columns into some df's

df3 <- cbind(id=df3[, 1], foo=sample(1e3:2e3, nrow(df3), replace=TRUE), df3[, -1])

df4 <- cbind(id=df4[, 1], foo=sample(1e3:2e3, nrow(df4), replace=TRUE), df4[, -1])

df5 <- cbind(id=df5[, 1], foo=sample(1e3:2e3, nrow(df5), replace=TRUE), df5[, -1])

df6 <- cbind(id=df6[, 1], foo=sample(1e3:2e3, nrow(df5), replace=TRUE), df6[, -1])



# giving var names:

names(df1)[3:22] <- nm[[1]][3:22]

names(df2)[3:22] <- nm[[2]][3:22]

names(df3)[4:23] <- nm[[3]][4:23]

names(df4)[4:23] <- nm[[4]][4:23]

names(df5)[4:23] <- nm[[5]][4:23]

names(df5)[4:23] <- nm[[6]][4:23]



# list of data.frames:

L <- list(df1, df2, df3, df4, df5, df6)



# list of data.tables:

library(data.table)

L.dt <- lapply(L, function(x) setkeyv(data.table(x), c("id", "yr")))

_{Also related: 1, 2}

edited Nov 24 '18 at 10:24

asked Nov 23 '18 at 14:10

jay.sf

6,76131841

I use this code, to achieve a full join avoid time-split columns with merge.data.table().

So, in base R

Reduce(function(...) merge(..., all=TRUE), L)

runs fine, whereas in data.table

Reduce(function(...)

  merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt, names))), L.dt)

trows the error:

  Error in merge.data.table(..., all = TRUE, by = Reduce(intersect, lapply(L.dt,  : 

  x has some duplicated column name(s): p.l01.x,p.l02.x,p.l03.x,p.l05.x,p.l06.x,p.l07.x,

p.l08.x,p.l10.x,p.l11.x,p.l12.x,

p.l01.y,p.l02.y,p.l03.y,p.l05.y,p.l06.y,p.l07.y,

p.l08.y,p.l10.y,p.l11.y,p.l12.y. Please remove 

or rename the duplicate(s) and try again.

In addition: Warning message:

In merge.data.table(..., all = TRUE, by = Reduce(intersect, lapply(L.dt,  :



 Error in merge.data.table(..., all = TRUE, by = Reduce(intersect, lapply(L.dt,  : 

  x has some duplicated column name(s): p.l01.x,p.l02.x,p.l03.x,p.l05.x,p.l06.x,p.l07.x,

p.l08.x,p.l10.x,p.l11.x,p.l12.x,

p.l01.y,p.l02.y,p.l03.y,p.l05.y,p.l06.y,p.l07.y,p.l08.y,p.l10.y,p.l11.y,p.l12.y. Please remove

or rename the duplicate(s) and try again.

Note that the issue appears to be cumulative somehow...

While

Reduce(function(...)

  merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:2], names))), L.dt[1:2])



Reduce(function(...)

  merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:3], names))), L.dt[1:3])



Reduce(function(...)

  merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:4], names))), L.dt[1:4])

still runs fine,

Reduce(function(...)

  merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:5], names))), L.dt[1:5])

throws a warning, and from here

Reduce(function(...)

  merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:6], names))), L.dt[1:6])

finally the error starts.

Thus, merge.data.table() appears to be working fine only until the number of data.tables exceeds 3?? Any ideas how I could solve this issue?

Data and code to reproduce

# names vector mimicking my data   

nm <- list(c("p.d17m", "p.d17", "p.d29", "p.d31", "p.n03", "p.n04", 

"p.n05", "p.d36", "p.d40", "p.d41", "p.d45", "p.d46", "p.d50", 

"p.d51", "p.d55", "p.d56", "p.d60", "p.d61", "p.d65", "p.d66", 

"p.d70", "p.d71", "p.n08"), c("p.d17m", "p.d17", "p.c44", "p.l01", 

"p.l02", "p.l03", "p.l05", "p.l06", "p.l07", "p.l08", "p.l10", 

"p.l11", "p.l12", "p.l13", "p.l15", "p.l16", "p.l17", "p.l18", 

"p.l20", "p.l21", "p.l22", "p.l23", "p.l25"), c("p.d17m", "p.d17", 

"p.c44", "p.l01", "p.l02", "p.l03", "p.l05", "p.l06", "p.l07", 

"p.l08", "p.l10", "p.l11", "p.l12", "p.l13", "p.l15", "p.l16", 

"p.l17", "p.l18", "p.l20", "p.l21", "p.l22", "p.l23", "p.l25"

), c("p.d17m", "p.d17", "p.c44", "p.l01", "p.l01r", "p.l02", 

"p.l03", "p.l05", "p.l06", "p.l07", "p.l08", "p.l10", "p.l11", 

"p.l12", "p.l13", "p.l15", "p.l16", "p.l17", "p.l18", "p.l20", 

"p.l21", "p.l22", "p.l23"), c("p.d17m", "p.d17", "p.c44", "p.l01", 

"p.l01r", "p.l02", "p.l03", "p.l05", "p.l06", "p.l07", "p.l08", 

"p.l10", "p.l11", "p.l12", "p.l13", "p.l15", "p.l16", "p.l17", 

"p.l18", "p.l20", "p.l21", "p.l22", "p.l23"), c("pdate.", "p.d17m", 

"p.d17", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165", 

"p.d166", "p.d167", "p.c44", "p.l01", "p.l01r", "p.l02", "p.l03", 

"p.l05", "p.l06", "p.l07", "p.l08", "p.l10", "p.l11", "p.l12"

), c("pdate.", "p.d17m", "p.d17", "p.d160", "p.d161", "p.d162", 

"p.d163", "p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.l01", 

"p.l01r", "p.l02", "p.l03", "p.l05", "p.l06", "p.l07", "p.l08", 

"p.l10", "p.l11", "p.l12"), c("pdate.", "p.d200", "p.d201", "p.d17m", 

"p.d17", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165", 

"p.d166", "p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50", 

"p.l01", "p.l01r", "p.l02", "p.l03", "p.l05"), c("pdate.", "p.d17m", 

"p.d17", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165", 

"p.d166", "p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50", 

"p.l01", "p.l01r", "p.l02", "p.l03", "p.l05", "p.l06", "p.l07"

), c("pdate.", "p.d17m", "p.d17", "p.d160", "p.d161", "p.d162", 

"p.d163", "p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.c47", 

"p.c48", "p.c49", "p.c50", "p.l01", "p.l01r", "p.l02", "p.l03", 

"p.l05", "p.l06", "p.l07"), c("plingu.", "pdate.", "p.d17m", 

"p.d17", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165", 

"p.d166", "p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50", 

"p.c60", "p.c61", "p.c62", "p.c63", "p.c64", "p.c65"), c("hab_ch.", 

"plingu.", "pdate.", "p.d17m", "p.d17", "p.d160", "p.d161", "p.d162", 

"p.d163", "p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.c47", 

"p.c48", "p.c49", "p.c50", "p.c60", "p.c61", "p.c62", "p.c63", 

"p.c64"), c("hab_ch.", "plingu.", "pdate.", "p.d17m", "p.d17", 

"p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165", "p.d166", 

"p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50", "p.c60", 

"p.c61", "p.c62", "p.c63", "p.c64"), c("hab_ch.", "plingu.", 

"pdate.", "p.d17m", "p.d17", "p.d160", "p.d161", "p.d162", "p.d163", 

"p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.c100", "p.c101", 

"p.c102", "p.c103", "p.c47", "p.c48", "p.c49", "p.c50", "p.l01"

), c("hab_ch.", "plingu.", "pdate.", "p.d17m", "p.d17", "p.d110a", 

"p.d110b", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", 

"p.d165", "p.d166", "p.d167", "p.c44", "p.c47", "p.c48", "p.c49", 

"p.c50", "p.l01", "p.l01r", "p.l02"), c("hab_ch.", "plingu.", 

"pdate.", "p.d17m", "p.d17", "p.d110a", "p.d110b", "p.d160", 

"p.d161", "p.d162", "p.d163", "p.d164", "p.d165", "p.d166", "p.d167", 

"p.c44", "p.c47", "p.c48", "p.c49", "p.c50", "p.l01", "p.l01r", 

"p.l02"), c("hab_ch.", "plingu.", "pdate.", "p.d17m", "p.d17", 

"p.d110a", "p.d110b", "p.d160", "p.d161", "p.d162", "p.d163", 

"p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.c100", "p.c101", 

"p.c102", "p.c103", "p.c47", "p.c48", "p.c49"), c("hab_ch.", 

"plingu.", "pdate.", "p.d17m", "p.d17", "p.d110a", "p.d110b", 

"p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165", "p.d166", 

"p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50", "p.l01", 

"p.l01r", "p.l02"))



n <- 10  # row numbers



fun <- function(x) {

  # fun() produces a n x 22 data.frame

  set.seed(x)

  d <- data.frame(

    cbind(id=1:n, yr=x, 

          matrix(rnorm(n*(20)), n)))

}



tmp <- setNames(lapply(1:6, fun), paste0("df", 1:6))  # produce six data frames



list2env(tmp, globalenv())  # get data frames into globalenv



# insert extra columns into some df's

df3 <- cbind(id=df3[, 1], foo=sample(1e3:2e3, nrow(df3), replace=TRUE), df3[, -1])

df4 <- cbind(id=df4[, 1], foo=sample(1e3:2e3, nrow(df4), replace=TRUE), df4[, -1])

df5 <- cbind(id=df5[, 1], foo=sample(1e3:2e3, nrow(df5), replace=TRUE), df5[, -1])

df6 <- cbind(id=df6[, 1], foo=sample(1e3:2e3, nrow(df5), replace=TRUE), df6[, -1])



# giving var names:

names(df1)[3:22] <- nm[[1]][3:22]

names(df2)[3:22] <- nm[[2]][3:22]

names(df3)[4:23] <- nm[[3]][4:23]

names(df4)[4:23] <- nm[[4]][4:23]

names(df5)[4:23] <- nm[[5]][4:23]

names(df5)[4:23] <- nm[[6]][4:23]



# list of data.frames:

L <- list(df1, df2, df3, df4, df5, df6)



# list of data.tables:

library(data.table)

L.dt <- lapply(L, function(x) setkeyv(data.table(x), c("id", "yr")))

_{Also related: 1, 2}

r merge data.table

edited Nov 24 '18 at 10:24

asked Nov 23 '18 at 14:10

jay.sf

6,76131841

edited Nov 24 '18 at 10:24

asked Nov 23 '18 at 14:10

jay.sf

6,76131841

edited Nov 24 '18 at 10:24

asked Nov 23 '18 at 14:10

jay.sf

6,76131841

asked Nov 23 '18 at 14:10

jay.sf

6,76131841

asked Nov 23 '18 at 14:10

jay.sf

6,76131841

Respond to the error about the duplicates: cols <- data.table(cols=unlist(lapply(L, function(X){colnames(X)}))); cols[, .N, by=cols][N>1,]. e.g. you have column p.l01 4 times. What do you want the output to be for this when there are 4 versions of these in the output? data.table handles some duplicates by putting .x and .y after, but it has limits. It's more robust to change the names of your data, going in to the merge, than alter them post-merge

– Jonny Phelps
Nov 23 '18 at 15:18

p.101 is always combined with year so there should expectedly be no problem. W/o this answer I'd expect the output to be p.101.x, p.101.y. p.101.something, but at least this is solved in the linked answer. "but it has limits" - that's exactly the point. In base R it's working - in data.table not (yet).

– jay.sf
Nov 23 '18 at 15:36

When I try Reduce(function(...) merge(..., all=TRUE), L) from your post, it removes p.l04. I wouldn't call that working, I'd call that a silent bug. the data.table error is at least helpful here

– Jonny Phelps
Nov 23 '18 at 15:39

p.l04 is not in the names, try any(sapply(L, function(x) "p.l04" %in% names(x)))

– jay.sf
Nov 23 '18 at 15:51

1

r u looking for Reduce(function(x, y) merge(x, y, all=TRUE, by=intersect(names(x), names(y))), L.dt) ?

– chinsoon12
Nov 26 '18 at 1:01

|
show 3 more comments

Respond to the error about the duplicates: cols <- data.table(cols=unlist(lapply(L, function(X){colnames(X)}))); cols[, .N, by=cols][N>1,]. e.g. you have column p.l01 4 times. What do you want the output to be for this when there are 4 versions of these in the output? data.table handles some duplicates by putting .x and .y after, but it has limits. It's more robust to change the names of your data, going in to the merge, than alter them post-merge

– Jonny Phelps
Nov 23 '18 at 15:18

p.101 is always combined with year so there should expectedly be no problem. W/o this answer I'd expect the output to be p.101.x, p.101.y. p.101.something, but at least this is solved in the linked answer. "but it has limits" - that's exactly the point. In base R it's working - in data.table not (yet).

– jay.sf
Nov 23 '18 at 15:36

When I try Reduce(function(...) merge(..., all=TRUE), L) from your post, it removes p.l04. I wouldn't call that working, I'd call that a silent bug. the data.table error is at least helpful here

– Jonny Phelps
Nov 23 '18 at 15:39

p.l04 is not in the names, try any(sapply(L, function(x) "p.l04" %in% names(x)))

– jay.sf
Nov 23 '18 at 15:51

1

r u looking for Reduce(function(x, y) merge(x, y, all=TRUE, by=intersect(names(x), names(y))), L.dt) ?

– chinsoon12
Nov 26 '18 at 1:01

Respond to the error about the duplicates: cols <- data.table(cols=unlist(lapply(L, function(X){colnames(X)}))); cols[, .N, by=cols][N>1,]. e.g. you have column p.l01 4 times. What do you want the output to be for this when there are 4 versions of these in the output? data.table handles some duplicates by putting .x and .y after, but it has limits. It's more robust to change the names of your data, going in to the merge, than alter them post-merge

– Jonny Phelps
Nov 23 '18 at 15:18

p.101 is always combined with year so there should expectedly be no problem. W/o this answer I'd expect the output to be p.101.x, p.101.y. p.101.something, but at least this is solved in the linked answer. "but it has limits" - that's exactly the point. In base R it's working - in data.table not (yet).

– jay.sf
Nov 23 '18 at 15:36

When I try Reduce(function(...) merge(..., all=TRUE), L) from your post, it removes p.l04. I wouldn't call that working, I'd call that a silent bug. the data.table error is at least helpful here

– Jonny Phelps
Nov 23 '18 at 15:39

p.l04 is not in the names, try any(sapply(L, function(x) "p.l04" %in% names(x)))

– jay.sf
Nov 23 '18 at 15:51

r u looking for Reduce(function(x, y) merge(x, y, all=TRUE, by=intersect(names(x), names(y))), L.dt) ?

– chinsoon12
Nov 26 '18 at 1:01

|
show 3 more comments

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53448228%2fhow-to-merge-full-join-a-list-of-more-than-three-data-tables-correctly%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

aObmdp63,83dwoyLUUkDW6ZSJ9AFjZJXiBt118 Q hwhigoOBLug FZEL,s,BqpSP024wN

搜尋此網誌

Argthtjtr

How to merge (full join) a list of more than three data.tables correctly?

Data and code to reproduce

Data and code to reproduce

Data and code to reproduce

Data and code to reproduce

0

Your Answer

Post as a guest

0

0

Post as a guest

Popular posts from this blog

"Incorrect syntax near the keyword 'ON'. (on update cascade, on delete cascade,)

Alcedinidae

7 octobre