How to merge (full join) a list of more than three data.tables correctly?





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







0















When using merge.data.table() I get ugly error messages about duplicated columns names—strangely enough only if there are more than three data.tables. On the contrary standard merge() with data.frame versions works fine.



I use this code, to achieve a full join avoid time-split columns with merge.data.table().



So, in base R



Reduce(function(...) merge(..., all=TRUE), L)


runs fine, whereas in data.table



Reduce(function(...)
merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt, names))), L.dt)


trows the error:



  Error in merge.data.table(..., all = TRUE, by = Reduce(intersect, lapply(L.dt,  : 
x has some duplicated column name(s): p.l01.x,p.l02.x,p.l03.x,p.l05.x,p.l06.x,p.l07.x,
p.l08.x,p.l10.x,p.l11.x,p.l12.x,
p.l01.y,p.l02.y,p.l03.y,p.l05.y,p.l06.y,p.l07.y,
p.l08.y,p.l10.y,p.l11.y,p.l12.y. Please remove
or rename the duplicate(s) and try again.
In addition: Warning message:
In merge.data.table(..., all = TRUE, by = Reduce(intersect, lapply(L.dt, :

Error in merge.data.table(..., all = TRUE, by = Reduce(intersect, lapply(L.dt, :
x has some duplicated column name(s): p.l01.x,p.l02.x,p.l03.x,p.l05.x,p.l06.x,p.l07.x,
p.l08.x,p.l10.x,p.l11.x,p.l12.x,
p.l01.y,p.l02.y,p.l03.y,p.l05.y,p.l06.y,p.l07.y,p.l08.y,p.l10.y,p.l11.y,p.l12.y. Please remove
or rename the duplicate(s) and try again.


Note that the issue appears to be cumulative somehow...



While



Reduce(function(...)
merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:2], names))), L.dt[1:2])

Reduce(function(...)
merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:3], names))), L.dt[1:3])

Reduce(function(...)
merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:4], names))), L.dt[1:4])


still runs fine,



Reduce(function(...)
merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:5], names))), L.dt[1:5])


throws a warning, and from here



Reduce(function(...)
merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:6], names))), L.dt[1:6])


finally the error starts.



Thus, merge.data.table() appears to be working fine only until the number of data.tables exceeds 3?? Any ideas how I could solve this issue?



Note: With easy example data no error appears. So it could be caused by my data. But I have no idea why—standard merge() works fine! My real data has about dim=15.000*500, some variable names match throughout the list, others are individual. This MCVE seems to reproduce the issue quite well.



Data and code to reproduce



# names vector mimicking my data   
nm <- list(c("p.d17m", "p.d17", "p.d29", "p.d31", "p.n03", "p.n04",
"p.n05", "p.d36", "p.d40", "p.d41", "p.d45", "p.d46", "p.d50",
"p.d51", "p.d55", "p.d56", "p.d60", "p.d61", "p.d65", "p.d66",
"p.d70", "p.d71", "p.n08"), c("p.d17m", "p.d17", "p.c44", "p.l01",
"p.l02", "p.l03", "p.l05", "p.l06", "p.l07", "p.l08", "p.l10",
"p.l11", "p.l12", "p.l13", "p.l15", "p.l16", "p.l17", "p.l18",
"p.l20", "p.l21", "p.l22", "p.l23", "p.l25"), c("p.d17m", "p.d17",
"p.c44", "p.l01", "p.l02", "p.l03", "p.l05", "p.l06", "p.l07",
"p.l08", "p.l10", "p.l11", "p.l12", "p.l13", "p.l15", "p.l16",
"p.l17", "p.l18", "p.l20", "p.l21", "p.l22", "p.l23", "p.l25"
), c("p.d17m", "p.d17", "p.c44", "p.l01", "p.l01r", "p.l02",
"p.l03", "p.l05", "p.l06", "p.l07", "p.l08", "p.l10", "p.l11",
"p.l12", "p.l13", "p.l15", "p.l16", "p.l17", "p.l18", "p.l20",
"p.l21", "p.l22", "p.l23"), c("p.d17m", "p.d17", "p.c44", "p.l01",
"p.l01r", "p.l02", "p.l03", "p.l05", "p.l06", "p.l07", "p.l08",
"p.l10", "p.l11", "p.l12", "p.l13", "p.l15", "p.l16", "p.l17",
"p.l18", "p.l20", "p.l21", "p.l22", "p.l23"), c("pdate.", "p.d17m",
"p.d17", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165",
"p.d166", "p.d167", "p.c44", "p.l01", "p.l01r", "p.l02", "p.l03",
"p.l05", "p.l06", "p.l07", "p.l08", "p.l10", "p.l11", "p.l12"
), c("pdate.", "p.d17m", "p.d17", "p.d160", "p.d161", "p.d162",
"p.d163", "p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.l01",
"p.l01r", "p.l02", "p.l03", "p.l05", "p.l06", "p.l07", "p.l08",
"p.l10", "p.l11", "p.l12"), c("pdate.", "p.d200", "p.d201", "p.d17m",
"p.d17", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165",
"p.d166", "p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50",
"p.l01", "p.l01r", "p.l02", "p.l03", "p.l05"), c("pdate.", "p.d17m",
"p.d17", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165",
"p.d166", "p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50",
"p.l01", "p.l01r", "p.l02", "p.l03", "p.l05", "p.l06", "p.l07"
), c("pdate.", "p.d17m", "p.d17", "p.d160", "p.d161", "p.d162",
"p.d163", "p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.c47",
"p.c48", "p.c49", "p.c50", "p.l01", "p.l01r", "p.l02", "p.l03",
"p.l05", "p.l06", "p.l07"), c("plingu.", "pdate.", "p.d17m",
"p.d17", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165",
"p.d166", "p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50",
"p.c60", "p.c61", "p.c62", "p.c63", "p.c64", "p.c65"), c("hab_ch.",
"plingu.", "pdate.", "p.d17m", "p.d17", "p.d160", "p.d161", "p.d162",
"p.d163", "p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.c47",
"p.c48", "p.c49", "p.c50", "p.c60", "p.c61", "p.c62", "p.c63",
"p.c64"), c("hab_ch.", "plingu.", "pdate.", "p.d17m", "p.d17",
"p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165", "p.d166",
"p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50", "p.c60",
"p.c61", "p.c62", "p.c63", "p.c64"), c("hab_ch.", "plingu.",
"pdate.", "p.d17m", "p.d17", "p.d160", "p.d161", "p.d162", "p.d163",
"p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.c100", "p.c101",
"p.c102", "p.c103", "p.c47", "p.c48", "p.c49", "p.c50", "p.l01"
), c("hab_ch.", "plingu.", "pdate.", "p.d17m", "p.d17", "p.d110a",
"p.d110b", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164",
"p.d165", "p.d166", "p.d167", "p.c44", "p.c47", "p.c48", "p.c49",
"p.c50", "p.l01", "p.l01r", "p.l02"), c("hab_ch.", "plingu.",
"pdate.", "p.d17m", "p.d17", "p.d110a", "p.d110b", "p.d160",
"p.d161", "p.d162", "p.d163", "p.d164", "p.d165", "p.d166", "p.d167",
"p.c44", "p.c47", "p.c48", "p.c49", "p.c50", "p.l01", "p.l01r",
"p.l02"), c("hab_ch.", "plingu.", "pdate.", "p.d17m", "p.d17",
"p.d110a", "p.d110b", "p.d160", "p.d161", "p.d162", "p.d163",
"p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.c100", "p.c101",
"p.c102", "p.c103", "p.c47", "p.c48", "p.c49"), c("hab_ch.",
"plingu.", "pdate.", "p.d17m", "p.d17", "p.d110a", "p.d110b",
"p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165", "p.d166",
"p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50", "p.l01",
"p.l01r", "p.l02"))

n <- 10 # row numbers

fun <- function(x) {
# fun() produces a n x 22 data.frame
set.seed(x)
d <- data.frame(
cbind(id=1:n, yr=x,
matrix(rnorm(n*(20)), n)))
}

tmp <- setNames(lapply(1:6, fun), paste0("df", 1:6)) # produce six data frames

list2env(tmp, globalenv()) # get data frames into globalenv

# insert extra columns into some df's
df3 <- cbind(id=df3[, 1], foo=sample(1e3:2e3, nrow(df3), replace=TRUE), df3[, -1])
df4 <- cbind(id=df4[, 1], foo=sample(1e3:2e3, nrow(df4), replace=TRUE), df4[, -1])
df5 <- cbind(id=df5[, 1], foo=sample(1e3:2e3, nrow(df5), replace=TRUE), df5[, -1])
df6 <- cbind(id=df6[, 1], foo=sample(1e3:2e3, nrow(df5), replace=TRUE), df6[, -1])

# giving var names:
names(df1)[3:22] <- nm[[1]][3:22]
names(df2)[3:22] <- nm[[2]][3:22]
names(df3)[4:23] <- nm[[3]][4:23]
names(df4)[4:23] <- nm[[4]][4:23]
names(df5)[4:23] <- nm[[5]][4:23]
names(df5)[4:23] <- nm[[6]][4:23]

# list of data.frames:
L <- list(df1, df2, df3, df4, df5, df6)

# list of data.tables:
library(data.table)
L.dt <- lapply(L, function(x) setkeyv(data.table(x), c("id", "yr")))


Also related: 1, 2










share|improve this question

























  • Respond to the error about the duplicates: cols <- data.table(cols=unlist(lapply(L, function(X){colnames(X)}))); cols[, .N, by=cols][N>1,]. e.g. you have column p.l01 4 times. What do you want the output to be for this when there are 4 versions of these in the output? data.table handles some duplicates by putting .x and .y after, but it has limits. It's more robust to change the names of your data, going in to the merge, than alter them post-merge

    – Jonny Phelps
    Nov 23 '18 at 15:18













  • p.101 is always combined with year so there should expectedly be no problem. W/o this answer I'd expect the output to be p.101.x, p.101.y. p.101.something, but at least this is solved in the linked answer. "but it has limits" - that's exactly the point. In base R it's working - in data.table not (yet).

    – jay.sf
    Nov 23 '18 at 15:36











  • When I try Reduce(function(...) merge(..., all=TRUE), L) from your post, it removes p.l04. I wouldn't call that working, I'd call that a silent bug. the data.table error is at least helpful here

    – Jonny Phelps
    Nov 23 '18 at 15:39











  • p.l04 is not in the names, try any(sapply(L, function(x) "p.l04" %in% names(x)))

    – jay.sf
    Nov 23 '18 at 15:51








  • 1





    r u looking for Reduce(function(x, y) merge(x, y, all=TRUE, by=intersect(names(x), names(y))), L.dt) ?

    – chinsoon12
    Nov 26 '18 at 1:01




















0















When using merge.data.table() I get ugly error messages about duplicated columns names—strangely enough only if there are more than three data.tables. On the contrary standard merge() with data.frame versions works fine.



I use this code, to achieve a full join avoid time-split columns with merge.data.table().



So, in base R



Reduce(function(...) merge(..., all=TRUE), L)


runs fine, whereas in data.table



Reduce(function(...)
merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt, names))), L.dt)


trows the error:



  Error in merge.data.table(..., all = TRUE, by = Reduce(intersect, lapply(L.dt,  : 
x has some duplicated column name(s): p.l01.x,p.l02.x,p.l03.x,p.l05.x,p.l06.x,p.l07.x,
p.l08.x,p.l10.x,p.l11.x,p.l12.x,
p.l01.y,p.l02.y,p.l03.y,p.l05.y,p.l06.y,p.l07.y,
p.l08.y,p.l10.y,p.l11.y,p.l12.y. Please remove
or rename the duplicate(s) and try again.
In addition: Warning message:
In merge.data.table(..., all = TRUE, by = Reduce(intersect, lapply(L.dt, :

Error in merge.data.table(..., all = TRUE, by = Reduce(intersect, lapply(L.dt, :
x has some duplicated column name(s): p.l01.x,p.l02.x,p.l03.x,p.l05.x,p.l06.x,p.l07.x,
p.l08.x,p.l10.x,p.l11.x,p.l12.x,
p.l01.y,p.l02.y,p.l03.y,p.l05.y,p.l06.y,p.l07.y,p.l08.y,p.l10.y,p.l11.y,p.l12.y. Please remove
or rename the duplicate(s) and try again.


Note that the issue appears to be cumulative somehow...



While



Reduce(function(...)
merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:2], names))), L.dt[1:2])

Reduce(function(...)
merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:3], names))), L.dt[1:3])

Reduce(function(...)
merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:4], names))), L.dt[1:4])


still runs fine,



Reduce(function(...)
merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:5], names))), L.dt[1:5])


throws a warning, and from here



Reduce(function(...)
merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:6], names))), L.dt[1:6])


finally the error starts.



Thus, merge.data.table() appears to be working fine only until the number of data.tables exceeds 3?? Any ideas how I could solve this issue?



Note: With easy example data no error appears. So it could be caused by my data. But I have no idea why—standard merge() works fine! My real data has about dim=15.000*500, some variable names match throughout the list, others are individual. This MCVE seems to reproduce the issue quite well.



Data and code to reproduce



# names vector mimicking my data   
nm <- list(c("p.d17m", "p.d17", "p.d29", "p.d31", "p.n03", "p.n04",
"p.n05", "p.d36", "p.d40", "p.d41", "p.d45", "p.d46", "p.d50",
"p.d51", "p.d55", "p.d56", "p.d60", "p.d61", "p.d65", "p.d66",
"p.d70", "p.d71", "p.n08"), c("p.d17m", "p.d17", "p.c44", "p.l01",
"p.l02", "p.l03", "p.l05", "p.l06", "p.l07", "p.l08", "p.l10",
"p.l11", "p.l12", "p.l13", "p.l15", "p.l16", "p.l17", "p.l18",
"p.l20", "p.l21", "p.l22", "p.l23", "p.l25"), c("p.d17m", "p.d17",
"p.c44", "p.l01", "p.l02", "p.l03", "p.l05", "p.l06", "p.l07",
"p.l08", "p.l10", "p.l11", "p.l12", "p.l13", "p.l15", "p.l16",
"p.l17", "p.l18", "p.l20", "p.l21", "p.l22", "p.l23", "p.l25"
), c("p.d17m", "p.d17", "p.c44", "p.l01", "p.l01r", "p.l02",
"p.l03", "p.l05", "p.l06", "p.l07", "p.l08", "p.l10", "p.l11",
"p.l12", "p.l13", "p.l15", "p.l16", "p.l17", "p.l18", "p.l20",
"p.l21", "p.l22", "p.l23"), c("p.d17m", "p.d17", "p.c44", "p.l01",
"p.l01r", "p.l02", "p.l03", "p.l05", "p.l06", "p.l07", "p.l08",
"p.l10", "p.l11", "p.l12", "p.l13", "p.l15", "p.l16", "p.l17",
"p.l18", "p.l20", "p.l21", "p.l22", "p.l23"), c("pdate.", "p.d17m",
"p.d17", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165",
"p.d166", "p.d167", "p.c44", "p.l01", "p.l01r", "p.l02", "p.l03",
"p.l05", "p.l06", "p.l07", "p.l08", "p.l10", "p.l11", "p.l12"
), c("pdate.", "p.d17m", "p.d17", "p.d160", "p.d161", "p.d162",
"p.d163", "p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.l01",
"p.l01r", "p.l02", "p.l03", "p.l05", "p.l06", "p.l07", "p.l08",
"p.l10", "p.l11", "p.l12"), c("pdate.", "p.d200", "p.d201", "p.d17m",
"p.d17", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165",
"p.d166", "p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50",
"p.l01", "p.l01r", "p.l02", "p.l03", "p.l05"), c("pdate.", "p.d17m",
"p.d17", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165",
"p.d166", "p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50",
"p.l01", "p.l01r", "p.l02", "p.l03", "p.l05", "p.l06", "p.l07"
), c("pdate.", "p.d17m", "p.d17", "p.d160", "p.d161", "p.d162",
"p.d163", "p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.c47",
"p.c48", "p.c49", "p.c50", "p.l01", "p.l01r", "p.l02", "p.l03",
"p.l05", "p.l06", "p.l07"), c("plingu.", "pdate.", "p.d17m",
"p.d17", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165",
"p.d166", "p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50",
"p.c60", "p.c61", "p.c62", "p.c63", "p.c64", "p.c65"), c("hab_ch.",
"plingu.", "pdate.", "p.d17m", "p.d17", "p.d160", "p.d161", "p.d162",
"p.d163", "p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.c47",
"p.c48", "p.c49", "p.c50", "p.c60", "p.c61", "p.c62", "p.c63",
"p.c64"), c("hab_ch.", "plingu.", "pdate.", "p.d17m", "p.d17",
"p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165", "p.d166",
"p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50", "p.c60",
"p.c61", "p.c62", "p.c63", "p.c64"), c("hab_ch.", "plingu.",
"pdate.", "p.d17m", "p.d17", "p.d160", "p.d161", "p.d162", "p.d163",
"p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.c100", "p.c101",
"p.c102", "p.c103", "p.c47", "p.c48", "p.c49", "p.c50", "p.l01"
), c("hab_ch.", "plingu.", "pdate.", "p.d17m", "p.d17", "p.d110a",
"p.d110b", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164",
"p.d165", "p.d166", "p.d167", "p.c44", "p.c47", "p.c48", "p.c49",
"p.c50", "p.l01", "p.l01r", "p.l02"), c("hab_ch.", "plingu.",
"pdate.", "p.d17m", "p.d17", "p.d110a", "p.d110b", "p.d160",
"p.d161", "p.d162", "p.d163", "p.d164", "p.d165", "p.d166", "p.d167",
"p.c44", "p.c47", "p.c48", "p.c49", "p.c50", "p.l01", "p.l01r",
"p.l02"), c("hab_ch.", "plingu.", "pdate.", "p.d17m", "p.d17",
"p.d110a", "p.d110b", "p.d160", "p.d161", "p.d162", "p.d163",
"p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.c100", "p.c101",
"p.c102", "p.c103", "p.c47", "p.c48", "p.c49"), c("hab_ch.",
"plingu.", "pdate.", "p.d17m", "p.d17", "p.d110a", "p.d110b",
"p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165", "p.d166",
"p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50", "p.l01",
"p.l01r", "p.l02"))

n <- 10 # row numbers

fun <- function(x) {
# fun() produces a n x 22 data.frame
set.seed(x)
d <- data.frame(
cbind(id=1:n, yr=x,
matrix(rnorm(n*(20)), n)))
}

tmp <- setNames(lapply(1:6, fun), paste0("df", 1:6)) # produce six data frames

list2env(tmp, globalenv()) # get data frames into globalenv

# insert extra columns into some df's
df3 <- cbind(id=df3[, 1], foo=sample(1e3:2e3, nrow(df3), replace=TRUE), df3[, -1])
df4 <- cbind(id=df4[, 1], foo=sample(1e3:2e3, nrow(df4), replace=TRUE), df4[, -1])
df5 <- cbind(id=df5[, 1], foo=sample(1e3:2e3, nrow(df5), replace=TRUE), df5[, -1])
df6 <- cbind(id=df6[, 1], foo=sample(1e3:2e3, nrow(df5), replace=TRUE), df6[, -1])

# giving var names:
names(df1)[3:22] <- nm[[1]][3:22]
names(df2)[3:22] <- nm[[2]][3:22]
names(df3)[4:23] <- nm[[3]][4:23]
names(df4)[4:23] <- nm[[4]][4:23]
names(df5)[4:23] <- nm[[5]][4:23]
names(df5)[4:23] <- nm[[6]][4:23]

# list of data.frames:
L <- list(df1, df2, df3, df4, df5, df6)

# list of data.tables:
library(data.table)
L.dt <- lapply(L, function(x) setkeyv(data.table(x), c("id", "yr")))


Also related: 1, 2










share|improve this question

























  • Respond to the error about the duplicates: cols <- data.table(cols=unlist(lapply(L, function(X){colnames(X)}))); cols[, .N, by=cols][N>1,]. e.g. you have column p.l01 4 times. What do you want the output to be for this when there are 4 versions of these in the output? data.table handles some duplicates by putting .x and .y after, but it has limits. It's more robust to change the names of your data, going in to the merge, than alter them post-merge

    – Jonny Phelps
    Nov 23 '18 at 15:18













  • p.101 is always combined with year so there should expectedly be no problem. W/o this answer I'd expect the output to be p.101.x, p.101.y. p.101.something, but at least this is solved in the linked answer. "but it has limits" - that's exactly the point. In base R it's working - in data.table not (yet).

    – jay.sf
    Nov 23 '18 at 15:36











  • When I try Reduce(function(...) merge(..., all=TRUE), L) from your post, it removes p.l04. I wouldn't call that working, I'd call that a silent bug. the data.table error is at least helpful here

    – Jonny Phelps
    Nov 23 '18 at 15:39











  • p.l04 is not in the names, try any(sapply(L, function(x) "p.l04" %in% names(x)))

    – jay.sf
    Nov 23 '18 at 15:51








  • 1





    r u looking for Reduce(function(x, y) merge(x, y, all=TRUE, by=intersect(names(x), names(y))), L.dt) ?

    – chinsoon12
    Nov 26 '18 at 1:01
















0












0








0








When using merge.data.table() I get ugly error messages about duplicated columns names—strangely enough only if there are more than three data.tables. On the contrary standard merge() with data.frame versions works fine.



I use this code, to achieve a full join avoid time-split columns with merge.data.table().



So, in base R



Reduce(function(...) merge(..., all=TRUE), L)


runs fine, whereas in data.table



Reduce(function(...)
merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt, names))), L.dt)


trows the error:



  Error in merge.data.table(..., all = TRUE, by = Reduce(intersect, lapply(L.dt,  : 
x has some duplicated column name(s): p.l01.x,p.l02.x,p.l03.x,p.l05.x,p.l06.x,p.l07.x,
p.l08.x,p.l10.x,p.l11.x,p.l12.x,
p.l01.y,p.l02.y,p.l03.y,p.l05.y,p.l06.y,p.l07.y,
p.l08.y,p.l10.y,p.l11.y,p.l12.y. Please remove
or rename the duplicate(s) and try again.
In addition: Warning message:
In merge.data.table(..., all = TRUE, by = Reduce(intersect, lapply(L.dt, :

Error in merge.data.table(..., all = TRUE, by = Reduce(intersect, lapply(L.dt, :
x has some duplicated column name(s): p.l01.x,p.l02.x,p.l03.x,p.l05.x,p.l06.x,p.l07.x,
p.l08.x,p.l10.x,p.l11.x,p.l12.x,
p.l01.y,p.l02.y,p.l03.y,p.l05.y,p.l06.y,p.l07.y,p.l08.y,p.l10.y,p.l11.y,p.l12.y. Please remove
or rename the duplicate(s) and try again.


Note that the issue appears to be cumulative somehow...



While



Reduce(function(...)
merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:2], names))), L.dt[1:2])

Reduce(function(...)
merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:3], names))), L.dt[1:3])

Reduce(function(...)
merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:4], names))), L.dt[1:4])


still runs fine,



Reduce(function(...)
merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:5], names))), L.dt[1:5])


throws a warning, and from here



Reduce(function(...)
merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:6], names))), L.dt[1:6])


finally the error starts.



Thus, merge.data.table() appears to be working fine only until the number of data.tables exceeds 3?? Any ideas how I could solve this issue?



Note: With easy example data no error appears. So it could be caused by my data. But I have no idea why—standard merge() works fine! My real data has about dim=15.000*500, some variable names match throughout the list, others are individual. This MCVE seems to reproduce the issue quite well.



Data and code to reproduce



# names vector mimicking my data   
nm <- list(c("p.d17m", "p.d17", "p.d29", "p.d31", "p.n03", "p.n04",
"p.n05", "p.d36", "p.d40", "p.d41", "p.d45", "p.d46", "p.d50",
"p.d51", "p.d55", "p.d56", "p.d60", "p.d61", "p.d65", "p.d66",
"p.d70", "p.d71", "p.n08"), c("p.d17m", "p.d17", "p.c44", "p.l01",
"p.l02", "p.l03", "p.l05", "p.l06", "p.l07", "p.l08", "p.l10",
"p.l11", "p.l12", "p.l13", "p.l15", "p.l16", "p.l17", "p.l18",
"p.l20", "p.l21", "p.l22", "p.l23", "p.l25"), c("p.d17m", "p.d17",
"p.c44", "p.l01", "p.l02", "p.l03", "p.l05", "p.l06", "p.l07",
"p.l08", "p.l10", "p.l11", "p.l12", "p.l13", "p.l15", "p.l16",
"p.l17", "p.l18", "p.l20", "p.l21", "p.l22", "p.l23", "p.l25"
), c("p.d17m", "p.d17", "p.c44", "p.l01", "p.l01r", "p.l02",
"p.l03", "p.l05", "p.l06", "p.l07", "p.l08", "p.l10", "p.l11",
"p.l12", "p.l13", "p.l15", "p.l16", "p.l17", "p.l18", "p.l20",
"p.l21", "p.l22", "p.l23"), c("p.d17m", "p.d17", "p.c44", "p.l01",
"p.l01r", "p.l02", "p.l03", "p.l05", "p.l06", "p.l07", "p.l08",
"p.l10", "p.l11", "p.l12", "p.l13", "p.l15", "p.l16", "p.l17",
"p.l18", "p.l20", "p.l21", "p.l22", "p.l23"), c("pdate.", "p.d17m",
"p.d17", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165",
"p.d166", "p.d167", "p.c44", "p.l01", "p.l01r", "p.l02", "p.l03",
"p.l05", "p.l06", "p.l07", "p.l08", "p.l10", "p.l11", "p.l12"
), c("pdate.", "p.d17m", "p.d17", "p.d160", "p.d161", "p.d162",
"p.d163", "p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.l01",
"p.l01r", "p.l02", "p.l03", "p.l05", "p.l06", "p.l07", "p.l08",
"p.l10", "p.l11", "p.l12"), c("pdate.", "p.d200", "p.d201", "p.d17m",
"p.d17", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165",
"p.d166", "p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50",
"p.l01", "p.l01r", "p.l02", "p.l03", "p.l05"), c("pdate.", "p.d17m",
"p.d17", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165",
"p.d166", "p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50",
"p.l01", "p.l01r", "p.l02", "p.l03", "p.l05", "p.l06", "p.l07"
), c("pdate.", "p.d17m", "p.d17", "p.d160", "p.d161", "p.d162",
"p.d163", "p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.c47",
"p.c48", "p.c49", "p.c50", "p.l01", "p.l01r", "p.l02", "p.l03",
"p.l05", "p.l06", "p.l07"), c("plingu.", "pdate.", "p.d17m",
"p.d17", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165",
"p.d166", "p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50",
"p.c60", "p.c61", "p.c62", "p.c63", "p.c64", "p.c65"), c("hab_ch.",
"plingu.", "pdate.", "p.d17m", "p.d17", "p.d160", "p.d161", "p.d162",
"p.d163", "p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.c47",
"p.c48", "p.c49", "p.c50", "p.c60", "p.c61", "p.c62", "p.c63",
"p.c64"), c("hab_ch.", "plingu.", "pdate.", "p.d17m", "p.d17",
"p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165", "p.d166",
"p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50", "p.c60",
"p.c61", "p.c62", "p.c63", "p.c64"), c("hab_ch.", "plingu.",
"pdate.", "p.d17m", "p.d17", "p.d160", "p.d161", "p.d162", "p.d163",
"p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.c100", "p.c101",
"p.c102", "p.c103", "p.c47", "p.c48", "p.c49", "p.c50", "p.l01"
), c("hab_ch.", "plingu.", "pdate.", "p.d17m", "p.d17", "p.d110a",
"p.d110b", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164",
"p.d165", "p.d166", "p.d167", "p.c44", "p.c47", "p.c48", "p.c49",
"p.c50", "p.l01", "p.l01r", "p.l02"), c("hab_ch.", "plingu.",
"pdate.", "p.d17m", "p.d17", "p.d110a", "p.d110b", "p.d160",
"p.d161", "p.d162", "p.d163", "p.d164", "p.d165", "p.d166", "p.d167",
"p.c44", "p.c47", "p.c48", "p.c49", "p.c50", "p.l01", "p.l01r",
"p.l02"), c("hab_ch.", "plingu.", "pdate.", "p.d17m", "p.d17",
"p.d110a", "p.d110b", "p.d160", "p.d161", "p.d162", "p.d163",
"p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.c100", "p.c101",
"p.c102", "p.c103", "p.c47", "p.c48", "p.c49"), c("hab_ch.",
"plingu.", "pdate.", "p.d17m", "p.d17", "p.d110a", "p.d110b",
"p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165", "p.d166",
"p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50", "p.l01",
"p.l01r", "p.l02"))

n <- 10 # row numbers

fun <- function(x) {
# fun() produces a n x 22 data.frame
set.seed(x)
d <- data.frame(
cbind(id=1:n, yr=x,
matrix(rnorm(n*(20)), n)))
}

tmp <- setNames(lapply(1:6, fun), paste0("df", 1:6)) # produce six data frames

list2env(tmp, globalenv()) # get data frames into globalenv

# insert extra columns into some df's
df3 <- cbind(id=df3[, 1], foo=sample(1e3:2e3, nrow(df3), replace=TRUE), df3[, -1])
df4 <- cbind(id=df4[, 1], foo=sample(1e3:2e3, nrow(df4), replace=TRUE), df4[, -1])
df5 <- cbind(id=df5[, 1], foo=sample(1e3:2e3, nrow(df5), replace=TRUE), df5[, -1])
df6 <- cbind(id=df6[, 1], foo=sample(1e3:2e3, nrow(df5), replace=TRUE), df6[, -1])

# giving var names:
names(df1)[3:22] <- nm[[1]][3:22]
names(df2)[3:22] <- nm[[2]][3:22]
names(df3)[4:23] <- nm[[3]][4:23]
names(df4)[4:23] <- nm[[4]][4:23]
names(df5)[4:23] <- nm[[5]][4:23]
names(df5)[4:23] <- nm[[6]][4:23]

# list of data.frames:
L <- list(df1, df2, df3, df4, df5, df6)

# list of data.tables:
library(data.table)
L.dt <- lapply(L, function(x) setkeyv(data.table(x), c("id", "yr")))


Also related: 1, 2










share|improve this question
















When using merge.data.table() I get ugly error messages about duplicated columns names—strangely enough only if there are more than three data.tables. On the contrary standard merge() with data.frame versions works fine.



I use this code, to achieve a full join avoid time-split columns with merge.data.table().



So, in base R



Reduce(function(...) merge(..., all=TRUE), L)


runs fine, whereas in data.table



Reduce(function(...)
merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt, names))), L.dt)


trows the error:



  Error in merge.data.table(..., all = TRUE, by = Reduce(intersect, lapply(L.dt,  : 
x has some duplicated column name(s): p.l01.x,p.l02.x,p.l03.x,p.l05.x,p.l06.x,p.l07.x,
p.l08.x,p.l10.x,p.l11.x,p.l12.x,
p.l01.y,p.l02.y,p.l03.y,p.l05.y,p.l06.y,p.l07.y,
p.l08.y,p.l10.y,p.l11.y,p.l12.y. Please remove
or rename the duplicate(s) and try again.
In addition: Warning message:
In merge.data.table(..., all = TRUE, by = Reduce(intersect, lapply(L.dt, :

Error in merge.data.table(..., all = TRUE, by = Reduce(intersect, lapply(L.dt, :
x has some duplicated column name(s): p.l01.x,p.l02.x,p.l03.x,p.l05.x,p.l06.x,p.l07.x,
p.l08.x,p.l10.x,p.l11.x,p.l12.x,
p.l01.y,p.l02.y,p.l03.y,p.l05.y,p.l06.y,p.l07.y,p.l08.y,p.l10.y,p.l11.y,p.l12.y. Please remove
or rename the duplicate(s) and try again.


Note that the issue appears to be cumulative somehow...



While



Reduce(function(...)
merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:2], names))), L.dt[1:2])

Reduce(function(...)
merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:3], names))), L.dt[1:3])

Reduce(function(...)
merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:4], names))), L.dt[1:4])


still runs fine,



Reduce(function(...)
merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:5], names))), L.dt[1:5])


throws a warning, and from here



Reduce(function(...)
merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:6], names))), L.dt[1:6])


finally the error starts.



Thus, merge.data.table() appears to be working fine only until the number of data.tables exceeds 3?? Any ideas how I could solve this issue?



Note: With easy example data no error appears. So it could be caused by my data. But I have no idea why—standard merge() works fine! My real data has about dim=15.000*500, some variable names match throughout the list, others are individual. This MCVE seems to reproduce the issue quite well.



Data and code to reproduce



# names vector mimicking my data   
nm <- list(c("p.d17m", "p.d17", "p.d29", "p.d31", "p.n03", "p.n04",
"p.n05", "p.d36", "p.d40", "p.d41", "p.d45", "p.d46", "p.d50",
"p.d51", "p.d55", "p.d56", "p.d60", "p.d61", "p.d65", "p.d66",
"p.d70", "p.d71", "p.n08"), c("p.d17m", "p.d17", "p.c44", "p.l01",
"p.l02", "p.l03", "p.l05", "p.l06", "p.l07", "p.l08", "p.l10",
"p.l11", "p.l12", "p.l13", "p.l15", "p.l16", "p.l17", "p.l18",
"p.l20", "p.l21", "p.l22", "p.l23", "p.l25"), c("p.d17m", "p.d17",
"p.c44", "p.l01", "p.l02", "p.l03", "p.l05", "p.l06", "p.l07",
"p.l08", "p.l10", "p.l11", "p.l12", "p.l13", "p.l15", "p.l16",
"p.l17", "p.l18", "p.l20", "p.l21", "p.l22", "p.l23", "p.l25"
), c("p.d17m", "p.d17", "p.c44", "p.l01", "p.l01r", "p.l02",
"p.l03", "p.l05", "p.l06", "p.l07", "p.l08", "p.l10", "p.l11",
"p.l12", "p.l13", "p.l15", "p.l16", "p.l17", "p.l18", "p.l20",
"p.l21", "p.l22", "p.l23"), c("p.d17m", "p.d17", "p.c44", "p.l01",
"p.l01r", "p.l02", "p.l03", "p.l05", "p.l06", "p.l07", "p.l08",
"p.l10", "p.l11", "p.l12", "p.l13", "p.l15", "p.l16", "p.l17",
"p.l18", "p.l20", "p.l21", "p.l22", "p.l23"), c("pdate.", "p.d17m",
"p.d17", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165",
"p.d166", "p.d167", "p.c44", "p.l01", "p.l01r", "p.l02", "p.l03",
"p.l05", "p.l06", "p.l07", "p.l08", "p.l10", "p.l11", "p.l12"
), c("pdate.", "p.d17m", "p.d17", "p.d160", "p.d161", "p.d162",
"p.d163", "p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.l01",
"p.l01r", "p.l02", "p.l03", "p.l05", "p.l06", "p.l07", "p.l08",
"p.l10", "p.l11", "p.l12"), c("pdate.", "p.d200", "p.d201", "p.d17m",
"p.d17", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165",
"p.d166", "p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50",
"p.l01", "p.l01r", "p.l02", "p.l03", "p.l05"), c("pdate.", "p.d17m",
"p.d17", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165",
"p.d166", "p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50",
"p.l01", "p.l01r", "p.l02", "p.l03", "p.l05", "p.l06", "p.l07"
), c("pdate.", "p.d17m", "p.d17", "p.d160", "p.d161", "p.d162",
"p.d163", "p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.c47",
"p.c48", "p.c49", "p.c50", "p.l01", "p.l01r", "p.l02", "p.l03",
"p.l05", "p.l06", "p.l07"), c("plingu.", "pdate.", "p.d17m",
"p.d17", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165",
"p.d166", "p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50",
"p.c60", "p.c61", "p.c62", "p.c63", "p.c64", "p.c65"), c("hab_ch.",
"plingu.", "pdate.", "p.d17m", "p.d17", "p.d160", "p.d161", "p.d162",
"p.d163", "p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.c47",
"p.c48", "p.c49", "p.c50", "p.c60", "p.c61", "p.c62", "p.c63",
"p.c64"), c("hab_ch.", "plingu.", "pdate.", "p.d17m", "p.d17",
"p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165", "p.d166",
"p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50", "p.c60",
"p.c61", "p.c62", "p.c63", "p.c64"), c("hab_ch.", "plingu.",
"pdate.", "p.d17m", "p.d17", "p.d160", "p.d161", "p.d162", "p.d163",
"p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.c100", "p.c101",
"p.c102", "p.c103", "p.c47", "p.c48", "p.c49", "p.c50", "p.l01"
), c("hab_ch.", "plingu.", "pdate.", "p.d17m", "p.d17", "p.d110a",
"p.d110b", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164",
"p.d165", "p.d166", "p.d167", "p.c44", "p.c47", "p.c48", "p.c49",
"p.c50", "p.l01", "p.l01r", "p.l02"), c("hab_ch.", "plingu.",
"pdate.", "p.d17m", "p.d17", "p.d110a", "p.d110b", "p.d160",
"p.d161", "p.d162", "p.d163", "p.d164", "p.d165", "p.d166", "p.d167",
"p.c44", "p.c47", "p.c48", "p.c49", "p.c50", "p.l01", "p.l01r",
"p.l02"), c("hab_ch.", "plingu.", "pdate.", "p.d17m", "p.d17",
"p.d110a", "p.d110b", "p.d160", "p.d161", "p.d162", "p.d163",
"p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.c100", "p.c101",
"p.c102", "p.c103", "p.c47", "p.c48", "p.c49"), c("hab_ch.",
"plingu.", "pdate.", "p.d17m", "p.d17", "p.d110a", "p.d110b",
"p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165", "p.d166",
"p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50", "p.l01",
"p.l01r", "p.l02"))

n <- 10 # row numbers

fun <- function(x) {
# fun() produces a n x 22 data.frame
set.seed(x)
d <- data.frame(
cbind(id=1:n, yr=x,
matrix(rnorm(n*(20)), n)))
}

tmp <- setNames(lapply(1:6, fun), paste0("df", 1:6)) # produce six data frames

list2env(tmp, globalenv()) # get data frames into globalenv

# insert extra columns into some df's
df3 <- cbind(id=df3[, 1], foo=sample(1e3:2e3, nrow(df3), replace=TRUE), df3[, -1])
df4 <- cbind(id=df4[, 1], foo=sample(1e3:2e3, nrow(df4), replace=TRUE), df4[, -1])
df5 <- cbind(id=df5[, 1], foo=sample(1e3:2e3, nrow(df5), replace=TRUE), df5[, -1])
df6 <- cbind(id=df6[, 1], foo=sample(1e3:2e3, nrow(df5), replace=TRUE), df6[, -1])

# giving var names:
names(df1)[3:22] <- nm[[1]][3:22]
names(df2)[3:22] <- nm[[2]][3:22]
names(df3)[4:23] <- nm[[3]][4:23]
names(df4)[4:23] <- nm[[4]][4:23]
names(df5)[4:23] <- nm[[5]][4:23]
names(df5)[4:23] <- nm[[6]][4:23]

# list of data.frames:
L <- list(df1, df2, df3, df4, df5, df6)

# list of data.tables:
library(data.table)
L.dt <- lapply(L, function(x) setkeyv(data.table(x), c("id", "yr")))


Also related: 1, 2







r merge data.table






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 24 '18 at 10:24







jay.sf

















asked Nov 23 '18 at 14:10









jay.sfjay.sf

6,76131841




6,76131841













  • Respond to the error about the duplicates: cols <- data.table(cols=unlist(lapply(L, function(X){colnames(X)}))); cols[, .N, by=cols][N>1,]. e.g. you have column p.l01 4 times. What do you want the output to be for this when there are 4 versions of these in the output? data.table handles some duplicates by putting .x and .y after, but it has limits. It's more robust to change the names of your data, going in to the merge, than alter them post-merge

    – Jonny Phelps
    Nov 23 '18 at 15:18













  • p.101 is always combined with year so there should expectedly be no problem. W/o this answer I'd expect the output to be p.101.x, p.101.y. p.101.something, but at least this is solved in the linked answer. "but it has limits" - that's exactly the point. In base R it's working - in data.table not (yet).

    – jay.sf
    Nov 23 '18 at 15:36











  • When I try Reduce(function(...) merge(..., all=TRUE), L) from your post, it removes p.l04. I wouldn't call that working, I'd call that a silent bug. the data.table error is at least helpful here

    – Jonny Phelps
    Nov 23 '18 at 15:39











  • p.l04 is not in the names, try any(sapply(L, function(x) "p.l04" %in% names(x)))

    – jay.sf
    Nov 23 '18 at 15:51








  • 1





    r u looking for Reduce(function(x, y) merge(x, y, all=TRUE, by=intersect(names(x), names(y))), L.dt) ?

    – chinsoon12
    Nov 26 '18 at 1:01





















  • Respond to the error about the duplicates: cols <- data.table(cols=unlist(lapply(L, function(X){colnames(X)}))); cols[, .N, by=cols][N>1,]. e.g. you have column p.l01 4 times. What do you want the output to be for this when there are 4 versions of these in the output? data.table handles some duplicates by putting .x and .y after, but it has limits. It's more robust to change the names of your data, going in to the merge, than alter them post-merge

    – Jonny Phelps
    Nov 23 '18 at 15:18













  • p.101 is always combined with year so there should expectedly be no problem. W/o this answer I'd expect the output to be p.101.x, p.101.y. p.101.something, but at least this is solved in the linked answer. "but it has limits" - that's exactly the point. In base R it's working - in data.table not (yet).

    – jay.sf
    Nov 23 '18 at 15:36











  • When I try Reduce(function(...) merge(..., all=TRUE), L) from your post, it removes p.l04. I wouldn't call that working, I'd call that a silent bug. the data.table error is at least helpful here

    – Jonny Phelps
    Nov 23 '18 at 15:39











  • p.l04 is not in the names, try any(sapply(L, function(x) "p.l04" %in% names(x)))

    – jay.sf
    Nov 23 '18 at 15:51








  • 1





    r u looking for Reduce(function(x, y) merge(x, y, all=TRUE, by=intersect(names(x), names(y))), L.dt) ?

    – chinsoon12
    Nov 26 '18 at 1:01



















Respond to the error about the duplicates: cols <- data.table(cols=unlist(lapply(L, function(X){colnames(X)}))); cols[, .N, by=cols][N>1,]. e.g. you have column p.l01 4 times. What do you want the output to be for this when there are 4 versions of these in the output? data.table handles some duplicates by putting .x and .y after, but it has limits. It's more robust to change the names of your data, going in to the merge, than alter them post-merge

– Jonny Phelps
Nov 23 '18 at 15:18







Respond to the error about the duplicates: cols <- data.table(cols=unlist(lapply(L, function(X){colnames(X)}))); cols[, .N, by=cols][N>1,]. e.g. you have column p.l01 4 times. What do you want the output to be for this when there are 4 versions of these in the output? data.table handles some duplicates by putting .x and .y after, but it has limits. It's more robust to change the names of your data, going in to the merge, than alter them post-merge

– Jonny Phelps
Nov 23 '18 at 15:18















p.101 is always combined with year so there should expectedly be no problem. W/o this answer I'd expect the output to be p.101.x, p.101.y. p.101.something, but at least this is solved in the linked answer. "but it has limits" - that's exactly the point. In base R it's working - in data.table not (yet).

– jay.sf
Nov 23 '18 at 15:36





p.101 is always combined with year so there should expectedly be no problem. W/o this answer I'd expect the output to be p.101.x, p.101.y. p.101.something, but at least this is solved in the linked answer. "but it has limits" - that's exactly the point. In base R it's working - in data.table not (yet).

– jay.sf
Nov 23 '18 at 15:36













When I try Reduce(function(...) merge(..., all=TRUE), L) from your post, it removes p.l04. I wouldn't call that working, I'd call that a silent bug. the data.table error is at least helpful here

– Jonny Phelps
Nov 23 '18 at 15:39





When I try Reduce(function(...) merge(..., all=TRUE), L) from your post, it removes p.l04. I wouldn't call that working, I'd call that a silent bug. the data.table error is at least helpful here

– Jonny Phelps
Nov 23 '18 at 15:39













p.l04 is not in the names, try any(sapply(L, function(x) "p.l04" %in% names(x)))

– jay.sf
Nov 23 '18 at 15:51







p.l04 is not in the names, try any(sapply(L, function(x) "p.l04" %in% names(x)))

– jay.sf
Nov 23 '18 at 15:51






1




1





r u looking for Reduce(function(x, y) merge(x, y, all=TRUE, by=intersect(names(x), names(y))), L.dt) ?

– chinsoon12
Nov 26 '18 at 1:01







r u looking for Reduce(function(x, y) merge(x, y, all=TRUE, by=intersect(names(x), names(y))), L.dt) ?

– chinsoon12
Nov 26 '18 at 1:01














0






active

oldest

votes












Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53448228%2fhow-to-merge-full-join-a-list-of-more-than-three-data-tables-correctly%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























0






active

oldest

votes








0






active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53448228%2fhow-to-merge-full-join-a-list-of-more-than-three-data-tables-correctly%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

"Incorrect syntax near the keyword 'ON'. (on update cascade, on delete cascade,)

Alcedinidae

Origin of the phrase “under your belt”?