How to merge (full join) a list of more than three data.tables correctly?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
When using merge.data.table()
I get ugly error messages about duplicated columns names—strangely enough only if there are more than three data.tables. On the contrary standard merge()
with data.frame
versions works fine.
I use this code, to achieve a full join avoid time-split columns with merge.data.table()
.
So, in base R
Reduce(function(...) merge(..., all=TRUE), L)
runs fine, whereas in data.table
Reduce(function(...)
merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt, names))), L.dt)
trows the error:
Error in merge.data.table(..., all = TRUE, by = Reduce(intersect, lapply(L.dt, :
x has some duplicated column name(s): p.l01.x,p.l02.x,p.l03.x,p.l05.x,p.l06.x,p.l07.x,
p.l08.x,p.l10.x,p.l11.x,p.l12.x,
p.l01.y,p.l02.y,p.l03.y,p.l05.y,p.l06.y,p.l07.y,
p.l08.y,p.l10.y,p.l11.y,p.l12.y. Please remove
or rename the duplicate(s) and try again.
In addition: Warning message:
In merge.data.table(..., all = TRUE, by = Reduce(intersect, lapply(L.dt, :
Error in merge.data.table(..., all = TRUE, by = Reduce(intersect, lapply(L.dt, :
x has some duplicated column name(s): p.l01.x,p.l02.x,p.l03.x,p.l05.x,p.l06.x,p.l07.x,
p.l08.x,p.l10.x,p.l11.x,p.l12.x,
p.l01.y,p.l02.y,p.l03.y,p.l05.y,p.l06.y,p.l07.y,p.l08.y,p.l10.y,p.l11.y,p.l12.y. Please remove
or rename the duplicate(s) and try again.
Note that the issue appears to be cumulative somehow...
While
Reduce(function(...)
merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:2], names))), L.dt[1:2])
Reduce(function(...)
merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:3], names))), L.dt[1:3])
Reduce(function(...)
merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:4], names))), L.dt[1:4])
still runs fine,
Reduce(function(...)
merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:5], names))), L.dt[1:5])
throws a warning, and from here
Reduce(function(...)
merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:6], names))), L.dt[1:6])
finally the error starts.
Thus, merge.data.table()
appears to be working fine only until the number of data.tables exceeds 3?? Any ideas how I could solve this issue?
Note: With easy example data no error appears. So it could be caused by my data. But I have no idea why—standard merge()
works fine! My real data has about dim=15.000*500, some variable names match throughout the list, others are individual. This MCVE seems to reproduce the issue quite well.
Data and code to reproduce
# names vector mimicking my data
nm <- list(c("p.d17m", "p.d17", "p.d29", "p.d31", "p.n03", "p.n04",
"p.n05", "p.d36", "p.d40", "p.d41", "p.d45", "p.d46", "p.d50",
"p.d51", "p.d55", "p.d56", "p.d60", "p.d61", "p.d65", "p.d66",
"p.d70", "p.d71", "p.n08"), c("p.d17m", "p.d17", "p.c44", "p.l01",
"p.l02", "p.l03", "p.l05", "p.l06", "p.l07", "p.l08", "p.l10",
"p.l11", "p.l12", "p.l13", "p.l15", "p.l16", "p.l17", "p.l18",
"p.l20", "p.l21", "p.l22", "p.l23", "p.l25"), c("p.d17m", "p.d17",
"p.c44", "p.l01", "p.l02", "p.l03", "p.l05", "p.l06", "p.l07",
"p.l08", "p.l10", "p.l11", "p.l12", "p.l13", "p.l15", "p.l16",
"p.l17", "p.l18", "p.l20", "p.l21", "p.l22", "p.l23", "p.l25"
), c("p.d17m", "p.d17", "p.c44", "p.l01", "p.l01r", "p.l02",
"p.l03", "p.l05", "p.l06", "p.l07", "p.l08", "p.l10", "p.l11",
"p.l12", "p.l13", "p.l15", "p.l16", "p.l17", "p.l18", "p.l20",
"p.l21", "p.l22", "p.l23"), c("p.d17m", "p.d17", "p.c44", "p.l01",
"p.l01r", "p.l02", "p.l03", "p.l05", "p.l06", "p.l07", "p.l08",
"p.l10", "p.l11", "p.l12", "p.l13", "p.l15", "p.l16", "p.l17",
"p.l18", "p.l20", "p.l21", "p.l22", "p.l23"), c("pdate.", "p.d17m",
"p.d17", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165",
"p.d166", "p.d167", "p.c44", "p.l01", "p.l01r", "p.l02", "p.l03",
"p.l05", "p.l06", "p.l07", "p.l08", "p.l10", "p.l11", "p.l12"
), c("pdate.", "p.d17m", "p.d17", "p.d160", "p.d161", "p.d162",
"p.d163", "p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.l01",
"p.l01r", "p.l02", "p.l03", "p.l05", "p.l06", "p.l07", "p.l08",
"p.l10", "p.l11", "p.l12"), c("pdate.", "p.d200", "p.d201", "p.d17m",
"p.d17", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165",
"p.d166", "p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50",
"p.l01", "p.l01r", "p.l02", "p.l03", "p.l05"), c("pdate.", "p.d17m",
"p.d17", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165",
"p.d166", "p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50",
"p.l01", "p.l01r", "p.l02", "p.l03", "p.l05", "p.l06", "p.l07"
), c("pdate.", "p.d17m", "p.d17", "p.d160", "p.d161", "p.d162",
"p.d163", "p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.c47",
"p.c48", "p.c49", "p.c50", "p.l01", "p.l01r", "p.l02", "p.l03",
"p.l05", "p.l06", "p.l07"), c("plingu.", "pdate.", "p.d17m",
"p.d17", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165",
"p.d166", "p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50",
"p.c60", "p.c61", "p.c62", "p.c63", "p.c64", "p.c65"), c("hab_ch.",
"plingu.", "pdate.", "p.d17m", "p.d17", "p.d160", "p.d161", "p.d162",
"p.d163", "p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.c47",
"p.c48", "p.c49", "p.c50", "p.c60", "p.c61", "p.c62", "p.c63",
"p.c64"), c("hab_ch.", "plingu.", "pdate.", "p.d17m", "p.d17",
"p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165", "p.d166",
"p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50", "p.c60",
"p.c61", "p.c62", "p.c63", "p.c64"), c("hab_ch.", "plingu.",
"pdate.", "p.d17m", "p.d17", "p.d160", "p.d161", "p.d162", "p.d163",
"p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.c100", "p.c101",
"p.c102", "p.c103", "p.c47", "p.c48", "p.c49", "p.c50", "p.l01"
), c("hab_ch.", "plingu.", "pdate.", "p.d17m", "p.d17", "p.d110a",
"p.d110b", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164",
"p.d165", "p.d166", "p.d167", "p.c44", "p.c47", "p.c48", "p.c49",
"p.c50", "p.l01", "p.l01r", "p.l02"), c("hab_ch.", "plingu.",
"pdate.", "p.d17m", "p.d17", "p.d110a", "p.d110b", "p.d160",
"p.d161", "p.d162", "p.d163", "p.d164", "p.d165", "p.d166", "p.d167",
"p.c44", "p.c47", "p.c48", "p.c49", "p.c50", "p.l01", "p.l01r",
"p.l02"), c("hab_ch.", "plingu.", "pdate.", "p.d17m", "p.d17",
"p.d110a", "p.d110b", "p.d160", "p.d161", "p.d162", "p.d163",
"p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.c100", "p.c101",
"p.c102", "p.c103", "p.c47", "p.c48", "p.c49"), c("hab_ch.",
"plingu.", "pdate.", "p.d17m", "p.d17", "p.d110a", "p.d110b",
"p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165", "p.d166",
"p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50", "p.l01",
"p.l01r", "p.l02"))
n <- 10 # row numbers
fun <- function(x) {
# fun() produces a n x 22 data.frame
set.seed(x)
d <- data.frame(
cbind(id=1:n, yr=x,
matrix(rnorm(n*(20)), n)))
}
tmp <- setNames(lapply(1:6, fun), paste0("df", 1:6)) # produce six data frames
list2env(tmp, globalenv()) # get data frames into globalenv
# insert extra columns into some df's
df3 <- cbind(id=df3[, 1], foo=sample(1e3:2e3, nrow(df3), replace=TRUE), df3[, -1])
df4 <- cbind(id=df4[, 1], foo=sample(1e3:2e3, nrow(df4), replace=TRUE), df4[, -1])
df5 <- cbind(id=df5[, 1], foo=sample(1e3:2e3, nrow(df5), replace=TRUE), df5[, -1])
df6 <- cbind(id=df6[, 1], foo=sample(1e3:2e3, nrow(df5), replace=TRUE), df6[, -1])
# giving var names:
names(df1)[3:22] <- nm[[1]][3:22]
names(df2)[3:22] <- nm[[2]][3:22]
names(df3)[4:23] <- nm[[3]][4:23]
names(df4)[4:23] <- nm[[4]][4:23]
names(df5)[4:23] <- nm[[5]][4:23]
names(df5)[4:23] <- nm[[6]][4:23]
# list of data.frames:
L <- list(df1, df2, df3, df4, df5, df6)
# list of data.tables:
library(data.table)
L.dt <- lapply(L, function(x) setkeyv(data.table(x), c("id", "yr")))
Also related: 1, 2
r merge data.table
|
show 3 more comments
When using merge.data.table()
I get ugly error messages about duplicated columns names—strangely enough only if there are more than three data.tables. On the contrary standard merge()
with data.frame
versions works fine.
I use this code, to achieve a full join avoid time-split columns with merge.data.table()
.
So, in base R
Reduce(function(...) merge(..., all=TRUE), L)
runs fine, whereas in data.table
Reduce(function(...)
merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt, names))), L.dt)
trows the error:
Error in merge.data.table(..., all = TRUE, by = Reduce(intersect, lapply(L.dt, :
x has some duplicated column name(s): p.l01.x,p.l02.x,p.l03.x,p.l05.x,p.l06.x,p.l07.x,
p.l08.x,p.l10.x,p.l11.x,p.l12.x,
p.l01.y,p.l02.y,p.l03.y,p.l05.y,p.l06.y,p.l07.y,
p.l08.y,p.l10.y,p.l11.y,p.l12.y. Please remove
or rename the duplicate(s) and try again.
In addition: Warning message:
In merge.data.table(..., all = TRUE, by = Reduce(intersect, lapply(L.dt, :
Error in merge.data.table(..., all = TRUE, by = Reduce(intersect, lapply(L.dt, :
x has some duplicated column name(s): p.l01.x,p.l02.x,p.l03.x,p.l05.x,p.l06.x,p.l07.x,
p.l08.x,p.l10.x,p.l11.x,p.l12.x,
p.l01.y,p.l02.y,p.l03.y,p.l05.y,p.l06.y,p.l07.y,p.l08.y,p.l10.y,p.l11.y,p.l12.y. Please remove
or rename the duplicate(s) and try again.
Note that the issue appears to be cumulative somehow...
While
Reduce(function(...)
merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:2], names))), L.dt[1:2])
Reduce(function(...)
merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:3], names))), L.dt[1:3])
Reduce(function(...)
merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:4], names))), L.dt[1:4])
still runs fine,
Reduce(function(...)
merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:5], names))), L.dt[1:5])
throws a warning, and from here
Reduce(function(...)
merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:6], names))), L.dt[1:6])
finally the error starts.
Thus, merge.data.table()
appears to be working fine only until the number of data.tables exceeds 3?? Any ideas how I could solve this issue?
Note: With easy example data no error appears. So it could be caused by my data. But I have no idea why—standard merge()
works fine! My real data has about dim=15.000*500, some variable names match throughout the list, others are individual. This MCVE seems to reproduce the issue quite well.
Data and code to reproduce
# names vector mimicking my data
nm <- list(c("p.d17m", "p.d17", "p.d29", "p.d31", "p.n03", "p.n04",
"p.n05", "p.d36", "p.d40", "p.d41", "p.d45", "p.d46", "p.d50",
"p.d51", "p.d55", "p.d56", "p.d60", "p.d61", "p.d65", "p.d66",
"p.d70", "p.d71", "p.n08"), c("p.d17m", "p.d17", "p.c44", "p.l01",
"p.l02", "p.l03", "p.l05", "p.l06", "p.l07", "p.l08", "p.l10",
"p.l11", "p.l12", "p.l13", "p.l15", "p.l16", "p.l17", "p.l18",
"p.l20", "p.l21", "p.l22", "p.l23", "p.l25"), c("p.d17m", "p.d17",
"p.c44", "p.l01", "p.l02", "p.l03", "p.l05", "p.l06", "p.l07",
"p.l08", "p.l10", "p.l11", "p.l12", "p.l13", "p.l15", "p.l16",
"p.l17", "p.l18", "p.l20", "p.l21", "p.l22", "p.l23", "p.l25"
), c("p.d17m", "p.d17", "p.c44", "p.l01", "p.l01r", "p.l02",
"p.l03", "p.l05", "p.l06", "p.l07", "p.l08", "p.l10", "p.l11",
"p.l12", "p.l13", "p.l15", "p.l16", "p.l17", "p.l18", "p.l20",
"p.l21", "p.l22", "p.l23"), c("p.d17m", "p.d17", "p.c44", "p.l01",
"p.l01r", "p.l02", "p.l03", "p.l05", "p.l06", "p.l07", "p.l08",
"p.l10", "p.l11", "p.l12", "p.l13", "p.l15", "p.l16", "p.l17",
"p.l18", "p.l20", "p.l21", "p.l22", "p.l23"), c("pdate.", "p.d17m",
"p.d17", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165",
"p.d166", "p.d167", "p.c44", "p.l01", "p.l01r", "p.l02", "p.l03",
"p.l05", "p.l06", "p.l07", "p.l08", "p.l10", "p.l11", "p.l12"
), c("pdate.", "p.d17m", "p.d17", "p.d160", "p.d161", "p.d162",
"p.d163", "p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.l01",
"p.l01r", "p.l02", "p.l03", "p.l05", "p.l06", "p.l07", "p.l08",
"p.l10", "p.l11", "p.l12"), c("pdate.", "p.d200", "p.d201", "p.d17m",
"p.d17", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165",
"p.d166", "p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50",
"p.l01", "p.l01r", "p.l02", "p.l03", "p.l05"), c("pdate.", "p.d17m",
"p.d17", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165",
"p.d166", "p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50",
"p.l01", "p.l01r", "p.l02", "p.l03", "p.l05", "p.l06", "p.l07"
), c("pdate.", "p.d17m", "p.d17", "p.d160", "p.d161", "p.d162",
"p.d163", "p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.c47",
"p.c48", "p.c49", "p.c50", "p.l01", "p.l01r", "p.l02", "p.l03",
"p.l05", "p.l06", "p.l07"), c("plingu.", "pdate.", "p.d17m",
"p.d17", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165",
"p.d166", "p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50",
"p.c60", "p.c61", "p.c62", "p.c63", "p.c64", "p.c65"), c("hab_ch.",
"plingu.", "pdate.", "p.d17m", "p.d17", "p.d160", "p.d161", "p.d162",
"p.d163", "p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.c47",
"p.c48", "p.c49", "p.c50", "p.c60", "p.c61", "p.c62", "p.c63",
"p.c64"), c("hab_ch.", "plingu.", "pdate.", "p.d17m", "p.d17",
"p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165", "p.d166",
"p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50", "p.c60",
"p.c61", "p.c62", "p.c63", "p.c64"), c("hab_ch.", "plingu.",
"pdate.", "p.d17m", "p.d17", "p.d160", "p.d161", "p.d162", "p.d163",
"p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.c100", "p.c101",
"p.c102", "p.c103", "p.c47", "p.c48", "p.c49", "p.c50", "p.l01"
), c("hab_ch.", "plingu.", "pdate.", "p.d17m", "p.d17", "p.d110a",
"p.d110b", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164",
"p.d165", "p.d166", "p.d167", "p.c44", "p.c47", "p.c48", "p.c49",
"p.c50", "p.l01", "p.l01r", "p.l02"), c("hab_ch.", "plingu.",
"pdate.", "p.d17m", "p.d17", "p.d110a", "p.d110b", "p.d160",
"p.d161", "p.d162", "p.d163", "p.d164", "p.d165", "p.d166", "p.d167",
"p.c44", "p.c47", "p.c48", "p.c49", "p.c50", "p.l01", "p.l01r",
"p.l02"), c("hab_ch.", "plingu.", "pdate.", "p.d17m", "p.d17",
"p.d110a", "p.d110b", "p.d160", "p.d161", "p.d162", "p.d163",
"p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.c100", "p.c101",
"p.c102", "p.c103", "p.c47", "p.c48", "p.c49"), c("hab_ch.",
"plingu.", "pdate.", "p.d17m", "p.d17", "p.d110a", "p.d110b",
"p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165", "p.d166",
"p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50", "p.l01",
"p.l01r", "p.l02"))
n <- 10 # row numbers
fun <- function(x) {
# fun() produces a n x 22 data.frame
set.seed(x)
d <- data.frame(
cbind(id=1:n, yr=x,
matrix(rnorm(n*(20)), n)))
}
tmp <- setNames(lapply(1:6, fun), paste0("df", 1:6)) # produce six data frames
list2env(tmp, globalenv()) # get data frames into globalenv
# insert extra columns into some df's
df3 <- cbind(id=df3[, 1], foo=sample(1e3:2e3, nrow(df3), replace=TRUE), df3[, -1])
df4 <- cbind(id=df4[, 1], foo=sample(1e3:2e3, nrow(df4), replace=TRUE), df4[, -1])
df5 <- cbind(id=df5[, 1], foo=sample(1e3:2e3, nrow(df5), replace=TRUE), df5[, -1])
df6 <- cbind(id=df6[, 1], foo=sample(1e3:2e3, nrow(df5), replace=TRUE), df6[, -1])
# giving var names:
names(df1)[3:22] <- nm[[1]][3:22]
names(df2)[3:22] <- nm[[2]][3:22]
names(df3)[4:23] <- nm[[3]][4:23]
names(df4)[4:23] <- nm[[4]][4:23]
names(df5)[4:23] <- nm[[5]][4:23]
names(df5)[4:23] <- nm[[6]][4:23]
# list of data.frames:
L <- list(df1, df2, df3, df4, df5, df6)
# list of data.tables:
library(data.table)
L.dt <- lapply(L, function(x) setkeyv(data.table(x), c("id", "yr")))
Also related: 1, 2
r merge data.table
Respond to the error about the duplicates:cols <- data.table(cols=unlist(lapply(L, function(X){colnames(X)})))
;cols[, .N, by=cols][N>1,]
. e.g. you have columnp.l01
4 times. What do you want the output to be for this when there are 4 versions of these in the output? data.table handles some duplicates by putting.x
and.y
after, but it has limits. It's more robust to change the names of your data, going in to the merge, than alter them post-merge
– Jonny Phelps
Nov 23 '18 at 15:18
p.101
is always combined withyear
so there should expectedly be no problem. W/o this answer I'd expect the output to bep.101.x
,p.101.y
.p.101.something
, but at least this is solved in the linked answer. "but it has limits" - that's exactly the point. In base R it's working - in data.table not (yet).
– jay.sf
Nov 23 '18 at 15:36
When I tryReduce(function(...) merge(..., all=TRUE), L)
from your post, it removesp.l04
. I wouldn't call that working, I'd call that a silent bug. the data.table error is at least helpful here
– Jonny Phelps
Nov 23 '18 at 15:39
p.l04
is not in the names, tryany(sapply(L, function(x) "p.l04" %in% names(x)))
– jay.sf
Nov 23 '18 at 15:51
1
r u looking forReduce(function(x, y) merge(x, y, all=TRUE, by=intersect(names(x), names(y))), L.dt)
?
– chinsoon12
Nov 26 '18 at 1:01
|
show 3 more comments
When using merge.data.table()
I get ugly error messages about duplicated columns names—strangely enough only if there are more than three data.tables. On the contrary standard merge()
with data.frame
versions works fine.
I use this code, to achieve a full join avoid time-split columns with merge.data.table()
.
So, in base R
Reduce(function(...) merge(..., all=TRUE), L)
runs fine, whereas in data.table
Reduce(function(...)
merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt, names))), L.dt)
trows the error:
Error in merge.data.table(..., all = TRUE, by = Reduce(intersect, lapply(L.dt, :
x has some duplicated column name(s): p.l01.x,p.l02.x,p.l03.x,p.l05.x,p.l06.x,p.l07.x,
p.l08.x,p.l10.x,p.l11.x,p.l12.x,
p.l01.y,p.l02.y,p.l03.y,p.l05.y,p.l06.y,p.l07.y,
p.l08.y,p.l10.y,p.l11.y,p.l12.y. Please remove
or rename the duplicate(s) and try again.
In addition: Warning message:
In merge.data.table(..., all = TRUE, by = Reduce(intersect, lapply(L.dt, :
Error in merge.data.table(..., all = TRUE, by = Reduce(intersect, lapply(L.dt, :
x has some duplicated column name(s): p.l01.x,p.l02.x,p.l03.x,p.l05.x,p.l06.x,p.l07.x,
p.l08.x,p.l10.x,p.l11.x,p.l12.x,
p.l01.y,p.l02.y,p.l03.y,p.l05.y,p.l06.y,p.l07.y,p.l08.y,p.l10.y,p.l11.y,p.l12.y. Please remove
or rename the duplicate(s) and try again.
Note that the issue appears to be cumulative somehow...
While
Reduce(function(...)
merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:2], names))), L.dt[1:2])
Reduce(function(...)
merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:3], names))), L.dt[1:3])
Reduce(function(...)
merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:4], names))), L.dt[1:4])
still runs fine,
Reduce(function(...)
merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:5], names))), L.dt[1:5])
throws a warning, and from here
Reduce(function(...)
merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:6], names))), L.dt[1:6])
finally the error starts.
Thus, merge.data.table()
appears to be working fine only until the number of data.tables exceeds 3?? Any ideas how I could solve this issue?
Note: With easy example data no error appears. So it could be caused by my data. But I have no idea why—standard merge()
works fine! My real data has about dim=15.000*500, some variable names match throughout the list, others are individual. This MCVE seems to reproduce the issue quite well.
Data and code to reproduce
# names vector mimicking my data
nm <- list(c("p.d17m", "p.d17", "p.d29", "p.d31", "p.n03", "p.n04",
"p.n05", "p.d36", "p.d40", "p.d41", "p.d45", "p.d46", "p.d50",
"p.d51", "p.d55", "p.d56", "p.d60", "p.d61", "p.d65", "p.d66",
"p.d70", "p.d71", "p.n08"), c("p.d17m", "p.d17", "p.c44", "p.l01",
"p.l02", "p.l03", "p.l05", "p.l06", "p.l07", "p.l08", "p.l10",
"p.l11", "p.l12", "p.l13", "p.l15", "p.l16", "p.l17", "p.l18",
"p.l20", "p.l21", "p.l22", "p.l23", "p.l25"), c("p.d17m", "p.d17",
"p.c44", "p.l01", "p.l02", "p.l03", "p.l05", "p.l06", "p.l07",
"p.l08", "p.l10", "p.l11", "p.l12", "p.l13", "p.l15", "p.l16",
"p.l17", "p.l18", "p.l20", "p.l21", "p.l22", "p.l23", "p.l25"
), c("p.d17m", "p.d17", "p.c44", "p.l01", "p.l01r", "p.l02",
"p.l03", "p.l05", "p.l06", "p.l07", "p.l08", "p.l10", "p.l11",
"p.l12", "p.l13", "p.l15", "p.l16", "p.l17", "p.l18", "p.l20",
"p.l21", "p.l22", "p.l23"), c("p.d17m", "p.d17", "p.c44", "p.l01",
"p.l01r", "p.l02", "p.l03", "p.l05", "p.l06", "p.l07", "p.l08",
"p.l10", "p.l11", "p.l12", "p.l13", "p.l15", "p.l16", "p.l17",
"p.l18", "p.l20", "p.l21", "p.l22", "p.l23"), c("pdate.", "p.d17m",
"p.d17", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165",
"p.d166", "p.d167", "p.c44", "p.l01", "p.l01r", "p.l02", "p.l03",
"p.l05", "p.l06", "p.l07", "p.l08", "p.l10", "p.l11", "p.l12"
), c("pdate.", "p.d17m", "p.d17", "p.d160", "p.d161", "p.d162",
"p.d163", "p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.l01",
"p.l01r", "p.l02", "p.l03", "p.l05", "p.l06", "p.l07", "p.l08",
"p.l10", "p.l11", "p.l12"), c("pdate.", "p.d200", "p.d201", "p.d17m",
"p.d17", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165",
"p.d166", "p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50",
"p.l01", "p.l01r", "p.l02", "p.l03", "p.l05"), c("pdate.", "p.d17m",
"p.d17", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165",
"p.d166", "p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50",
"p.l01", "p.l01r", "p.l02", "p.l03", "p.l05", "p.l06", "p.l07"
), c("pdate.", "p.d17m", "p.d17", "p.d160", "p.d161", "p.d162",
"p.d163", "p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.c47",
"p.c48", "p.c49", "p.c50", "p.l01", "p.l01r", "p.l02", "p.l03",
"p.l05", "p.l06", "p.l07"), c("plingu.", "pdate.", "p.d17m",
"p.d17", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165",
"p.d166", "p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50",
"p.c60", "p.c61", "p.c62", "p.c63", "p.c64", "p.c65"), c("hab_ch.",
"plingu.", "pdate.", "p.d17m", "p.d17", "p.d160", "p.d161", "p.d162",
"p.d163", "p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.c47",
"p.c48", "p.c49", "p.c50", "p.c60", "p.c61", "p.c62", "p.c63",
"p.c64"), c("hab_ch.", "plingu.", "pdate.", "p.d17m", "p.d17",
"p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165", "p.d166",
"p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50", "p.c60",
"p.c61", "p.c62", "p.c63", "p.c64"), c("hab_ch.", "plingu.",
"pdate.", "p.d17m", "p.d17", "p.d160", "p.d161", "p.d162", "p.d163",
"p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.c100", "p.c101",
"p.c102", "p.c103", "p.c47", "p.c48", "p.c49", "p.c50", "p.l01"
), c("hab_ch.", "plingu.", "pdate.", "p.d17m", "p.d17", "p.d110a",
"p.d110b", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164",
"p.d165", "p.d166", "p.d167", "p.c44", "p.c47", "p.c48", "p.c49",
"p.c50", "p.l01", "p.l01r", "p.l02"), c("hab_ch.", "plingu.",
"pdate.", "p.d17m", "p.d17", "p.d110a", "p.d110b", "p.d160",
"p.d161", "p.d162", "p.d163", "p.d164", "p.d165", "p.d166", "p.d167",
"p.c44", "p.c47", "p.c48", "p.c49", "p.c50", "p.l01", "p.l01r",
"p.l02"), c("hab_ch.", "plingu.", "pdate.", "p.d17m", "p.d17",
"p.d110a", "p.d110b", "p.d160", "p.d161", "p.d162", "p.d163",
"p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.c100", "p.c101",
"p.c102", "p.c103", "p.c47", "p.c48", "p.c49"), c("hab_ch.",
"plingu.", "pdate.", "p.d17m", "p.d17", "p.d110a", "p.d110b",
"p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165", "p.d166",
"p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50", "p.l01",
"p.l01r", "p.l02"))
n <- 10 # row numbers
fun <- function(x) {
# fun() produces a n x 22 data.frame
set.seed(x)
d <- data.frame(
cbind(id=1:n, yr=x,
matrix(rnorm(n*(20)), n)))
}
tmp <- setNames(lapply(1:6, fun), paste0("df", 1:6)) # produce six data frames
list2env(tmp, globalenv()) # get data frames into globalenv
# insert extra columns into some df's
df3 <- cbind(id=df3[, 1], foo=sample(1e3:2e3, nrow(df3), replace=TRUE), df3[, -1])
df4 <- cbind(id=df4[, 1], foo=sample(1e3:2e3, nrow(df4), replace=TRUE), df4[, -1])
df5 <- cbind(id=df5[, 1], foo=sample(1e3:2e3, nrow(df5), replace=TRUE), df5[, -1])
df6 <- cbind(id=df6[, 1], foo=sample(1e3:2e3, nrow(df5), replace=TRUE), df6[, -1])
# giving var names:
names(df1)[3:22] <- nm[[1]][3:22]
names(df2)[3:22] <- nm[[2]][3:22]
names(df3)[4:23] <- nm[[3]][4:23]
names(df4)[4:23] <- nm[[4]][4:23]
names(df5)[4:23] <- nm[[5]][4:23]
names(df5)[4:23] <- nm[[6]][4:23]
# list of data.frames:
L <- list(df1, df2, df3, df4, df5, df6)
# list of data.tables:
library(data.table)
L.dt <- lapply(L, function(x) setkeyv(data.table(x), c("id", "yr")))
Also related: 1, 2
r merge data.table
When using merge.data.table()
I get ugly error messages about duplicated columns names—strangely enough only if there are more than three data.tables. On the contrary standard merge()
with data.frame
versions works fine.
I use this code, to achieve a full join avoid time-split columns with merge.data.table()
.
So, in base R
Reduce(function(...) merge(..., all=TRUE), L)
runs fine, whereas in data.table
Reduce(function(...)
merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt, names))), L.dt)
trows the error:
Error in merge.data.table(..., all = TRUE, by = Reduce(intersect, lapply(L.dt, :
x has some duplicated column name(s): p.l01.x,p.l02.x,p.l03.x,p.l05.x,p.l06.x,p.l07.x,
p.l08.x,p.l10.x,p.l11.x,p.l12.x,
p.l01.y,p.l02.y,p.l03.y,p.l05.y,p.l06.y,p.l07.y,
p.l08.y,p.l10.y,p.l11.y,p.l12.y. Please remove
or rename the duplicate(s) and try again.
In addition: Warning message:
In merge.data.table(..., all = TRUE, by = Reduce(intersect, lapply(L.dt, :
Error in merge.data.table(..., all = TRUE, by = Reduce(intersect, lapply(L.dt, :
x has some duplicated column name(s): p.l01.x,p.l02.x,p.l03.x,p.l05.x,p.l06.x,p.l07.x,
p.l08.x,p.l10.x,p.l11.x,p.l12.x,
p.l01.y,p.l02.y,p.l03.y,p.l05.y,p.l06.y,p.l07.y,p.l08.y,p.l10.y,p.l11.y,p.l12.y. Please remove
or rename the duplicate(s) and try again.
Note that the issue appears to be cumulative somehow...
While
Reduce(function(...)
merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:2], names))), L.dt[1:2])
Reduce(function(...)
merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:3], names))), L.dt[1:3])
Reduce(function(...)
merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:4], names))), L.dt[1:4])
still runs fine,
Reduce(function(...)
merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:5], names))), L.dt[1:5])
throws a warning, and from here
Reduce(function(...)
merge(..., all=TRUE, by=Reduce(intersect, lapply(L.dt[1:6], names))), L.dt[1:6])
finally the error starts.
Thus, merge.data.table()
appears to be working fine only until the number of data.tables exceeds 3?? Any ideas how I could solve this issue?
Note: With easy example data no error appears. So it could be caused by my data. But I have no idea why—standard merge()
works fine! My real data has about dim=15.000*500, some variable names match throughout the list, others are individual. This MCVE seems to reproduce the issue quite well.
Data and code to reproduce
# names vector mimicking my data
nm <- list(c("p.d17m", "p.d17", "p.d29", "p.d31", "p.n03", "p.n04",
"p.n05", "p.d36", "p.d40", "p.d41", "p.d45", "p.d46", "p.d50",
"p.d51", "p.d55", "p.d56", "p.d60", "p.d61", "p.d65", "p.d66",
"p.d70", "p.d71", "p.n08"), c("p.d17m", "p.d17", "p.c44", "p.l01",
"p.l02", "p.l03", "p.l05", "p.l06", "p.l07", "p.l08", "p.l10",
"p.l11", "p.l12", "p.l13", "p.l15", "p.l16", "p.l17", "p.l18",
"p.l20", "p.l21", "p.l22", "p.l23", "p.l25"), c("p.d17m", "p.d17",
"p.c44", "p.l01", "p.l02", "p.l03", "p.l05", "p.l06", "p.l07",
"p.l08", "p.l10", "p.l11", "p.l12", "p.l13", "p.l15", "p.l16",
"p.l17", "p.l18", "p.l20", "p.l21", "p.l22", "p.l23", "p.l25"
), c("p.d17m", "p.d17", "p.c44", "p.l01", "p.l01r", "p.l02",
"p.l03", "p.l05", "p.l06", "p.l07", "p.l08", "p.l10", "p.l11",
"p.l12", "p.l13", "p.l15", "p.l16", "p.l17", "p.l18", "p.l20",
"p.l21", "p.l22", "p.l23"), c("p.d17m", "p.d17", "p.c44", "p.l01",
"p.l01r", "p.l02", "p.l03", "p.l05", "p.l06", "p.l07", "p.l08",
"p.l10", "p.l11", "p.l12", "p.l13", "p.l15", "p.l16", "p.l17",
"p.l18", "p.l20", "p.l21", "p.l22", "p.l23"), c("pdate.", "p.d17m",
"p.d17", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165",
"p.d166", "p.d167", "p.c44", "p.l01", "p.l01r", "p.l02", "p.l03",
"p.l05", "p.l06", "p.l07", "p.l08", "p.l10", "p.l11", "p.l12"
), c("pdate.", "p.d17m", "p.d17", "p.d160", "p.d161", "p.d162",
"p.d163", "p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.l01",
"p.l01r", "p.l02", "p.l03", "p.l05", "p.l06", "p.l07", "p.l08",
"p.l10", "p.l11", "p.l12"), c("pdate.", "p.d200", "p.d201", "p.d17m",
"p.d17", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165",
"p.d166", "p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50",
"p.l01", "p.l01r", "p.l02", "p.l03", "p.l05"), c("pdate.", "p.d17m",
"p.d17", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165",
"p.d166", "p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50",
"p.l01", "p.l01r", "p.l02", "p.l03", "p.l05", "p.l06", "p.l07"
), c("pdate.", "p.d17m", "p.d17", "p.d160", "p.d161", "p.d162",
"p.d163", "p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.c47",
"p.c48", "p.c49", "p.c50", "p.l01", "p.l01r", "p.l02", "p.l03",
"p.l05", "p.l06", "p.l07"), c("plingu.", "pdate.", "p.d17m",
"p.d17", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165",
"p.d166", "p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50",
"p.c60", "p.c61", "p.c62", "p.c63", "p.c64", "p.c65"), c("hab_ch.",
"plingu.", "pdate.", "p.d17m", "p.d17", "p.d160", "p.d161", "p.d162",
"p.d163", "p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.c47",
"p.c48", "p.c49", "p.c50", "p.c60", "p.c61", "p.c62", "p.c63",
"p.c64"), c("hab_ch.", "plingu.", "pdate.", "p.d17m", "p.d17",
"p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165", "p.d166",
"p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50", "p.c60",
"p.c61", "p.c62", "p.c63", "p.c64"), c("hab_ch.", "plingu.",
"pdate.", "p.d17m", "p.d17", "p.d160", "p.d161", "p.d162", "p.d163",
"p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.c100", "p.c101",
"p.c102", "p.c103", "p.c47", "p.c48", "p.c49", "p.c50", "p.l01"
), c("hab_ch.", "plingu.", "pdate.", "p.d17m", "p.d17", "p.d110a",
"p.d110b", "p.d160", "p.d161", "p.d162", "p.d163", "p.d164",
"p.d165", "p.d166", "p.d167", "p.c44", "p.c47", "p.c48", "p.c49",
"p.c50", "p.l01", "p.l01r", "p.l02"), c("hab_ch.", "plingu.",
"pdate.", "p.d17m", "p.d17", "p.d110a", "p.d110b", "p.d160",
"p.d161", "p.d162", "p.d163", "p.d164", "p.d165", "p.d166", "p.d167",
"p.c44", "p.c47", "p.c48", "p.c49", "p.c50", "p.l01", "p.l01r",
"p.l02"), c("hab_ch.", "plingu.", "pdate.", "p.d17m", "p.d17",
"p.d110a", "p.d110b", "p.d160", "p.d161", "p.d162", "p.d163",
"p.d164", "p.d165", "p.d166", "p.d167", "p.c44", "p.c100", "p.c101",
"p.c102", "p.c103", "p.c47", "p.c48", "p.c49"), c("hab_ch.",
"plingu.", "pdate.", "p.d17m", "p.d17", "p.d110a", "p.d110b",
"p.d160", "p.d161", "p.d162", "p.d163", "p.d164", "p.d165", "p.d166",
"p.d167", "p.c44", "p.c47", "p.c48", "p.c49", "p.c50", "p.l01",
"p.l01r", "p.l02"))
n <- 10 # row numbers
fun <- function(x) {
# fun() produces a n x 22 data.frame
set.seed(x)
d <- data.frame(
cbind(id=1:n, yr=x,
matrix(rnorm(n*(20)), n)))
}
tmp <- setNames(lapply(1:6, fun), paste0("df", 1:6)) # produce six data frames
list2env(tmp, globalenv()) # get data frames into globalenv
# insert extra columns into some df's
df3 <- cbind(id=df3[, 1], foo=sample(1e3:2e3, nrow(df3), replace=TRUE), df3[, -1])
df4 <- cbind(id=df4[, 1], foo=sample(1e3:2e3, nrow(df4), replace=TRUE), df4[, -1])
df5 <- cbind(id=df5[, 1], foo=sample(1e3:2e3, nrow(df5), replace=TRUE), df5[, -1])
df6 <- cbind(id=df6[, 1], foo=sample(1e3:2e3, nrow(df5), replace=TRUE), df6[, -1])
# giving var names:
names(df1)[3:22] <- nm[[1]][3:22]
names(df2)[3:22] <- nm[[2]][3:22]
names(df3)[4:23] <- nm[[3]][4:23]
names(df4)[4:23] <- nm[[4]][4:23]
names(df5)[4:23] <- nm[[5]][4:23]
names(df5)[4:23] <- nm[[6]][4:23]
# list of data.frames:
L <- list(df1, df2, df3, df4, df5, df6)
# list of data.tables:
library(data.table)
L.dt <- lapply(L, function(x) setkeyv(data.table(x), c("id", "yr")))
Also related: 1, 2
r merge data.table
r merge data.table
edited Nov 24 '18 at 10:24
jay.sf
asked Nov 23 '18 at 14:10
jay.sfjay.sf
6,76131841
6,76131841
Respond to the error about the duplicates:cols <- data.table(cols=unlist(lapply(L, function(X){colnames(X)})))
;cols[, .N, by=cols][N>1,]
. e.g. you have columnp.l01
4 times. What do you want the output to be for this when there are 4 versions of these in the output? data.table handles some duplicates by putting.x
and.y
after, but it has limits. It's more robust to change the names of your data, going in to the merge, than alter them post-merge
– Jonny Phelps
Nov 23 '18 at 15:18
p.101
is always combined withyear
so there should expectedly be no problem. W/o this answer I'd expect the output to bep.101.x
,p.101.y
.p.101.something
, but at least this is solved in the linked answer. "but it has limits" - that's exactly the point. In base R it's working - in data.table not (yet).
– jay.sf
Nov 23 '18 at 15:36
When I tryReduce(function(...) merge(..., all=TRUE), L)
from your post, it removesp.l04
. I wouldn't call that working, I'd call that a silent bug. the data.table error is at least helpful here
– Jonny Phelps
Nov 23 '18 at 15:39
p.l04
is not in the names, tryany(sapply(L, function(x) "p.l04" %in% names(x)))
– jay.sf
Nov 23 '18 at 15:51
1
r u looking forReduce(function(x, y) merge(x, y, all=TRUE, by=intersect(names(x), names(y))), L.dt)
?
– chinsoon12
Nov 26 '18 at 1:01
|
show 3 more comments
Respond to the error about the duplicates:cols <- data.table(cols=unlist(lapply(L, function(X){colnames(X)})))
;cols[, .N, by=cols][N>1,]
. e.g. you have columnp.l01
4 times. What do you want the output to be for this when there are 4 versions of these in the output? data.table handles some duplicates by putting.x
and.y
after, but it has limits. It's more robust to change the names of your data, going in to the merge, than alter them post-merge
– Jonny Phelps
Nov 23 '18 at 15:18
p.101
is always combined withyear
so there should expectedly be no problem. W/o this answer I'd expect the output to bep.101.x
,p.101.y
.p.101.something
, but at least this is solved in the linked answer. "but it has limits" - that's exactly the point. In base R it's working - in data.table not (yet).
– jay.sf
Nov 23 '18 at 15:36
When I tryReduce(function(...) merge(..., all=TRUE), L)
from your post, it removesp.l04
. I wouldn't call that working, I'd call that a silent bug. the data.table error is at least helpful here
– Jonny Phelps
Nov 23 '18 at 15:39
p.l04
is not in the names, tryany(sapply(L, function(x) "p.l04" %in% names(x)))
– jay.sf
Nov 23 '18 at 15:51
1
r u looking forReduce(function(x, y) merge(x, y, all=TRUE, by=intersect(names(x), names(y))), L.dt)
?
– chinsoon12
Nov 26 '18 at 1:01
Respond to the error about the duplicates:
cols <- data.table(cols=unlist(lapply(L, function(X){colnames(X)})))
; cols[, .N, by=cols][N>1,]
. e.g. you have column p.l01
4 times. What do you want the output to be for this when there are 4 versions of these in the output? data.table handles some duplicates by putting .x
and .y
after, but it has limits. It's more robust to change the names of your data, going in to the merge, than alter them post-merge– Jonny Phelps
Nov 23 '18 at 15:18
Respond to the error about the duplicates:
cols <- data.table(cols=unlist(lapply(L, function(X){colnames(X)})))
; cols[, .N, by=cols][N>1,]
. e.g. you have column p.l01
4 times. What do you want the output to be for this when there are 4 versions of these in the output? data.table handles some duplicates by putting .x
and .y
after, but it has limits. It's more robust to change the names of your data, going in to the merge, than alter them post-merge– Jonny Phelps
Nov 23 '18 at 15:18
p.101
is always combined with year
so there should expectedly be no problem. W/o this answer I'd expect the output to be p.101.x
, p.101.y
. p.101.something
, but at least this is solved in the linked answer. "but it has limits" - that's exactly the point. In base R it's working - in data.table not (yet).– jay.sf
Nov 23 '18 at 15:36
p.101
is always combined with year
so there should expectedly be no problem. W/o this answer I'd expect the output to be p.101.x
, p.101.y
. p.101.something
, but at least this is solved in the linked answer. "but it has limits" - that's exactly the point. In base R it's working - in data.table not (yet).– jay.sf
Nov 23 '18 at 15:36
When I try
Reduce(function(...) merge(..., all=TRUE), L)
from your post, it removes p.l04
. I wouldn't call that working, I'd call that a silent bug. the data.table error is at least helpful here– Jonny Phelps
Nov 23 '18 at 15:39
When I try
Reduce(function(...) merge(..., all=TRUE), L)
from your post, it removes p.l04
. I wouldn't call that working, I'd call that a silent bug. the data.table error is at least helpful here– Jonny Phelps
Nov 23 '18 at 15:39
p.l04
is not in the names, try any(sapply(L, function(x) "p.l04" %in% names(x)))
– jay.sf
Nov 23 '18 at 15:51
p.l04
is not in the names, try any(sapply(L, function(x) "p.l04" %in% names(x)))
– jay.sf
Nov 23 '18 at 15:51
1
1
r u looking for
Reduce(function(x, y) merge(x, y, all=TRUE, by=intersect(names(x), names(y))), L.dt)
?– chinsoon12
Nov 26 '18 at 1:01
r u looking for
Reduce(function(x, y) merge(x, y, all=TRUE, by=intersect(names(x), names(y))), L.dt)
?– chinsoon12
Nov 26 '18 at 1:01
|
show 3 more comments
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53448228%2fhow-to-merge-full-join-a-list-of-more-than-three-data-tables-correctly%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53448228%2fhow-to-merge-full-join-a-list-of-more-than-three-data-tables-correctly%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Respond to the error about the duplicates:
cols <- data.table(cols=unlist(lapply(L, function(X){colnames(X)})))
;cols[, .N, by=cols][N>1,]
. e.g. you have columnp.l01
4 times. What do you want the output to be for this when there are 4 versions of these in the output? data.table handles some duplicates by putting.x
and.y
after, but it has limits. It's more robust to change the names of your data, going in to the merge, than alter them post-merge– Jonny Phelps
Nov 23 '18 at 15:18
p.101
is always combined withyear
so there should expectedly be no problem. W/o this answer I'd expect the output to bep.101.x
,p.101.y
.p.101.something
, but at least this is solved in the linked answer. "but it has limits" - that's exactly the point. In base R it's working - in data.table not (yet).– jay.sf
Nov 23 '18 at 15:36
When I try
Reduce(function(...) merge(..., all=TRUE), L)
from your post, it removesp.l04
. I wouldn't call that working, I'd call that a silent bug. the data.table error is at least helpful here– Jonny Phelps
Nov 23 '18 at 15:39
p.l04
is not in the names, tryany(sapply(L, function(x) "p.l04" %in% names(x)))
– jay.sf
Nov 23 '18 at 15:51
1
r u looking for
Reduce(function(x, y) merge(x, y, all=TRUE, by=intersect(names(x), names(y))), L.dt)
?– chinsoon12
Nov 26 '18 at 1:01