How can I avoid complex for loops?












11















I am currently working with a series of large datasets and I'm trying to improve how I write scripts in R. I tend to mostly make use of for loops which I know can be cumbersome and slow, espeically with very large datasets.



I have heard a lot of people recommending the apply() family to avoid complex for loops, but I am struggling to get my head around using them to apply multiple functions in one go.



Here is some simple example data:



A <- data.frame('Area' = c(4, 6, 5),
'flow' = c(1, 1, 1))
B <- data.frame('Area' = c(6, 8, 4),
'flow' = c(1, 2, 1))
files <- list(A, B)
frames <- list('A', 'B')


What I want to do is sort the data by the 'flow' variable, then add columns for the portion of total 'flow' and 'area' each data point represents, before finally adding a further two columns of the cumulative percentage of each variable.



Currently I use this for loop:



sort_files <- list()
n <- 1
for(i in files){
name <- frames[n]
nom <- paste(name,'_sorted', sep = '')
data <- i[order(-i$flow),]
area <- sum(i$Area)
total <- sum(i$flow)
data$area_portion <- (data$Area/area)*100
data$flow_portion <- (data$flow/total)*100
data$cum_area <- cumsum(data$area_portion)
data$cum_flow <- cumsum(data$flow_portion)
assign(nom, data)
df <- get(paste(name,'_sorted', sep = ''))
sort_files[[nom]] <- df
n <- n + 1
}


Which works, but seems overly complex and ugly, and I'm sure it will run far slower than better scripts.



How can I simplify and neaten up the above code?



This is the expected output:



sort_files

$`A_sorted`
Area flow area_portion flow_portion cum_area cum_flow
1 4 1 26.66667 33.33333 26.66667 33.33333
2 6 1 40.00000 33.33333 66.66667 66.66667
3 5 1 33.33333 33.33333 100.00000 100.00000

$B_sorted
Area flow area_portion flow_portion cum_area cum_flow
2 8 2 44.44444 50 44.44444 50
1 6 1 33.33333 25 77.77778 75
3 4 1 22.22222 25 100.00000 100









share|improve this question




















  • 2





    You didn't define pos_files in the script above. Also names is a function so you better not define an object with that name.

    – markus
    yesterday






  • 1





    av_portion is also missing, although I understand it's the mean o Area. files is also a R function.

    – patL
    yesterday








  • 2





    @tom91: can you add the expected output too?

    – Tung
    yesterday











  • @markus and patL Sorry! I just realised I copied over the script with the actual variable names and not the test one. I have updated it now.

    – tom91
    yesterday











  • @Tung Expected output has been added to the bottom

    – tom91
    yesterday
















11















I am currently working with a series of large datasets and I'm trying to improve how I write scripts in R. I tend to mostly make use of for loops which I know can be cumbersome and slow, espeically with very large datasets.



I have heard a lot of people recommending the apply() family to avoid complex for loops, but I am struggling to get my head around using them to apply multiple functions in one go.



Here is some simple example data:



A <- data.frame('Area' = c(4, 6, 5),
'flow' = c(1, 1, 1))
B <- data.frame('Area' = c(6, 8, 4),
'flow' = c(1, 2, 1))
files <- list(A, B)
frames <- list('A', 'B')


What I want to do is sort the data by the 'flow' variable, then add columns for the portion of total 'flow' and 'area' each data point represents, before finally adding a further two columns of the cumulative percentage of each variable.



Currently I use this for loop:



sort_files <- list()
n <- 1
for(i in files){
name <- frames[n]
nom <- paste(name,'_sorted', sep = '')
data <- i[order(-i$flow),]
area <- sum(i$Area)
total <- sum(i$flow)
data$area_portion <- (data$Area/area)*100
data$flow_portion <- (data$flow/total)*100
data$cum_area <- cumsum(data$area_portion)
data$cum_flow <- cumsum(data$flow_portion)
assign(nom, data)
df <- get(paste(name,'_sorted', sep = ''))
sort_files[[nom]] <- df
n <- n + 1
}


Which works, but seems overly complex and ugly, and I'm sure it will run far slower than better scripts.



How can I simplify and neaten up the above code?



This is the expected output:



sort_files

$`A_sorted`
Area flow area_portion flow_portion cum_area cum_flow
1 4 1 26.66667 33.33333 26.66667 33.33333
2 6 1 40.00000 33.33333 66.66667 66.66667
3 5 1 33.33333 33.33333 100.00000 100.00000

$B_sorted
Area flow area_portion flow_portion cum_area cum_flow
2 8 2 44.44444 50 44.44444 50
1 6 1 33.33333 25 77.77778 75
3 4 1 22.22222 25 100.00000 100









share|improve this question




















  • 2





    You didn't define pos_files in the script above. Also names is a function so you better not define an object with that name.

    – markus
    yesterday






  • 1





    av_portion is also missing, although I understand it's the mean o Area. files is also a R function.

    – patL
    yesterday








  • 2





    @tom91: can you add the expected output too?

    – Tung
    yesterday











  • @markus and patL Sorry! I just realised I copied over the script with the actual variable names and not the test one. I have updated it now.

    – tom91
    yesterday











  • @Tung Expected output has been added to the bottom

    – tom91
    yesterday














11












11








11








I am currently working with a series of large datasets and I'm trying to improve how I write scripts in R. I tend to mostly make use of for loops which I know can be cumbersome and slow, espeically with very large datasets.



I have heard a lot of people recommending the apply() family to avoid complex for loops, but I am struggling to get my head around using them to apply multiple functions in one go.



Here is some simple example data:



A <- data.frame('Area' = c(4, 6, 5),
'flow' = c(1, 1, 1))
B <- data.frame('Area' = c(6, 8, 4),
'flow' = c(1, 2, 1))
files <- list(A, B)
frames <- list('A', 'B')


What I want to do is sort the data by the 'flow' variable, then add columns for the portion of total 'flow' and 'area' each data point represents, before finally adding a further two columns of the cumulative percentage of each variable.



Currently I use this for loop:



sort_files <- list()
n <- 1
for(i in files){
name <- frames[n]
nom <- paste(name,'_sorted', sep = '')
data <- i[order(-i$flow),]
area <- sum(i$Area)
total <- sum(i$flow)
data$area_portion <- (data$Area/area)*100
data$flow_portion <- (data$flow/total)*100
data$cum_area <- cumsum(data$area_portion)
data$cum_flow <- cumsum(data$flow_portion)
assign(nom, data)
df <- get(paste(name,'_sorted', sep = ''))
sort_files[[nom]] <- df
n <- n + 1
}


Which works, but seems overly complex and ugly, and I'm sure it will run far slower than better scripts.



How can I simplify and neaten up the above code?



This is the expected output:



sort_files

$`A_sorted`
Area flow area_portion flow_portion cum_area cum_flow
1 4 1 26.66667 33.33333 26.66667 33.33333
2 6 1 40.00000 33.33333 66.66667 66.66667
3 5 1 33.33333 33.33333 100.00000 100.00000

$B_sorted
Area flow area_portion flow_portion cum_area cum_flow
2 8 2 44.44444 50 44.44444 50
1 6 1 33.33333 25 77.77778 75
3 4 1 22.22222 25 100.00000 100









share|improve this question
















I am currently working with a series of large datasets and I'm trying to improve how I write scripts in R. I tend to mostly make use of for loops which I know can be cumbersome and slow, espeically with very large datasets.



I have heard a lot of people recommending the apply() family to avoid complex for loops, but I am struggling to get my head around using them to apply multiple functions in one go.



Here is some simple example data:



A <- data.frame('Area' = c(4, 6, 5),
'flow' = c(1, 1, 1))
B <- data.frame('Area' = c(6, 8, 4),
'flow' = c(1, 2, 1))
files <- list(A, B)
frames <- list('A', 'B')


What I want to do is sort the data by the 'flow' variable, then add columns for the portion of total 'flow' and 'area' each data point represents, before finally adding a further two columns of the cumulative percentage of each variable.



Currently I use this for loop:



sort_files <- list()
n <- 1
for(i in files){
name <- frames[n]
nom <- paste(name,'_sorted', sep = '')
data <- i[order(-i$flow),]
area <- sum(i$Area)
total <- sum(i$flow)
data$area_portion <- (data$Area/area)*100
data$flow_portion <- (data$flow/total)*100
data$cum_area <- cumsum(data$area_portion)
data$cum_flow <- cumsum(data$flow_portion)
assign(nom, data)
df <- get(paste(name,'_sorted', sep = ''))
sort_files[[nom]] <- df
n <- n + 1
}


Which works, but seems overly complex and ugly, and I'm sure it will run far slower than better scripts.



How can I simplify and neaten up the above code?



This is the expected output:



sort_files

$`A_sorted`
Area flow area_portion flow_portion cum_area cum_flow
1 4 1 26.66667 33.33333 26.66667 33.33333
2 6 1 40.00000 33.33333 66.66667 66.66667
3 5 1 33.33333 33.33333 100.00000 100.00000

$B_sorted
Area flow area_portion flow_portion cum_area cum_flow
2 8 2 44.44444 50 44.44444 50
1 6 1 33.33333 25 77.77778 75
3 4 1 22.22222 25 100.00000 100






r for-loop






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited yesterday









double-beep

2,1892824




2,1892824










asked yesterday









tom91tom91

16111




16111








  • 2





    You didn't define pos_files in the script above. Also names is a function so you better not define an object with that name.

    – markus
    yesterday






  • 1





    av_portion is also missing, although I understand it's the mean o Area. files is also a R function.

    – patL
    yesterday








  • 2





    @tom91: can you add the expected output too?

    – Tung
    yesterday











  • @markus and patL Sorry! I just realised I copied over the script with the actual variable names and not the test one. I have updated it now.

    – tom91
    yesterday











  • @Tung Expected output has been added to the bottom

    – tom91
    yesterday














  • 2





    You didn't define pos_files in the script above. Also names is a function so you better not define an object with that name.

    – markus
    yesterday






  • 1





    av_portion is also missing, although I understand it's the mean o Area. files is also a R function.

    – patL
    yesterday








  • 2





    @tom91: can you add the expected output too?

    – Tung
    yesterday











  • @markus and patL Sorry! I just realised I copied over the script with the actual variable names and not the test one. I have updated it now.

    – tom91
    yesterday











  • @Tung Expected output has been added to the bottom

    – tom91
    yesterday








2




2





You didn't define pos_files in the script above. Also names is a function so you better not define an object with that name.

– markus
yesterday





You didn't define pos_files in the script above. Also names is a function so you better not define an object with that name.

– markus
yesterday




1




1





av_portion is also missing, although I understand it's the mean o Area. files is also a R function.

– patL
yesterday







av_portion is also missing, although I understand it's the mean o Area. files is also a R function.

– patL
yesterday






2




2





@tom91: can you add the expected output too?

– Tung
yesterday





@tom91: can you add the expected output too?

– Tung
yesterday













@markus and patL Sorry! I just realised I copied over the script with the actual variable names and not the test one. I have updated it now.

– tom91
yesterday





@markus and patL Sorry! I just realised I copied over the script with the actual variable names and not the test one. I have updated it now.

– tom91
yesterday













@Tung Expected output has been added to the bottom

– tom91
yesterday





@Tung Expected output has been added to the bottom

– tom91
yesterday












2 Answers
2






active

oldest

votes


















12














Using lapply to loop over files and dplyr mutate to add new columns



library(dplyr)

setNames(lapply(files, function(x)
x %>%
arrange(desc(flow)) %>%
mutate(area_portion = Area/sum(Area)*100,
flow_portion = flow/sum(flow) * 100,
cum_area = cumsum(area_portion),
cum_flow = cumsum(flow_portion))
),paste0(frames, "_sorted"))


#$A_sorted
# Area flow area_portion flow_portion cum_area cum_flow
#1 4 1 26.66667 33.33333 26.66667 33.33333
#2 6 1 40.00000 33.33333 66.66667 66.66667
#3 5 1 33.33333 33.33333 100.00000 100.00000

#$B_sorted
# Area flow area_portion flow_portion cum_area cum_flow
#1 8 2 44.44444 50 44.44444 50
#2 6 1 33.33333 25 77.77778 75
#3 4 1 22.22222 25 100.00000 100




Or completely going tidyverse way we can change lapply with map and setNames with set_names



library(tidyverse)

map(set_names(files, str_c(frames, "_sorted")),
. %>% arrange(desc(flow)) %>%
mutate(area_portion = Area/sum(Area)*100,
flow_portion = flow/sum(flow) * 100,
cum_area = cumsum(area_portion),
cum_flow = cumsum(flow_portion)))


Updated the tidyverse approach following some pointers from @Moody_Mudskipper.






share|improve this answer


























  • This is excellent, and exactly the kind of thing I was after. Out of interest what are the benefits of going the tidyverse route?

    – tom91
    yesterday






  • 1





    @tom91 In this case not much benefit I would say. But some people find tidyverse more readable and easy to understand.

    – Ronak Shah
    yesterday













  • some very minor points, forgive me for scratching that itch: (1) if you really want to go full tidyverse you can use str_c (it's almost the same but has a few differences : stackoverflow.com/questions/53118271/… ). (2) you don't need to unlist frames. (3) To make avoid these embedded parentheses over several lines you could put the set_names after a pipe in the end OR (and this is what I'd do), rename files instead so you get the naming done ASAP. (4) function(x) x %>% can be replaced by a functional chain . %>%.

    – Moody_Mudskipper
    yesterday











  • you would end up with something starting with map(set_names(files, str_c(frames, "_sorted")), . %>% arrange(...

    – Moody_Mudskipper
    yesterday






  • 1





    @Moody_Mudskipper cool..Thanks. Updated the answer. Hope I did cover all the points you mentioned and in the right way :)

    – Ronak Shah
    yesterday





















6














You could also define a function first ..



f <- function(data) {

# sort data by flow
data <- data[order(data['flow'], decreasing = TRUE), ]

# apply your functions
data["area_portion"] <- data['Area'] / sum(data['Area']) * 100
data["flow_portion"] <- data['flow'] / sum(data['flow']) * 100
data["cum_area"] <- cumsum(data['area_portion'])
data["cum_flow"] <- cumsum(data['flow_portion'])
data
}


.. and use lapply to, ahhm, apply f to your list



out <- lapply(files, f)
out
#[[1]]
# Area flow area_portion flow_portion cum_area cum_flow
#1 4 1 26.66667 33.33333 26.66667 33.33333
#2 6 1 40.00000 33.33333 66.66667 66.66667
#3 5 1 33.33333 33.33333 100.00000 100.00000

#[[2]]
# Area flow area_portion flow_portion cum_area cum_flow
#2 8 2 44.44444 50 44.44444 50
#1 6 1 33.33333 25 77.77778 75
#3 4 1 22.22222 25 100.00000 100


If you want to change the names of out you can use setNames



out <- setNames(lapply(files, f), paste0(c("A", "B"), "_sorted"))
# or
# out <- setNames(lapply(files, f), paste0(unlist(frames), "_sorted"))





share|improve this answer





















  • 2





    Creat a function, of course! I should of thought of that, far simpler than a complex for loop! Thanks!

    – tom91
    yesterday











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54456701%2fhow-can-i-avoid-complex-for-loops%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









12














Using lapply to loop over files and dplyr mutate to add new columns



library(dplyr)

setNames(lapply(files, function(x)
x %>%
arrange(desc(flow)) %>%
mutate(area_portion = Area/sum(Area)*100,
flow_portion = flow/sum(flow) * 100,
cum_area = cumsum(area_portion),
cum_flow = cumsum(flow_portion))
),paste0(frames, "_sorted"))


#$A_sorted
# Area flow area_portion flow_portion cum_area cum_flow
#1 4 1 26.66667 33.33333 26.66667 33.33333
#2 6 1 40.00000 33.33333 66.66667 66.66667
#3 5 1 33.33333 33.33333 100.00000 100.00000

#$B_sorted
# Area flow area_portion flow_portion cum_area cum_flow
#1 8 2 44.44444 50 44.44444 50
#2 6 1 33.33333 25 77.77778 75
#3 4 1 22.22222 25 100.00000 100




Or completely going tidyverse way we can change lapply with map and setNames with set_names



library(tidyverse)

map(set_names(files, str_c(frames, "_sorted")),
. %>% arrange(desc(flow)) %>%
mutate(area_portion = Area/sum(Area)*100,
flow_portion = flow/sum(flow) * 100,
cum_area = cumsum(area_portion),
cum_flow = cumsum(flow_portion)))


Updated the tidyverse approach following some pointers from @Moody_Mudskipper.






share|improve this answer


























  • This is excellent, and exactly the kind of thing I was after. Out of interest what are the benefits of going the tidyverse route?

    – tom91
    yesterday






  • 1





    @tom91 In this case not much benefit I would say. But some people find tidyverse more readable and easy to understand.

    – Ronak Shah
    yesterday













  • some very minor points, forgive me for scratching that itch: (1) if you really want to go full tidyverse you can use str_c (it's almost the same but has a few differences : stackoverflow.com/questions/53118271/… ). (2) you don't need to unlist frames. (3) To make avoid these embedded parentheses over several lines you could put the set_names after a pipe in the end OR (and this is what I'd do), rename files instead so you get the naming done ASAP. (4) function(x) x %>% can be replaced by a functional chain . %>%.

    – Moody_Mudskipper
    yesterday











  • you would end up with something starting with map(set_names(files, str_c(frames, "_sorted")), . %>% arrange(...

    – Moody_Mudskipper
    yesterday






  • 1





    @Moody_Mudskipper cool..Thanks. Updated the answer. Hope I did cover all the points you mentioned and in the right way :)

    – Ronak Shah
    yesterday


















12














Using lapply to loop over files and dplyr mutate to add new columns



library(dplyr)

setNames(lapply(files, function(x)
x %>%
arrange(desc(flow)) %>%
mutate(area_portion = Area/sum(Area)*100,
flow_portion = flow/sum(flow) * 100,
cum_area = cumsum(area_portion),
cum_flow = cumsum(flow_portion))
),paste0(frames, "_sorted"))


#$A_sorted
# Area flow area_portion flow_portion cum_area cum_flow
#1 4 1 26.66667 33.33333 26.66667 33.33333
#2 6 1 40.00000 33.33333 66.66667 66.66667
#3 5 1 33.33333 33.33333 100.00000 100.00000

#$B_sorted
# Area flow area_portion flow_portion cum_area cum_flow
#1 8 2 44.44444 50 44.44444 50
#2 6 1 33.33333 25 77.77778 75
#3 4 1 22.22222 25 100.00000 100




Or completely going tidyverse way we can change lapply with map and setNames with set_names



library(tidyverse)

map(set_names(files, str_c(frames, "_sorted")),
. %>% arrange(desc(flow)) %>%
mutate(area_portion = Area/sum(Area)*100,
flow_portion = flow/sum(flow) * 100,
cum_area = cumsum(area_portion),
cum_flow = cumsum(flow_portion)))


Updated the tidyverse approach following some pointers from @Moody_Mudskipper.






share|improve this answer


























  • This is excellent, and exactly the kind of thing I was after. Out of interest what are the benefits of going the tidyverse route?

    – tom91
    yesterday






  • 1





    @tom91 In this case not much benefit I would say. But some people find tidyverse more readable and easy to understand.

    – Ronak Shah
    yesterday













  • some very minor points, forgive me for scratching that itch: (1) if you really want to go full tidyverse you can use str_c (it's almost the same but has a few differences : stackoverflow.com/questions/53118271/… ). (2) you don't need to unlist frames. (3) To make avoid these embedded parentheses over several lines you could put the set_names after a pipe in the end OR (and this is what I'd do), rename files instead so you get the naming done ASAP. (4) function(x) x %>% can be replaced by a functional chain . %>%.

    – Moody_Mudskipper
    yesterday











  • you would end up with something starting with map(set_names(files, str_c(frames, "_sorted")), . %>% arrange(...

    – Moody_Mudskipper
    yesterday






  • 1





    @Moody_Mudskipper cool..Thanks. Updated the answer. Hope I did cover all the points you mentioned and in the right way :)

    – Ronak Shah
    yesterday
















12












12








12







Using lapply to loop over files and dplyr mutate to add new columns



library(dplyr)

setNames(lapply(files, function(x)
x %>%
arrange(desc(flow)) %>%
mutate(area_portion = Area/sum(Area)*100,
flow_portion = flow/sum(flow) * 100,
cum_area = cumsum(area_portion),
cum_flow = cumsum(flow_portion))
),paste0(frames, "_sorted"))


#$A_sorted
# Area flow area_portion flow_portion cum_area cum_flow
#1 4 1 26.66667 33.33333 26.66667 33.33333
#2 6 1 40.00000 33.33333 66.66667 66.66667
#3 5 1 33.33333 33.33333 100.00000 100.00000

#$B_sorted
# Area flow area_portion flow_portion cum_area cum_flow
#1 8 2 44.44444 50 44.44444 50
#2 6 1 33.33333 25 77.77778 75
#3 4 1 22.22222 25 100.00000 100




Or completely going tidyverse way we can change lapply with map and setNames with set_names



library(tidyverse)

map(set_names(files, str_c(frames, "_sorted")),
. %>% arrange(desc(flow)) %>%
mutate(area_portion = Area/sum(Area)*100,
flow_portion = flow/sum(flow) * 100,
cum_area = cumsum(area_portion),
cum_flow = cumsum(flow_portion)))


Updated the tidyverse approach following some pointers from @Moody_Mudskipper.






share|improve this answer















Using lapply to loop over files and dplyr mutate to add new columns



library(dplyr)

setNames(lapply(files, function(x)
x %>%
arrange(desc(flow)) %>%
mutate(area_portion = Area/sum(Area)*100,
flow_portion = flow/sum(flow) * 100,
cum_area = cumsum(area_portion),
cum_flow = cumsum(flow_portion))
),paste0(frames, "_sorted"))


#$A_sorted
# Area flow area_portion flow_portion cum_area cum_flow
#1 4 1 26.66667 33.33333 26.66667 33.33333
#2 6 1 40.00000 33.33333 66.66667 66.66667
#3 5 1 33.33333 33.33333 100.00000 100.00000

#$B_sorted
# Area flow area_portion flow_portion cum_area cum_flow
#1 8 2 44.44444 50 44.44444 50
#2 6 1 33.33333 25 77.77778 75
#3 4 1 22.22222 25 100.00000 100




Or completely going tidyverse way we can change lapply with map and setNames with set_names



library(tidyverse)

map(set_names(files, str_c(frames, "_sorted")),
. %>% arrange(desc(flow)) %>%
mutate(area_portion = Area/sum(Area)*100,
flow_portion = flow/sum(flow) * 100,
cum_area = cumsum(area_portion),
cum_flow = cumsum(flow_portion)))


Updated the tidyverse approach following some pointers from @Moody_Mudskipper.







share|improve this answer














share|improve this answer



share|improve this answer








edited yesterday

























answered yesterday









Ronak ShahRonak Shah

36.9k104161




36.9k104161













  • This is excellent, and exactly the kind of thing I was after. Out of interest what are the benefits of going the tidyverse route?

    – tom91
    yesterday






  • 1





    @tom91 In this case not much benefit I would say. But some people find tidyverse more readable and easy to understand.

    – Ronak Shah
    yesterday













  • some very minor points, forgive me for scratching that itch: (1) if you really want to go full tidyverse you can use str_c (it's almost the same but has a few differences : stackoverflow.com/questions/53118271/… ). (2) you don't need to unlist frames. (3) To make avoid these embedded parentheses over several lines you could put the set_names after a pipe in the end OR (and this is what I'd do), rename files instead so you get the naming done ASAP. (4) function(x) x %>% can be replaced by a functional chain . %>%.

    – Moody_Mudskipper
    yesterday











  • you would end up with something starting with map(set_names(files, str_c(frames, "_sorted")), . %>% arrange(...

    – Moody_Mudskipper
    yesterday






  • 1





    @Moody_Mudskipper cool..Thanks. Updated the answer. Hope I did cover all the points you mentioned and in the right way :)

    – Ronak Shah
    yesterday





















  • This is excellent, and exactly the kind of thing I was after. Out of interest what are the benefits of going the tidyverse route?

    – tom91
    yesterday






  • 1





    @tom91 In this case not much benefit I would say. But some people find tidyverse more readable and easy to understand.

    – Ronak Shah
    yesterday













  • some very minor points, forgive me for scratching that itch: (1) if you really want to go full tidyverse you can use str_c (it's almost the same but has a few differences : stackoverflow.com/questions/53118271/… ). (2) you don't need to unlist frames. (3) To make avoid these embedded parentheses over several lines you could put the set_names after a pipe in the end OR (and this is what I'd do), rename files instead so you get the naming done ASAP. (4) function(x) x %>% can be replaced by a functional chain . %>%.

    – Moody_Mudskipper
    yesterday











  • you would end up with something starting with map(set_names(files, str_c(frames, "_sorted")), . %>% arrange(...

    – Moody_Mudskipper
    yesterday






  • 1





    @Moody_Mudskipper cool..Thanks. Updated the answer. Hope I did cover all the points you mentioned and in the right way :)

    – Ronak Shah
    yesterday



















This is excellent, and exactly the kind of thing I was after. Out of interest what are the benefits of going the tidyverse route?

– tom91
yesterday





This is excellent, and exactly the kind of thing I was after. Out of interest what are the benefits of going the tidyverse route?

– tom91
yesterday




1




1





@tom91 In this case not much benefit I would say. But some people find tidyverse more readable and easy to understand.

– Ronak Shah
yesterday







@tom91 In this case not much benefit I would say. But some people find tidyverse more readable and easy to understand.

– Ronak Shah
yesterday















some very minor points, forgive me for scratching that itch: (1) if you really want to go full tidyverse you can use str_c (it's almost the same but has a few differences : stackoverflow.com/questions/53118271/… ). (2) you don't need to unlist frames. (3) To make avoid these embedded parentheses over several lines you could put the set_names after a pipe in the end OR (and this is what I'd do), rename files instead so you get the naming done ASAP. (4) function(x) x %>% can be replaced by a functional chain . %>%.

– Moody_Mudskipper
yesterday





some very minor points, forgive me for scratching that itch: (1) if you really want to go full tidyverse you can use str_c (it's almost the same but has a few differences : stackoverflow.com/questions/53118271/… ). (2) you don't need to unlist frames. (3) To make avoid these embedded parentheses over several lines you could put the set_names after a pipe in the end OR (and this is what I'd do), rename files instead so you get the naming done ASAP. (4) function(x) x %>% can be replaced by a functional chain . %>%.

– Moody_Mudskipper
yesterday













you would end up with something starting with map(set_names(files, str_c(frames, "_sorted")), . %>% arrange(...

– Moody_Mudskipper
yesterday





you would end up with something starting with map(set_names(files, str_c(frames, "_sorted")), . %>% arrange(...

– Moody_Mudskipper
yesterday




1




1





@Moody_Mudskipper cool..Thanks. Updated the answer. Hope I did cover all the points you mentioned and in the right way :)

– Ronak Shah
yesterday







@Moody_Mudskipper cool..Thanks. Updated the answer. Hope I did cover all the points you mentioned and in the right way :)

– Ronak Shah
yesterday















6














You could also define a function first ..



f <- function(data) {

# sort data by flow
data <- data[order(data['flow'], decreasing = TRUE), ]

# apply your functions
data["area_portion"] <- data['Area'] / sum(data['Area']) * 100
data["flow_portion"] <- data['flow'] / sum(data['flow']) * 100
data["cum_area"] <- cumsum(data['area_portion'])
data["cum_flow"] <- cumsum(data['flow_portion'])
data
}


.. and use lapply to, ahhm, apply f to your list



out <- lapply(files, f)
out
#[[1]]
# Area flow area_portion flow_portion cum_area cum_flow
#1 4 1 26.66667 33.33333 26.66667 33.33333
#2 6 1 40.00000 33.33333 66.66667 66.66667
#3 5 1 33.33333 33.33333 100.00000 100.00000

#[[2]]
# Area flow area_portion flow_portion cum_area cum_flow
#2 8 2 44.44444 50 44.44444 50
#1 6 1 33.33333 25 77.77778 75
#3 4 1 22.22222 25 100.00000 100


If you want to change the names of out you can use setNames



out <- setNames(lapply(files, f), paste0(c("A", "B"), "_sorted"))
# or
# out <- setNames(lapply(files, f), paste0(unlist(frames), "_sorted"))





share|improve this answer





















  • 2





    Creat a function, of course! I should of thought of that, far simpler than a complex for loop! Thanks!

    – tom91
    yesterday
















6














You could also define a function first ..



f <- function(data) {

# sort data by flow
data <- data[order(data['flow'], decreasing = TRUE), ]

# apply your functions
data["area_portion"] <- data['Area'] / sum(data['Area']) * 100
data["flow_portion"] <- data['flow'] / sum(data['flow']) * 100
data["cum_area"] <- cumsum(data['area_portion'])
data["cum_flow"] <- cumsum(data['flow_portion'])
data
}


.. and use lapply to, ahhm, apply f to your list



out <- lapply(files, f)
out
#[[1]]
# Area flow area_portion flow_portion cum_area cum_flow
#1 4 1 26.66667 33.33333 26.66667 33.33333
#2 6 1 40.00000 33.33333 66.66667 66.66667
#3 5 1 33.33333 33.33333 100.00000 100.00000

#[[2]]
# Area flow area_portion flow_portion cum_area cum_flow
#2 8 2 44.44444 50 44.44444 50
#1 6 1 33.33333 25 77.77778 75
#3 4 1 22.22222 25 100.00000 100


If you want to change the names of out you can use setNames



out <- setNames(lapply(files, f), paste0(c("A", "B"), "_sorted"))
# or
# out <- setNames(lapply(files, f), paste0(unlist(frames), "_sorted"))





share|improve this answer





















  • 2





    Creat a function, of course! I should of thought of that, far simpler than a complex for loop! Thanks!

    – tom91
    yesterday














6












6








6







You could also define a function first ..



f <- function(data) {

# sort data by flow
data <- data[order(data['flow'], decreasing = TRUE), ]

# apply your functions
data["area_portion"] <- data['Area'] / sum(data['Area']) * 100
data["flow_portion"] <- data['flow'] / sum(data['flow']) * 100
data["cum_area"] <- cumsum(data['area_portion'])
data["cum_flow"] <- cumsum(data['flow_portion'])
data
}


.. and use lapply to, ahhm, apply f to your list



out <- lapply(files, f)
out
#[[1]]
# Area flow area_portion flow_portion cum_area cum_flow
#1 4 1 26.66667 33.33333 26.66667 33.33333
#2 6 1 40.00000 33.33333 66.66667 66.66667
#3 5 1 33.33333 33.33333 100.00000 100.00000

#[[2]]
# Area flow area_portion flow_portion cum_area cum_flow
#2 8 2 44.44444 50 44.44444 50
#1 6 1 33.33333 25 77.77778 75
#3 4 1 22.22222 25 100.00000 100


If you want to change the names of out you can use setNames



out <- setNames(lapply(files, f), paste0(c("A", "B"), "_sorted"))
# or
# out <- setNames(lapply(files, f), paste0(unlist(frames), "_sorted"))





share|improve this answer















You could also define a function first ..



f <- function(data) {

# sort data by flow
data <- data[order(data['flow'], decreasing = TRUE), ]

# apply your functions
data["area_portion"] <- data['Area'] / sum(data['Area']) * 100
data["flow_portion"] <- data['flow'] / sum(data['flow']) * 100
data["cum_area"] <- cumsum(data['area_portion'])
data["cum_flow"] <- cumsum(data['flow_portion'])
data
}


.. and use lapply to, ahhm, apply f to your list



out <- lapply(files, f)
out
#[[1]]
# Area flow area_portion flow_portion cum_area cum_flow
#1 4 1 26.66667 33.33333 26.66667 33.33333
#2 6 1 40.00000 33.33333 66.66667 66.66667
#3 5 1 33.33333 33.33333 100.00000 100.00000

#[[2]]
# Area flow area_portion flow_portion cum_area cum_flow
#2 8 2 44.44444 50 44.44444 50
#1 6 1 33.33333 25 77.77778 75
#3 4 1 22.22222 25 100.00000 100


If you want to change the names of out you can use setNames



out <- setNames(lapply(files, f), paste0(c("A", "B"), "_sorted"))
# or
# out <- setNames(lapply(files, f), paste0(unlist(frames), "_sorted"))






share|improve this answer














share|improve this answer



share|improve this answer








edited yesterday

























answered yesterday









markusmarkus

11.8k1233




11.8k1233








  • 2





    Creat a function, of course! I should of thought of that, far simpler than a complex for loop! Thanks!

    – tom91
    yesterday














  • 2





    Creat a function, of course! I should of thought of that, far simpler than a complex for loop! Thanks!

    – tom91
    yesterday








2




2





Creat a function, of course! I should of thought of that, far simpler than a complex for loop! Thanks!

– tom91
yesterday





Creat a function, of course! I should of thought of that, far simpler than a complex for loop! Thanks!

– tom91
yesterday


















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54456701%2fhow-can-i-avoid-complex-for-loops%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Paul Cézanne

UIScrollView CustomStickyHeader Resize height generates problems when scroll is too fast

Angular material date-picker (MatDatepicker) auto completes the date on focus out