How can I avoid complex for loops?

I am currently working with a series of large datasets and I'm trying to improve how I write scripts in R. I tend to mostly make use of for loops which I know can be cumbersome and slow, espeically with very large datasets.

I have heard a lot of people recommending the apply() family to avoid complex for loops, but I am struggling to get my head around using them to apply multiple functions in one go.

Here is some simple example data:

A <- data.frame('Area' = c(4, 6, 5),

                'flow' = c(1, 1, 1))

B <- data.frame('Area' = c(6, 8, 4),

                'flow' = c(1, 2, 1))

files <- list(A, B)

frames <- list('A', 'B')

What I want to do is sort the data by the 'flow' variable, then add columns for the portion of total 'flow' and 'area' each data point represents, before finally adding a further two columns of the cumulative percentage of each variable.

Currently I use this for loop:

sort_files <- list()

n <- 1

for(i in files){

  name <- frames[n]

  nom <- paste(name,'_sorted', sep = '')

  data <- i[order(-i$flow),]

  area <- sum(i$Area)

  total <- sum(i$flow)

  data$area_portion <- (data$Area/area)*100

  data$flow_portion <- (data$flow/total)*100

  data$cum_area <- cumsum(data$area_portion)

  data$cum_flow <- cumsum(data$flow_portion)

  assign(nom, data)

  df <- get(paste(name,'_sorted', sep = ''))

  sort_files[[nom]] <- df

  n <- n + 1

}

Which works, but seems overly complex and ugly, and I'm sure it will run far slower than better scripts.

How can I simplify and neaten up the above code?

This is the expected output:

sort_files



$`A_sorted`

  Area flow area_portion flow_portion  cum_area  cum_flow

1    4    1     26.66667     33.33333  26.66667  33.33333

2    6    1     40.00000     33.33333  66.66667  66.66667

3    5    1     33.33333     33.33333 100.00000 100.00000



$B_sorted

  Area flow area_portion flow_portion  cum_area cum_flow

2    8    2     44.44444           50  44.44444       50

1    6    1     33.33333           25  77.77778       75

3    4    1     22.22222           25 100.00000      100

edited yesterday

double-beep

2,1892824

asked yesterday

tom91

16111

2

You didn't define pos_files in the script above. Also names is a function so you better not define an object with that name.

– markus
yesterday

1

av_portion is also missing, although I understand it's the mean o Area. files is also a R function.

– patL
yesterday

2

@tom91: can you add the expected output too?

– Tung
yesterday

@markus and patL Sorry! I just realised I copied over the script with the actual variable names and not the test one. I have updated it now.

– tom91
yesterday

@Tung Expected output has been added to the bottom

– tom91
yesterday

add a comment |

I have heard a lot of people recommending the apply() family to avoid complex for loops, but I am struggling to get my head around using them to apply multiple functions in one go.

Here is some simple example data:

A <- data.frame('Area' = c(4, 6, 5),

                'flow' = c(1, 1, 1))

B <- data.frame('Area' = c(6, 8, 4),

                'flow' = c(1, 2, 1))

files <- list(A, B)

frames <- list('A', 'B')

Currently I use this for loop:

sort_files <- list()

n <- 1

for(i in files){

  name <- frames[n]

  nom <- paste(name,'_sorted', sep = '')

  data <- i[order(-i$flow),]

  area <- sum(i$Area)

  total <- sum(i$flow)

  data$area_portion <- (data$Area/area)*100

  data$flow_portion <- (data$flow/total)*100

  data$cum_area <- cumsum(data$area_portion)

  data$cum_flow <- cumsum(data$flow_portion)

  assign(nom, data)

  df <- get(paste(name,'_sorted', sep = ''))

  sort_files[[nom]] <- df

  n <- n + 1

}

Which works, but seems overly complex and ugly, and I'm sure it will run far slower than better scripts.

How can I simplify and neaten up the above code?

This is the expected output:

sort_files



$`A_sorted`

  Area flow area_portion flow_portion  cum_area  cum_flow

1    4    1     26.66667     33.33333  26.66667  33.33333

2    6    1     40.00000     33.33333  66.66667  66.66667

3    5    1     33.33333     33.33333 100.00000 100.00000



$B_sorted

  Area flow area_portion flow_portion  cum_area cum_flow

2    8    2     44.44444           50  44.44444       50

1    6    1     33.33333           25  77.77778       75

3    4    1     22.22222           25 100.00000      100

edited yesterday

double-beep

2,1892824

asked yesterday

tom91

16111

2

You didn't define pos_files in the script above. Also names is a function so you better not define an object with that name.

– markus
yesterday

1

av_portion is also missing, although I understand it's the mean o Area. files is also a R function.

– patL
yesterday

2

@tom91: can you add the expected output too?

– Tung
yesterday

@markus and patL Sorry! I just realised I copied over the script with the actual variable names and not the test one. I have updated it now.

– tom91
yesterday

@Tung Expected output has been added to the bottom

– tom91
yesterday

add a comment |

I have heard a lot of people recommending the apply() family to avoid complex for loops, but I am struggling to get my head around using them to apply multiple functions in one go.

Here is some simple example data:

A <- data.frame('Area' = c(4, 6, 5),

                'flow' = c(1, 1, 1))

B <- data.frame('Area' = c(6, 8, 4),

                'flow' = c(1, 2, 1))

files <- list(A, B)

frames <- list('A', 'B')

Currently I use this for loop:

sort_files <- list()

n <- 1

for(i in files){

  name <- frames[n]

  nom <- paste(name,'_sorted', sep = '')

  data <- i[order(-i$flow),]

  area <- sum(i$Area)

  total <- sum(i$flow)

  data$area_portion <- (data$Area/area)*100

  data$flow_portion <- (data$flow/total)*100

  data$cum_area <- cumsum(data$area_portion)

  data$cum_flow <- cumsum(data$flow_portion)

  assign(nom, data)

  df <- get(paste(name,'_sorted', sep = ''))

  sort_files[[nom]] <- df

  n <- n + 1

}

Which works, but seems overly complex and ugly, and I'm sure it will run far slower than better scripts.

How can I simplify and neaten up the above code?

This is the expected output:

sort_files



$`A_sorted`

  Area flow area_portion flow_portion  cum_area  cum_flow

1    4    1     26.66667     33.33333  26.66667  33.33333

2    6    1     40.00000     33.33333  66.66667  66.66667

3    5    1     33.33333     33.33333 100.00000 100.00000



$B_sorted

  Area flow area_portion flow_portion  cum_area cum_flow

2    8    2     44.44444           50  44.44444       50

1    6    1     33.33333           25  77.77778       75

3    4    1     22.22222           25 100.00000      100

edited yesterday

double-beep

2,1892824

asked yesterday

tom91

16111

I have heard a lot of people recommending the apply() family to avoid complex for loops, but I am struggling to get my head around using them to apply multiple functions in one go.

Here is some simple example data:

A <- data.frame('Area' = c(4, 6, 5),

                'flow' = c(1, 1, 1))

B <- data.frame('Area' = c(6, 8, 4),

                'flow' = c(1, 2, 1))

files <- list(A, B)

frames <- list('A', 'B')

Currently I use this for loop:

sort_files <- list()

n <- 1

for(i in files){

  name <- frames[n]

  nom <- paste(name,'_sorted', sep = '')

  data <- i[order(-i$flow),]

  area <- sum(i$Area)

  total <- sum(i$flow)

  data$area_portion <- (data$Area/area)*100

  data$flow_portion <- (data$flow/total)*100

  data$cum_area <- cumsum(data$area_portion)

  data$cum_flow <- cumsum(data$flow_portion)

  assign(nom, data)

  df <- get(paste(name,'_sorted', sep = ''))

  sort_files[[nom]] <- df

  n <- n + 1

}

Which works, but seems overly complex and ugly, and I'm sure it will run far slower than better scripts.

How can I simplify and neaten up the above code?

This is the expected output:

sort_files



$`A_sorted`

  Area flow area_portion flow_portion  cum_area  cum_flow

1    4    1     26.66667     33.33333  26.66667  33.33333

2    6    1     40.00000     33.33333  66.66667  66.66667

3    5    1     33.33333     33.33333 100.00000 100.00000



$B_sorted

  Area flow area_portion flow_portion  cum_area cum_flow

2    8    2     44.44444           50  44.44444       50

1    6    1     33.33333           25  77.77778       75

3    4    1     22.22222           25 100.00000      100

r for-loop

edited yesterday

double-beep

2,1892824

asked yesterday

tom91

16111

edited yesterday

double-beep

2,1892824

asked yesterday

tom91

16111

edited yesterday

double-beep

2,1892824

edited yesterday

double-beep

2,1892824

edited yesterday

double-beep

2,1892824

asked yesterday

tom91

16111

asked yesterday

tom91

16111

asked yesterday

tom91

16111

2

You didn't define pos_files in the script above. Also names is a function so you better not define an object with that name.

– markus
yesterday

1

av_portion is also missing, although I understand it's the mean o Area. files is also a R function.

– patL
yesterday

2

@tom91: can you add the expected output too?

– Tung
yesterday

@markus and patL Sorry! I just realised I copied over the script with the actual variable names and not the test one. I have updated it now.

– tom91
yesterday

@Tung Expected output has been added to the bottom

– tom91
yesterday

add a comment |

2

You didn't define pos_files in the script above. Also names is a function so you better not define an object with that name.

– markus
yesterday

1

av_portion is also missing, although I understand it's the mean o Area. files is also a R function.

– patL
yesterday

2

@tom91: can you add the expected output too?

– Tung
yesterday

@markus and patL Sorry! I just realised I copied over the script with the actual variable names and not the test one. I have updated it now.

– tom91
yesterday

@Tung Expected output has been added to the bottom

– tom91
yesterday

You didn't define pos_files in the script above. Also names is a function so you better not define an object with that name.

– markus
yesterday

av_portion is also missing, although I understand it's the mean o Area. files is also a R function.

– patL
yesterday

@tom91: can you add the expected output too?

– Tung
yesterday

@markus and patL Sorry! I just realised I copied over the script with the actual variable names and not the test one. I have updated it now.

– tom91
yesterday

@Tung Expected output has been added to the bottom

– tom91
yesterday

add a comment |

2 Answers
2

active

oldest

votes

Using lapply to loop over files and dplyr mutate to add new columns

library(dplyr)



setNames(lapply(files, function(x) 

          x %>%

            arrange(desc(flow)) %>%

            mutate(area_portion = Area/sum(Area)*100, 

                   flow_portion = flow/sum(flow) * 100, 

                   cum_area = cumsum(area_portion),

                   cum_flow = cumsum(flow_portion))

),paste0(frames, "_sorted"))





#$A_sorted

#  Area flow area_portion flow_portion  cum_area  cum_flow

#1    4    1     26.66667     33.33333  26.66667  33.33333

#2    6    1     40.00000     33.33333  66.66667  66.66667

#3    5    1     33.33333     33.33333 100.00000 100.00000



#$B_sorted

#  Area flow area_portion flow_portion  cum_area cum_flow

#1    8    2     44.44444           50  44.44444       50

#2    6    1     33.33333           25  77.77778       75

#3    4    1     22.22222           25 100.00000      100

Or completely going tidyverse way we can change lapply with map and setNames with set_names

library(tidyverse)



map(set_names(files, str_c(frames, "_sorted")), 

  . %>% arrange(desc(flow)) %>%

  mutate(area_portion = Area/sum(Area)*100, 

         flow_portion = flow/sum(flow) * 100, 

         cum_area = cumsum(area_portion),

         cum_flow = cumsum(flow_portion)))

Updated the tidyverse approach following some pointers from @Moody_Mudskipper.

edited yesterday

answered yesterday

Ronak Shah

36.9k104161

This is excellent, and exactly the kind of thing I was after. Out of interest what are the benefits of going the tidyverse route?

– tom91
yesterday

1

@tom91 In this case not much benefit I would say. But some people find tidyverse more readable and easy to understand.

– Ronak Shah
yesterday

some very minor points, forgive me for scratching that itch: (1) if you really want to go full tidyverse you can use str_c (it's almost the same but has a few differences : stackoverflow.com/questions/53118271/… ). (2) you don't need to unlist frames. (3) To make avoid these embedded parentheses over several lines you could put the set_names after a pipe in the end OR (and this is what I'd do), rename files instead so you get the naming done ASAP. (4) function(x) x %>% can be replaced by a functional chain . %>%.

– Moody_Mudskipper
yesterday

you would end up with something starting with map(set_names(files, str_c(frames, "_sorted")), . %>% arrange(...

– Moody_Mudskipper
yesterday

1

@Moody_Mudskipper cool..Thanks. Updated the answer. Hope I did cover all the points you mentioned and in the right way :)

– Ronak Shah
yesterday

|
show 1 more comment

You could also define a function first ..

f <- function(data) {



  # sort data by flow

  data <- data[order(data['flow'], decreasing = TRUE), ]



  # apply your functions

  data["area_portion"] <- data['Area'] / sum(data['Area']) * 100

  data["flow_portion"] <- data['flow'] / sum(data['flow']) * 100

  data["cum_area"] <- cumsum(data['area_portion'])

  data["cum_flow"] <- cumsum(data['flow_portion'])

  data

  }

.. and use lapply to, ahhm, apply f to your list

out <- lapply(files, f)

out

#[[1]]

#  Area flow area_portion flow_portion  cum_area  cum_flow

#1    4    1     26.66667     33.33333  26.66667  33.33333

#2    6    1     40.00000     33.33333  66.66667  66.66667

#3    5    1     33.33333     33.33333 100.00000 100.00000



#[[2]]

#  Area flow area_portion flow_portion  cum_area cum_flow

#2    8    2     44.44444           50  44.44444       50

#1    6    1     33.33333           25  77.77778       75

#3    4    1     22.22222           25 100.00000      100

If you want to change the names of out you can use setNames

out <- setNames(lapply(files, f), paste0(c("A", "B"), "_sorted"))

# or

# out <- setNames(lapply(files, f), paste0(unlist(frames), "_sorted"))

edited yesterday

answered yesterday

markus

11.8k1233

2

Creat a function, of course! I should of thought of that, far simpler than a complex for loop! Thanks!

– tom91
yesterday

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54456701%2fhow-can-i-avoid-complex-for-loops%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

Using lapply to loop over files and dplyr mutate to add new columns

library(dplyr)



setNames(lapply(files, function(x) 

          x %>%

            arrange(desc(flow)) %>%

            mutate(area_portion = Area/sum(Area)*100, 

                   flow_portion = flow/sum(flow) * 100, 

                   cum_area = cumsum(area_portion),

                   cum_flow = cumsum(flow_portion))

),paste0(frames, "_sorted"))





#$A_sorted

#  Area flow area_portion flow_portion  cum_area  cum_flow

#1    4    1     26.66667     33.33333  26.66667  33.33333

#2    6    1     40.00000     33.33333  66.66667  66.66667

#3    5    1     33.33333     33.33333 100.00000 100.00000



#$B_sorted

#  Area flow area_portion flow_portion  cum_area cum_flow

#1    8    2     44.44444           50  44.44444       50

#2    6    1     33.33333           25  77.77778       75

#3    4    1     22.22222           25 100.00000      100

Or completely going tidyverse way we can change lapply with map and setNames with set_names

library(tidyverse)



map(set_names(files, str_c(frames, "_sorted")), 

  . %>% arrange(desc(flow)) %>%

  mutate(area_portion = Area/sum(Area)*100, 

         flow_portion = flow/sum(flow) * 100, 

         cum_area = cumsum(area_portion),

         cum_flow = cumsum(flow_portion)))

Updated the tidyverse approach following some pointers from @Moody_Mudskipper.

edited yesterday

answered yesterday

Ronak Shah

36.9k104161

This is excellent, and exactly the kind of thing I was after. Out of interest what are the benefits of going the tidyverse route?

– tom91
yesterday

1

@tom91 In this case not much benefit I would say. But some people find tidyverse more readable and easy to understand.

– Ronak Shah
yesterday

some very minor points, forgive me for scratching that itch: (1) if you really want to go full tidyverse you can use str_c (it's almost the same but has a few differences : stackoverflow.com/questions/53118271/… ). (2) you don't need to unlist frames. (3) To make avoid these embedded parentheses over several lines you could put the set_names after a pipe in the end OR (and this is what I'd do), rename files instead so you get the naming done ASAP. (4) function(x) x %>% can be replaced by a functional chain . %>%.

– Moody_Mudskipper
yesterday

you would end up with something starting with map(set_names(files, str_c(frames, "_sorted")), . %>% arrange(...

– Moody_Mudskipper
yesterday

1

@Moody_Mudskipper cool..Thanks. Updated the answer. Hope I did cover all the points you mentioned and in the right way :)

– Ronak Shah
yesterday

|
show 1 more comment

Using lapply to loop over files and dplyr mutate to add new columns

library(dplyr)



setNames(lapply(files, function(x) 

          x %>%

            arrange(desc(flow)) %>%

            mutate(area_portion = Area/sum(Area)*100, 

                   flow_portion = flow/sum(flow) * 100, 

                   cum_area = cumsum(area_portion),

                   cum_flow = cumsum(flow_portion))

),paste0(frames, "_sorted"))





#$A_sorted

#  Area flow area_portion flow_portion  cum_area  cum_flow

#1    4    1     26.66667     33.33333  26.66667  33.33333

#2    6    1     40.00000     33.33333  66.66667  66.66667

#3    5    1     33.33333     33.33333 100.00000 100.00000



#$B_sorted

#  Area flow area_portion flow_portion  cum_area cum_flow

#1    8    2     44.44444           50  44.44444       50

#2    6    1     33.33333           25  77.77778       75

#3    4    1     22.22222           25 100.00000      100

Or completely going tidyverse way we can change lapply with map and setNames with set_names

library(tidyverse)



map(set_names(files, str_c(frames, "_sorted")), 

  . %>% arrange(desc(flow)) %>%

  mutate(area_portion = Area/sum(Area)*100, 

         flow_portion = flow/sum(flow) * 100, 

         cum_area = cumsum(area_portion),

         cum_flow = cumsum(flow_portion)))

Updated the tidyverse approach following some pointers from @Moody_Mudskipper.

edited yesterday

answered yesterday

Ronak Shah

36.9k104161

This is excellent, and exactly the kind of thing I was after. Out of interest what are the benefits of going the tidyverse route?

– tom91
yesterday

1

@tom91 In this case not much benefit I would say. But some people find tidyverse more readable and easy to understand.

– Ronak Shah
yesterday

some very minor points, forgive me for scratching that itch: (1) if you really want to go full tidyverse you can use str_c (it's almost the same but has a few differences : stackoverflow.com/questions/53118271/… ). (2) you don't need to unlist frames. (3) To make avoid these embedded parentheses over several lines you could put the set_names after a pipe in the end OR (and this is what I'd do), rename files instead so you get the naming done ASAP. (4) function(x) x %>% can be replaced by a functional chain . %>%.

– Moody_Mudskipper
yesterday

you would end up with something starting with map(set_names(files, str_c(frames, "_sorted")), . %>% arrange(...

– Moody_Mudskipper
yesterday

1

@Moody_Mudskipper cool..Thanks. Updated the answer. Hope I did cover all the points you mentioned and in the right way :)

– Ronak Shah
yesterday

|
show 1 more comment

Using lapply to loop over files and dplyr mutate to add new columns

library(dplyr)



setNames(lapply(files, function(x) 

          x %>%

            arrange(desc(flow)) %>%

            mutate(area_portion = Area/sum(Area)*100, 

                   flow_portion = flow/sum(flow) * 100, 

                   cum_area = cumsum(area_portion),

                   cum_flow = cumsum(flow_portion))

),paste0(frames, "_sorted"))





#$A_sorted

#  Area flow area_portion flow_portion  cum_area  cum_flow

#1    4    1     26.66667     33.33333  26.66667  33.33333

#2    6    1     40.00000     33.33333  66.66667  66.66667

#3    5    1     33.33333     33.33333 100.00000 100.00000



#$B_sorted

#  Area flow area_portion flow_portion  cum_area cum_flow

#1    8    2     44.44444           50  44.44444       50

#2    6    1     33.33333           25  77.77778       75

#3    4    1     22.22222           25 100.00000      100

Or completely going tidyverse way we can change lapply with map and setNames with set_names

library(tidyverse)



map(set_names(files, str_c(frames, "_sorted")), 

  . %>% arrange(desc(flow)) %>%

  mutate(area_portion = Area/sum(Area)*100, 

         flow_portion = flow/sum(flow) * 100, 

         cum_area = cumsum(area_portion),

         cum_flow = cumsum(flow_portion)))

Updated the tidyverse approach following some pointers from @Moody_Mudskipper.

edited yesterday

answered yesterday

Ronak Shah

36.9k104161

Using lapply to loop over files and dplyr mutate to add new columns

library(dplyr)



setNames(lapply(files, function(x) 

          x %>%

            arrange(desc(flow)) %>%

            mutate(area_portion = Area/sum(Area)*100, 

                   flow_portion = flow/sum(flow) * 100, 

                   cum_area = cumsum(area_portion),

                   cum_flow = cumsum(flow_portion))

),paste0(frames, "_sorted"))





#$A_sorted

#  Area flow area_portion flow_portion  cum_area  cum_flow

#1    4    1     26.66667     33.33333  26.66667  33.33333

#2    6    1     40.00000     33.33333  66.66667  66.66667

#3    5    1     33.33333     33.33333 100.00000 100.00000



#$B_sorted

#  Area flow area_portion flow_portion  cum_area cum_flow

#1    8    2     44.44444           50  44.44444       50

#2    6    1     33.33333           25  77.77778       75

#3    4    1     22.22222           25 100.00000      100

Or completely going tidyverse way we can change lapply with map and setNames with set_names

library(tidyverse)



map(set_names(files, str_c(frames, "_sorted")), 

  . %>% arrange(desc(flow)) %>%

  mutate(area_portion = Area/sum(Area)*100, 

         flow_portion = flow/sum(flow) * 100, 

         cum_area = cumsum(area_portion),

         cum_flow = cumsum(flow_portion)))

Updated the tidyverse approach following some pointers from @Moody_Mudskipper.

edited yesterday

answered yesterday

Ronak Shah

36.9k104161

edited yesterday

answered yesterday

Ronak Shah

36.9k104161

answered yesterday

Ronak Shah

36.9k104161

answered yesterday

Ronak Shah

36.9k104161

This is excellent, and exactly the kind of thing I was after. Out of interest what are the benefits of going the tidyverse route?

– tom91
yesterday

1

@tom91 In this case not much benefit I would say. But some people find tidyverse more readable and easy to understand.

– Ronak Shah
yesterday

some very minor points, forgive me for scratching that itch: (1) if you really want to go full tidyverse you can use str_c (it's almost the same but has a few differences : stackoverflow.com/questions/53118271/… ). (2) you don't need to unlist frames. (3) To make avoid these embedded parentheses over several lines you could put the set_names after a pipe in the end OR (and this is what I'd do), rename files instead so you get the naming done ASAP. (4) function(x) x %>% can be replaced by a functional chain . %>%.

– Moody_Mudskipper
yesterday

you would end up with something starting with map(set_names(files, str_c(frames, "_sorted")), . %>% arrange(...

– Moody_Mudskipper
yesterday

1

@Moody_Mudskipper cool..Thanks. Updated the answer. Hope I did cover all the points you mentioned and in the right way :)

– Ronak Shah
yesterday

|
show 1 more comment

This is excellent, and exactly the kind of thing I was after. Out of interest what are the benefits of going the tidyverse route?

– tom91
yesterday

1

@tom91 In this case not much benefit I would say. But some people find tidyverse more readable and easy to understand.

– Ronak Shah
yesterday

some very minor points, forgive me for scratching that itch: (1) if you really want to go full tidyverse you can use str_c (it's almost the same but has a few differences : stackoverflow.com/questions/53118271/… ). (2) you don't need to unlist frames. (3) To make avoid these embedded parentheses over several lines you could put the set_names after a pipe in the end OR (and this is what I'd do), rename files instead so you get the naming done ASAP. (4) function(x) x %>% can be replaced by a functional chain . %>%.

– Moody_Mudskipper
yesterday

you would end up with something starting with map(set_names(files, str_c(frames, "_sorted")), . %>% arrange(...

– Moody_Mudskipper
yesterday

1

@Moody_Mudskipper cool..Thanks. Updated the answer. Hope I did cover all the points you mentioned and in the right way :)

– Ronak Shah
yesterday

This is excellent, and exactly the kind of thing I was after. Out of interest what are the benefits of going the tidyverse route?

– tom91
yesterday

@tom91 In this case not much benefit I would say. But some people find tidyverse more readable and easy to understand.

– Ronak Shah
yesterday

some very minor points, forgive me for scratching that itch: (1) if you really want to go full tidyverse you can use str_c (it's almost the same but has a few differences : stackoverflow.com/questions/53118271/… ). (2) you don't need to unlist frames. (3) To make avoid these embedded parentheses over several lines you could put the set_names after a pipe in the end OR (and this is what I'd do), rename files instead so you get the naming done ASAP. (4) function(x) x %>% can be replaced by a functional chain . %>%.

– Moody_Mudskipper
yesterday

you would end up with something starting with map(set_names(files, str_c(frames, "_sorted")), . %>% arrange(...

– Moody_Mudskipper
yesterday

@Moody_Mudskipper cool..Thanks. Updated the answer. Hope I did cover all the points you mentioned and in the right way :)

– Ronak Shah
yesterday

|
show 1 more comment

You could also define a function first ..

f <- function(data) {



  # sort data by flow

  data <- data[order(data['flow'], decreasing = TRUE), ]



  # apply your functions

  data["area_portion"] <- data['Area'] / sum(data['Area']) * 100

  data["flow_portion"] <- data['flow'] / sum(data['flow']) * 100

  data["cum_area"] <- cumsum(data['area_portion'])

  data["cum_flow"] <- cumsum(data['flow_portion'])

  data

  }

.. and use lapply to, ahhm, apply f to your list

out <- lapply(files, f)

out

#[[1]]

#  Area flow area_portion flow_portion  cum_area  cum_flow

#1    4    1     26.66667     33.33333  26.66667  33.33333

#2    6    1     40.00000     33.33333  66.66667  66.66667

#3    5    1     33.33333     33.33333 100.00000 100.00000



#[[2]]

#  Area flow area_portion flow_portion  cum_area cum_flow

#2    8    2     44.44444           50  44.44444       50

#1    6    1     33.33333           25  77.77778       75

#3    4    1     22.22222           25 100.00000      100

If you want to change the names of out you can use setNames

out <- setNames(lapply(files, f), paste0(c("A", "B"), "_sorted"))

# or

# out <- setNames(lapply(files, f), paste0(unlist(frames), "_sorted"))

edited yesterday

answered yesterday

markus

11.8k1233

2

Creat a function, of course! I should of thought of that, far simpler than a complex for loop! Thanks!

– tom91
yesterday

add a comment |

You could also define a function first ..

f <- function(data) {



  # sort data by flow

  data <- data[order(data['flow'], decreasing = TRUE), ]



  # apply your functions

  data["area_portion"] <- data['Area'] / sum(data['Area']) * 100

  data["flow_portion"] <- data['flow'] / sum(data['flow']) * 100

  data["cum_area"] <- cumsum(data['area_portion'])

  data["cum_flow"] <- cumsum(data['flow_portion'])

  data

  }

.. and use lapply to, ahhm, apply f to your list

out <- lapply(files, f)

out

#[[1]]

#  Area flow area_portion flow_portion  cum_area  cum_flow

#1    4    1     26.66667     33.33333  26.66667  33.33333

#2    6    1     40.00000     33.33333  66.66667  66.66667

#3    5    1     33.33333     33.33333 100.00000 100.00000



#[[2]]

#  Area flow area_portion flow_portion  cum_area cum_flow

#2    8    2     44.44444           50  44.44444       50

#1    6    1     33.33333           25  77.77778       75

#3    4    1     22.22222           25 100.00000      100

If you want to change the names of out you can use setNames

out <- setNames(lapply(files, f), paste0(c("A", "B"), "_sorted"))

# or

# out <- setNames(lapply(files, f), paste0(unlist(frames), "_sorted"))

edited yesterday

answered yesterday

markus

11.8k1233

2

Creat a function, of course! I should of thought of that, far simpler than a complex for loop! Thanks!

– tom91
yesterday

add a comment |

You could also define a function first ..

f <- function(data) {



  # sort data by flow

  data <- data[order(data['flow'], decreasing = TRUE), ]



  # apply your functions

  data["area_portion"] <- data['Area'] / sum(data['Area']) * 100

  data["flow_portion"] <- data['flow'] / sum(data['flow']) * 100

  data["cum_area"] <- cumsum(data['area_portion'])

  data["cum_flow"] <- cumsum(data['flow_portion'])

  data

  }

.. and use lapply to, ahhm, apply f to your list

out <- lapply(files, f)

out

#[[1]]

#  Area flow area_portion flow_portion  cum_area  cum_flow

#1    4    1     26.66667     33.33333  26.66667  33.33333

#2    6    1     40.00000     33.33333  66.66667  66.66667

#3    5    1     33.33333     33.33333 100.00000 100.00000



#[[2]]

#  Area flow area_portion flow_portion  cum_area cum_flow

#2    8    2     44.44444           50  44.44444       50

#1    6    1     33.33333           25  77.77778       75

#3    4    1     22.22222           25 100.00000      100

If you want to change the names of out you can use setNames

out <- setNames(lapply(files, f), paste0(c("A", "B"), "_sorted"))

# or

# out <- setNames(lapply(files, f), paste0(unlist(frames), "_sorted"))

edited yesterday

answered yesterday

markus

11.8k1233

You could also define a function first ..

f <- function(data) {



  # sort data by flow

  data <- data[order(data['flow'], decreasing = TRUE), ]



  # apply your functions

  data["area_portion"] <- data['Area'] / sum(data['Area']) * 100

  data["flow_portion"] <- data['flow'] / sum(data['flow']) * 100

  data["cum_area"] <- cumsum(data['area_portion'])

  data["cum_flow"] <- cumsum(data['flow_portion'])

  data

  }

.. and use lapply to, ahhm, apply f to your list

out <- lapply(files, f)

out

#[[1]]

#  Area flow area_portion flow_portion  cum_area  cum_flow

#1    4    1     26.66667     33.33333  26.66667  33.33333

#2    6    1     40.00000     33.33333  66.66667  66.66667

#3    5    1     33.33333     33.33333 100.00000 100.00000



#[[2]]

#  Area flow area_portion flow_portion  cum_area cum_flow

#2    8    2     44.44444           50  44.44444       50

#1    6    1     33.33333           25  77.77778       75

#3    4    1     22.22222           25 100.00000      100

If you want to change the names of out you can use setNames

out <- setNames(lapply(files, f), paste0(c("A", "B"), "_sorted"))

# or

# out <- setNames(lapply(files, f), paste0(unlist(frames), "_sorted"))

edited yesterday

answered yesterday

markus

11.8k1233

edited yesterday

answered yesterday

markus

11.8k1233

answered yesterday

markus

11.8k1233

answered yesterday

markus

11.8k1233

2

Creat a function, of course! I should of thought of that, far simpler than a complex for loop! Thanks!

– tom91
yesterday

add a comment |

2

Creat a function, of course! I should of thought of that, far simpler than a complex for loop! Thanks!

– tom91
yesterday

Creat a function, of course! I should of thought of that, far simpler than a complex for loop! Thanks!

– tom91
yesterday

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Argthtjtr