# Mapping a list of functions to a list of datasets with a list of columns as arguments

**Econometrics and Free Software**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This week I had the opportunity to teach R at my workplace, again. This course was the “advanced R” course, and unlike the one I taught at the end of last year, I had one more day (so 3 days in total) where I could show my colleagues the joys of the `tidyverse`

and R.

To finish the section on programming with R, which was the very last section of the whole 3 day course I wanted to blow their minds; I had already shown them packages from the `tidyverse`

in the previous days, such as `dplyr`

, `purrr`

and `stringr`

, among others. I taught them how to use `ggplot2`

, `broom`

and `modelr`

. They also liked `janitor`

and `rio`

very much. I noticed that it took them a bit more time and effort for them to digest `purrr::map()`

and `purrr::reduce()`

, but they all seemed to see how powerful these functions were. To finish on a very high note, I showed them the ultimate `purrr::map()`

use case.

Consider the following; imagine you have a situation where you are working on a list of datasets. These datasets might be the same, but for different years, or for different countries, or they might be completely different datasets entirely. If you used `rio::import_list()`

to read them into R, you will have them in a nice list. Let’s consider the following list as an example:

library(tidyverse) data(mtcars) data(iris) data_list = list(mtcars, iris)

I made the choice to have completely different datasets. Now, I would like to map some functions to the columns of these datasets. If I only worked on one, for example on `mtcars`

, I would do something like:

my_summarise_f = function(dataset, cols, funcs){ dataset %>% summarise_at(vars(!!!cols), funs(!!!funcs)) }

And then I would use my function like so:

mtcars %>% my_summarise_f(quos(mpg, drat, hp), quos(mean, sd, max)) ## mpg_mean drat_mean hp_mean mpg_sd drat_sd hp_sd mpg_max drat_max ## 1 20.09062 3.596563 146.6875 6.026948 0.5346787 68.56287 33.9 4.93 ## hp_max ## 1 335

`my_summarise_f()`

takes a dataset, a list of columns and a list of functions as arguments and uses tidy evaluation to apply `mean()`

, `sd()`

, and `max()`

to the columns `mpg`

, `drat`

and `hp`

of `mtcars`

. That’s pretty useful, but not useful enough! Now I want to apply this to the list of datasets I defined above. For this, let’s define the list of columns I want to work on:

cols_mtcars = quos(mpg, drat, hp) cols_iris = quos(Sepal.Length, Sepal.Width) cols_list = list(cols_mtcars, cols_iris)

Now, let’s use some `purrr`

magic to apply the functions I want to the columns I have defined in `list_cols`

:

map2(data_list, cols_list, my_summarise_f, funcs = quos(mean, sd, max)) ## [[1]] ## mpg_mean drat_mean hp_mean mpg_sd drat_sd hp_sd mpg_max drat_max ## 1 20.09062 3.596563 146.6875 6.026948 0.5346787 68.56287 33.9 4.93 ## hp_max ## 1 335 ## ## [[2]] ## Sepal.Length_mean Sepal.Width_mean Sepal.Length_sd Sepal.Width_sd ## 1 5.843333 3.057333 0.8280661 0.4358663 ## Sepal.Length_max Sepal.Width_max ## 1 7.9 4.4

That’s pretty useful, but not useful enough! I want to also use different functions to different datasets!

Well, let’s define a list of functions then:

funcs_mtcars = quos(mean, sd, max) funcs_iris = quos(median, min) funcs_list = list(funcs_mtcars, funcs_iris)

Because there is no `map3()`

, we need to use `pmap()`

:

pmap( list( dataset = data_list, cols = cols_list, funcs = funcs_list ), my_summarise_f) ## [[1]] ## mpg_mean drat_mean hp_mean mpg_sd drat_sd hp_sd mpg_max drat_max ## 1 20.09062 3.596563 146.6875 6.026948 0.5346787 68.56287 33.9 4.93 ## hp_max ## 1 335 ## ## [[2]] ## Sepal.Length_median Sepal.Width_median Sepal.Length_min Sepal.Width_min ## 1 5.8 3 4.3 2

Now I’m satisfied! Let me tell you, this blew their minds ?!

To be able to use things like that, I told them to always solve a problem for a single example, and from there, try to generalize their solution using functional programming tools found in `purrr`

.

If you found this blog post useful, you might want to follow me on twitter for blog post updates.

**leave a comment**for the author, please follow the link and comment on their blog:

**Econometrics and Free Software**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.