class: title-slide, center, middle <link rel="stylesheet" href="https://use.fontawesome.com/releases/v5.6.0/css/all.css" integrity="sha384-aOkxzJ5uQz7WBObEZcHvV5JvRW3TUc2rNPA7pe3AwnsUohiw1Vj2Rgx2KSOkF5+h" crossorigin="anonymous"> <style> .center2 { margin: 0; position: absolute; top: 50%; left: 50%; -ms-transform: translate(-50%, -50%); transform: translate(-50%, -50%); } .rcorners1 { margin: auto; border-radius: 25px; background: #ada500; padding: 10px; # width: 50%; } </style> <style type="text/css"> .right-column{ padding-top: 0; } .remark-code, .remark-inline-code { font-family: 'Source Code Pro', 'Lucida Console', Monaco, monospace; font-size: 90%; } </style> <div class="my-logo-left"> <img src="img/edubron-en-rgb.jpg" width="100%" /> </div> <div class="my-logo-right"> <img src="img/Logo Methods Hub.png" width="100%"/> </div> # A reProducible woRkflow with Quarto .font160[ .SW-greenD[Part 3] ] .font120[ .SW-greenD[*Data manipulation with*] .UA-red[*`dplyr`*] ] Sven De Maeyer & Tine van Daal .font80[ .UA-red[ 2nd - 3th March, 2026 ] ] --- class: inverse-green, left # Overview .center2[ - Tidyverse --- ([Click here](#part1)) - The `dplyr` package --- ([Cliick here](#part2)) ] --- class: inverse-green, center, middle name: part1 # 1. Tidyverse --- ## Welcom in the .UA-red[`tidyverse`] .center2[ <img src="tidyverse_data_science.png" alt="" width="100%" height="100%" /> ] --- ## Why .UA-red[`tidyverse`]? <br> More accessible for beginners <br> Consistent approach for all potential tasks <br> Powerful potential applications with minimum 'effort' <br> Can give you the confidence to explore `R` --- ## Tibble Normally we work with a .SW-greenD[dataframe] in `R` but we can have very complex data-structures as well (e.g., lists, matrices, ...) In the `tidyverse` ecosystem we work with a simple form of data-structure: a `tibble` A tibble is a dataframe that fits the **tidy data** principle .footnotesize[ ``` r Friends ``` ``` ## # A tibble: 108 × 4 ## student occassion condition fluency ## <dbl> <dbl> <dbl> <dbl> ## 1 1 1 1 101. ## 2 1 2 1 104. ## 3 1 3 1 117. ## 4 2 1 2 98.8 ## 5 2 2 2 107. ## 6 2 3 2 111. ## 7 3 1 3 105. ## 8 3 2 3 102. ## 9 3 3 3 101. ## 10 4 1 1 102. ## # ℹ 98 more rows ``` ] --- ## What is **tidy data**? <img src="tidydata_1.jpeg" alt="" width="80%" height="80%" /> <p align="right">.footnotesize[.SW-greenD[*Artwork by @allison_horst*]] </p> --- ## What is **tidy data**? <img src="tidydata_2.jpeg" alt="" width="80%" height="80%" /> <p align="right">.footnotesize[.SW-greenD[*Artwork by @allison_horst*]] </p> --- ## What is **tidy data**? <img src="tidydata_3.jpeg" alt="" width="80%" height="80%" /> <p align="right">.footnotesize[.SW-greenD[*Artwork by @allison_horst*]] </p> --- class: inverse-green, center, middle name: part2 # 2. The .UA-red[`dplyr`] package --- ## .UA-red[`dplyr`] ... .Large[is THE package to work with tidy data !] <br> <br> .SW-greenD[**VERBS**] are at the core: - `filter()` - `mutate()` - `select()` - `group_by() + summarise()` - `arrange()` - `rename()` - `relocate()` - `join()` --- <img src="dplyr_cheatsheet.jpg" alt="" width="60%" height="60%" /> https://raw.githubusercontent.com/rstudio/cheatsheets/master/data-transformation.pdf --- ## The .UA-red[`%>%`] operator (a 'pipe') .left-column[ <img src="magrittr_stxndz.png" alt="" width="100%" height="100%" /> <br> <p align="center">To create <br>.SW-greenD[**a chain of functions**] </p> ] .right-column[ Instead of ``` r mean(c(1,2,3,4)) ``` or ``` r Numbers <- c(1,2,3,4) mean(Numbers) ``` you can do ``` r c(1,2,3,4) %>% mean( ) ``` With the **`%>%`** you can write a sentence like: > *I .UA-red[`%>%`] woke up .UA-red[`%>%`], took a shower .UA-red[`%>%`], got breakfast .UA-red[`%>%`], took the train .UA-red[`%>%`] and arrived at the ICO course .UA-red[`%>%`] …* ] --- ## .UA-red[`filter()`] <img src="dplyr_filter.jpeg" alt="" width="70%" height="70%" /> <p align="right">.footnotesize[.SW-greenD[*Artwork by @allison_horst*]] </p> --- ## Let's apply .UA-red[`filter()`] With the FRIENDS data: > .SW-greenD[*We only select observations from the first measurement occassion in condition 1*] ``` r Friends_Occ1 <- Friends %>% filter(occassion == 1 & condition == 1) ``` .UA-red[`==`] is *equals* (notice the 2 = signs!) > .SW-greenD[*Let's clean some data, and remove observations with fluency values above 300 and that do not equal fluence value 0*] ``` r Friends_clean <- Friends %>% filter(fluency < 300 & fluency != 0) ``` .UA-red[`!=`] means *not equal to* --- ## .UA-red[`mutate()`] <img src="dplyr_mutate.png" alt="" width="50%" height="50%" /> <p align="right">.footnotesize[.SW-greenD[*Artwork by @allison_horst*]] </p> --- ## Let's apply .UA-red[`mutate()`] With the Friends data: > .SW-greenD[*We calculate a new variable containing the fluency scores minus the average of fluency*] ``` r Friends <- Friends %>% mutate( fluency_centered = fluency - mean(fluency, na.rm = T) ) ``` --- ## Let's apply .UA-red[`mutate()`] With the Friends data: > .SW-greenD[*We create a factor for condition*] ``` r Friends <- Friends %>% mutate( condition_factor = as.factor(condition) ) str(Friends$condition_factor) ``` ``` ## Factor w/ 3 levels "1","2","3": 1 1 1 2 2 2 3 3 3 1 ... ``` --- ## Let's apply .UA-red[`select()`] .font-size140[To **select** variables.] Some examples with the Friends data: > .SW-greenD[*We only select `condition` and `occasion` and inspect the result with the `str()`function*] .footnotesize[ ``` r Friends %>% select( condition, occassion ) %>% str() ``` ``` ## tibble [108 × 2] (S3: tbl_df/tbl/data.frame) ## $ condition: num [1:108] 1 1 1 2 2 2 3 3 3 1 ... ## ..- attr(*, "value.labels")= Named chr [1:3] "3" "2" "1" ## .. ..- attr(*, "names")= chr [1:3] "No subtitles" "Spanish" "English" ## $ occassion: num [1:108] 1 2 3 1 2 3 1 2 3 1 ... ## - attr(*, "variable.labels")= Named chr(0) ## ..- attr(*, "names")= chr(0) ## - attr(*, "codepage")= int 1252 ``` ] --- ## Rename variables with .UA-red[`rename()`] Notice how the variable `occassion` is misspelled! Pretty annoying when coding... But we can easily **rename** variables. Function `rename(new_name = old_name)` > .SW-greenD[*Rename the variable `occassion` to `occasion`* ] .footnotesize[ ``` r Friends <- Friends %>% rename( occasion = occassion ) ``` ] --- ## Super combo 1: .UA-red[`group_by() + summarize( )`] Transform a tibble to a *grouped tibble* making use of `group_by()` Calculate summary stats per group making use of `summarize()` > .SW-greenD[*Calculate the average fluency and standard deviation per condition* ] .footnotesize[ ``` r Friends %>% group_by( condition ) %>% summarize( mean_fluency = mean(fluency), sd_fluency = sd(fluency) ) ``` ``` ## # A tibble: 3 × 3 ## condition mean_fluency sd_fluency ## <dbl> <dbl> <dbl> ## 1 1 109. 9.08 ## 2 2 108. 6.02 ## 3 3 103. 4.17 ``` ] --- ## Super combo 1: .UA-red[`group_by() + summarize( )`] > .SW-greenD[*Calculate the number of observations for each combination of condition and occasion* ] .footnotesize[ ``` r Friends %>% group_by( occasion, condition ) %>% summarize( n_observations = n() ) ``` ``` ## # A tibble: 9 × 3 ## # Groups: occasion [3] ## occasion condition n_observations ## <dbl> <dbl> <int> ## 1 1 1 12 ## 2 1 2 12 ## 3 1 3 12 ## 4 2 1 12 ## 5 2 2 12 ## 6 2 3 12 ## 7 3 1 12 ## 8 3 2 12 ## 9 3 3 12 ``` ] --- ## Super combo 2: .UA-red[`mutate() + case_when( )`] <img src="dplyr_case_when_sm.png" alt="" width="70%" height="70%" /> <p align="right">.footnotesize[.SW-greenD[*Artwork by @allison_horst*]] </p> --- ## Super combo 2: .UA-red[`mutate() + case_when( )`] To **recode** variables into new variables! .pull-left[ > .SW-greenD[*We create a new categorical variant of fluency with 3 groups, then we select this new variable and have a look to the top 5 observations...* ]] .pull-right[ .footnotesize[ ``` r Friends %>% mutate( fluency_grouped = case_when( fluency < 106.625 - 7.1 ~ 'low', fluency >= 106.625 - 7.1 & fluency < 106.625 + 7.1 ~ 'average', fluency >= 106.625 + 7.1 ~ 'high' ) ) %>% select( fluency, fluency_grouped ) %>% head(5) ``` ``` ## # A tibble: 5 × 2 ## fluency fluency_grouped ## <dbl> <chr> ## 1 101. average ## 2 104. average ## 3 117. high ## 4 98.8 low ## 5 107. average ``` ] ] --- ## How to define conditions <br> .UA-red[`x == y`] `\(\rightarrow\)` 'x is **equal** to y' .UA-red[`x != y` ] `\(\rightarrow\)` 'x is **NOT equal** to y' <br> .UA-red[`x < y`] `\(\rightarrow\)` 'x is **smaller** than y' .UA-red[`x <= y`] `\(\rightarrow\)` 'x is **smaller or equal** to y' <br> .UA-red[`x > y`] `\(\rightarrow\)` 'x is **higher** than y' .UA-red[`x >= y`] `\(\rightarrow\)` 'x is **higher or equal** to y' --- ## Bolean operators <br> We can combine conditions! <br> <br> .large[.UA-red[`&`]] represents the bolean operator **AND**<br> .footnotesize[*for example: `gender == 1 & age <=18`*] <br> .large[.UA-red[`|`]] represents the bolean operator **OR**<br> .footnotesize[*for example: `gender == 1 | gender == 2`*] <br> .large[.UA-red[`!`]] represents the bolean operator **NOT**<br> .footnotesize[*for example: `gender == 1 & !age <=18`*] --- ## Interactive tutorial about .UA-red[`dplyr()`] If you want some more material and a place to exercise your skills? This online and freetutorial (made with the package `learnr`) is strongly advised! <img src="Interactive_tutorial_dplyr.jpg" alt="" width="50%" height="50%" /> https://allisonhorst.shinyapps.io/dplyr-learnr/#section-welcome --- class: inverse-blue # <i class="fas fa-laptop-code" style="color: #FF0035;"></i> Exercise `dplyr` .left-column[  ] .right-column[ - You can find the qmd-file .SW-greenD[ `Exercises_dplyr.qmd`] in the Exercises folder (you created the project yesterday!) (Exercises > Exercise2_dplyr) - Open this document - You get a set of tasks with empty code blocks to start coding - Write and test the necessary code - Stuck? No Worries! - We are there - Help each other - There is a solution key (.SW-greenD[`Exercises_dplyr_solutions.qmd`]) ]