A reProducible woRkflow with Quarto

class: title-slide, center, middle

<style>

.center2 {
  margin: 0;
  position: absolute;
  top: 50%;
  left: 50%;
  -ms-transform: translate(-50%, -50%);
  transform: translate(-50%, -50%);
}

.rcorners1 {
  margin: auto;
  border-radius: 25px;
  background: #ada500;
  padding: 10px;
#  width: 50%;
}
</style>

.remark-code, .remark-inline-code { font-family: 'Source Code Pro', 'Lucida Console', Monaco, monospace;
                                    font-size: 90%;
                                  }

</style>

# A reProducible woRkflow with Quarto
.font160[
.SW-greenD[Part 3]
]
.font120[
.SW-greenD[*Data manipulation with*] .UA-red[*`dplyr`*]
]
Sven De Maeyer & Tine van Daal

.font80[
.UA-red[
2nd - 3th March, 2026
]
]

---
class: inverse-green, left

# Overview

.center2[
- Tidyverse --- ([Click here](#part1))
- The `dplyr` package --- ([Cliick here](#part2))

]

---
class: inverse-green, center, middle
name: part1

# 1. Tidyverse

---

## Welcom in the .UA-red[`tidyverse`]

.center2[
<img src="tidyverse_data_science.png" alt="" width="100%" height="100%" />
]

---

## Why .UA-red[`tidyverse`]?
<br>
More accessible for beginners
<br>

Consistent approach for all potential tasks
<br>

Powerful potential applications with minimum 'effort'
<br>

Can give you the confidence to explore `R`
---
## Tibble

Normally we work with a .SW-greenD[dataframe] in `R` but we can have very complex data-structures as well (e.g., lists, matrices, ...)

In the `tidyverse` ecosystem we work with a simple form of data-structure: a `tibble`

A tibble is a dataframe that fits the **tidy data** principle

.footnotesize[

``` r
Friends
```

```
## # A tibble: 108 × 4
##    student occassion condition fluency
##      <dbl>     <dbl>     <dbl>   <dbl>
##  1       1         1         1   101. 
##  2       1         2         1   104. 
##  3       1         3         1   117. 
##  4       2         1         2    98.8
##  5       2         2         2   107. 
##  6       2         3         2   111. 
##  7       3         1         3   105. 
##  8       3         2         3   102. 
##  9       3         3         3   101. 
## 10       4         1         1   102. 
## # ℹ 98 more rows
```

]

---
## What is **tidy data**?

<p align="right">.footnotesize[.SW-greenD[*Artwork by @allison_horst*]] </p>

---
## What is **tidy data**?

<p align="right">.footnotesize[.SW-greenD[*Artwork by @allison_horst*]] </p>

---
## What is **tidy data**?

<p align="right">.footnotesize[.SW-greenD[*Artwork by @allison_horst*]] </p>

---
class: inverse-green, center, middle
name: part2

# 2. The .UA-red[`dplyr`] package

---

## .UA-red[`dplyr`] ...

.Large[is THE package to work with tidy data !]

.SW-greenD[**VERBS**] are at the core:

- `filter()`
- `mutate()`
- `select()`
- `group_by() + summarise()`
- `arrange()`
- `rename()`
- `relocate()`
- `join()`

---

https://raw.githubusercontent.com/rstudio/cheatsheets/master/data-transformation.pdf
---

## The .UA-red[`%>%`] operator (a 'pipe')
.left-column[
<img src="magrittr_stxndz.png" alt="" width="100%" height="100%" />
<br>
<p align="center">To create <br>.SW-greenD[**a chain of functions**] </p>

]

.right-column[

Instead of

``` r
mean(c(1,2,3,4))
```

``` r
Numbers <- c(1,2,3,4)
mean(Numbers)
```
you can do

``` r
c(1,2,3,4) %>% 
  mean( )
```

With the **`%>%`** you can write a sentence like:

> *I .UA-red[`%>%`] woke up .UA-red[`%>%`], took a shower .UA-red[`%>%`], got breakfast .UA-red[`%>%`], took the train .UA-red[`%>%`] and arrived at the ICO course .UA-red[`%>%`] …*
]

---

## .UA-red[`filter()`]

<p align="right">.footnotesize[.SW-greenD[*Artwork by @allison_horst*]] </p>

---
## Let's apply .UA-red[`filter()`]

With the FRIENDS data:

> .SW-greenD[*We only select observations from the first measurement occassion in condition 1*]

``` r
Friends_Occ1 <- Friends %>%
  filter(occassion == 1 & condition == 1)
```

.UA-red[`==`] is *equals* (notice the 2 = signs!)

> .SW-greenD[*Let's clean some data, and remove observations with fluency values above 300 and that do not equal fluence value 0*]

``` r
Friends_clean <- Friends %>%
  filter(fluency < 300 & fluency != 0)
```

.UA-red[`!=`] means *not equal to*

---
## .UA-red[`mutate()`]

<p align="right">.footnotesize[.SW-greenD[*Artwork by @allison_horst*]] </p>

---
## Let's apply .UA-red[`mutate()`]

With the Friends data:

> .SW-greenD[*We calculate a new variable containing the fluency scores minus the average of fluency*]

``` r
Friends <- Friends %>%
  mutate(
    fluency_centered = fluency - mean(fluency, na.rm = T)
    )
```

---

## Let's apply .UA-red[`mutate()`]

With the Friends data:

> .SW-greenD[*We create a factor for condition*]

``` r
Friends <- Friends %>%
  mutate(
    condition_factor = as.factor(condition)
  )

str(Friends$condition_factor)
```

```
##  Factor w/ 3 levels "1","2","3": 1 1 1 2 2 2 3 3 3 1 ...
```

---
## Let's apply .UA-red[`select()`]

.font-size140[To **select** variables.]

Some examples with the Friends data:

> .SW-greenD[*We only select `condition` and `occasion` and inspect the result with the `str()`function*]

.footnotesize[

``` r
Friends %>%
  select(
    condition, occassion
  ) %>%
  str()
```

```
## tibble [108 × 2] (S3: tbl_df/tbl/data.frame)
##  $ condition: num [1:108] 1 1 1 2 2 2 3 3 3 1 ...
##   ..- attr(*, "value.labels")= Named chr [1:3] "3" "2" "1"
##   .. ..- attr(*, "names")= chr [1:3] "No subtitles" "Spanish" "English"
##  $ occassion: num [1:108] 1 2 3 1 2 3 1 2 3 1 ...
##  - attr(*, "variable.labels")= Named chr(0) 
##   ..- attr(*, "names")= chr(0) 
##  - attr(*, "codepage")= int 1252
```
]
---
## Rename variables with .UA-red[`rename()`]

Notice how the variable `occassion` is misspelled! Pretty annoying when coding... But we can easily **rename** variables.

Function `rename(new_name = old_name)`

> .SW-greenD[*Rename the variable `occassion` to `occasion`* ]

.footnotesize[

``` r
Friends <- Friends %>%
  rename(
    occasion = occassion
  )
```
]

---
## Super combo 1: .UA-red[`group_by() + summarize( )`]

Transform a tibble to a *grouped tibble* making use of `group_by()`

Calculate summary stats per group making use of `summarize()`

> .SW-greenD[*Calculate the average fluency and standard deviation per condition* ]

.footnotesize[

``` r
Friends %>%
  group_by(
    condition
  ) %>%
  summarize(
    mean_fluency = mean(fluency),
    sd_fluency   = sd(fluency)
  )
```

```
## # A tibble: 3 × 3
##   condition mean_fluency sd_fluency
##       <dbl>        <dbl>      <dbl>
## 1         1         109.       9.08
## 2         2         108.       6.02
## 3         3         103.       4.17
```
]

---
## Super combo 1: .UA-red[`group_by() + summarize( )`]

> .SW-greenD[*Calculate the number of observations for each combination of condition and occasion* ]

.footnotesize[

``` r
Friends %>%
  group_by(
    occasion, condition
  ) %>%
  summarize(
    n_observations  = n()
  )
```

```
## # A tibble: 9 × 3
## # Groups:   occasion [3]
##   occasion condition n_observations
##      <dbl>     <dbl>          <int>
## 1        1         1             12
## 2        1         2             12
## 3        1         3             12
## 4        2         1             12
## 5        2         2             12
## 6        2         3             12
## 7        3         1             12
## 8        3         2             12
## 9        3         3             12
```
]

---
## Super combo 2: .UA-red[`mutate() + case_when( )`]

<p align="right">.footnotesize[.SW-greenD[*Artwork by @allison_horst*]] </p>

---
## Super combo 2: .UA-red[`mutate() + case_when( )`]

To **recode** variables into new variables!

.pull-left[
> .SW-greenD[*We create a new categorical variant of fluency with 3 groups, then we select this new variable and have a look to the top 5 observations...* ]]

.pull-right[
.footnotesize[

``` r
Friends %>%
  mutate(
    fluency_grouped = case_when(
      fluency < 106.625 - 7.1 ~ 'low',
      fluency >= 106.625 - 7.1 & fluency < 106.625 + 7.1 ~ 'average',
      fluency >= 106.625 + 7.1 ~ 'high'
    )
  ) %>% 
  select(
    fluency,
    fluency_grouped
    ) %>%
  head(5)
```

```
## # A tibble: 5 × 2
##   fluency fluency_grouped
##     <dbl> <chr>          
## 1   101.  average        
## 2   104.  average        
## 3   117.  high           
## 4    98.8 low            
## 5   107.  average
```
]
]

---

## How to define conditions
<br>
.UA-red[`x == y`] `$\rightarrow$` 'x is **equal** to y'

.UA-red[`x != y` ] `$\rightarrow$` 'x is **NOT equal** to y'

<br>

.UA-red[`x < y`] `$\rightarrow$` 'x is **smaller** than y'

.UA-red[`x <= y`] `$\rightarrow$` 'x is **smaller or equal** to y'

<br>

.UA-red[`x > y`] `$\rightarrow$` 'x is **higher** than y'

.UA-red[`x >= y`] `$\rightarrow$` 'x is **higher or equal** to y'

---

## Bolean operators
<br>
We can combine conditions!
<br>
<br>

.large[.UA-red[`&`]] represents the bolean operator **AND**<br>
.footnotesize[*for example: `gender == 1 & age <=18`*]
<br>

.large[.UA-red[`|`]] represents the bolean operator **OR**<br> 
.footnotesize[*for example: `gender == 1 | gender == 2`*]
<br>

.large[.UA-red[`!`]] represents the bolean operator **NOT**<br>
.footnotesize[*for example: `gender == 1 & !age <=18`*]

---

## Interactive tutorial about .UA-red[`dplyr()`]

If you want some more material and a place to exercise your skills? This online and freetutorial (made with the package  `learnr`) is strongly advised!

https://allisonhorst.shinyapps.io/dplyr-learnr/#section-welcome

---
class: inverse-blue

# <i class="fas fa-laptop-code" style="color: #FF0035;"></i> Exercise `dplyr`

.left-column[
![](https://media.giphy.com/media/A9grgCQ0Dm012/giphy.gif)
]
.right-column[
- You can find the qmd-file .SW-greenD[ `Exercises_dplyr.qmd`] in the Exercises folder (you created the project yesterday!) (Exercises > Exercise2_dplyr)

- Open this document

- You get a set of tasks with empty code blocks to start coding

- Write and test the necessary code

- Stuck? No Worries! 
  - We are there
  - Help each other
  - There is a solution key (.SW-greenD[`Exercises_dplyr_solutions.qmd`])

]