class: title-slide, center, middle <link rel="stylesheet" href="https://use.fontawesome.com/releases/v5.6.0/css/all.css" integrity="sha384-aOkxzJ5uQz7WBObEZcHvV5JvRW3TUc2rNPA7pe3AwnsUohiw1Vj2Rgx2KSOkF5+h" crossorigin="anonymous"> <style> .center2 { margin: 0; position: absolute; top: 50%; left: 50%; -ms-transform: translate(-50%, -50%); transform: translate(-50%, -50%); } .rcorners1 { margin: auto; border-radius: 25px; background: #ada500; padding: 10px; # width: 50%; } </style> <style type="text/css"> .right-column{ padding-top: 0; } .remark-code, .remark-inline-code { font-family: 'Source Code Pro', 'Lucida Console', Monaco, monospace; font-size: 90%; } </style> <div class="my-logo-left"> <img src="img/edubron-en-rgb.jpg" width="100%" /> </div> <div class="my-logo-right"> <img src="img/Logo Methods Hub.png" width="100%"/> </div> # A reProducible woRkflow with Quarto .font160[ .SW-greenD[Part 1] ] .font120[ .SW-greenD[*About reproducibility and R*] ] Sven De Maeyer & Tine van Daal .font80[ .UA-red[ 2nd - 3th March, 2026 ] ] --- class: inverse-green, left # Overzicht .center2[ 1. RepRoducibility?! --- ([click here](#part1)) 2. Sta`R`t to be `R`eproducible --- ([click here](#part2)) 3. A woRld of packages! --- ([click here](#part3)) 4. Working with `R`-projects --- ([click here](#part4)) 5. Importing data --- ([click here](#part5)) ] --- class: inverse-green, center, middle name: part1 # 1. Rep`R`oducibility?! --- ## Key principles of Open Science <br> <br> -- .font140[.SW-greenD[**Transparency**]]<br> .SW-greenL[Openness about our methodology, data, and results.] -- .font140[.SW-greenD[**Openness**]]<br>.SW-greenL[Research and research outputs are freely available to everyone.] -- .font140[.SW-greenD[**Reproducibility**]]<br>.SW-greenL[Research methods and data are shared in such a way that others can **follow, reproduce, and reuse** the decisions we have made.] <br> <br> <br> <br> <br> <br> <br> <br> .font80[Definitions taken from Edubron's [mission statement](https://www.uantwerpen.be/en/research-groups/edubron/research/open-science/) regarding Open Science.] --- ## A reproducible workflow = your external (research) hard drive .Large["*Can you redo that analysis to check ...?*"] -- <center> <img src="img/reproducible_not.jpg" width="600" height="453" /> </center> --- ## Reproducible or replicable? .pull-left[ <img src="img/reproducible-definitiongrid.jpg" width="538" height="382" /> <br> <br> <br> <br> .footnote[ .font50[ <i class="fas fa-link" style="color: #FF0035;"></i> * Figure from The Turing Way Community & Scriberia (2024) downloaded from https://zenodo.org/records/13882307 * ] ] ] -- .pull-right[ <br> <br> <br> .right[ .font140[ *What does it take to do <br>reproducible research?* ] ] ] --- ## Towards a reproducible workflow <center> <img src="img/Open Huis Workflow.png" width="600" height="453" /> </center> --- ## A reproducible quantitative workflow <img src="https://r4ds.hadley.nz/diagrams/data-science/base.png" width="538" height="382" /> <br> <br> .footnote[ .font50[ <i class="fas fa-link" style="color: #FF0035;"></i> * Wickham, H., Cetinkayak-Rundel, M., & Grolemund, G. (2023) R for Data Science (second edition). O'Reilly. Also available online: https://www.edureka.co/blog/r-programming-language * ] ] --- ## Why we talk about rep<img src="img/Rlogo.svg" width="5%"/>oducibility? .pull-left[ `R` is a powerful tool for: - Data cleaning and preparation - Statistical analysis - Data visualisation - ... <img src="Slides_part1_files/figure-html/unnamed-chunk-2-1.png" width="320px" /> ] -- .pull-right[ <br> `R` facilitates .SW-greenD[**reproducibility**] (and transparency) by allowing you to write a **script**. <br> <br> <br> <br> <br> .font80[ ``` r library(tidyverse) #load tidyverse package library(datasauRus) #load datasauRus package datasaurus_dozen %>% filter(dataset=="star") %>% #filter rows ggplot( aes(x = x, y = y) #define x- and y-axis )+ geom_point(size = 1.2) + #change size of points annotate( #add text geom = "text", x = 60, y = 52, size = 8, fontface = "bold", color = "#AAAA17", label = "R is an open science staR!" ) + theme_void() + #set theme of plot theme(legend.position = "none") ``` ] ] --- class: inverse-green, center, middle name: part2 # 2. Sta`R`t to be `R`eproducible --- ## RStudio to make you`R` life easie`R` <left> <img src="img/rstudio-panes-labeled.jpeg" width="55%"/> </left> <br> <br> .footnote[ .font50[ <i class="fas fa-link" style="color: #FF0035;"></i> * Figure from RStudio User Guide (Release 2026.01.01) https://docs.posit.co/ide/user/ide/get-started/ * ] ] --- class: inverse-blue, center, middle ## Let's make a script in `RStudio` .rcorners1[.Large[ <i class="fas fa-code" style="color: #FF0035;"></i> * Time to get started with the real work and code together* ] .small[You can find the code on the next slides. There is also a script `Fruit.R` that contains the same code (see course material for Part 1 online)] ] --- ### <i class="fas fa-code" style="color: #FF0035;"></i> .SW-greenD[Code-Blocks: character vector] We create a vector and call it _.UA-blue[Fruit]_ .small[ ``` r Fruit <- c("Apples", "Bananas", "Lemons", "Berries", "Peaches", NA) ``` ] .font80[ - .UA-red[`c()`] represents .SW-greenD[*concatenate*], all elements between the brackets ( separated by `,`) are 'merged' into one element - .UA-red[`<-`] means that we want to store the result in an object (a vector) that we call `Fruit` - the .UA-red[`" "`] that surround the elements indicate that these are of the type *character*] Let's look at the object _.UA-blue[Fruit]_ by printing it to the console .small[ ``` r Fruit ``` ``` ## [1] "Apples" "Bananas" "Lemons" "Berries" "Peaches" NA ``` ] We can also check the structure of the object _.UA-blue[Fruit]_ by using the **.UA-red[`str()`]** function .small[ ``` r str(Fruit) ``` ``` ## chr [1:6] "Apples" "Bananas" "Lemons" "Berries" "Peaches" NA ``` ] --- ### <i class="fas fa-code" style="color: #FF0035;"></i> .SW-greenD[Code-Blocks: numeric vector] We create a vector called _.UA-blue[Weight]_ .small[ ``` r Weight <- c(230, 191, 93, 100, 48, 244) ``` ] <br> In this case, the elements are 'recorded' as numeric elements .small[ ``` r str(Weight) ``` ``` ## num [1:6] 230 191 93 100 48 244 ``` ] <br> Now, we can start calculating... For example, we can apply the function .UA-red[`mean( )`] .small[ ``` r mean(Weight) ``` ``` ## [1] 151 ``` ] --- ### <i class="fas fa-code" style="color: #FF0035;"></i> .SW-greenD[Code-Blocks: logical vector] We create a vector called _.UA-blue[Yellow]_ ``` r Yellow <- c(F, T, T, F, F, F) ``` <br> The element of the vector _.UA-blue[Yellow]_ are 'logical operators' (TRUE or FALSE) ``` r str(Yellow) ``` ``` ## logi [1:6] FALSE TRUE TRUE FALSE FALSE FALSE ``` <br> Let us have a closer look at the vector _.UA-blue[Yellow]_ ``` r Yellow ``` ``` ## [1] FALSE TRUE TRUE FALSE FALSE FALSE ``` --- ### <i class="fas fa-code" style="color: #FF0035;"></i> .SW-greenD[Code-Blocks: a data frame] .pull-left[ We create a data.frame called _.UA-blue[Fruit_data]_ using the function **.UA-red[`data.frame( )`]** .small[ ``` r Fruit_data <- data.frame(Fruit, Weight, Yellow) ``` ] <br> Let's have a look at *.UA-blue[Fruit_data]* .small[ ``` r Fruit_data ``` ``` ## Fruit Weight Yellow ## 1 Apples 230 FALSE ## 2 Bananas 191 TRUE ## 3 Lemons 93 TRUE ## 4 Berries 100 FALSE ## 5 Peaches 48 FALSE ## 6 <NA> 244 FALSE ``` ] ] .pull-right[ We can check the structure of the object _.UA-blue[Fruit_data]_ .small[ ``` r str(Fruit_data) ``` ``` ## 'data.frame': 6 obs. of 3 variables: ## $ Fruit : chr "Apples" "Bananas" "Lemons" "Berries" ... ## $ Weight: num 230 191 93 100 48 244 ## $ Yellow: logi FALSE TRUE TRUE FALSE FALSE FALSE ``` ] ] --- ### <i class="fas fa-code" style="color: #FF0035;"></i> .SW-greenD[Code-Blocks: subsetting a data frame] .pull-left[ The **.UA-red[`$`] operator** is used to refer to a vector .small[ ``` r Fruit_data$Weight ``` ``` ## [1] 230 191 93 100 48 244 ``` ] By **indexing** specific columns- (and/or) row-numbers can be selected .UA-red[`[row, column]`] ] .pull-right[ **Examples of indexing**: - Retrieve element that is located at row 1 and column 1 .small[ ``` r Fruit_data[1,1] ``` ``` ## [1] "Apples" ``` ] - Retrieve all elements in column 3 .small[ ``` r Fruit_data[,3] ``` ``` ## [1] FALSE TRUE TRUE FALSE FALSE FALSE ``` ] - Retrieve all elements in row 3 .small[ ``` r Fruit_data[3,] ``` ``` ## Fruit Weight Yellow ## 3 Lemons 93 TRUE ``` ] ] --- ## Type of objects <br> .pull-left[ - *vectors* - matrices - arrays - *data frames* - lists - *functions* ... ] .pull-right[ <img src="img/Data_Types.png" width="80%" height="80%" /> ] .footnote[.small[ <i class="fas fa-link" style="color: #FF0035;"></i> * Figure from https://www.edureka.co/blog/r-programming-language *] ] --- ## Vectors <img src="img/Vectors.png" width="40%" height="40%" /> <br> .footnote[ .small[ <i class="fas fa-link" style="color: #FF0035;"></i> * Figure from https://www.edureka.co/blog/r-programming-language * ] ] --- ## This is a *repRoducible* workshop! Various types of files used during the workshop: - R-scripts (as example) (.R) - Rmd-files (for slides) (.Rmd) - Qmd-files (for exercises) (.Qmd) - data sets (.sav, .csv, ...) - html-version of slides and exercises (.html) All these files can be found at the dedicated web-page: https://r-workshop-edubron26.netlify.app/ --- class: inverse-green, center, middle name: part3 # 3. A woRld of packages! --- ## R-Packages?! .pull-left[ - The unive<img src="img/Rlogo.svg" width="8%"/>sum is in constant development - Package = extension of the `Base`-functions - Range from .SW-greenD[specialised] to .SW-greenD[generic/universal] packages - Overview on .font100[https://cran.r-project.org/web/packages/available_packages_by_name.html] - Which package(s) to use? ] .pull-right[ <img src="img/R_packages.jpg" width="100%" height="100%" style="display: block; margin: auto 0 auto auto;" /> ] --- ## The `tidyverse` package is a .UA-red[**MUST-HAVE**] for everyone! <br> <br> <img src="img/Tidyverse_screen.jpg" width="70%" height="70%" /> .footnote[ .font50[ <i class="fas fa-link" style="color: #FF0035;"></i> * More information at https://tidyverse.org * ] ] --- ## A reproducible workflow with tidyverse <center> <img src="img/tidyverse-package-workflow.png" width="75%"/> </center> <br> <br> .footnote[ .font50[ <i class="fas fa-link" style="color: #FF0035;"></i> * Figure from Teach data Science https://teachdatascience.com/tidyverse/ * ] ] --- ## .UA-red[ `install.packages( )`] and .SW-greenD[`library( )`] .pull-left[ Packages can be .UA-red[**downloaded and installed**] - by 'clicking' in `RStudio` - **by using a function in the console or script** *What Sven usually does: `install.packages()`* *What Tine usually does: click and go!* For example, ``` r install.packages("tidyverse", dependencies = T) ``` ] -- .pull-right[ Packages should be .SW-greenD[**activated**] at the start of the session - by 'clicking' in `RStudio` - **by using a function in de console or script** <br> *What we both usually do: `library()`* For example, ``` r library("tidyverse") ``` ] --- class: inverse-green, center, middle name: part4 # 4. Working with `R`-projects --- ## Handling all these files <br> During analyses, we will regularly **handle different files**: - data sets - scripts - output (html-files; figures; ...) <br> <br> Consequently, we should **refer to these files in our code**. --- ## Taking the right path ... There are two ways to refer to these files. .pull-left[ .SW-greenD[**USING ABSOLUTE PATHS**] We can do this using an *ABSOLUTE path*. Below is an example of an absolute path: ``` r 'c:/Users/Sven/Mijn Documenten/UAntwerpen/Analyses/ProjectX/R_Script/Analysescript1.R' ``` .font80[Absolute paths make it difficult to share your work; you get into trouble when you change laptops; ... ] ] -- .pull-right[ .SW-greenD[**USING RELATIVE PATHS**] To avoid these problems, `RStudio` introduced the concept of an *.UA-red[`R`-Project].* Within a project you can use *RELATIVE paths* (that starts from the folder in which you save the project). <br> Below is an example of a relative path: ``` r '~R_Script/Analysescript1.R' ``` ] --- ## Sharing projects? <br> If Tine puts a project of Sven on her laptop, the .SW-greenD[RELATIVE paths] might get her into trouble. For example, the Rmd-file 'Slides_part1.Rmd' is located at `'c:/Users/Sven/Dropbox/R_workshop_Edubron26/Presentations/Part 1/Slides_part1.Rmd' ` <br> <br> Of course, Tine doesn't have the same structure on her laptop... <br> <br> To facilitate sharing of projects, the package **`here`** has been created. --- ## Creating relative paths with function .UA-red[`here()`] The function `**here()**` creates a paths relative to the top-level directory of my laptop. ``` r library(here) here() ``` ``` ## [1] "/Users/svendemaeyer/Library/CloudStorage/OneDrive-UniversiteitAntwerpen/Bestanden van Tine van Daal - A repRoducible woRkflow with Quarto" ``` ``` r here("Presentations", "Part 1", "Slides_part1_Rmd") ``` ``` ## [1] "/Users/svendemaeyer/Library/CloudStorage/OneDrive-UniversiteitAntwerpen/Bestanden van Tine van Daal - A repRoducible woRkflow with Quarto/Presentations/Part 1/Slides_part1_Rmd" ``` --- class: inverse-blue, center, middle ## Exercises .rcorners1[.Large[ <i class="fas fa-code" style="color: #FF0035;"></i> .white[*Creating a project in `RStudio`*] <br> .small[You can find the folder `Exercises` at the Github: <https://github.com/Sdemaeyer2/R_workshop_Edubron26/>. Download this folder and put it somewhere on your laptop.] ] ] --- ## Creating a project in `RStudio` Click on .UA-red[`File/New Project...`] Next, you can choose: <img src="img/New_Project.jpg" width="50%" height="50%" /> --- class: inverse-green, center, middle name: part8 # 5. Importing data --- ## Data comes in different formats Data exist in various formats, but the most common ones are: - MS Excel - SPSS / Stata / SAS - text (csv, tab-delimited) Various methods and packages have been developed to import these types of data We will show some possibilities --- ## Importing MS Excel data with the .UA-red[`readxl`] pakket Make sure that the package is installed on your laptop Then, you can use the code below to import the data .small[ ``` r library(readxl) Data <- read_excel( path = "<path to map and file>" ) ``` ] *Imagine*: you have an excel-file .SW-greenD[**Datacollection1**] with multiple sheets *Goal*: you only want to import the sheet with name .SW-greenD[**Group2**] and save it as a data frame .SW-greenD[**Data_Group2**] .small[ ``` r Data_Groep2 <- read_excel( path = "<path to map and file>/Datacollection1.xlsx", sheet = "Group1" ) ``` ] You shouldn't use the argument `sheet` if: - the excel-file only includes 1 sheet - there are multiple sheets but you only want to import the first one --- ## Importing SPSS data with package .UA-red[`foreign`] Make sure that the package is installed on your laptop Then, you can use the code below to import the data ``` r library(foreign) Data <- read.spss( file = "<path to map and file>", use.value.labels = FALSE, to.data.frame = TRUE ) ``` > .SW-greenD[`use.value.labels`] argument: <br> <br> In SPSS, data is labelled. Do you want these labels to appear in the R data-frame? <br> <br> *For example. Variable with 3 categories: 1 = "Low", 2 = "Average", 3 = "High". These labels are included in the SPSS-file. You can choose to import these labels in the R data frame (`use.value.labels = TRUE`) or only the numbers 1, 2 and 3 that represent these categories (`use.value.labels = FALSE`)* --- ## Importing csv-files with function .UA-red[`read.table( )`] .pull-left[ csv-files usually look like this .font80[(separated by .UA-red[.large[`,`]] or .UA-red[.large[`;`]]) ] .small[ ``` r Column1, Column2, Column3 1, 3, 5 2, 4, 6 8, 10, 99 ``` ] To import these data, the function .UA-red[`read_table( )`] is most convenient... .small[ ``` r Data <- read.table( file = "<path to map and file>", header = TRUE, sep = ",", dec = "." ) ``` ] ] .pull-right[ - **.SW-greenD[`header`]** argument: <br> <br> The first row includes column (variable) names or not? - **.SW-greenD[`sep`]** argument: <br> <br> Which character is used to separate the values in the different columns? - **.SW-greenD[`dec`]** argument: <br> <br> Which character is used to indicate a decimal point? .font80[(Can be a point or a comma)] ]