Introduction
RMarkdown lets you combine R code and text in one document to create
dynamic, reproducible reports.
By embedding code directly within your written explanations,
RMarkdown ensures that your analysis, results, and visuals are
automatically updated if the data or code changes—making it easy for
others (and your future self!) to understand, verify, and rerun your
work exactly as it was done.
In this short tutorial, we will learn:
- How to use R code within a RMarkdown document
- How to import and explore our dataset
- How to recode column names
Intro to Data in R
Basic Syntax of R
The most important components of an R script are
objects and functions. Objects store
information and functions are used to manipulate the data.
Assignment operators, pipes and
arguments are used to link objects and functions and
communicate what we want to do.

Objects
An object is anything you create and name in R. It can be a number, a
dataset, a function, or even a plot. Objects take on content from
everything to the right of the assignment operator.
x <- 5 #x is now an object that holds the value 5
b <- "Anna" #b is now an object that holds the character Anna
Notes:
Since Anna is a character, it needs to be wrapped in quotations
(e will learn more about data types tomorrow).
The symbol #
is used within a code chunk to insert
comments. Comments won’t affect how the code runs, but text that is not
signaled as a comment will generate errors.
Functions
Functions are a set of instructions that accomplish a task. They are
often (though not always) instructions to be performed on an argument.
Functions do something—like calculate, sort, or plot. You call them by
their name and add parentheses.
Arguments
Arguments are the details you give a function so it knows what to do.
They go inside the parentheses of a function. Let’s take a look at the
function mean()
Type mean
in the Help tab of the bottom left panel. The
results will provide a description of the function, including its
arguments.
x |
x is an R object that contains the numbers we want to
find the mean of |
No default, required |
trim |
A number between 0 and 0.5. Removes a fraction of
highest and lowest values before computing mean (useful if you want a
trimmed mean that ignores outliers). |
0 means no trimming |
na.rm |
Indicates whether NA values should be included or not
in the calculation |
TRUE if NA values should be removed, FALSE if not |
Assignment Operators
This is how you store a value in R. It’s like saying: “Let this name
hold this value.” It assigns content from the
objects/functions/arguments on its right to the object on its left.
name <- "Maria" #Now name holds the string "Maria".
Note: The assignment operator <-
is
also considered a function. It is a ‘store’ function that assigns
information to an object. The arrow <-
is the most
common, but =
can also be used in some contexts.
You can overwrite a new value to the same object name. When you
assign again, the previous content is replaced.
name <- "Anna" #Now name replaces the previous information and holds the string "Anna".
Why Overwriting is Useful
As your analysis becomes more complicated, you often build your
results step-by-step.
Instead of creating dozens of different object names, you can
reuse the same object name to store updated versions of your data or
results.
This keeps your environment clean and your code easier to
read.
Pipes
This is used to chain steps together in a readable way. Instead of
nesting functions, you move step-by-step like a recipe.
mynumbers <- (c(1,2,3)) #storing the numbers 1, 2 and 3 in the object called "mynumbers"
mynumbers |> mean() #take the object mynumbers and pipe it into the mean function
## [1] 2
Note: In R, c()
stands for “combine” or
“concatenate”. The role of c()
is to combine the values
inside it into a vector — a basic data structure in R. It takes the
individual numbers 1, 2, and 3 and creates a single vector: 1 2 3. You
can think of c()
as “gluing” elements together into one
group. You will learn more about data types and structures tomorrow.
Packages and Libraries
Collections of R functions are stored in Packages. In order to use a
specific function we need to install the package that contains that
function.
Tidyverse
When we talk about “Base R” we refer to the original functions and
syntax included with R—no extra packages needed. Base R contains many
functions like read.csv()
, mean()
,
subset()
, and plot()
. It can be very powerful
and flexible but sometimes it is less intuitive for beginners.
We will install Tidyverse, which is a collection of
packages designed to make data analysis easier and more consistent.
Think of the tidyverse as a toolbox that gives you simple and
readable functions for the most common steps in working with data:
- Importing data (e.g., readr, readxl)
- Cleaning and transforming data (e.g., dplyr, tidyr)
- Visualizing data (e.g., ggplot2)
- Working with strings (e.g., stringr) or dates (e.g., lubridate)
All these packages follow the same logic and syntax, so once you
learn one, the others feel familiar too. For additional information,
visit the tidyverse info
page
We will be using packages from Tidyverse later today and
tomorrow…
Install Package
To install a package we use the function
install.packages()
.
#install.packages("tidyverse")
Load Libraries
Packages are stored in libraries. Once a package is installed, we
need to call the library with the function library()
.
library(tidyverse)
## Warning: package 'purrr' was built under R version 4.3.3
## Warning: package 'lubridate' was built under R version 4.3.3
Note that the package name needs to be in quotations when installing
the package, but not when loading the library.
Packages only need to be installed once.
Libraries need to be loaded in each work session.
Remember the Tidyverse Data Science Workflow? Today we will be
focusing on the first two steps:
Source
Read Data
Read a csv file
To import a csv file we can use the read_csv()
function
and assign it to a new object we will call js_data. We create a
new object to be able to call it in different functions later on.
js_data <- read_csv("data/timeuse_day1_na.csv")
Listing Column Names
To ask for a list of all the column names in our dataset we can use
the names()
function.
names(js_data)
## [1] "PUMFID" "AGEGR10" "SEX" "MARSTAT" "PRV" "LUC_RST"
## [7] "EHG_ALL" "GTU_110" "GTU_130" "DUR01" "DUR05" "DUR06"
## [13] "DURS200" "DURL313" "DUR08" "DUR13" "DUR14" "DUR15"
## [19] "MRW_20" "MRW_30" "MRW_40" "MRW_D40A" "MRW_D40B" "EDM_02"
## [25] "TST_01" "TCS_110" "TCS_120" "TCS_150" "TCS_200"
Notice that the column names from the original dataset don’t provide
a clear description of what the variable is. We will change the column
names later to facilitate working with our data in the future.
Head Function
The head function will display the top rows of the dataset. It will
include information about the default data type assigned to each column.
You will learn more about data types tomorrow.
head(js_data)
PUMFID
|
AGEGR10
|
SEX
|
MARSTAT
|
PRV
|
LUC_RST
|
EHG_ALL
|
GTU_110
|
GTU_130
|
DUR01
|
DUR05
|
DUR06
|
DURS200
|
DURL313
|
DUR08
|
DUR13
|
DUR14
|
DUR15
|
MRW_20
|
MRW_30
|
MRW_40
|
MRW_D40A
|
MRW_D40B
|
EDM_02
|
TST_01
|
TCS_110
|
TCS_120
|
TCS_150
|
TCS_200
|
10000
|
5
|
1
|
5
|
46
|
1
|
3
|
1
|
1
|
510
|
60
|
120
|
770
|
90
|
0
|
0
|
0
|
0
|
NA
|
1
|
1
|
1
|
2
|
NA
|
8
|
2
|
2
|
2
|
2
|
10001
|
5
|
1
|
1
|
59
|
1
|
4
|
3
|
4
|
420
|
150
|
0
|
0
|
0
|
0
|
0
|
0
|
0
|
NA
|
2
|
1
|
1
|
2
|
NA
|
1
|
2
|
2
|
2
|
2
|
10002
|
4
|
2
|
1
|
47
|
1
|
5
|
1
|
6
|
570
|
0
|
0
|
630
|
30
|
480
|
0
|
0
|
0
|
NA
|
NA
|
NA
|
1
|
1
|
NA
|
7
|
2
|
1
|
1
|
1
|
10003
|
6
|
2
|
5
|
35
|
1
|
4
|
2
|
4
|
510
|
10
|
45
|
875
|
80
|
20
|
0
|
0
|
0
|
NA
|
NA
|
NA
|
1
|
1
|
NA
|
1
|
2
|
2
|
2
|
2
|
10004
|
2
|
1
|
6
|
35
|
1
|
NA
|
1
|
3
|
525
|
90
|
40
|
815
|
0
|
0
|
0
|
0
|
0
|
NA
|
NA
|
NA
|
2
|
2
|
NA
|
1
|
2
|
2
|
2
|
2
|
10005
|
1
|
1
|
6
|
35
|
1
|
1
|
1
|
6
|
435
|
0
|
0
|
430
|
40
|
530
|
0
|
0
|
0
|
NA
|
NA
|
NA
|
1
|
1
|
NA
|
2
|
2
|
1
|
1
|
2
|
Viewing Data
To visualize the full dataset we use the View()
function. This will open our dataset in a separate window.
View(js_data)
Change Column Names
We mentioned earlier that we wanted to work with column names that
were more descriptive of the content of each variable. To change column
names we can use the function rename()
.
The function rename()
is part of one of the packages
that was installed with tidyverse.
Type the following code to change the column name from “PUMFID” to
“id”
js_data <- js_data |>
rename ("id" = "PUMFID")
Did it work?
Your Turn!
Now, to change the rest of the column names copy the following code.
(click show to see the code)
js_data <- js_data |>
rename ("ageGrp" = "AGEGR10",
"sex" = "SEX",
"maritalStat" = "MARSTAT",
"province" = "PRV",
"popCenter" = "LUC_RST",
"eduLevel" = "EHG_ALL",
"feelRushed" = "GTU_110",
"extraTime" = "GTU_130",
"durSleep" = "DUR01",
"durMealPrep" = "DUR05",
"durEating" = "DUR06",
"durAlone" = "DURS200",
"durDriving" = "DURL313",
"durWork" = "DUR08",
"durShoolSite" = "DUR13",
"durSchoolOnline" = "DUR14",
"durStudy" = "DUR15",
"mainStudy" = "MRW_20",
"mainJobHunting" = "MRW_30",
"mainWork" = "MRW_40",
"worked12m" = "MRW_D40A",
"workedWeek" = "MRW_D40B",
"enrollStat" = "EDM_02",
"dailyTexts" = "TST_01",
"timeSlowDown" = "TCS_110",
"timeWorkaholic" = "TCS_120",
"timeNotFamFriends" = "TCS_150",
"timeWantAlone" = "TCS_200")
Use the functionnames(data)
to display the column
names.
names(js_data)
## [1] "id" "ageGrp" "sex"
## [4] "maritalStat" "province" "popCenter"
## [7] "eduLevel" "feelRushed" "extraTime"
## [10] "durSleep" "durMealPrep" "durEating"
## [13] "durAlone" "durDriving" "durWork"
## [16] "durShoolSite" "durSchoolOnline" "durStudy"
## [19] "mainStudy" "mainJobHunting" "mainWork"
## [22] "worked12m" "workedWeek" "enrollStat"
## [25] "dailyTexts" "timeSlowDown" "timeWorkaholic"
## [28] "timeNotFamFriends" "timeWantAlone"
Save your work
Saving in R format (RData) will preserve data types and metadata
assigned to the dataset. The text format (csv) will be the ideal format
to share the data.
save(js_data, file="data/timeuse_day2.RData")
write_csv(js_data, file="data/timeuse_day2.csv")
Upload to OSF
At the end of each work session, remember to save your data as .RData
and .csv, and also your RMarkdown file (.Rmd). We will upload those
files to OSF.

---
title: "First steps in R"
pagetitle: "First steps in R"
output:
  html_document:
    code_folding: show # allows toggling of showing and hiding code. Remove if not using code.
    code_download: true # allows the user to download the source .Rmd file. Remove if not using code.
    includes:
      after_body: footer.html # include a custom footer.
    toc: true
    toc_depth: 3
    toc_float:
      collapsed: false
      smooth_scroll: false
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(message = FALSE, warnings = FALSE)
```

## Introduction

:::ibntro
RMarkdown lets you combine R code and text in one document to create dynamic, reproducible reports.
:::

By embedding code directly within your written explanations, RMarkdown ensures that your analysis, results, and visuals are automatically updated if the data or code changes—making it easy for others (and your future self!) to understand, verify, and rerun your work exactly as it was done.

In this short tutorial, we will learn:

- How to use R code within a RMarkdown document
- How to import and explore our dataset
- How to recode column names 


## Intro to Data in R

### Basic Syntax of R 

The most important components of an R script are **objects** and **functions**. Objects store information and functions are used to manipulate the data. 

**Assignment operators**, **pipes** and **arguments** are used to link objects and functions and communicate what we want to do.

![](images/day2_RSyntax.png)


#### Objects
An object is anything you create and name in R. It can be a number, a dataset, a function, or even a plot. Objects take on content from everything to the right of the assignment operator.

```{r}
x <- 5 #x is now an object that holds the value 5
b <- "Anna" #b is now an object that holds the character Anna
```
**Notes:** 

 - Since Anna is a character, it needs to be wrapped in quotations (e will learn more about data types tomorrow).
 
 - The symbol `#` is used within a code chunk to insert comments. Comments won't affect how the code runs, but text that is not signaled as a comment will generate errors.

#### Functions
Functions are a set of instructions that accomplish a task. They are often (though not always) instructions to be performed on an argument. Functions do something—like calculate, sort, or plot. You call them by their name and add parentheses.

#### Arguments
Arguments are the details you give a function so it knows what to do. They go inside the parentheses of a function. Let's take a look at the function `mean()`

:::question
Type `mean` in the Help tab of the bottom left panel. The results will provide a description of the function, including its arguments.
:::


**Argument** |**What it means** | **Default Value**
|:------|:------|:-------|
| x | x is an R object that contains the numbers we want to find the mean of|No default, required|
| trim | A number between 0 and 0.5. Removes a fraction of highest and lowest values before computing mean (useful if you want a trimmed mean that ignores outliers). | 0 means no trimming |
| na.rm | Indicates whether NA values should be included or not in the calculation| TRUE if NA values should be removed, FALSE if not |



#### Assignment Operators
This is how you store a value in R. It’s like saying: “Let this name hold this value.” It assigns content from the objects/functions/arguments on its right to the object on its left.

```{r}
name <- "Maria" #Now name holds the string "Maria".
```

**Note:** The assignment operator `<-` is also considered a function. It is a 'store' function that assigns information to an object. The arrow `<-` is the most common, but `=` can also be used in some contexts.

You can overwrite a new value to the same object name. When you assign again, the previous content is replaced.

```{r}
name <- "Anna" #Now name replaces the previous information and holds the string "Anna".
```

**Why Overwriting is Useful**

 - As your analysis becomes more complicated, you often build your results step-by-step.
 
 - Instead of creating dozens of different object names, you can reuse the same object name to store updated versions of your data or results.
 
 - This keeps your environment clean and your code easier to read.


#### Pipes
This is used to chain steps together in a readable way. Instead of nesting functions, you move step-by-step like a recipe.

```{r}
mynumbers <- (c(1,2,3)) #storing the numbers 1, 2 and 3 in the object called "mynumbers"
mynumbers |> mean() #take the object mynumbers and pipe it into the mean function
```
**Note:** In R, `c()` stands for “combine” or “concatenate”. The role of `c()` is to combine the values inside it into a vector — a basic data structure in R. It takes the individual numbers 1, 2, and 3 and creates a single vector: 1 2 3. You can think of `c()` as “gluing” elements together into one group. You will learn more about data types and structures tomorrow.

## Packages and Libraries

Collections of R functions are stored in Packages. In order to use a specific function we need to install the package that contains that function.

### Tidyverse

When we talk about "Base R" we refer to the original functions and syntax included with R—no extra packages needed. Base R contains many functions like `read.csv()`, `mean()`, `subset()`, and `plot()`. It can be very powerful and flexible but sometimes it is less intuitive for beginners.

We will install **Tidyverse**, which is a collection of packages designed to make data analysis easier and more consistent.

Think of the tidyverse as a toolbox that gives you simple and readable functions for the most common steps in working with data:

 - Importing data (e.g., readr, readxl)
 - Cleaning and transforming data (e.g., dplyr, tidyr)
 - Visualizing data (e.g., ggplot2)
 - Working with strings (e.g., stringr) or dates (e.g., lubridate)

All these packages follow the same logic and syntax, so once you learn one, the others feel familiar too.
For additional information, visit the <a href="https://www.tidyverse.org/">tidyverse info page</a>

We will be using packages from Tidyverse later today and tomorrow...

### Install Package
To install a package we use the function `install.packages()`. 

```{r}
#install.packages("tidyverse")
```

### Load Libraries
Packages are stored in libraries. Once a package is installed, we need to call the library with the function `library()`.

```{r}
library(tidyverse)
```

:::flag
Note that the package name needs to be in quotations when installing the package, but not when loading the library.

Packages only need to be installed once.

Libraries need to be loaded in each work session.
:::


Remember the Tidyverse Data Science Workflow? Today we will be focusing on the first two steps:

![](images/day2_workflow.png)
<a href="https://telapps.london.edu/analytics_with_R/tidyverse.html">Source</a>

## Read Data

### Read a csv file
To import a csv file we can use the `read_csv()` function and assign it to a new object we will call *js_data*. We create a new object to be able to call it in different functions later on.
```{r}
js_data <- read_csv("data/timeuse_day1_na.csv")
```

### Read Other Formats
In the example we are working with the data is stored in a csv file. The package **readr** from Tidyverse can also read other formats like `read_tsv()`(tab-separated values), `read_delim()`(delimited files CSV and TSV), `read_table()`(whitespace-separated files), `read_log()`(web log files).

There are other functions and packages that allow us to import different file types. 

**File Type** |**Function** | **Package**
|:------|:------|:-------|
|.csv | `read_csv()`| readr |
| .xlsx | `read.xlsx()`| xlsx |
| .sav | `read_sav()`| haven |
| .sas7bdat , .sas7bcat | `read_sas()`| haven |
| .dta | `read_dta()`| haven |

## Listing Column Names
To ask for a list of all the column names in our dataset we can use the `names()` function.
```{r}
names(js_data)
```

Notice that the column names from the original dataset don't provide a clear description of what the variable is. We will change the column names later to facilitate working with our data in the future.

## Head Function
The head function will display the top rows of the dataset. It will include information about the default data type assigned to each column. You will learn more about data types tomorrow.

```{r, data-isolation, results = FALSE}
head(js_data)
```

```{r, echo = FALSE, message = FALSE, warning=FALSE}
library(kableExtra)
head(js_data)|>
  kbl() |>
  #kable_styling(bootstrap_options = "striped")
kable_paper() %>%
  scroll_box(width = "500px", height = "200px")

```

## Viewing Data
To visualize the full dataset we use the `View()` function. This will open our dataset in a separate window.
```{r, eval=FALSE}
View(js_data)
```


## Change Column Names
We mentioned earlier that we wanted to work with column names that were more descriptive of the content of each variable. To change column names we can use the function `rename()`.

The function `rename()` is part of one of the packages that was installed with tidyverse.

:::walkthrough
Type the following code to change the column name from "PUMFID" to "id"
```{r}
js_data <- js_data |>
  rename ("id" = "PUMFID")
```
Did it work?
:::

## Your Turn!

:::question
Now, to change the rest of the column names copy the following code. (click show to see the code)

```{r, class.source = 'fold-hide'}
js_data <- js_data |>
  rename ("ageGrp" = "AGEGR10",
          "sex" = "SEX",
          "maritalStat" = "MARSTAT",
          "province" =  "PRV",
          "popCenter" = "LUC_RST",
          "eduLevel" = "EHG_ALL",
          "feelRushed" = "GTU_110",
          "extraTime" = "GTU_130",
          "durSleep" = "DUR01",
          "durMealPrep" = "DUR05",
          "durEating" = "DUR06",
          "durAlone" = "DURS200",
          "durDriving" = "DURL313",
          "durWork" = "DUR08",
          "durShoolSite" = "DUR13",
          "durSchoolOnline" = "DUR14",
          "durStudy" = "DUR15",
          "mainStudy" = "MRW_20",
          "mainJobHunting" = "MRW_30",
          "mainWork" = "MRW_40",
          "worked12m" = "MRW_D40A",
          "workedWeek" = "MRW_D40B",
          "enrollStat" = "EDM_02",
          "dailyTexts" = "TST_01",
          "timeSlowDown" = "TCS_110",
          "timeWorkaholic" = "TCS_120",
          "timeNotFamFriends" = "TCS_150",
          "timeWantAlone" = "TCS_200")
```

:::

:::question
Use the function`names(data)` to display the column names.
```{r}
names(js_data)
```
:::


## Save your work
Saving in R format (RData) will preserve data types and metadata assigned to the dataset. 
The text format (csv) will be the ideal format to share the data.

```{r}
save(js_data, file="data/timeuse_day2.RData")
```

```{r}
write_csv(js_data, file="data/timeuse_day2.csv")
```

## Upload to OSF

At the end of each work session, remember to save your data as .RData and .csv, and also your RMarkdown file (.Rmd). We will upload those files to OSF.

![](images/osf/osfUpload.gif)



