FJUEL workshop
Dynamic documents in R: Introduction to Quarto

Session 2: Exploring dataframes, creating tables and graphs

Session 2: Learning goals

Learning goals

  • Learn the basics of working with data frames in R
  • Understand what piping means
  • Create tables using the {dplyr} package
  • Create graphs using {ggplot2}

Action: Explore PhDPublications dataset

Get data

  • Install the AER package (that’s where the data are)
install.packages("AER")

Load the packages AER and tidyverse:

library(AER)
library(tidyverse)
  • Run following code to be able to use the data set:
data(PhDPublications)
  • Data set appears as an object in your workspace (top right panel)

Save typing

  • Assign data frame to a new object called d (= copy it)
  • Assignment operator:
    • <-
    • shortcut: Alt + - (Mac: Option + -)
d <- PhDPublications
  • RStudio also offers code completion: Tab key
  • Try it: Type Ph and hit Tab key

Inspect contents of data frame

  • Contents: str()
  • Look at first few rows: head()

Inspect contents of data frame

  • Variables
    • articles # articles published during last 3 years of PhD
    • gender
    • married
    • kids # of children less than 6 years old
    • prestige prestige of graduate program
    • mentor # articles published by mentor

Inspect contents of data frame

'data.frame':   915 obs. of  6 variables:
 $ articles: int  0 0 0 0 0 0 0 0 0 0 ...
 $ gender  : Factor w/ 2 levels "male","female": 1 2 2 1 2 2 2 1 1 2 ...
 $ married : Factor w/ 2 levels "no","yes": 2 1 1 2 1 2 1 2 1 2 ...
 $ kids    : int  0 0 0 1 0 2 0 2 0 0 ...
 $ prestige: num  2.52 2.05 3.75 1.18 3.75 ...
 $ mentor  : int  7 6 6 3 26 2 3 4 6 0 ...
 - attr(*, "datalabel")= chr "Academic Biochemists / S Long"
 - attr(*, "time.stamp")= chr "30 Jan 2001 10:49"
 - attr(*, "formats")= chr [1:6] "%9.0g" "%9.0g" "%9.0g" "%9.0g" ...
 - attr(*, "types")= int [1:6] 98 98 98 98 102 98
 - attr(*, "val.labels")= chr [1:6] "" "sexlbl" "marlbl" "" ...
 - attr(*, "var.labels")= chr [1:6] "Articles in last 3 yrs of PhD" "Gender: 1=female 0=male" "Married: 1=yes 0=no" "Number of children < 6" ...
 - attr(*, "version")= int 6
 - attr(*, "label.table")=List of 6
  ..$ marlbl: Named num [1:2] 0 1
  .. ..- attr(*, "names")= chr [1:2] "Single" "Married"
  ..$ sexlbl: Named num [1:2] 0 1
  .. ..- attr(*, "names")= chr [1:2] "Men" "Women"
  ..$       : NULL
  ..$       : NULL
  ..$       : NULL
  ..$       : NULL

Structure your Quarto document

  • Use headings
    • ### R setup (load packages)
    • ### Data (load data)
    • ### Initial data analysis (tables, graphs)
  • Keyboard shortcut to insert code chunk:
    • Ctrl + Alt + I (Mac: Command + Option + I)

Create tables

R package: dplyr

  • Part of the tidyverse
  • Piping
    • |>
    • This code means “and then”
    • Shortcut: Strg + Shift + M (Mac: Cmd + Shift + M)

How many male and female PhD students?

  • Useful functions
    • group_by() divide the dataset
    • summarize() summarize subsets
    • n() count how many there are
d |>
  group_by(gender) |> 
  summarise(
    N = n())
# A tibble: 2 × 2
  gender     N
  <fct>  <int>
1 male     494
2 female   421

Task

  • Create tables to answer the following questions
  • How many PhD students…
    • … are married?
    • … have no kids?

Number of articles by kid count

  • mean(), median(), sd(), max()
d |> 
  group_by(kids) |> 
  summarise(
    N = n(),
    mean_pubs = mean(articles),
    median_pubs = median(articles),
    sd_pubs = sd(articles),
    max_pubs = max(articles))
# A tibble: 4 × 6
   kids     N mean_pubs median_pubs sd_pubs max_pubs
  <int> <int>     <dbl>       <dbl>   <dbl>    <int>
1     0   599     1.72            1   1.93        19
2     1   195     1.76            1   2.05        12
3     2   105     1.54            1   1.74        11
4     3    16     0.812           1   0.911        3

Create graphs

R package ggplot2

  • Part of the tidyverse
  • “Grammar of Graphics”

How many PhD students have (how many) kids?

d |> ggplot(aes(x = kids)) + 
    geom_bar()

How many PhD students are married?

d |> ggplot(aes(x = married)) + 
    geom_bar()

Distribution: Number of articles published by mentor

d |> ggplot(aes(x = mentor)) + 
    geom_histogram()

Explore themes

d |> ggplot(aes(x = mentor)) + 
    geom_histogram() +
    theme_minimal()

Session 2: Practice

Time for practice! The tasks are available here. (there is also a link on my website)