Session 1: R project setup and Quarto documents

Reproducibility
Your data analysis project should be self-contained, which means that everything that is needed to reproduce the results is stored in the project folder.



![]()
**bold***italics*

install.packages()library()## R setup, where you load the packages you need<- (shortcut: Alt + -)function(argument = "...")str()head()mean(), median()PhDPublications datasetAER package (that’s where the data is)d (= copy it)<-Ph and hit Tab keystr()head()articles # articles published during last 3 years of PhDgendermarriedkids # of children less than 6 years oldprestige prestige of graduate programmentor # articles published by mentor'data.frame': 915 obs. of 6 variables:
$ articles: int 0 0 0 0 0 0 0 0 0 0 ...
$ gender : Factor w/ 2 levels "male","female": 1 2 2 1 2 2 2 1 1 2 ...
$ married : Factor w/ 2 levels "no","yes": 2 1 1 2 1 2 1 2 1 2 ...
$ kids : int 0 0 0 1 0 2 0 2 0 0 ...
$ prestige: num 2.52 2.05 3.75 1.18 3.75 ...
$ mentor : int 7 6 6 3 26 2 3 4 6 0 ...
- attr(*, "datalabel")= chr "Academic Biochemists / S Long"
- attr(*, "time.stamp")= chr "30 Jan 2001 10:49"
- attr(*, "formats")= chr [1:6] "%9.0g" "%9.0g" "%9.0g" "%9.0g" ...
- attr(*, "types")= int [1:6] 98 98 98 98 102 98
- attr(*, "val.labels")= chr [1:6] "" "sexlbl" "marlbl" "" ...
- attr(*, "var.labels")= chr [1:6] "Articles in last 3 yrs of PhD" "Gender: 1=female 0=male" "Married: 1=yes 0=no" "Number of children < 6" ...
- attr(*, "version")= int 6
- attr(*, "label.table")=List of 6
..$ marlbl: Named num [1:2] 0 1
.. ..- attr(*, "names")= chr [1:2] "Single" "Married"
..$ sexlbl: Named num [1:2] 0 1
.. ..- attr(*, "names")= chr [1:2] "Men" "Women"
..$ : NULL
..$ : NULL
..$ : NULL
..$ : NULL
### R setup (load packages)### Data (load data)### Descriptive analysis (tables, graphs)dplyrtidyverse|>
group_by() divide the datasetsummarize() summarize subsetsn() count how many there aremean(), median(), sd(), max()d |>
group_by(kids) |>
summarise(
N = n(),
mean_pubs = mean(articles),
median_pubs = median(articles),
sd_pubs = sd(articles),
max_pubs = max(articles))# A tibble: 4 x 6
kids N mean_pubs median_pubs sd_pubs max_pubs
<int> <int> <dbl> <dbl> <dbl> <int>
1 0 599 1.72 1 1.93 19
2 1 195 1.76 1 2.05 12
3 2 105 1.54 1 1.74 11
4 3 16 0.812 1 0.911 3
ggplot2tidyverse