Session 2: Exploring dataframes, creating tables and graphs
{dplyr}
package{ggplot2}
PhDPublications
datasetAER
package (that’s where the data are)Load the packages AER
and tidyverse
:
d
(= copy it)<-
Ph
and hit Tab keystr()
head()
articles
# articles published during last 3 years of PhDgender
married
kids
# of children less than 6 years oldprestige
prestige of graduate programmentor
# articles published by mentor'data.frame': 915 obs. of 6 variables:
$ articles: int 0 0 0 0 0 0 0 0 0 0 ...
$ gender : Factor w/ 2 levels "male","female": 1 2 2 1 2 2 2 1 1 2 ...
$ married : Factor w/ 2 levels "no","yes": 2 1 1 2 1 2 1 2 1 2 ...
$ kids : int 0 0 0 1 0 2 0 2 0 0 ...
$ prestige: num 2.52 2.05 3.75 1.18 3.75 ...
$ mentor : int 7 6 6 3 26 2 3 4 6 0 ...
- attr(*, "datalabel")= chr "Academic Biochemists / S Long"
- attr(*, "time.stamp")= chr "30 Jan 2001 10:49"
- attr(*, "formats")= chr [1:6] "%9.0g" "%9.0g" "%9.0g" "%9.0g" ...
- attr(*, "types")= int [1:6] 98 98 98 98 102 98
- attr(*, "val.labels")= chr [1:6] "" "sexlbl" "marlbl" "" ...
- attr(*, "var.labels")= chr [1:6] "Articles in last 3 yrs of PhD" "Gender: 1=female 0=male" "Married: 1=yes 0=no" "Number of children < 6" ...
- attr(*, "version")= int 6
- attr(*, "label.table")=List of 6
..$ marlbl: Named num [1:2] 0 1
.. ..- attr(*, "names")= chr [1:2] "Single" "Married"
..$ sexlbl: Named num [1:2] 0 1
.. ..- attr(*, "names")= chr [1:2] "Men" "Women"
..$ : NULL
..$ : NULL
..$ : NULL
..$ : NULL
### R setup
(load packages)### Data
(load data)### Initial data analysis
(tables, graphs)dplyr
tidyverse
|>
group_by()
divide the datasetsummarize()
summarize subsetsn()
count how many there aremean()
, median()
, sd()
, max()
d |>
group_by(kids) |>
summarise(
N = n(),
mean_pubs = mean(articles),
median_pubs = median(articles),
sd_pubs = sd(articles),
max_pubs = max(articles))
# A tibble: 4 × 6
kids N mean_pubs median_pubs sd_pubs max_pubs
<int> <int> <dbl> <dbl> <dbl> <int>
1 0 599 1.72 1 1.93 19
2 1 195 1.76 1 2.05 12
3 2 105 1.54 1 1.74 11
4 3 16 0.812 1 0.911 3
ggplot2
tidyverse
Time for practice! The tasks are available here. (there is also a link on my website)