Session 1: Practice

Getting help

To understand how functions work, you can refer to the built-in R help desk. For example, to learn more about the function mean(), type this into the console (lower left panel), then hit Enter:

?mean
  • What does the argument trim = do?
  • The argument na.rm = talks about “NA values”. Make sure you understand what NA means! If necessary, google it (“r na values”)

Working with dataframes

In R, data tables are called data frames. There are several built-in datasets, which can be loaded easily using the function data(). Let’s look at one which records vocabulary test scores based on a simple 10-word test.

The data set is available in the package {car}, so we need to install that package first. Then we load it.

install.packages("car")
library(car)

Now we are ready to load the Vocab data set:

data(Vocab)

It should appear as an object in the top right panel (“Environment”).

Find out more about the data set using R’s help desk:

?Vocab
  • Inspect the contents of the data frame.
    • Each row is a participant - how many are there?
    • How many variables (columns) are there?

You can access specific columns (i.e. variables) in data frames using the dollar sign ($). For instance, by typing Vocab$education you can access the variable education. Try typing the following, which prints the first 1,000 entries in this column:

Vocab$vocabulary

Make sure you use of the code completion functionalities in RStudio. When typing Vocab, type Vo and then hit the tab key, which will open a drop-down menu and offer several options for code completion. Use the arrow buttons to select one. Similarly, when typing Vocab$, R knows you are looking for a column in the data frame, so again use the tab key to save time.

You can now apply functions to this variable. To obtain the mean, for instance, use:

mean(Vocab$education)
  • Add text to the quarto notebook explaining what the code does and what the output is.

  • Render the document and make sure everything looks OK.

Tasks:

  • Find the median number of years of education in the sample.
  • Find the maximum and minimum.
  • Find out what the function range() does.
  • Find the interquartile range, which is a measure of variation. Use Google for help.
  • Find the mean and median vocabulary score
  • Find the standard deviation of the vocabulary scores
  • Add text to the quarto notebook explaining what the code does and what the output is.
  • Render the document and make sure everything looks OK.

Additional tasks (use Google for help)

  • Find the lower and upper quartile for the variable years of education.
  • Use the function table() to see the distribution of
    • Sex: Are there more male or more female participants?
    • Year: What pattern do you notice across years?
    • Years of education: Which count is the most frequent one? Why?
  • Render the document and make sure everything looks OK.