To submit this assignment, upload the full document on blackboard, including the original questions, your code, and the output. Submit you assignment as a knitted .pdf (preferred) or .html file.

  1. Variable assignment (1 mark)

    1. Assign the value 5 to the variable/object a. Display a. (0.25 marks)

    2. Assign the result of 10/3 to the variable b. Display b. (0.25 marks)

    3. Write a function that adds two numbers and returns their sum. Use it to assign the sum of a and b to result. Display result. (In practice, there is already a more sophisticated built-in function for this: result <- sum(a, b)) (0.25 marks)

    4. Write a function that multiplies two numbers and returns their product. Use it to assign the product of a and b to product. Display product. (In practice, there is already a more sophisticated built-in function for this: product <- prod(a, b)) (0.25 marks)

  2. Vectors (1 mark)

    1. Create a vector v with all integers 0-30, and a vector w with every third integer in the same range. (0.25 marks)

    2. What is the difference in lengths of the vectors v and w? (0.25 marks)

    3. Create a new vector, v_square, with the square of elements at indices 3, 6, 7, 10, 15, 22, 23, 24, and 30 from the variable v. Hint: Use indexing rather than a for loop. (0.25 marks)

    4. Calculate the mean and median of the first five values from v_square. (0.25 marks)

  3. Boolean indexing (1 mark)

    1. Create a boolean vector v_bool, indicating which vector v elements are bigger than 20. How many values are over 20? Hint: In R, TRUE = 1, and FALSE = 0, so you can use simple arithmetic to find this out. (0.5 marks)

    2. Display the output of v[TRUE]. Explain why you think R outputs this. (0.25 marks) (Note: this is not really something you would ever need to do in practice!)

    3. Use the variable v_bool as an index to extract the elements from v that are bigger than 20. What are the min and max values of this new vector? (0.25 marks)

  4. Data frames (2 marks)

    1. There are many built-in data frames in R, which you can find more details about online. What are the column names of the built-in dataframe beaver1? How many observations (rows) and variables (columns) are there? (0.5 marks)

    2. Display both the first 6 and last 6 rows of this data frame. Show how to do so with both indexing as well as specialized functions. (0.5 marks)

    3. What is the min, mean, and max body temperature in this data set? Hint: Remember that each column in a data frame is a vector, so you can use the same functions as in the previous question on vectors. (0.5 marks)

    4. Use the summary function to display an overview of the temp column. (0.25 marks)

    5. Use a single instance of the summary function to display an overview of the time and temp columns. (0.25 marks)

  5. Data frames with dplyr (3 marks)

    1. Say weโ€™re attempting to calculate mean temperature in the beaver1 dataset. What is wrong with the following chain of dplyr commands? (0.5 marks)

      beaver1 %>%
          filter( %>%
          summarise(mean_temp = mean(temp))
    2. Use dplyr to randomly sample 20 rows from beaver1. Calculate mean temperature from this subsetted dataset. (0.5 marks) Hint: you may want to refer to the dplyr cheatsheet for this

    3. Using the full beaver1 dataset, calculate the mean temperature for day 346. (0.25 marks) Note: use the full dataset for parts c-f below as well.

    4. Rather than using filter() to calculate the mean for each day separately, the more convenient group_by() can be used to aggregate measurements by a categorical value (such as the day column in beaver). Use this approach to calculate the mean temperature and activity level for each of the days in the dataset. (0.5 marks)

    5. Express in writing what the average activity level from the above calculation means. Hint: Remember that you can read a description of the columns online. (0.25 marks)

    6. How many observations are there per day in this dataset? (0.25 marks)

    7. How many observations are there per day when the beaver is active outside the retreat? (0.25 marks)

    8. Grouping by activity level and the day of the observation. Which variable seems to be more related to high body temperature: activity level or day of measurement? (0.5 marks)

This work is licensed under a Creative Commons Attribution 4.0 International License. See the licensing page for more details about copyright information.