Assignment 4: Exploration, linear and mixed-effects models

To submit this assignment, upload the full document on blackboard, including the original questions, your code, and the output. Submit you assignment as a knitted .pdf (prefered) or .html file.

Visualization (3 marks)

Import the tidyverse library. We will be using the same beaver1 dataset that we used in last week’s assignment.
```
library(tidyverse)
```
1. Create a histogram to visualize the distribution of the beavers’ body temperatures, separating the temperature data based on the beaver’s activity level. (after transforming it into a categorical variable the way you did for your last assignment). Describe the properties of the distribution. When creating this plot for the purpose of evaluating temperature, what argument did you adjust and why? (1 mark)
2. What type of variables are temperature and time of day? With this in mind, create a visualization that will help you get a better understanding of the relationship between these variables. (0.5 mark)
3. Create a single box plot to simultaneously visualise temperature, activity, and day. (0.5 mark)
4. What is one prediction you might make about the relationships among your variables (based on the patterns you observed)? Create a visualization that illustrates your prediction, improving on your previous plots in at least one way. State why this plot is an improvement. (1 mark)
Outliers (2 marks)
1. In the beaver1 dataset, there are some particularly high/low body temperature measurements. Give an example of a systematic or random error (state which) that could have influenced these values. (0.5 marks)
2. Consider whether these values would affect your ability to test whether temperature varies by activity level. You should generate plots and/or perform statistical tests with and without these points, and make an informed decision about whether they should be kept or dropped (Hint: you may want to either create a second data set or get creative with colour.) State whether you would remove the points and why. (1.5 marks)
Linear models (3 marks)

Run the following code to load the CO2 dataset.

```r
co2_df <- as_data_frame(as.matrix(CO2)) %>% 
    mutate(conc = as.integer(conc),
           uptake = as.numeric(uptake))
```

```
## Warning: `as_data_frame()` is deprecated, use `as_tibble()` (but mind the new semantics).
## This warning is displayed once per session.
```

a. Look through the help documentation (?CO2) to understand what each
variable means. Imagine you were running a statistical model to assess the
effects of chilling on plant CO2 uptake. What would the $y$ and $x$
variables be in such a model? What about if you were trying to assess the
relationship between ambient CO~2~ concentrations and plant uptake? Briefly
defend these choices. (1 mark)

b. How much does `uptake` change if `conc` goes up by 10 mL/L? Write out the
interpretation as a simple statement of this contribution of `conc` on
`uptake`. How much CO2 would you predict plants to uptake if atmospheric
concentrations were 2,450 mL/L?. Show your work. (2 marks)

Linear mixed-effects models (4 marks).

Santangelo et al. (2018) were interested in understanding how plant defenses, herbivores, and pollinators influence the expression of plant floral traits (e.g. flower size). Their experiment had 3 treatments, each with 2 levels: Plant defense (2 levels: defended vs. undefended), herbivory (2 levels: reduced vs. ambient) and pollination (2 levels: open vs. supplemental). These treatments were fully crossed for a total of 8 treatment combinations. In each treatment combination, they grew 4 individuals from each of 25 plant genotypes for a total of 800 plants (8 treatment combinations x 25 genotypes x 4 individuals per genotype). Plants were grown in a common garden at the Koffler Scientific Reserve (UofT’s field research station) and 6 floral traits were measured on all plants throughout the summer. We will analyze how the treatments influenced one of these traits in this exercise. Run the code chunk below to download the data, which includes only a subset of the columns from the full dataset:
```
library(tidyverse)

plant_data <- "https://uoftcoders.github.io/rcourse/data/Santangelo_JEB_2018.csv"
download.file(plant_data, "Santangelo_JEB_2018.csv")
plant_data <- read_csv("Santangelo_JEB_2018.csv", 
                       col_names = TRUE)
```
```
## Parsed with column specification:
## cols(
##   Genotype = col_character(),
##   Pollination = col_character(),
##   Herbivory = col_character(),
##   HCN = col_character(),
##   Flower.date = col_double(),
##   Avg.Bnr.Wdth = col_double(),
##   Avg.Bnr.Ht = col_double(),
##   Biomass.plant = col_double(),
##   Num.flwrs = col_double(),
##   Total.Inf = col_double(),
##   Block = col_character()
## )
```
```
glimpse(plant_data)
```
```
## Observations: 792
## Variables: 11
## $ Genotype      <chr> "173-1", "173-1", "173-1", "173-1", "173-1", "173-1", "…
## $ Pollination   <chr> "Open", "Open", "Open", "Supp", "Supp", "Supp", "Open",…
## $ Herbivory     <chr> "Reduced", "Ambient", "Reduced", "Ambient", "Ambient", …
## $ HCN           <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes",…
## $ Flower.date   <dbl> 86, 23, 12, 25, 16, 38, 22, 61, 32, 25, 30, 33, 54, 23,…
## $ Avg.Bnr.Wdth  <dbl> 3.32, NA, 3.02, 3.14, 2.76, NA, 3.11, 3.07, 3.16, 3.16,…
## $ Avg.Bnr.Ht    <dbl> 6.50, NA, 7.02, 6.03, 5.20, NA, 5.69, 6.18, 6.04, 5.94,…
## $ Biomass.plant <dbl> 9.27, 8.34, 7.74, 15.76, 4.16, 7.75, 48.67, 10.32, 28.8…
## $ Num.flwrs     <dbl> 52.5, 67.0, 57.5, NA, NA, 47.5, 60.0, 58.5, 66.5, 64.5,…
## $ Total.Inf     <dbl> 4, 1, 21, 9, 0, 15, 41, 19, 17, 23, 129, 46, 35, 35, 22…
## $ Block         <chr> "E", "A", "E", "A", "A", "A", "D", "F", "C", "C", "A", …
```
```
head(plant_data)
```
```
## # A tibble: 6 x 11
##   Genotype Pollination Herbivory HCN   Flower.date Avg.Bnr.Wdth Avg.Bnr.Ht
##   <chr>    <chr>       <chr>     <chr>       <dbl>        <dbl>      <dbl>
## 1 173-1    Open        Reduced   Yes            86         3.32       6.5 
## 2 173-1    Open        Ambient   Yes            23        NA         NA   
## 3 173-1    Open        Reduced   Yes            12         3.02       7.02
## 4 173-1    Supp        Ambient   Yes            25         3.14       6.03
## 5 173-1    Supp        Ambient   Yes            16         2.76       5.2 
## 6 173-1    Supp        Ambient   Yes            38        NA         NA   
## # … with 4 more variables: Biomass.plant <dbl>, Num.flwrs <dbl>,
## #   Total.Inf <dbl>, Block <chr>
```
You can see that the data contain 792 observations (i.e. plants, 8 died during the experiment). There are 50 genotypes across 3 treatments: Herbivory, Pollination, and HCN (i.e. hydrogen cyanide, a plant defense). There are 6 plant floral traits: Number of days to first flower, banner petal length, banner petal width, plant biomass, number of flowers, and number of inflorescences. Finally, since plants that are closer in space in the common garden may have similar trait expression due to more similar environments, the authors included 6 spatial “blocks” to account for this environmental variation (i.e. Plant from block A “share” an environment and those from block B “share” an environment, etc.). Also keep in mind that each treatment combination contains 4 individuals of each genotype, which are likely to have similar trait expression due simply to shared genetics.
1. Use the lme4 and lmerTest R packages to run a linear mixed-effects model examining how herbivores (Herbivory), Pollinators (Pollination), plant defenses (HCN) and all interactions influences the width of banner petals (Avg.Bnr.Wdth) produced by plants while accounting for variation due to spatial block and plant genotype. Also allow the intercept for Genotype to vary across the levels of the herbivory treatment. (1 mark: 0.5 for correct fixed effects specification and 0.5 for correct random effects structure). You only need to specify the model for this part of the question.
2. Summarize (i.e. get the output) the model that you ran in part (a). Did any of the treatments have a significant effect on banner petal length? If so, which ones? Based on your examination of the model output, how can you tell which level of the significant treatments resulted in longer or shorter mean banner petal widths? Make a statement for each significant main effects in the model (i.e. not interactions). If none of the main effects are significant, then simply write “there are no significant main effects in the model” (0.5 marks).
3. Using dplyr and gglot2, plot the mean banner width for one of the significant interactions in the model above (whichever you choose). The idea is to show how both treatments interact to influence the mean length of banner petals using a combination of different colours, linetypes, shapes, etc. on the same plot (i.e., no faceting). Avoid overlapping points in the figure. Also include error bars/bands with one standard error around the mean. As a reminder, I have included the formula to calculate the standard error of the mean below. (1.5 marks).
\[ SE = \frac{sd}{\sqrt{n}} \]
1. After accounting for the fixed effects, what percentage of the variation in banner petal width was explained by each of the random effects in the model? Show yor work. (0.5 marks).
2. Descibe the pattern you see in the figure generated in part (c). Why do you think the interaction you plotted was significant in the model? (0.5 marks)

This work is licensed under a Creative Commons Attribution 4.0 International License. See the licensing page for more details about copyright information.