To submit this assignment, upload the full document on blackboard, including the original questions, your code, and the output. Submit your assignment as a knitted .pdf (preferred) or .html file.

  1. In this exercise, we will once again use the data of Santangelo et al. (In press) that you used in assignment 5. Let’s go ahead and load in the data.
library(tidyverse)
library(lme4)
library(lmerTest)

Santangelo_data <- paste0("https://raw.githubusercontent.com/",
                          "James-S-Santangelo/rcourse/",
                          "master/data/Santangelo_JEB_2018.csv")
download.file(Santangelo_data, "Santangelo_JEB_2018.csv")
Santangelo_data <- read_csv("Santangelo_JEB_2018.csv", 
                            col_names = TRUE)
glimpse(Santangelo_data)
## Observations: 792
## Variables: 11
## $ Genotype      <chr> "173-1", "173-1", "173-1", "173-1", "173-1", "17...
## $ Pollination   <chr> "Open", "Open", "Open", "Supp", "Supp", "Supp", ...
## $ Herbivory     <chr> "Reduced", "Ambient", "Reduced", "Ambient", "Amb...
## $ HCN           <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes",...
## $ Flower.date   <int> 86, 23, 12, 25, 16, 38, 22, 61, 32, 25, 30, 33, ...
## $ Avg.Bnr.Wdth  <dbl> 3.32, NA, 3.02, 3.14, 2.76, NA, 3.11, 3.07, 3.16...
## $ Avg.Bnr.Ht    <dbl> 6.50, NA, 7.02, 6.03, 5.20, NA, 5.69, 6.18, 6.04...
## $ Biomass.plant <dbl> 9.27, 8.34, 7.74, 15.76, 4.16, 7.75, 48.67, 10.3...
## $ Num.flwrs     <dbl> 52.5, 67.0, 57.5, NA, NA, 47.5, 60.0, 58.5, 66.5...
## $ Total.Inf     <int> 4, 1, 21, 9, 0, 15, 41, 19, 17, 23, 129, 46, 35,...
## $ Block         <chr> "E", "A", "E", "A", "A", "A", "D", "F", "C", "C"...
head(Santangelo_data)
## # A tibble: 6 x 11
##   Genotype Pollination Herbivory HCN   Flower.date Avg.Bnr.Wdth Avg.Bnr.Ht
##   <chr>    <chr>       <chr>     <chr>       <int>        <dbl>      <dbl>
## 1 173-1    Open        Reduced   Yes            86         3.32       6.5 
## 2 173-1    Open        Ambient   Yes            23        NA         NA   
## 3 173-1    Open        Reduced   Yes            12         3.02       7.02
## 4 173-1    Supp        Ambient   Yes            25         3.14       6.03
## 5 173-1    Supp        Ambient   Yes            16         2.76       5.2 
## 6 173-1    Supp        Ambient   Yes            38        NA         NA   
## # ... with 4 more variables: Biomass.plant <dbl>, Num.flwrs <dbl>,
## #   Total.Inf <int>, Block <chr>
  1. Create a mixed-effects model with banner petal length (Avg.Bnr.Ht) as the response variable and HCN, Pollination, and their interaction as fixed-effect predictors. Be sure to account for variation due to Genotype, Block and the Genotype by Pollination interactions by including these terms as random effects. This will be our saturated model. [Hint: Take a look at lecture 8 for a reminder on how to code random effects using lmer(), especially how interation (i.e. crossed) random-effects are coded]. (0.5 marks)

  2. We will generate a reduced model from the saturated model in (a). Should we use AIC or AICc. Why? Show your calculation. (0.5 marks)

  3. Using the approach described in lecture 11, optimize the random effect structure of this model. Show the AIC/AICc output for each model of varying random effect strucure. Describe in one sentence what the optimal random effect structure of the model is and why. [Hint: Think carefully about which random effects should and should not be removed from the model. See lecture 11 for the logic behind this decision and assignment 5 for a description of the dataset that will help guide this decision]. (1 mark)

  4. Using the model with the optimal random-effect structure identified in (c), find the optimal fixed-effect structure. Be sure to showw all the models and their AIC/AICc scores (1 mark)

  5. Based on the AIC/AICc output from (d), generate a your final model with both optimized fixed and random effects. Summarize the model and interpret its output. Is there a significant effect of any treatment? If so, which one(s) and in which direction (i.e. larger or smaller banner petals in which treatment factor level?). Use the model’s output to support your answer. You only need to interpret significant main effects here (i.e. not interactions). (0.5 marks)

  6. Use dplyr and ggplot2 to plot the banner petal length of plants by the main effect that showed a significant effect in the optimized model above. The figure should show the mean plus and minus a single standard error of the mean. Suggest one biological interpretation of the pattern you see in the figure and in the model (i.e. why do you think this would happen). [Hint: geom_errorbar()]. (0.25 marks)

  7. Do you think we were justified in interpreting a single model? Why or why not? What alternative approach could we have used? (0.25 marks).

  1. Install thecluster package so we can use the dataset “animals”. Use the ?animals to look at the documentation for the dataset, and load the packages below:
#install.packages("cluster")
library(cluster)
library(psych)
## 
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha
head(animals)
##     war fly ver end gro hai
## ant   1   1   1   1   2   1
## bee   1   2   1   1   2   2
## cat   2   1   2   1   1   2
## cpl   1   1   1   1   1   2
## chi   2   1   2   2   2   2
## cow   2   1   2   1   2   2
  1. First, clean the data so that there are no missing values (recall: the principal() function does not handle NA’s well). Then, create a correlation matrix and create an inital principal components matrix without rotation and with the full number of possible factors. Create a scree plot. Based on the eigenvalues and scree plot, decide on an appropriate number of components for your model and justify your decision. (1 mark)

  2. Run your principal components analysis using the number of factors/components you have selected. Interpret the output using the psych package we used in class. Assign each variable to a component. Do the components you’ve come up with make sense to you? What do they have in common, if anything (1 mark)?

  3. Let’s go back to our survey data. If you need to re-download it, use the code below:

# If you need to re-download the survey:
download.file("https://ndownloader.figshare.com/files/2292169", "survey.csv") 
survey <- read_csv("survey.csv")
## Parsed with column specification:
## cols(
##   record_id = col_integer(),
##   month = col_integer(),
##   day = col_integer(),
##   year = col_integer(),
##   plot_id = col_integer(),
##   species_id = col_character(),
##   sex = col_character(),
##   hindfoot_length = col_integer(),
##   weight = col_integer(),
##   genus = col_character(),
##   species = col_character(),
##   taxa = col_character(),
##   plot_type = col_character()
## )
survey %>% mutate_if(is.character, as.factor) -> survey #uses dplyr function to change all character vectors to factors
head(survey)
## # A tibble: 6 x 13
##   record_id month   day  year plot_id species_id sex   hindfoot_length
##       <int> <int> <int> <int>   <int> <fct>      <fct>           <int>
## 1         1     7    16  1977       2 NL         M                  32
## 2        72     8    19  1977       2 NL         M                  31
## 3       224     9    13  1977       2 NL         <NA>               NA
## 4       266    10    16  1977       2 NL         <NA>               NA
## 5       349    11    12  1977       2 NL         <NA>               NA
## 6       363    11    12  1977       2 NL         <NA>               NA
## # ... with 5 more variables: weight <int>, genus <fct>, species <fct>,
## #   taxa <fct>, plot_type <fct>

Choose two appropriate variables to run MANOVA on. Choose two categorical predictor variables. Run the analysis (without follow-up analyses), interpret the results, and explain your choice of variables. (1 mark).

  1. When would we prefer to use MANOVA over ANOVA? When is it not appropriate to use MANOVA? List one example each (1 mark).

This work is licensed under a Creative Commons Attribution 4.0 International License. See the licensing page for more details about copyright information.