To submit this assignment, upload the full document on blackboard, including the original questions, your code, and the output. Submit your assignment as a knitted
```r library(tidyverse) library(lme4) library(lmerTest) Santangelo_data <- "https://uoftcoders.github.io/rcourse/data/Santangelo_JEB_2018.csv" download.file(Santangelo_data, "Santangelo_JEB_2018.csv") Santangelo_data <- read_csv("Santangelo_JEB_2018.csv", col_names = TRUE) glimpse(Santangelo_data) ``` ``` ## Observations: 792 ## Variables: 11 ## $ Genotype <chr> "173-1", "173-1", "173-1", "173-1", "173-1", "173-1", "… ## $ Pollination <chr> "Open", "Open", "Open", "Supp", "Supp", "Supp", "Open",… ## $ Herbivory <chr> "Reduced", "Ambient", "Reduced", "Ambient", "Ambient", … ## $ HCN <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes",… ## $ Flower.date <dbl> 86, 23, 12, 25, 16, 38, 22, 61, 32, 25, 30, 33, 54, 23,… ## $ Avg.Bnr.Wdth <dbl> 3.32, NA, 3.02, 3.14, 2.76, NA, 3.11, 3.07, 3.16, 3.16,… ## $ Avg.Bnr.Ht <dbl> 6.50, NA, 7.02, 6.03, 5.20, NA, 5.69, 6.18, 6.04, 5.94,… ## $ Biomass.plant <dbl> 9.27, 8.34, 7.74, 15.76, 4.16, 7.75, 48.67, 10.32, 28.8… ## $ Num.flwrs <dbl> 52.5, 67.0, 57.5, NA, NA, 47.5, 60.0, 58.5, 66.5, 64.5,… ## $ Total.Inf <dbl> 4, 1, 21, 9, 0, 15, 41, 19, 17, 23, 129, 46, 35, 35, 22… ## $ Block <chr> "E", "A", "E", "A", "A", "A", "D", "F", "C", "C", "A", … ``` ```r head(Santangelo_data) ``` ``` ## # A tibble: 6 x 11 ## Genotype Pollination Herbivory HCN Flower.date Avg.Bnr.Wdth Avg.Bnr.Ht ## <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> ## 1 173-1 Open Reduced Yes 86 3.32 6.5 ## 2 173-1 Open Ambient Yes 23 NA NA ## 3 173-1 Open Reduced Yes 12 3.02 7.02 ## 4 173-1 Supp Ambient Yes 25 3.14 6.03 ## 5 173-1 Supp Ambient Yes 16 2.76 5.2 ## 6 173-1 Supp Ambient Yes 38 NA NA ## # … with 4 more variables: Biomass.plant <dbl>, Num.flwrs <dbl>, ## # Total.Inf <dbl>, Block <chr> ``` a. Model selection works best when there are no missing values in your dataset. We will be identifying the best model that predict variation in flowering time (`Flower.date`) across plants. Create a dataset that excludes rows where there is missing flowering date data. (0.25 marks) b. We want to know how HCN, herbivores, and pollinators influence flowering date. We also think that the effect of herbivores and pollinators on flowering data might depend on whether the plant is producing HCN. Create a model that includes fixed-effects that test these predictions. Be sure to account for variation due to `Genotype`, `Block` and the `Genotype` by `Herbivory` interactions by including these terms as random effects. This will be our saturated model. You can ignore `boundary (singular) fit` warnings that may arise. (1 marks) c. We will generate a reduced model from the saturated model in (a). Should we use AIC or AIC~c~. Why? Show your calculation. (0.5 marks) d. Using the approach described in lecture 11, optimize the random effect structure of this model. Show the AIC/AIC~c~ output for each model of varying random effect strucure. Provide a one sentence justification for each random effect the model, justifying whether it is fixed (i.e., in every model) or whether some models will drop this effect. Describe in one sentence what the optimal random effect structure of the model is and why. (0.5 marks) e. Using the model with the optimal random-effect structure identified in (c), find the optimal fixed-effect structure. Be sure to show all the models and their AIC/AIC~c~ scores. (1.5 mark) f. Based on the AIC/AIC~c~ output from (d), generate your final model with both optimized fixed and random effects. Summarize the model and interpret its output. Is there a significant effect of any treatment? If so, which one(s) and in which direction. Make a statement about the significant treatments' effects on flowering date. Use the model's output to support your answer. You only need to interpret significant main effects here (i.e. not interactions). (0.75 marks) g. Do you think we were justified in interpreting a single model? Why or why not? What alternative approach could we have used? (0.25 marks). h. Use `dplyr` and `ggplot2` to plot the flowering date of plants by the _main_ effect that showed a significant effect in the optimized model above. The figure should show the mean plus and minus a single standard error of the mean. Suggest one biological interpretation of the pattern you see in the figure and in the model (i.e. why do you think this would happen). If there are no significant effects in the model, simply write "There are no significant effects!". (0.25 marks)
Here are some relevant snippets of code taken from the lecture notes to get you started on this exercise.
library(lavaan) kenya.wide <- read.csv("kenya.wide.csv", header=TRUE, sep=",") kenya.pca <- kenya.wide %>% dplyr::select(arabiensis, gambiae, funestus, merus) %>% #choose relevant columns mutate_all(sqrt) %>% #this is the Hellinger transformation prcomp(.) #pipe directly into baseR function for PCA axes <- data.frame(kenya.pca$x) kenya.wide <- bind_cols(kenya.wide, axes) kenya.wide$s.abun <- log(kenya.wide$total.abundance) kenya.wide$s.sr <- log(kenya.wide$SR) kenya.wide$s.pfpr <- log(kenya.wide$PfPR) sem02 <- ' # regressions l.pfpr ~ a*l.sr + b*l.abun + c*PC2 # correlations l.sr ~~ d*l.abun PC2 ~~ e*l.sr # defined parameters indirect.abun := (a*d) #indirect effect of abundance via SR indirect.abun2 := (d*e*c) #indirect effect of abundance via PC2 total.abun := b + (a*d) + (d*e*c) '
a. Complete the structural equation model by adding in calculations for the indirect and total effects of species richness (SR) and composition (PC2) on malaria prevalence (PfPR). (0.5 marks) b. Evaluate the model, bootstrapping confidence intervals for path coefficients with seed #778. Which predictor had the largest _direct_ effect on malaria prevalence? How about _total_ effect? Briefly explain these effects in plain english (1 sentence each). (1 mark)
This dataset consists of the same information as kenya.wide, with the addition of one new columns for “EIR”.
a. Convert this dataset to the wide format. Fill cells in the wide dataset with the **relative abundance** of each species, and include the columns "total abundance" and "EIR" in the final product. (Hint: use xxxx_join to add the desired columns to the wide dataset after you spread it) (Hint2: pivot_wider() may be easier to use than spread()) (1 marks) b. Construct a series of linear models to investigate the relationship between EIR and i) total mosquito abundance and ii) the abundance of each species. Interpret the results of these models. (Hint: is EIR a simple function of total mosquito abundance, or is there a particular species that is contributing disproportionately to it?) (0.5 mark) c. Investigate the influence of total abundance and community structure (use the first two PC axes) on EIR with a strcutural equation model. Include only direct effects only in this model, and pretend we have reason to believe total abundance is associated with community composition. i. Briefly explain the correlation structure you have chosen for your predictors, total abundance, PC1, and PC2. (0.5 marks) ii. Evaluate the model. Are these results congruent with your findings from part (a)? (0.5 marks)
This work is licensed under a Creative Commons Attribution 4.0 International License. See the licensing page for more details about copyright information.