To submit this assignment, upload the full document on blackboard, including the original questions, your code, and the output. Submit both your .Rmd source file and a knitted .html version of the same file.

1. Ricker model (4 marks).

The Ricker model is a stochastic model for growth of a population. Its two parameters are the death rate $$b$$ and growth rate $$r$$ of the population. The carrying capacity of the system is $$r/b$$.

$$N_{t+1} = N_t e^{r - bN_t + \xi_t}$$

1. Use the following code to simulate a dataset of the number of fish returning to spawn each year. Create a point and line plot with time on x-axis and the y on the y-axis using ggplot2. Include a horizontal line at the carrying capacity and name the axis labels something more meaningful (0.5 marks).
# Simulating data for the Ricker Model function
simulate_data <-
function(initial_condition = 1000,
growth_rate = 2.0,
carrying_capacity = 2000,
noise = 0.75,
timesteps = 100) {

# So random number generator is consistent
set.seed(2100)

death_rate <- growth_rate / carrying_capacity

y <- numeric(timesteps + 1) # empty vector of population sizes
y[1] <- initial_condition # set first element to a initial condition
for (i in 2:(timesteps + 1)) {
y[i] <- y[i - 1] * exp(rnorm(n = 1, growth_rate - death_rate * y[i - 1], noise))
}

data <- data.frame(time = seq(1, length(y), by = 1), y = y)
return(data)
}

# Simulate data:
ricker_df <- simulate_data()
1. Use least squares to fit the simulated data to the model (hint: check with lecture 10 material, fitting models with two parameters, for similar code). Follow these steps. (1.5 marks)

1. Make a vector of $$N_t$$ and $$N_{t+1}$$, so that $$N_t$$ is the independent variable and $$N_{t+1}$$ is the data you are fitting
2. Create a function ricker_pred_fn that: a) has these two arguments, r, b, in that order; b) calculates $$N_{t+1}$$ given $$N_t$$, according to the Ricker model equation; c) subtracts the $$N_{t+1}$$ from the predicted Ricker model to create the residual; d) and returns the squared residual.
3. Create two sequence of rvals (for $$r$$) ranging from 0.5 to 3 and bvals (for $$b$$) ranging from 0.0001 to 0.0015.
4. Either use a for loop or use the expand.grid + mapply combination (check lecture 10 material) to apply every combination of rvals and bvals to the ricker_pred_fn function.
5. Plot the resulting matrix using wireframe.
6. Use which to extract the best-fitting values of $$r$$ and $$b$$.
2. Rearrange the Ricker model formula so that it has the regression form $$y = a + Xb + e$$, where $$y = ln(N_{t+1}/N_t)$$ (hint: taking the logarithm cancels out the exponential $$e$$). Write down the formula below. What are $$a$$, $$b$$, $$X$$, and $$e$$ equivalents in the Ricker model equation? (0.5 mark).

3. Create a new data frame called ricker_df2 with two columns $$y$$ and $$x$$ based on the data from ricker_df, with the new variables in the form of the equation in 1c) above (hint: reuse the variables from 1b). Use ricker_df2 to fit a linear regression model of the rearranged Ricker equation using glm. Print out a tidy version of the model results, keeping only the term, estimate, confidence interval, and p-value. What are the estimates for $$r$$ and $$b$$? Are they the same as the constants used from 1a)? (1 mark)

4. Is there a difference between the fit using glm and the fit using least squares? Comment on why or why not there may be a difference. Is there an advantage to using one method over the other? (0.5 mark)

1. Generalized Linear Models (GLM). Run the below code chunk. (4 marks)
library(tidyverse)
co2_df <- as_data_frame(as.matrix(CO2)) %>%
mutate(conc = as.integer(conc),
uptake = as.numeric(uptake))
1. Look through the help documentation (?CO2) to understand what each variable means. Which variable(s) do you think would be the $$y$$ in the GLM model? Which variable(s) would be the $$x$$? Briefly defend these choices. (0.25 marks)

2. Create at most two plots that explore the data to eventually justify the final variables you will use in the glm model. The order and type of plots shown should highlight your thought process for your choice of variables in the model. (1 mark)

3. Write up the GLM equation, substituting in your final variables that you will use. You can write it as a comment in a code chunk, as plain text, or as a LaTeX equation (same as how we’ve shown it in class). (0.25 mark)

4. Run the glm model, tidy it up, and keep only the term, estimate, confidence interval, and p-value. (optional: if you want to create a table in the html output, use the knitr::kable() function) (1 mark)

5. Insert the numbers from the glm output into the GLM equation. (0.25 mark)

6. How much does uptake change if conc goes up by 10 mL/L? (Note: it is intentional that there is no mention of the other variables in the model.) Write out the interpretation as a simple statement of this contribution of conc on uptake, when the other variables are also in the model. (0.25 mark)

7. In regression modelling, there is a concept known as “interaction terms”. What this means is that the influence a variable has on the dependent variable ($$y$$) changes when another variable also changes. For example, in mammals, body temperature and air temperature are not related to each other, while in reptiles, body temperature is related to air temperature. So the regression equation would be:

$BodyTemp = a + b_1 AirTemp + b_2 Reptile + b_3 (AirTemp \times Reptile)$

In glm, the formula would look like: BodyTemp ~ AirTemp + Reptile + AirTemp:Reptile. Do you think there is a potential interaction in the co2_df dataset? If so then: write up the glm formula; run the model; tidy it up; and keep the term, estimate, confidence interval, and p-value. Print this model output. Now, insert in the numbers in the GLM formula and write out the interpretations of the different conditions of the variables. Remember, character data is converted into (in this dataset) 0 and 1 in the regression equation. If you don’t think there are interactions, justify why. (1 marks)