Lesson preamble

Learning objectives

  • Appreciate bifurcations in models
  • Use least squares to fit a model to data

Lesson outline

Total lesson time: 2 hours

  • Review eigenvalues and eigenvectors (10 minutes)
  • Vectorizing linear systems of differential equations: another perspective on eigenvalues (20 minutes)
  • Bifurcations (20 minutes)
  • Fitting models to data
    • Fitting simulated data for a single parameters (20 minutes)
    • Fitting real data for two parameters (40 minutes)


  • Download this plant biomass dataset and put it in your working directory.
  • install.packages('lattice')
  • install.packages('phaseR')
  • install.packages('deSolve')
  • install.packages('ggplot2') (or tidyverse)
  • install.packages('dplyr') (or tidyverse)
  • install.packages('tidyr') (or tidyverse)

Recap: quantitative analysis of fixed points and stability

Here are the steps to quantitatively analyze a model. These steps are written for a two-dimensional system, but they can be applied to a system of any number of dimensions.

This is a generic two-dimensional system. There are two differential equations, and each is a function of both \(x\) and \(y\).

\[ \frac{dx}{dt} = f(x, y) \\ \frac{dy}{dt} = g(x,y)\]

  1. Fixed points: Find all combinations of \(x\) and \(y\) that satisfy \(f(x, y) = 0\) and \(g(x, y) = 0\).

  2. Stability: Calculate the Jacobian matrix and find its eigenvalues for each fixed point. This is the Jacobian matrix:

\[J=\begin{bmatrix} \frac{\partial f}{\partial x} & \frac{\partial f}{\partial y} \\ \frac{\partial g}{\partial x} & \frac{\partial g}{\partial y} \end{bmatrix}\]

  • Find eigenvalues at each fixed point using the function eigen in R, or by solving the equation \(\text{Det}(A - \lambda \mathbb{1}) = 0\).
  • For a set of eigenvalues (\(\lambda_1\), \(\lambda_2\)):
    • If all eigenvalues have negative real parts, the fixed point is stable.
    • If one or more eigenvalues has a positive real part, the fixed point is unstable.
    • If the eigenvalues have an imaginary (complex) part, the fixed point is oscillatory.

Vectorizing systems of differential equations

To see why eigenvalues tell us about the stability of a fixed point, let’s look at a linear system of differential equations and its solution.

Consider this system of differential equations:

\[\frac{d x_1}{dt}=Ax_1+Bx_2 \\ \frac{d x_2}{dt} = Cx_1+Dx_2\]

This system is linear: each term only has either \(x_1\) or \(x_2\) in it, and there are no terms that have higher powers of \(x_1\) and \(x_2\) (such as \(x_1 x_2\) or \(x_1^2\)).

Because it’s linear, this system can be written as a matrix equation:

\[\frac{d \vec{x}}{dt} = E\vec{x}\]

where \(\vec{x}\) is two-element vector \((x_1,x_2)\), \(\frac{d \vec{x}}{dt}\) is also a two-element vector \((\frac{d x_1}{dt}, \frac{d x_2}{dt})\), and

\[E=\begin{bmatrix} A & B \\ C & D \end{bmatrix}\]

To solve this, we assume that solutions exist of the form \(\vec{x} = \text{e}^{\lambda t} \vec{v}\), where \(\vec{v}\) is an unknown fixed vector (basically a “direction”), and \(\lambda\) is a growth rate, also unknown. If we can find solutions of this form, we know they correspond to exponential growth or decay in the direction described by \(\vec{v}\).

Plugging \(\vec{x}\) into the vector equation above:

\[ \frac{d \vec{x}}{dt} = \frac{d}{dt} \text{e}^{\lambda t} \vec{v} = \lambda\text{e}^{\lambda t} \vec{v} \]

\[\lambda\text{e}^{\lambda t} \vec{v} = E \text{e}^{\lambda t} \vec{v} \\ E\vec{v} = \lambda \vec{v}\]

This is an eigenvalue-eigenvector equation, which means that the directions we were after are the eigenvectors of \(E\), and the growth rates are the eigenvalues.

Linear system example

\[\frac{dx}{dt} = y \\ \frac{dy}{dt} = -2x - 3y\]

The matrix form of this system is

\[\frac{d \vec{x}}{dt} = E\vec{x}\]

where the matrix \(E\) is

\[E = \begin{bmatrix} 0 & 1 \\ -2 & -3 \end{bmatrix}\]

To solve this system, we look for eigenvalues and eigenvectors of \(E\):

Let’s look at the eigenvectors on a phase portrait.

## [1] "Note: colour has been reset as required"

The two eigenvector directions \(\vec{v_1}\) and \(\vec{v_2}\) (green and purple dashed lines) represent the directions in which the flow follows \(\text{e}^{\lambda_1 t} \vec{v_1}\) or \(\text{e}^{\lambda_2 t} \vec{v_2}\). Other trajectories can be linear combinations of these two solutions:

\[\vec{x}(t) =c_l \text{e}^{\lambda_1 t} \vec{v_1} + c_2 \text{e}^{\lambda_2 t} \vec{v_2}\]

Now we can see why the eigenvalues determine stability: in a linear (or linearized) system, the eigenvalues determine whether the system shows exponential growth, decay, or oscillations, because the model solution is a combination of exponential functions whose parameters are the eigenvalues.

By calculating the Jacobian for any system we encounter, we are converting the system into a linear system like the one above, and all the same logic applies.


A bifurcation is a change in the number or stability of fixed points in a system as a function of one of the system parameters. This is a pretty abstract definition, so let’s look at an example right away.

A model of fishing

Here is a model that describes the population size of a species of fish as they are being hunted.

\[\frac{dN}{dt} = rN\left(1-\frac{N}{K}\right) - HN\]

In the absence of fishing, the population is assumed to grow logistically — this is the first term in the equation. The effects of fishing are modeled by the term \(- HN\), which says that fish are caught or “harvested” at a rate that depends on \(N\). This is plausible — when fewer fish are available, it is harder to find them and so the daily catch drops.


Find the fixed points analytically for this system.

Right away, we can see something interesting about one of the fixed points. If \(H > r\), then the second fixed point becomes negative. In a real population, there would now be only a single fixed point at \(N=0\) since \(N\) can’t be negative.

Next, let’s look at the stability analytically. We calculate \(\frac{\partial f(N)}{\partial N}\) and evaluate it at each fixed point, where \(f(N) = rN(1-\frac{N}{K}) - HN\).

\[\frac{\partial f(N)}{\partial N} = \frac{\partial}{\partial N} \left(rN(1-\frac{N}{K}) - HN \right) \\ =r - \frac{2rN}{K} - H\]

For the fixed point \(N=0\):

\[\frac{\partial f(N)}{\partial N} = r-H\]

This is positive if \(r > H\) and negative if \(r < H\). This means \(H = r\) is a bifurcation point: at this value of \(H\) (assuming \(r\) is fixed), this fixed point changes stability.

For the fixed point \(N = K\left(1-\frac{H}{r}\right)\):

\[\frac{\partial f(N)}{\partial N} = -r+H\]

This is positive if \(r < H\) and negative if \(r > H\). This fixed point also changes stability at the bifurcation point \(H = r\).

Finally, let’s plot \(dN/dt\) vs \(N\) and confirm our result.

First, define a function for the model.

Next, we choose parameters and create a sequence to plot.

And finally we plot in the usual way.

Fitting models to data

Finding data is a massive field, and there are many different strategies for choosing parameters for a model depending on the assumptions you make about your data and model. Today we will talk about arguably the most common way to fit data — least squares fitting. (Linear regression, which will be covered next class, is a special case of least squares fitting.) I will show you a method for doing ‘brute force’ least squares fitting so that you can fit any model you like using the same procedure, but there are special cases you might encounter in which there are faster and/or better ways to implement least squares fitting.

Fitting simulated data with a single parameter

Suppose we sample repeatedly from a bacterial population over time, and suppose we want to fit our sampled data to an exponential growth model.

Let’s simulate some data from an exponential function to work with.

Now let’s assume we don’t know the growth rate \(r\) and we want to extract it from the data — we want to fit the data to an exponential growth model and find the value of \(r\) that gives the best fit.

To do this, we need a way to tell if a fit is good or bad. What criteria should we use to determine if a fit is good? One option is to just try several parameter values and try to manually adjust the parameter until the fit looks good. This is imprecise and not reproducible, but it’s often a good place to start to get an idea of what parameter range to check.

The idea behind least squares fitting is that we want to minimize the difference between our model’s prediction and the actual data, and we do that by minimizing the sum of the squares of the residuals. Residuals (or errors) are the difference between the model and the data for each data point. The purpose of taking the square of each residual is to take out the influence of the sign of the residual.

The sum of the squares of the residuals is just a number, and now we have a criteria we can use to determine which of two fits is better: whichever fit minimizes that number is the better fit.

In practice, there are many ways to find the particular parameter that gives the best fit. One way is to start at some parameter value, then start adjusting it by small amounts and checking if the fit gets better or worse. If it gets worse, go in a different direction. If it gets better, keep going in that direction until it either starts getting worse again or the amount by which it gets better is very small. This type of algorithm is called gradient descent.

Another option is to try a range of parameter values and choose the one that gives the best fit out of that list. This is simpler to implement than the previous algorithm, but it’s also computationally more costly — it can take a long time.

We will do some examples of the second method today, what we’ll call ‘brute-force least squares’.

Let’s plot the sum of residuals squared vs. \(r\) to find which value of \(r\) fits best:

We can see visually that the minimum is around \(r = 0.2\), but to extract that number from the list we can use the function which:

We got the ‘correct’ value of \(r\), the one that we originally used to simulate the data.

Finally, let’s plot the fit against the original data:

You can play around with this simulated data and see what happens to the fit if you change the amount of noise that gets added to the data.

Fitting models with two or more parameters

Now let’s fit a model with more parameters, and let’s use some real data.

We can use our usual tricks to see what’s in our dataset:

Let’s plot just one of the habitats, sites, and treatments vs. time and colour by species.

Empetrum nigrum looks like it could be following logistic growth, so let’s try to fit the data to that model. First, let’s make a new variable with just the data we want to fit.

Now we define a logistic model function so that we can numerically generate the model’s solution and compare it to the data.

Let’s try running a single numerical solution and plotting it alongside the data.

Now we’ll define a grid of \(r\) and \(K\) values to try, then calculate a numerical solution for each combination of parameters.

Added: The more R way of doing the above code is:

## [1] TRUE

We can use the library lattice to plot the surface:

Now we extract the values of \(r\) and \(K\) that minimize the sum of residuals:

And now we can plot the best fit curve with the data:

You can do this process with more than two parameters as well, but the search space will be larger and the simulations necessary will take longer.

Assumptions of least squares

When is it appropriate to use least squares to fit your data?

Assumptions of least squares:

  • The noise or error is Gaussian-distributed: the most likely value for a data point is the predicted value, but there is a Gaussian distribution with some width that determines how likely it is that the data will have a different value.
  • All data points have the same variance in their error: the width of the Gaussian distribution governing the error is the same for each data point. This can be modified though if it’s not a good assumption.
  • The noise or error is only in the dependent variable(s), not in the independent variable(s).
  • Each data point is independent of the other data points.

Note: in all this talk of fitting, what we’re doing doesn’t tell us which model fits our data best, only what parameters for a particular model fit best. The choice of model is up to us. There are ways to distinguish between fits for different models which will be discussed next class.

Extras: Maximum Likelihood

Maximum likelihood is a way of finding parameters of a given probability distribution that best match data. It answers the question: which paremeter(s) make the data most likely to have occured?

Likelihood is defined as \(P(\text{data} | \text{model})\), which you can read as the probability of the data given the model. This is subtly different from \(P(\text{model} | \text{data})\), the probability of the model given the data, which is what you’re really after. But according to Bayes’ theorem, these two quantities are proportional, and so in practice, maximizing the likelihood is equivalent to maximizing the probability of a particular model given your data.

Bayes’ theorem, for the curious:

\[ P(A | B) = \frac{P(B | A)P(A)}{P(B)}\]

Least squares from maximum likelihood

Least squares is the maximum likelihood solution for the assumption that errors are Gaussian distributed. This assumption can be formulated like this: for a set of data \(y\) as a function of \(x\), let \(e_i\) be the difference between the predicted and measured value of \(y\) at point \(x_i\). We assume \(e_i\) follows a Gaussian or normal distribution with mean \(0\) and variance \(\sigma\):

\[P(e_i) = \frac{1}{\sqrt{2 \pi \sigma^2}} \text{e}^{-\frac{e_i^2}{2\sigma^2}}\]

We also assume that each data point \(y_i\) is independent, so the probability for the entire dataset is a product of all the probabilities for each data point:

\[P(\vec{e}) = P(e_1)P(e_2) ... P(e_N) = \prod_{i=1}^N P(e_i) \]

This is the likelihood, this is the quantity we want to maximize. To make our lives easier, we can also maximize the logarithm of the likelihood instead, since the logarithm is an increasing function and its maximum will still be in the same spot.

The log-likelihood is

\[\sum_{i=1}^N \text{log}P(e_i) = \sum_{i=1}^N \left( \text{log} \frac{1}{\sqrt{2 \pi \sigma^2}} - \frac{e_i^2}{2\sigma^2} \right)\]

\[ = N \text{log} \frac{1}{\sqrt{2 \pi \sigma^2}} - \frac{1}{2\sigma^2}\sum_{i=1}^N e_i^2\]

The first term is a constant, and so we can forget about it when looking for the maximum. What we’re left with is wanting to maximize

\[- \frac{1}{2\sigma^2}\sum_{i=1}^N e_i^2\]

Since \(\sigma\) is a constant, we can drop the prefactor as well. Finally, we can change the sign and minimize what’s left:

\[\sum_{i=1}^N e_i^2\]

But remember that \(e_i\) is just the difference between the predicted \(y_i\) and the actual data point \(y_i\), so the quantity above is just the sum of the squares of the residuals, exactly what we want to minimize in least squares.

Extras: using the analytic solution for a model to fit data

Instead of simulating the model, if you know the analytic solution you can use it directly like we did with the exponential model.

This is the analytic solution for the logistic model:

\[N(t) = \frac{N_0 K\text{e}^{rt}}{N_0(\text{e}^{rt}-1)+K}\]

Now to extract the values of \(r\) and \(K\) that minimize the sum of residuals:

This work is licensed under a Creative Commons Attribution 4.0 International License. See the licensing page for more details about copyright information.