3.4 Class 7: Infectious Diseases
Before getting into any biology, let’s learn about making our own functions. Throughout the course, we’ve made use of R functions such as print(), c(), and ggplot(); defining our own functions allows us to
Defining our own functions allows us to perform actions over and over on multiple sets of inputs.
Below, let’s practice creating a function that takes one input, a list, and returns the number of things in it:
# A vector containing the contents of my fridge
refrigerator <- c("Apples", "Pears", "Bread", "Eggs", "Cheddar",
"Brie", "Lettuce", "Shallots", "Cabbage")
# The function definition
countFood <- function(fridge){ # Fridge is a local variable - it does not exist outside the function
numItems <- length(fridge)
print(paste("I have", numItems, "things in my fridge."))
}
# Two more fridges
myFriendsPantry <- c("olives")
fridge <- c("veal", "steak", "halloumi")
# Run the fridge function on all three sets of inputs:
countFood(refrigerator)## [1] "I have 9 things in my fridge."
## [1] "I have 1 things in my fridge."
## [1] "I have 3 things in my fridge."
The countFood() function prints out a value, but doesn’t actually store anything to memory. Very often, we want to use the output from our function for downstream analysis. To save the oputput of our function, we need to include a return() statement.
For example, the function below generates a poly A tail sequence by pasting together a series of "A"’s, taking as input the length of the sequence to return:
# Define function
polyAtail <- function(length){
polyA <- rep("A", times = length) # Create a vector of A's
return(paste(polyA, collapse = "")) # Convert that vector into a string
}
# Run polyAtail() function on two inputs
polyA <- polyAtail(63)
polyA2 <- polyAtail(17)
# Print results
print(polyA)## [1] "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
## [1] "AAAAAAAAAAAAAAAAA"
3.4.1 The SIR Model
In the function below, we implement the simplest form of the SIR model. The SIR model tracks the numbers of Susceptible, Infected, and Recovered individuals in a population, assuming a constant birth/death rate. :
SIR <- function(S, I, R, # Starting number of individuals, as a fraction of total population size
mu, B, gamma, # death/birth rate, contact lengths, recovery rate
simTime, step){
# Initialize total population size
N <- 1
# lists to store population sizes
sList <- c(S)
iList <- c(I)
rList <- c(R)
# loop through times
for (time in seq(simTime/step)){
# Calculate rates of change
dS <- (mu * (N - S) - B * I * S / N) * step
dI <- (B * I * S / N - (mu + gamma) * I) * step
dR <- (gamma * I - mu * R) * step
# Update population sizes based on rate of change
S <- S + dS
I <- I + dI
R <- R + dR
# Update lists
sList <- c(sList, S)
iList <- c(iList, I)
rList <- c(rList, R)
}
# Store results to data frame
df <- data.frame(
time = seq(length(sList)) * step,
S = sList,
I = iList,
R = rList
)
return(df)
}The function above returns a data frame containing time and population size for individuals of all categories.
We would also like to plot our data. Since we may want to do this over and over again for different inputs, it also makes sense to plot through a function. Notice that because we just want to create a plot here, we don’t need to save anything to memory. Therefore, this function does not have a return() statement:
# Define plotting function
plotSIR <- function(SIR_data){
# Convert to tall format
SIR_data_tall <- melt(SIR_data, id.vars = "time")
# Fix the column names
colnames(SIR_data_tall) <- c("Time", "Population", "Size")
# Plot, coloring by population
ggplot(SIR_data_tall, aes(x = Time,
y = Size,
color = Population)) +
geom_line() +
theme_bw()
}We can also make a function to just plot the infected individuals. This is identical to the function above, except that we subset our individuals while plotting (SIR_data_tall[SIR_data_tall$Population == "I", ]):
# Define plotting function
plotInfected<- function(SIR_data){
# Convert to tall format
SIR_data_tall <- melt(SIR_data, id.vars = "time")
# Fix the column names
colnames(SIR_data_tall) <- c("Time", "Population", "Size")
# Plot, coloring by population
ggplot(SIR_data_tall[SIR_data_tall$Population == "I", ], aes(x = Time,
y = Size,
color = Population)) +
geom_line() +
theme_bw()
}Now that we’ve defined our functions, we can run them on a set of data:
# Initial fraction of populations in each category
S <- 0.19
I <- 0.01
R <- 0.8
mu <- 1/(50 * 52) # Birth rate
B <- 2 # Contact lengths
gamma <- 0.5 # Recovery rate
simTime <- 52 * 100
step <- 0.1
SIR_data <- SIR(S, I, R, mu, B, gamma,simTime, step)
plotSIR(SIR_data)

As with other multi-population models, we can the trajectories of multiple populations as they relate to each other:

Note that in this model, there is a cyclical oscillation approaching an equilibrium as time goes on. The new spikes in case number are a result of new births in the population increasing the susceptible population.
3.4.2 SEIR
The SEIR model extends the standard SIR model by including a new phase - Exposed individuals. Exposed individuals have contracted the infection but are not yet symptomatic.
We’ve also added a couple new terms to make this model more realistic:
The parameter
pnow specifies the at birth vaccination rate. As a result, some fractions of individuals in each generation immediately enter the R category rather than S at birth.The parameter
alphaspecifies the rate of death induced by the disease. This quantity of individuals is removed from the I population. Becausealphacan in fact lead to population size change, we now need to recalculateNwith each time step, reflecting the changing total population size.
SEIR <- function(S, E, I, R, # Starting number of individuals
mu, B, gamma, sigma, alpha, p, # death/birth rate, contact lengths, recovery rate, disease progression rate, death rate from disease, vaccination proportion
simTime, step){
# Initialize the total population size (start at 100%)
N <- 1
# lists to store population sizes
sList <- c(S)
eList <- c(E)
iList <- c(I)
rList <- c(R)
# loop through times
for (time in seq(simTime/step)){
# Calculate rates of change
dS <- (mu * (N * (1-p) - S) - B * I * S / N) * step # (1-p) reflects vaccination
dE <- (B * I * S / N - (mu + sigma) * E) * step
dI <- (sigma * E - (mu + gamma + alpha) * I) * step
dR <- (gamma * I - mu * R + mu * N * p) * step # mu * N * p reflects vaccination
# Update population sizes based on rates of change
S <- S + dS
E <- E + dE
I <- I + dI
R <- R + dR
# Update lists
sList <- c(sList, S)
eList <- c(eList, E)
iList <- c(iList, I)
rList <- c(rList, R)
# Recalculate total population size
N <- sum(S, E, I, R)
}
# Store results to data frame
df <- data.frame(
time = seq(length(sList)) * step,
S = sList,
E = eList,
I = iList,
R = rList
)
return(df)
}Now, running on a population:
# Initializing proportion of individuals in each population
S <- 0.19
E <- 0.0
I <- 0.01
R <- 0.8
mu <- 1/(50 * 52) # Birth rate
B <- 2 # Contact lengths
sigma <- 0.2 # Time spent exposed prior to symptoms
gamma <- 0.5 # Recovery rate
alpha <- 0.1 # Death rate from sickness
p <- 0.2 # At Birth Vaccination Rate
simTime <- 52 * 50
step <- 0.1
# Run Simulation
SEIR_data <- SEIR(S, E, I, R, mu, B, gamma, sigma, alpha, p, simTime, step)
# Plot
plotSIR(SEIR_data)
Now, plotting just the infected population:

And lastly, overlaying the infected and exposed populations on the same figure:
ggplot(SEIR_data, aes(x = S, y = I)) +
geom_path() +
geom_path(data = SEIR_data, aes(x = S, y = E), color = "plum") +
theme_classic()