COVID-19 in Norway: A Case Study#

Norwegian Institute of Public Health (FolkeHelseInstituttet) has published Norwegian surveillance data in a format that is easily accessible from a machine-reading perspective.
🔗 Surveillance Data Repository ⚠️ The repository is now archived (set to read-only)

In this chapter, we use real COVID-19 data from the Norwegian Institute of Public Health (FHI) to explore disease spread and estimate the reproduction number \(\mathcal{R}(t).\)
We then use these estimates in a SIRS compartmental model to simulate the epidemic under different scenarios.

1. Load Python packages#

We begin by importing the necessary Python packages and ensuring required libraries are installed.

# --- Core packages ---
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.integrate import odeint
import requests

# --- Install optional packages if missing ---
try:
    from epyestim import covid19
except ImportError:
    !pip install epyestim pymc --quiet
    from epyestim import covid19

2. Load data from MSIS (Norwegian Institute of Public Health)#

We use the dataset data_covid19_msis_by_time_sex_age_latest.csv, which includes weekly COVID-19 case counts in Norway by age and sex.

A short description of the data is in _DOCUMENTATION_data_covid19_msis_by_time_sex_age.txt can be also viewed for context.

url_msis="https://github.com/folkehelseinstituttet/surveillance_data/raw/master/covid19/data_covid19_msis_by_time_sex_age_latest.csv"

msis = pd.read_csv(url_msis) # Read the CSV file into a DataFrame
msis['date']=pd.to_datetime(msis.date) # convert date to datetime



doc_url = "https://raw.githubusercontent.com/folkehelseinstituttet/surveillance_data/master/covid19/_DOCUMENTATION_data_covid19_msis_by_time_sex_age.txt"

response = requests.get(doc_url) # Get the documentation 
documentation_text = response.text #Save the content as text

# To display the content, uncomment the line below
#print(documentation_text) # Print or display the content

msis.head() # Print the first few rows of the DataFrame
granularity_time granularity_geo location_code border age sex year week yrwk season x date n date_of_publishing
0 week nation norge 2020 0-9 female 2020 8 2020-08 2019/2020 31.0 2020-02-23 0 2022-11-14
1 week nation norge 2020 0-9 male 2020 8 2020-08 2019/2020 31.0 2020-02-23 0 2022-11-14
2 week nation norge 2020 10-19 female 2020 8 2020-08 2019/2020 31.0 2020-02-23 0 2022-11-14
3 week nation norge 2020 10-19 male 2020 8 2020-08 2019/2020 31.0 2020-02-23 0 2022-11-14
4 week nation norge 2020 20-29 female 2020 8 2020-08 2019/2020 31.0 2020-02-23 1 2022-11-14

3. Explore the data#

3.1. Summorize the data by age and sex#

fig, ax = plt.subplots(1, 2, figsize=(14, 5))
sns.barplot(data=msis.groupby('age')['n'].sum().reset_index(), x='age', y='n', ax=ax[0])
sns.barplot(data=msis.groupby('sex')['n'].sum().reset_index(), x='sex', y='n', ax=ax[1])
ax[0].set_title("Total by Age"); ax[1].set_title("Total by Gender")
plt.tight_layout()
plt.show()
../_images/df686ccc7fbc7f877a90ad056ecb3587464c10164d80840b63c774908cff8144.png

3.2. Visualize time series#

We visualize case counts by age and sex on linear and logarithmic scales.

# Create the subplots with shared x-axis
fig, ax = plt.subplots(2, 1, figsize=(14, 8), sharex=True)

# First subplot: Linear scale
sns.lineplot(
    data=msis,
    x='date',
    y='n',
    hue='age',
    style='sex',
    markers=True,
    dashes=False,
    ax=ax[0]
)
ax[0].set_title('Number of cases by age and sex (Linear Scale)')

# Second subplot: Logarithmic scale
sns.lineplot(
    data=msis,
    x='date',
    y='n',
    hue='age',
    style='sex',
    markers=True,
    dashes=False,
    ax=ax[1]
)
ax[1].set_yscale('log')  # Set y-axis to log scale
ax[1].set_title('Number of cases by age and sex (Logarithmic Scale)')


# Rotate x-axis tick labels for better readability
plt.xticks(rotation=45)

# Reduce the number of x-axis ticks by showing every Nth tick
N = 10  # Adjust N as needed
xticks = msis['date'].unique()[::N]
ax[1].set_xticks(xticks)

# Adjust layout to prevent overlap
plt.tight_layout()

# Display the plot
plt.show()
../_images/cfc246adcddab5db6479d127a7b1dd9117594df7650365a76936868032416710.png

3.3. Aggregate all the cases#

Next, we combine all the data to calculte all cases across age and gender:

# Group by date and sum the cases
weekly_cases = msis.groupby('date', as_index=False)['n'].sum().set_index('date')
weekly_cases.head()
n
date
2020-02-23 1
2020-03-01 30
2020-03-08 217
2020-03-15 1215
2020-03-22 1438
fig, ax = plt.subplots(figsize=(14, 6))

sns.lineplot(
    data=weekly_cases,
    x=weekly_cases.index,
    y='n',
    ax=ax
)

ax.set_title('Total number of cases (Linear Scale)')


# Rotate x-axis tick labels for better readability
plt.xticks(rotation=45)

# Reduce the number of x-axis ticks by showing every Nth tick
N = 10  # Adjust N as needed
xticks = weekly_cases.index[::N]
ax.set_xticks(xticks)
ax.grid(True)  # Add grid to the first subplot

# Adjust layout to prevent overlap
plt.tight_layout()

# Display the plot
plt.show()
../_images/104de79e9e2851d30cc1c8588ea71c71c71f39cf6fb0f728695491c64a0c9d5b.png

5. Estimate Reproduction Number \(R(t)\)#

We estimate the effective reproduction number \(\mathcal{R}(t)\) using the Python package epyestim, which is based on Bayesian inference for time-varying reproduction numbers.

The method uses:

  • A smoothed time series of daily incidence data

  • A serial interval distribution (i.e., time between successive cases)

  • A delay distribution between infection and case reporting

epyestim accounts for uncertainty and returns quantiles (e.g., 2.5%, 97.5%) as well as the posterior mean of \(\mathcal{R}(t)\).
These estimates are crucial for evaluating when transmission is increasing (\(R(t) > 1\)) or under control (\(R(t) < 1\)).

We apply it to the interpolated case data from MSIS.

import epyestim.covid19 as covid19

daily_cases = weekly_cases.resample('D').interpolate(method='linear') #interolate data to daily cases
r_estimates = covid19.r_covid(daily_cases['n'])
r_estimates.head()
cases R_mean R_var Q0.025 Q0.5 Q0.975
2020-02-28 21.0 4.852091 0.035857 4.486573 4.849615 5.231146
2020-02-29 25.0 3.409442 0.014790 3.175703 3.407999 3.651504
2020-03-01 30.0 2.632046 0.007420 2.465315 2.631087 2.803539
2020-03-02 56.0 2.299896 0.004671 2.168185 2.299228 2.435981
2020-03-03 83.0 2.236439 0.003513 2.122974 2.235927 2.353221

We can now plot the obtained estimated \(R(t)\) the \(97.5\%\) confidence interval of the estimate.

# Plot the R estimates with uncertainty
fig, ax = plt.subplots(figsize=(15, 4))
sns.lineplot(data=r_estimates, x=r_estimates.index, y='R_mean', ax=ax, label='R')
ax.fill_between(r_estimates.index, r_estimates['Q0.025'], r_estimates['Q0.975'], color='b', alpha=0.2, label='97.5% condifence')
ax.axhline(y=1, color='r', linestyle='--', label='R=1')
ax.set_xlabel('Date')
ax.set_ylabel('R value')
ax.set_title('Estimated R over Time with Uncertainty')

# uncomment the following lines to set x-axis limits
#start_date, end_date= pd.to_datetime('2020-04-01'), pd.to_datetime('2020-09-01')
#ax.set_xlim(start_date, end_date)
ax.set_ylim(0, 2)
ax.legend()
plt.show()
../_images/04793666a5c5b59381e3584a2432cba62f20b4448cd7a2273e3d9e6ac8ac4236.png

6. SIRS Model with Time-Varying Reproduction Number \(\mathcal{R}(t)\)#

In this section, we define and solve a SIRS model where the effective reproduction number \(\mathcal{R}(t)\) varies with time, based on the estimate above.

Model Overview#

The model follows the classic SIRS dynamics with compartments:

  • \(S(t)\): susceptible individuals

  • \(I(t)\): infected individuals

  • \(R(t)\): recovered (and temporarily immune) individuals

We incorporate \(\mathcal{R}(t)\) directly into the model using a time-dependent transmission rate:

\[ \beta(t) = \gamma \mathcal{R}(t) \]

This requires interpolating \(\mathcal{R}(t)\) values at arbitrary time points using the R_interp function.

Model Equations#

\[\begin{split} \begin{aligned} \frac{dS}{dt} &= -\beta(t) \frac{S I}{N} + \eta R \\ \frac{dI}{dt} &= \beta(t) \frac{S I}{N} - \gamma I \\ \frac{dR}{dt} &= \gamma I - \eta R \end{aligned} \end{split}\]

Parameters#

Symbol

Meaning

Value

\(\gamma\)

Recovery rate

\(1/7\) (1 infection lasts ~7 days)

\(\eta\)

Loss of immunity rate

\(1/30\) (immunity lasts ~30 days)

\(N\)

Total population

5,400,000

\(\mathcal{R}(t)\)

Time-varying reproduction number

Estimated from data

\(I_0\)

Initial infected individuals

From case data on the first date

\(S_0, R_0\)

Initial susceptible/recovered

Computed from \(N\) and \(I_0\)

Initial Setup#

We define:

  • R_interp(t, Rt, key): interpolates the \(R(t)\) values for continuous time

  • deriv(...): defines the SIRS differential equations with \(\beta(t)\)

  • Time grid \(t\): in days since the start (can be selected within the range of estimated \(\mathcal{R}(t)\))

This setup is then passed to the odeint solver to simulate the SIRS model.

def R_interp(t, Rt, key):
    """
    Interpolates the R values for the given time points.
    str ='R_mean' or 'Q0.975' or 'Q0.025'
    t: Time points for interpolation, in days since the start date refereced to in Rt.index[0]
    """
    days =(Rt.index - Rt.index[0]).days
    R_values = np.interp(t, days, Rt[key])
    return R_values

def deriv(X, t, N, gamma, eta, Rt, key ='R_mean'):
    """
    Compute the derivatives for the SIR model, with estimated R(t) values.
    For the odeint solver to work, we interpolate the R values for the time points, see R_interp function.
    """
    S, I, R = X
    beta =  R_interp(t,Rt,key)*gamma
    dSdt = -beta * S * I / N + eta*R
    dIdt = beta * S * I / N - gamma * I
    dRdt = gamma * I - eta*R
    return dSdt, dIdt, dRdt

#Define the parameters
gamma, eta = 1/7, 1/30
N = 5_400_000
day0, dayN = r_estimates.index[14], r_estimates.index[-1] 
Rt = r_estimates.loc[day0:dayN]
t = np.linspace(0, (dayN - day0).days, (dayN - day0).days + 1)
I0 = daily_cases.loc[day0]['n']
X0 = N - I0, I0, 0
 
print('The model is initialized...')      
The model is initialized...

6.1. Run the numerical simulations#

Run the code using mean estimate \(\mathcal{R}(t)\) and its \(97.5\%\) quantiles.

SIR_dict = {}


for key in ['R_mean', 'Q0.025', 'Q0.975']:
    ret = odeint(deriv, X0, t, args=(N, gamma, eta, Rt, key))
    SIR_dict['S for ' + key] = ret.T[0]
    SIR_dict['I for ' + key] = ret.T[1]
    SIR_dict['R for ' + key] = ret.T[2]

SIR = pd.DataFrame(SIR_dict, index=t)
SIR.head()
S for R_mean I for R_mean R for R_mean S for Q0.025 I for Q0.025 R for Q0.025 S for Q0.975 I for Q0.975 R for Q0.975
0.0 5.399070e+06 929.857143 0.000000 5.399070e+06 929.857143 0.000000 5.399070e+06 929.857143 0.000000
1.0 5.398903e+06 963.902794 133.162223 5.398908e+06 959.297067 132.836126 5.398898e+06 968.605135 133.494343
2.0 5.398745e+06 989.100082 266.103762 5.398755e+06 979.979820 264.821012 5.398734e+06 998.448705 267.415215
3.0 5.398596e+06 1006.554397 397.668186 5.398612e+06 993.057811 394.835050 5.398579e+06 1020.434264 400.571564
4.0 5.398455e+06 1017.628440 526.901376 5.398478e+06 999.914528 521.965100 5.398432e+06 1035.916169 531.972602

6.2. Plot the results#

fig, ax = plt.subplots(figsize=(15, 4))
sns.lineplot(data=SIR, x=SIR.index, y='I for R_mean', ax=ax, label='SIRS')
ax.fill_between(SIR.index, SIR['I for Q0.025'], SIR['I for Q0.975'], color='b', alpha=0.2, label='97.5 % confidence')

sns.lineplot(data=daily_cases.loc[day0:dayN], x=SIR.index, y='n', ax=ax, label='MSIS dataset')


ax.set_xlabel('days since ' + str(day0.date())) 
ax.set_ylabel('Number of People') 
ax.set_title('Number of Infected, I(t)')
ax.legend()

plt.grid(True)
plt.show()
../_images/b45d2433f4b1292b90996b88d3376507bc40dd46055e3f4cc5caa71639359b58.png

7. Real-World Considerations and Model Limitations#

In this example, the reported COVID-19 case data have been used both to estimate the time-varying reproduction number \(\mathcal{R}(t)\) and to evaluate the fit of the SIRS model to observed dynamics. As a result, it is not surprising that the model qualitatively matches the data relatively well. The simulation is based on information derived from the same data it is compared against.

Important considerations#

  • The true value of \(\mathcal{R}(t)\) is not directly observable and must be estimated from imperfect and delayed data sources.

  • Reported case numbers depend on factors such as testing capacity, case definitions, reporting delays, and public health policies.

  • The model does not include important real-world factors such as vaccination, public behavior changes, or non-pharmaceutical interventions (e.g., lockdowns or mask mandates).

  • The SIRS model used here assumes a homogeneously mixing population, constant recovery and reinfection rates, and does not include age structure, geography, or other heterogeneities.

Conclusion#

Mathematical models are simplified representations of complex systems. They are valuable for exploring hypotheses and testing possible scenarios, but they cannot account for every aspect of a real-world epidemic.

Exploration#

Now that the basic SIRS model has been implemented and compared with data, you are encouraged to explore further. Try the following:

Vary the model parameters#

  • What happens if the recovery rate \(\gamma\) is increased or decreased?

  • How does the duration of immunity (controlled by \(\eta\)) affect the long-term dynamics?

  • Try different values of the initial conditions \(I_0\), \(R_0\), and see how they change the peak and timing of infections.

Extend the model structure#

  • Implement an SIERS model by adding an “Exposed” compartment for individuals who are infected but not yet infectious.

  • Add age structureand and/ore explore model with vaccination, the vaccination data can be obtaind from 🔗 Surveillance Data Repository

  • Explore how different vaccination rates or timings change the spread of the disease.


Exploring these extensions will help deepen your understanding of epidemic modeling and the role of mathematical assumptions in interpreting public health data.