Imperial College simulation code for COVID-19

I finally managed to get Neil Ferguson’s COVID-19 simulation running on my iMAC (4.2 GHz Quad-Core i7 with 16MB of memory). So it is untrue that you need some parallel supercomputer to run these simulations. The trick though is to first to switch off the  default “parallelism” option in the make file before compiling the software. I suspect that Ferguson’s original c-code has recently been repackaged in C++ , along with a new  directory structure all cleaned up for the release on GITHUB. The famous but now shorter 5000 line routine though is still there.

Having looked into the software a bit,  I would describe his model as a Monte Carlo simulation of  how an infectious disease evolves within the population of any country. A country (such as the UK) is divided  into administrative units (regions) with population distributions parameterised by town size, age profiles, household sizes, on  a detailed population grid. It is certainly not a simple SIR model – like mine.

The disease (Covid) is seeded in some location (London ?)  with an initial 100 infected persons. The model then simulates the transmission rates within and across different locations (eg. schools, workplaces, home, transport, pubs etc.), types of person (age), and between towns (transport). As more people get infected so the number needing hospital treatment, ICU beds and eventually deaths is calculated with time. I think in the original paper the results were based on initial estimates of R and IFR data coming from China.

Different intervention scenarios aimed at curtailing the epidemic can then also  be simulated, for example by “deducing” their effect on reducing the demand on NHS ICU beds  and on the overall infection and death rates.

The guts of the simulation seems to be all in one file CovidSim.cpp which was originally written by Neil Fergusson in C, but during the cleanup process transferred to C++.  There have recently been some scathing code reviews in the media, but I am not going to follow suit and criticise his code just for the hell of it. Yes it should have been structured better, but I am also pretty sure it is correct and has been evolving  over a long period of time simulating ever more complex scenarios.  When scientists write their own code they just want results and  probably never consider the necessity for it all to be made publicly available. Now Ferguson  finds himself in a global maelstrom so basically he  had to publish his code, but beforehand it had to be at least partly sanitised.

Some reviewers have claimed that it has bugs because  the results are “non deterministic” or”stochastic”. However in his defence,  I would say that is exactly what Monte Carlo simulations always do. You can never get exactly the same result twice. Furthermore it is pretty obviously to me that the code is still under active development right now, since  it  is being added to in real time, for example  to simulate UK’s new “test, track and trace” policy.

It also looks like another team in his group at IC have been working on the Github release because in order to run the Covid scenarios you now also need to have both Python and R installed. ( see here ). Python is used to generate the sample data and R is used to plot the results. Python alone would have been more than sufficient. As a result it is not going to be easy for the man in the street to get this software up and running. Luckily I had Python and R already installed.

Having struggled through the software build, I finally managed to run their sample COVID simulations provided for the UK which has two scenarios a) No intervention and b) Full Lockdown. It is run with an effective R = 3, which seems rather high to me ( R was 2.4 in Ferguson’s original paper). It runs on my Mac in about 50 mins. So here are the plots generated by their R-scripts (not really to my taste).

Over 600,000 deaths in the UK with no intervention (R=3, IFR = 1%) and about 40,000 deaths with Lockdown intervention.

Deaths/week peak at ~1500 in mid-April with lockdown measures compared to a peak of 20,000/week with no intervention!

Finally these are the predicted deaths per week under lockdown measures (the no-intervention case reaches ~130K for last week in April).

Deaths per week and by age group under lockdown intervention

So how has this worked out in the UK? These figures can now be compared to the actual data which is probably best represented by the independent ONS study of excess weekly deaths.

ONS data on excess deaths in the UK in 2020

If you look at the bottom red curve which shows  excess deaths directly related to COVID-19 then you can see that the lockdown predictions more or less agree with the outcomes. Does that mean that  Neil Ferguson really did call the right shot?

Probably  he did, yet still today we have a fundamental lack of knowledge of the real-time  numbers of infected and recovered persons in the UK.  This current obsession with R is actually misleading government policy because R will naturally change during any epidemic. R will always reach 1 in coincidence with the peak in cases and then fall rapidly towards zero.

At some point the costs of continuous stop-go lockdowns based on fear alone will become unbearable. We will have to learn to live with this virus.

Posted in Public Health | 40 Comments

Social networks and epidemics

Simple SIR models essentially assume that the infection rate of any  population is homogeneous, i.e. everyone stands an equal chance of getting infected. However this isn’t really true, because some people have very large social networks, while  the majority vary from  the the low tens, down to a handful of regular contacts. Politicians, business leaders and Scientists tend to have very large networks of regular contacts across the world, whereas most people have far smaller family and work related networks. The so-called super-spreaders are the first infected members of large social networks.

Social networks derived from news reports concerning the Iraq Insurgency in 2007 (my work). This just illustrates how large variations in network size can be.

At the start of an infectious disease outbreak the probability of someone getting infected if they are a member of a large network and then spreading it is very high. These are the super spreaders which force the initial reproduction rate R0 to very high values. It is no real surprise that Boris Johnson, Mat Hancock, Chris Whitty, Dominic Cummins and Neil Fergusson all got infected together as they formed part of the government’s  large COVID/SAGE  network. However as an outbreak progresses, the larger networks become increasingly already infected, so that remaining networks for new infections will therefore tend to get smaller and smaller. As a direct result of this gradual process, R will naturally begin to reduce,  even without imposing any extra social distancing policies. R is simply proportional to the size of network that the average infected person belongs to, so as mean network size diminishes so does R.

The Imperial model has a complex model of the distribution of UK population in cities, towns and villages and simulates travel and commuting. However as far as I can tell it  does not include social networks nor could it easily do so. There are just too many types of network for them to model – entertainment, businesses, councils, sports clubs, etc. All we can really do is to estimate how R might reduce naturally with time as the susceptible network size reduces.

My model uses a contact rate \beta which is the contact rate per day resulting in infection and 1/g which represents the average infectious period in days. Initially beta is 0.5 and 1/g = 5 but in order to simulate the decrease in social network size I reduce \beta by 0.003 per day. As a result R slowly reduces from 2.5 on March 1st to 1.0 on June 2nd. Note that this does not include any government led extra social distancing measures.

Here is a comparison  with this modified model with UK accumulated deaths

UK cumulative deaths compared to a slowing in R with average social network size

It is clear that the model doesn’t properly represents the shape of the UK data. It also seems clear that the first cases in UK also appeared probably 2 weeks before March 1st. One reason the curve can’t fit is because  the UK also imposed a lockdown on March 23rd which changed the dynamics. Next we look at Sweden which did not impose a lockdown.

Comparison of a reducing network model  with Sweden’s  accumulated deaths per million people.

The model is almost a perfect fit, so  I suspect that Sweden may well  be on the right track. If this intuitive decrease in  R due to ever decreasing  social networks of the susceptible really  is the driver, then by mid June the Swedish epidemic will be over through herd immunity.

Posted in Public Health | 17 Comments

Herd Immunity

Note: My definition of Herd Immunity is the total % of the population who get infected once the epidemic has completely finished. Most people define it as just the % of the population infected once  the peak is reached. This gives much lower figures – 60% rather than 90% !

When a new disease enters into a community without prior immunity it will spread  if R0 > 1 . R0 is simply the number of people the first person with the disease infects before he/she recovers (or dies). If R0 < 1 then the disease fades quickly away and has no lasting effect,  while if R0 >> 1 the faster the disease spreads. Avian Flu was an example of a lethal disease but fortunately for us had R0 < 1. People got infected by birds but did not easily pass it on to others before they died. As a result there was no pandemic, despite WHO warnings of one at the time.

In the UK there are 65 million people all initially susceptible to any new infectious disease arriving in the country. It is estimated for Covid-19 that  R0 probably was  around 2.4 in Wuhan. However for simplicity let’s consider a general case where some disease has R0 = 4 and that the infectious period is 5 days.

At the start of the epidemic one infected person arrives in UK  and infects 4 others before recovering. These then each infect 4 others so that the infected population will increase as 4^n every 5 days.


It is probably only after ~1 month that anyone really notices that there is a problem, but by then the epidemic is already increasing “exponentially” out of control. However there is a safety catch.  Eventually each new infected person begins to meet some of those also infected or already recovered, so now cannot pass it on to 4 new cases. The reservoir of susceptible people is quickly running out and R is now diminishing fast. It reduces to 1 at the peak of the outbreak and then falls dramatically as the epidemic collapses. The population is then said to have reached “herd immunity”, and the UK is afterwards immune to any new infections arriving from abroad. One interesting fact is that herd immunity is always reached before everyone in the country is infected.  A percentage of the population will always escape any infection, but this percentage depends critically on the initial value of R0. Here are two examples.

Herd immunity for R0 = 1.2.  Only 30% get infected before Herd Immunity is reached.

However if R0 is much larger than 2 it is a different story.

Herd immunity is only reached after 90% of the population have been infected, but at least the epidemic ends much faster!

Governments can reduce “R” through social distancing measures, but this is a tricky process because to return to “normal” life infections would essentially have to drop to zero, perhaps through track & trace. Alternatively we could eventually reach herd immunity with maintaining R=1.2 but that would take 9 months, and even then may not be sustainable while infection rates outside the country remain at R=2.4.  So In both cases international travel might still need to be controlled indefinitely.

The only certain way out of this dilemma remains either a rapid development of an effective vaccine, or a drug treatment which  renders the disease no worse than a cold.


Posted in Public Health, Uncategorized | 16 Comments