Imperial College simulation code for COVID-19

I finally managed to get Neil Ferguson’s COVID-19 simulation running on my iMAC (4.2 GHz Quad-Core i7 with 16MB of memory). So it is untrue that you need some parallel supercomputer to run these simulations. The trick though is to first to switch off the  default “parallelism” option in the make file before compiling the software. I suspect that Ferguson’s original c-code has recently been repackaged in C++ , along with a new  directory structure all cleaned up for the release on GITHUB. The famous but now shorter 5000 line routine though is still there.

Having looked into the software a bit,  I would describe his model as a Monte Carlo simulation of  how an infectious disease evolves within the population of any country. A country (such as the UK) is divided  into administrative units (regions) with population distributions parameterised by town size, age profiles, household sizes, on  a detailed population grid. It is certainly not a simple SIR model – like mine.

The disease (Covid) is seeded in some location (London ?)  with an initial 100 infected persons. The model then simulates the transmission rates within and across different locations (eg. schools, workplaces, home, transport, pubs etc.), types of person (age), and between towns (transport). As more people get infected so the number needing hospital treatment, ICU beds and eventually deaths is calculated with time. I think in the original paper the results were based on initial estimates of R and IFR data coming from China.

Different intervention scenarios aimed at curtailing the epidemic can then also  be simulated, for example by “deducing” their effect on reducing the demand on NHS ICU beds  and on the overall infection and death rates.

The guts of the simulation seems to be all in one file CovidSim.cpp which was originally written by Neil Fergusson in C, but during the cleanup process transferred to C++.  There have recently been some scathing code reviews in the media, but I am not going to follow suit and criticise his code just for the hell of it. Yes it should have been structured better, but I am also pretty sure it is correct and has been evolving  over a long period of time simulating ever more complex scenarios.  When scientists write their own code they just want results and  probably never consider the necessity for it all to be made publicly available. Now Ferguson  finds himself in a global maelstrom so basically he  had to publish his code, but beforehand it had to be at least partly sanitised.

Some reviewers have claimed that it has bugs because  the results are “non deterministic” or”stochastic”. However in his defence,  I would say that is exactly what Monte Carlo simulations always do. You can never get exactly the same result twice. Furthermore it is pretty obviously to me that the code is still under active development right now, since  it  is being added to in real time, for example  to simulate UK’s new “test, track and trace” policy.

It also looks like another team in his group at IC have been working on the Github release because in order to run the Covid scenarios you now also need to have both Python and R installed. ( see here ). Python is used to generate the sample data and R is used to plot the results. Python alone would have been more than sufficient. As a result it is not going to be easy for the man in the street to get this software up and running. Luckily I had Python and R already installed.

Having struggled through the software build, I finally managed to run their sample COVID simulations provided for the UK which has two scenarios a) No intervention and b) Full Lockdown. It is run with an effective R = 3, which seems rather high to me ( R was 2.4 in Ferguson’s original paper). It runs on my Mac in about 50 mins. So here are the plots generated by their R-scripts (not really to my taste).

Over 600,000 deaths in the UK with no intervention (R=3, IFR = 1%) and about 40,000 deaths with Lockdown intervention.

Deaths/week peak at ~1500 in mid-April with lockdown measures compared to a peak of 20,000/week with no intervention!

Finally these are the predicted deaths per week under lockdown measures (the no-intervention case reaches ~130K for last week in April).

Deaths per week and by age group under lockdown intervention

So how has this worked out in the UK? These figures can now be compared to the actual data which is probably best represented by the independent ONS study of excess weekly deaths.

ONS data on excess deaths in the UK in 2020

If you look at the bottom red curve which shows  excess deaths directly related to COVID-19 then you can see that the lockdown predictions more or less agree with the outcomes. Does that mean that  Neil Ferguson really did call the right shot?

Probably  he did, yet still today we have a fundamental lack of knowledge of the real-time  numbers of infected and recovered persons in the UK.  This current obsession with R is actually misleading government policy because R will naturally change during any epidemic. R will always reach 1 in coincidence with the peak in cases and then fall rapidly towards zero.

At some point the costs of continuous stop-go lockdowns based on fear alone will become unbearable. We will have to learn to live with this virus.

About Clive Best

PhD High Energy Physics Worked at CERN, Rutherford Lab, JET, JRC, OSVision
This entry was posted in Public Health. Bookmark the permalink.

40 Responses to Imperial College simulation code for COVID-19

  1. Mick Wilson says:

    Another thoughtful piece, Clive. I agree that pillorying Ferguson has been fruitless, but understand how the uncertainty is unsettling “deterministic” programmers and other armchair experts. In an age when software models in systems managing life-and-death systems used in flight control, nuclear medicine and electrical power distribution have been trained up or at least characterized parameterized by Monte Carlo-type methods to ascertain reliability and reporoducibility envelopes… well, its gets fuzzy.

    • Clive Best says:

      Thanks Mick,

      Modelling human behaviour as well as an epidemic is always going to be difficult.

      • Really telling when one sees climate scientists — many who say that natural variability is chaotic and therefore impossible to predict — claim that the epidemiologist’s models are all wrong (see Annan, etc).

        In reality, natural climate variability is easy to model in comparison to the game theory of human behavior. The climate modelers just have to buckle down and find the deterministic patterns. The worst assumption one can make is to assume chaos or randomness when the determinism is front and center, see this recent post based on an EGU presentation

  2. Hugo says:

    Again nice work.

    I think a Ro of 3 is a bit high.
    Sars covid19 RO is much lower than that.
    Its not influenza or Measles

    The RO bottle neck is in the viral load.
    Population density.


    • Clive Best says:

      R is going to depend on location and time. Right now R is probably >2 in care homes and in some hospitals but < 0.5 in London streets ! So R0 is the initial infection rate and R must fall to 1 at the peak and zero when the epidemic ends. However R0 also determines the % of people infected and the length of the epidemic. Lowering R reduces total deaths but always extends the length of the epidemic.

      • John Bradley says:

        Clive, grateful if you would explain why ‘lowering R reduces total deaths’ over the timespan of the epidemic, please.

        • Clive Best says:

          I try to explain why R makes a difference in this post

          The basic reason is that the percentage of the population who get infected to achieve herd immunity falls dramatically with R, but it takes much longer to achieve it.

          It is a playoff between balancing the length of an epidemic with the total number of deaths. The lower R is then the longer the peak lasts (up to 6 months!) but the lower the number of deaths result. This also allows time for better treatments to be developed and possibly a vaccine.

          Lockdown and social distancing reduces R and saves lives, but at the cost on the economy. The long term knock on effects of this may cost even more lives because we can’t return to normal (international travel, pubs and nightclubs etc.) without risking R going above one.

          Let’s hope a vaccine or effective treatment works soon.

          • John Bradley says:

            Thank you for the explanation Clive.

            R(0) is a measure of the ease with which a virus is transmitted and is a function of the virus’ innate contagiousness, and the unmitigated environment which it inhabits. For the reasons you cite, the lower a virus’ R(0) the lower the total deaths, ceteris paribus. Understood.

            R is the effective reproduction number. The point I was trying to make was that lowering R by modifying the environment which the virus inhabits (eg interventions such as suppression), reduces total deaths only if those interventions are maintained indefinitely. If interventions are relaxed R will rise and over the long term total deaths will be the same as in an unmitigated scenario (assuming NHS not overwhelmed, no vaccine). In other words unless we remain in a state of lockdown for an implausibly long time, suppression defers total Covid deaths but doesn’t reduce them. The only justification for suppression might be to stop the NHS being overwhelmed and/or to keep people alive long enough for the cavalry to arrive in the form of a vaccine/effective treatment.

  3. JerryJames says:

    In terms of the impact on the Healthcare system it would be useful to see hospitalizations. Deaths get headlines and is of value but patients in beds is what will tax the infrastructure. If we can keep that under control and implement testing and tracing, then we can feel better about “opening up”.

  4. Clive you say

    However in his defence, I would say that is exactly what Monte Carlo simulations always do. You can never get exactly the same result twice.

    What happens if you set the seed?

    And as a bonus question would you regard setting the seed as optional?



    • Clive Best says:

      You should always get the same trends, but you can’t say that person A will get infected by person B. All you can say is that there is a say 60% probability that they get infected in some place. Then you throw a dice to decide. So in one run he gets infected and in another run he doesn’t. However when you average over millions of people it doesn’t matter.

      • Chris Long says:

        You should get exactly the same results if you seed the RNG and design the software carefully. The Imperial team had some non-deterministic behaviour due to multiple threads calling the RNG in a non-deterministic sequence. There were some recent updates to the software to fix that, so they are aware that the simulation should be deterministic and are trying to make it so.

      • My understanding is if you always seed the random number generator with same seed, the sequence of ‘random’ numbers generated will be the same. If this is done repeated runs will produce the same results (be deterministic). This is why the ability to seed random number generators was provided.

        Or have I missed something?

        • Your understanding is correct, Jeremy. There is some nuance to this though; the order of the random numbers returned will be the same for the same seed, but different parts of the code may call the generator in different orders each time the code is run.

          Imagine two parts of the code, A and B, each have a few calls to the generator and are being run concurrently (i.e. you have no guarantee from the computer as to the interleaving of A and B). Lets assume that the seed guarantees “random” output 3, 1, 4, 1, 5, 9. Denoting each call to the random number generator as its source (A or B) and its order within the source. One run may receive the numbers:

          A1 – 3
          A2 – 1
          A3 – 4
          B1 – 1
          B2 – 5
          B3 – 9

          Another run may be:

          B1 – 3
          B2 – 1
          B3 – 4
          A1 – 1
          A2 – 5
          A3 – 9

          And a third one may interleave them:

          A1 – 3
          B1 – 1
          B2 – 4
          A2 – 1
          B3 – 5
          A3 – 9

          The only guarantee you have from the computer is that all the As will be in order and all the Bs, but you can’t tell anything about how they compare.

          Although the random numbers come out in the same order, if A calculates the sum then its outputs are 8, 15, 13. If B calculates the product then its outputs are 45, 12, 20. Here we get non-determinism despite a well-behaved random number generator with the same seed! It’s the responsibility of the software engineer to make sure this doesn’t happen.

          If there is any skew in the way this happens then even averaging multiple runs won’t fix it.

        • Alex Moss says:

          Indeed, I have read criticism that different results are a provided with the same random seed, which is a separate issue from stochastic simulation.

      • Thank you for taking the time to write this article Clive. It was refreshing to read that you weren’t going to criticise the code for the mere sake of it 🙂

        I disagree with the assertion that you can explain away non-determinism as “what Monte Carlo (MC) simultations always do” though. There is determinism from the perspective of the computer, i.e. using a pseudo-random number generator (PRNG) as against a truly random source, and from the perspective of the user. The latter is certainly the very point of MC models, but the former is not a prerequisite for this, and I’d argue that it undermines the validity of the code by making it impossible to prove parity between the maths and its implementation in software. To say that you “can never get exactly the same result twice” is simply incorrect.

        Re your dice response: the difference between reality and simulation is that the sim allows you to repeat the dice roll, with the same seed, and get the same result. Assuming a good PRNG, then you get reliable trends and averages to interpret in a MC model, but the key point is that it’s entirely repeatable. Any other non-determinism comes from a different source (e.g. a race condition) and can’t be differentiated from a bug.

        As an example, let’s say we have a simple model of Y=X+2 where X is distributed uniformly in the range (0,1)—we explore the distribution Y by sampling from X and adding 2. With the same seed, every trial will result in the same sample x, so to claim that non-determinism in a single sample of y is still ok is to say that the +2 operation is stochastic—it’s a bug. The same holds regardless of the distribution of X.

        There are certainly greater complexities when multiple concurrent calls are made to the PRNG because their ordering isn’t guaranteed. However, being difficult is not the same as being impossible and it’s the job of the software engineer to manage this such that they can prove the correctness of their implementation.

        But can’t you just take the average and wave these problems aside? No. The difference between “precision” and “accuracy” is important here. Precision is how tightly your darts land on the board, and accuracy is how close the centroid is to the bullseye (accuracy and precision are lay terms for mean and variance, respectively). Taking an average of multiple runs only improves precision by decreasing the standard error of the mean as n increases. High precision and low accuracy is just “better wrong” but it’s still wrong—you won’t win a game of darts with every dart landing in the exact same spot if that spot isn’t on the board. This analogy is slightly flawed because in the high-precision game of darts you can measure how wrong you are and adjust accordingly, but this isn’t possible when estimating unknown properties such as in the Imperial model.

        The burden of proof for correctness of code lies with the engineer that writes it. Having non-deterministic output makes testing more difficult because it’s impossible to codify assumptions through tests of the form “input A into function B must get output C”. The moment there is any hint of non-determinism, the results of these tests change from a binary correct or not to a probabilistic “maybe correct”, and these “maybes” accumulate to such a point that the trustworthiness of the code is questionable. Science (and software engineering for that matter) is based on reproducibility, and non-deterministic bugs undermine this. We have no way of knowing if the statistics and the software are equivalent.

        • I’ve always thought threading was the root of all evil.

          Donald Knuth and Tony Hoare ain’t that keen neither.

        • It’s true that, from the procedural point of view, it’s always best to engineer the model so that given the appropriate seed will reproduce the same output. However, let’s not lose focus on two aspects:

          1) This model is from the early 90s. The Mersenne twister – to put things into context – was published in 1997. Non reproducible Montecarlo simulations were just normal back then and this model was extremely advanced in the epidemiological world ( I would argue it’s still overkilled but that’s the topic for another discussion). A quick look at the code shows me that the implentation of seeds was done fairly recently, in 2017. They just did not feel the need for it before probably because they were running the model a few times and then bootstrapping the results.

          2) When you’re making predictions, all it matters is the ability to have predictive power. This model has predictive power. I don’t know if it outperforms the simplest SIRs because I am not an epidemiologist and I don’t know the literature but – as Clive shows even in this blog post – it works so that’s what matters. Ferguson was attacked with a number of ludicrous points (e.g. the software is not multi-threaded or they can’t spell household) but the major argument is that the prediction of non-intervention holds true – if nothing because it is a simple one to make given the R0 of the virus. A back of the envelope calculation would still return 500k deaths in the UK with not-intervention.

          3) (bonus point) What I personally find shocking is that this group managed to get this far without ever publishing the code before.

  5. Alex Moss says:

    Thanks for detailed review! The main criteria on which to see if the model was correct is the counterfactual. What if we had maintained mitigation instead of suppression strategy. Of course we can never know this for the UK. Except am I right in thinking that the model would show exponential growth in deaths for some time under a mitigation scenario to the 250k max? If so how does this match the data that shows death growth rate was falling (i.e. slowing down) from early march?

  6. Waza says:

    Great work.
    The age distribution is out of whack.
    Examples from ships and aged care suggest crew and staff have higher R0 but lower fatalities

  7. John Bradley says:

    Many thanks Clive. I wonder if you can shed any light on a couple of things that have been bugging me.
    First, would I be right in thinking that if you assumed an IFR of 0.1% rather than the nearly 1% assumed, the total number of deaths would be a tenth ie in the Counterfactual about 50k rather than 500k?
    Second, how does the model produce a reduction in deaths of about 250k in ‘mitigation’ compared with the Counterfactual if deaths are taken over a two year period (which they are in the 16 March paper, Table 4)? Or indeed 200k or so reduction going from ‘mitigation’ to ‘suppression’?

    I would have thought that suppression defers deaths and only reduces deaths if people die from not being able to be treated ie through lack of ICU capacity.

    • Clive Best says:

      Yes if you take an overall IFR of 0.1% and R0=2.4 (Ferguson’s first paper) then you would get 50,000 deaths in the UK, if you let the epidemic run its course without any intervention. I think the Gupta group in Oxford are proposing this is the case, but that would mean that nearly 60% of the population have already had COVID-19 and we are on the verge of reaching herd immunity.

      We will only know the true figure when we have antibody tests to measure how many people have already had it. I think the real answer is probably about 10-12% or about 7million people. This would then give an IFR of 0.05%.

      • John Bradley says:

        Thanks Clive.
        Any thoughts on my second query – the reason for the reduction in deaths from 500k (do nothing) to 240k (mitigation) to a few tens of thousand (suppression).

  8. Rob L says:

    In the ‘Death incidence by model run’ graph, you have the caption ‘200,000’ deaths per week with no intervention, but reads like 20,000 per week. Can you clarify?

    Moreover, what does ‘no intervention’ mean? Is there any allowance for the fact that people will avoid social contact if there is a big scary disease around, even if the govt doesn’t ban it? For example, the pubs were pretty from 16 March onwards when govt said to avoid them, but they weren’t closed till the weekend before lockdown.

    • Clive Best says:

      You’re right! That was my mistake. The graph shows deaths/week so the correct figure is 20,000/week peak value.

    • Clive Best says:

      You’re also right that people would naturally socially distance themselves during an outbreak. It doesn’t need “government intervention” to ensure that. R naturally falls because the super-spreader “jet-setters” types get infected first.

      • Börje Månsson says:

        You can verify this natural decrease in R0 by using the corona hospital data (H) and the total verified corona cases (CT).
        If you assume H are proportional to the real number of infected people (I) you can calculate: (H, I, S, CT are all per capita and the SIR model is used)
        R=R0*S =1+1/g*d/dt[ln(I)]=1+1/g*d/dt[ln(H)].

        (You do not have to know the constant of proportionality)
        Then it is reasonable to assume (1-S) is proportional to CT (1-S is the total number of people who has been infected at some time.)

        1-S=k*CT or S=1-k*CT

        If you know the constant k you can then calculate
        For Sweden k=20-30
        The result is an exponential fall of R0 of the form:

        R0(t)=a+ (R0(0)-a)*exp(-c*t) where a <R0(0) is a limiting value.(R0(0) is the initial R0)
        This leads to a lower herd immunity value. Instead of herd immunity at 1-1/R0(0),we have herd immunity at 1-1/a where a< R0(0).

        • Clive Best says:

          Yes, if R reduces naturally to say ~1.2 then herd immunity can be reached at a 30% final infection rate.

          • “Yes, if R reduces naturally to say ~1.2 then herd immunity can be reached at a 30% final infection rate.”

            Remember to interpret this curve correctly. R has nothing to do with the asymptotic cumulative level, but only in the *rate* at which it achieves the cumulative

  9. Bruce Edmonds says:

    If only Nature had forced Ferguson to publish his code when they published his paper, we would be in a better position today – maybe several years of checking, critiquing and improving this code ready for a true crisis!

    • Clive Best says:

      It is reproducible on the large scale. Each time you run it though there are small differences because it tracks random person to person infections across the UK population distribution. That is why for his “Report9” paper Neil Fergusson run the code 10 times for each scenario and then took the average. I ran it just once because I don’t have a supercomputer. So I think some people simply misunderstood what the code does.

      Far more important are the assumptions about how many days before symptoms show, how many days people are infectious, and the infection rate. It turns out that he was too optimistic. R0 was 3.0 and not 2.4

      • Wes.H says:

        On the “large scale”, Code Check reproduced with our current github master what the team produced privately (with their non-open-source version of the code) at the time Report 9 was published – there were differences he commented on.

        But additionally the results from CodeCheck *exactly* byte-match the results the group find, and that you should expect to find by running the scripts in the Report 9 folder yourself – at least tested with a range of compilers on Windows and Linux. If Clang-10 behaves similar on Mac to other platforms, there’s no reason why you should get byte-identical results on Mac too.

        • Wes.H says:

          (sorry – terrible typo at the end – no reason why you *shouldn’t* get byte-identical results on Mac…!!)

        • Clive Best says:


          Why then do you run the code 10 times for each report9 intervention scenario ? i.e. /NR=10

          I agree that the code should give identical results if run twice with identical input and random number seeds. I had wondered if instead the random number generator had called SYSTIME or something which would mean two runs were not identical.

          • Wes.H says:

            The /NR:10 will run the same model 10 times in one go, without reinitialising the random seeds between runs, so you’ll get 10 different output realisations (10 stochastic instances) – with which to do likelihood evaluations.

            It’s just easier than doing 10 separate runs and keeping track of 10 sets of manually provided random seeds for the command-line.

            And if you repeat the whole command-line again including the /NR:10, you’d get a replica of the ensemble of 10 instances.

  10. Wes H. says:

    Hi Clive, thank you for a well balanced article – and indeed for building and running the code. Many of the most vocal critics of the code did not try building it, and from the nature of their critique, probably did not even look at it much (eg, the “it’s written in fortran” spiel…)

    Some bits of info that might be useful…

    1. You shouldn’t have to switch off parallelism in the build process, but Macs with C++ sometimes differ from other platforms… (and from each other…) – at one point, the Xcode C/C++ compiler didn’t support threading – I’m not sure if this is still the case – so you may need to manually install/upgrade gcc. And some Mac systems seem to not have OpenMP libraries, so you have to `brew install libomp`.

    That said: 16Gb might be a little tight for a full UK run, and more threads will need a bit more memory, so I think for your Mac it’s borderline. On my modest desktop I see about 17Gb in use on an 8-core job, and jobs take about 6 minutes.

    2. Indeed, there is now a bit of use of C++, and not just pure C. No desperate “drive” on our part to move into C++, but more that there was no longer a strong argument to remain in pure C, with a growing team of developers involved.

    3. Determinism. Short story: right now the code is fully deterministic, and you’ll get identical output if you compile with gcc/clang/intel/MS compilers, on Windows or Linux, with the same inputs, including number of threads. Perhaps other platforms/compilers too. The thread-count requirement sometimes draws criticism, but is entirely expected and unavoidable, as each thread needs its own private psuedo random-number generator.

    Slightly longer story: it was always deterministic when used in the ways the group expected it to be used – namely, do the slow population initialisation once (with cmd-line /S), and from then on use the network file with /L). The problem was really one of clearer documentation – coupled with the fact that the critiques never filed an issue or got in touch with the group about it. Eventually, the team commented out one line to single-thread that section, and at a bit of performance cost in initialisation, determinism was ensured if all runs are used with /S.

    One more thing: doing lots of runs with different seeds isn’t just about taking the average; you use the instances to do likelihood calculations, creating error bars around the central figure. The spread is critical in knowing how much you should trust the average.

    (Nothing changed with seeds in 2017 – it’s always been this way)

    4. Python and R aren’t strict dependencies for building/running the code; you can run the executable as it is, and load the resulting tables in to Excel if you like. But should academics have a compute cluster and want to run 400 or so report 9 jobs in that way, the python scripts are ready made for that purpose. Among our group, python and R are in more common use than C++ compilers as it happens, so it’s an optional dependency, and not hard to satisfy.

    5. Lastly – yes, many things could have been done differently in an ideal world, but “just releasing earlier” is not as simple as everyone makes out. Academia has lots of long-lived code that isn’t to current standards. Publishing it, or not publishing it, has its risks. The group here went the path of trying to make it as readable and accessible as possible for a release. Hopefully everyone can understand why that choice was made, even if they don’t agree. When code pre-dates github by most of a decade, it’s not always easy to catch up, when the same code is in still in continual ongoing development, and Academia does not provide much space for stopping to refactor and modernise older code, or to reinvent the whole thing in order to answer a small incremental question.

    Apologies for the rather lengthy comment, but I felt you might appreciate a bit more detail, having discussed it so far. Thanks again for taking the time.

    • Clive Best says:


      Thank you for your detailed comments. The ambition and effort involved in developing CovidSim is impressive. There is no doubt about that.

      I was probably lucky to get the UK to run in 16GB. It crashes if I try to run the US ! I originally tried to install OpenMP but had problems before then deciding to build it with `-DUSE_OPENMP=OFF

      Maybe that is the reason why my Mac can run the UK cases. Since then I have also managed to run a set of Report 9 cases ( serially – but it took me 12 hours . So I probably should really upgrade the memory to 32GB.

      Don’t worry about the code purists. I used to program in Fortran !

      I understand the pressure you guys were under once the “Ferguson” model was being blamed for the lockdown both here and in the US !

Leave a Reply