Category: Statistics


Estimating SARS-Cov-2 infection counts

It seems like everyone is obsessively following how the number of COVID-19 cases keeps evolving. Understandable, being in the middle of a pandemic and all. The problem is that the reported COVID-19 cases only include laboratory-confirmed cases, and because of testing limitations, it leads to significantly undercounting the actual number of infections. I've seen suggestions that looking at death counts would give a more accurate picture.

This has lead me to try my hand at estimating the number of active infections based on daily death counts. The basic calculation is quite easy. Assuming an infection fatality rate (percent of those infected who die) of 1% leads to an estimate of 100 people infected with SARS-Cov-2 for every death. Assuming an average recovery time (recovery meaning either getting better or dying) of three weeks means that for each death the 100 infected people got infected an average of three weeks earlier. So each death implies that there were 100 people with a SARS-Cov-2 infection during the preceding three weeks. Then calculating the total number of infected at any point in time is just a question of repeating the previous calculation for the number of dead for each day. Finally, adding a little bit on smoothing by running a moving average over the results.

Using data from https://covid19api.com/, the result for Germany looks like this:

Germany infected counts

Here we can see how the SARS-Cov-2 started to spread in February, and from the end of February up through the middle of March, we got a clean exponential growth curve. During the latter part of March, once everyone realizes what is going, people start voluntarily social distancing and various restrictions start are put in place. This is visible as the exponential growth starts tapering off. At the end of March Germany goes into full lockdown, and the peak of the pandemic is reached very quickly after that. While the lockdown was in effect the number of infected continue to drop rapidly. Towards the end of April, as the lockdown carefully started to get lifted, the rate at which the number of infected were dropping started slowing down. One downside with this sort of model is that the data lags by three weeks, so we can't get a feel for what the current situation in Germany is.

There are a lot of assumptions underlying the previous calculations, some important, others not so much. The moving average should smooth over over weekend effects (fewer deaths reported on weekends, with a spike the following Monday to catch up) and other smaller reporting issues like that. The assumption that the recovery time for everyone is three weeks while, in reality, it is different for different people is not a big problem either. The effect of the ones with shorter recovery time will be canceled out by the effect of the ones with longer recovery time, especially once the total number of deaths per day starts getting large.

The assumed average recovery time makes more of a difference, though. If the average recovery time lasts longer there will be more overlap between different infected people, and the curve will get taller. Going from a two week recovery time to a four week recovery time almost double the peak infected count. You can see the effect of this here:

Effect of recovery time on infected count

The assumption with the largest effect on infected count is, without a doubt, the infection fatality rate (IFR). An IFR of 1% leads to a multiplication factor of 100, meaning 100 infected for each death. Dropping IFR to 0.5% increases the multiplication factor to 200, which will double the number of infected. This effect is visible in the following chart:

Effect of IFR on infected count

Another assumption underlying this entire project is that the reports of COVID-19 deaths are accurate. Or at least more accurate than confirmed infections. The Economist has done some modeling on this, by comparing the reported COVID-19 deaths to the total number of excess deaths. The assumption is that with everyone social distancing, the reduction of traffic accidents and so on should probably lead to a small drop in deaths compared to what would be expected that time of year. A large bump in excess deaths is probably roughly entirely attributable to COVID-19. Looking at the numbers produced by The Economist suggest countries like Sweden is probably doing quite a good job in reporting COVID-19, while Italy is missing significant numbers of infected.

Let's look at some results for other countries. Below are the numbers of four nordic countries. Note that the Y-axis has a different range for the different countries. Sweden has roughly ten times infected count at peak compared to Finland.

Infection counts in nordic countries

There is a clear peak showing that lockdown strategies work as a way to limit the spread of SARS-Cov-2. Sweden, which never went into lockdown, instead had a mix of voluntary social distancing and some limits on large gatherings. It shows up in the data both in a much higher peak infected count, and a much slower drop. But even Swedens light-touch approach was enough to halt the growth of the pandemic and start bringing the infect counts down.

We can also look at some results for other European countries. The range for the Y-axis for Germany is different.

Infected counts in large European contries

It is interesting how the infected counts are so similar between the UK, France, and Italy. The UK is the worst one of the three, but the peak isn't that much higher. It is also the slowest to come down. Germany is on another level, with only a quarter of the infected as compared to the UK. You can also see how Italy was earlier than other countries, both in getting a clear exponential growth curve going already in the middle of February, and peaking in the middle of March following national lockdown going into effect in early March. The UK is lagging in getting the numbers under control. As of three weeks ago, it was still dealing with 500k infected.

It is interesting how effective various lockdowns have been. The drop rate varies somewhat depending on how tight the lockdown is, but even Sweden's light-touch approach got the counts to drop. It will be interesting to see how things develop as lockdowns end. Some countries (the UK, I'm looking at you) seem the be willing to start opening up while they still have quite substantial infected counts.