**We can be confident the population prevalence of Covid-19 is less than 1%, but what else can we conclude, asks biostatistics professor Thomas Lumley.**

One of the known unknowns about the NZ coronavirus epidemic is the number of cases we have not detected. There will have been a mixture of people who didn’t get any symptoms, people who are going to show symptoms but haven’t yet, people who got moderately sick but didn’t get tested, and people whose deaths were attributed to some pre-existing condition without testing.

In making the decision to loosen restrictions, we care mostly about people who are currently infected, who aren’t (currently) sick enough to get testing, and who aren’t known contacts of previous cases. What can we say about this number — the “community prevalence” of undetected coronavirus infection in New Zealand?

One upper bound is that we’re currently seeing about 1% positive tests in people who either have symptoms or are close contacts of cases. The prevalence in close contacts of cases must be higher than in the general population — this is an infectious disease — so we can be fairly confident the population prevalence is less than 1%.

Are there any other constraints? Well, infection isn’t a static process. If you have coronavirus in 1% of Kiwis, they will pass it on to other people and they themselves will recover. At the moment, under alert level four, the epidemic modellers at Te Pūnaha Matatini are estimating a reproduction number of about 0.5, so 50,000 cases will infect half that many new people. Now, if we’re missing nearly all the cases, the modelling might not be all that accurate, but there would have to be tens of thousands of new infections. And at least a few percent of those new cases will be sick enough to need medical treatment. We would quickly notice that many people showing up to hospitals with (by assumption) no known contacts. It isn’t happening. Personally, I have a hard time believing in a prevalence as high as 0.2%, which would mean 10,000 cases, of which we’re missing over 85%.

The other constraint would come from testing healthy people, which is why the government has started doing that. If you wanted an accurate estimate for the population as a whole, you’d need some sort of random population sample, but in the short term it makes more sense to take a sensibly constructed random sample of supermarkets and then test their customers and employees — if there’s major undetected spread, supermarkets are one of the likely places for it to happen, and they’re also a convenient place to find people who are already leaving home, so you can test them without barging into their bubbles. So, we aren’t getting a true population prevalence estimate, but we are getting an estimate of something a bit like it but probably higher. A good way to structure this kind of sampling would be to pick supermarkets randomly, with probability proportional to the estimated number of customers of the supermarket; you might want to modify that a bit to ensure sampling in some areas for equity and Treaty reasons.

How many do we need to test? It depends on how sure you want to be. If we sample 10,000 people and four are positive, we could estimate prevalence at four out of 10,000, or 0.04%. Actually, given that the test isn’t perfect and misses some cases, we’d estimate about 0.06%. But what if no one is positive? The best estimate clearly isn’t zero.

The question gets more extreme with smaller sample sizes: if we sample 350 people (as was done at the Queenstown Pak’nSave) and find no cases, what can we say about the prevalence? The classical answer, a valuable trick for hallway statistical consulting, is that the chance of seeing no cases in N tests is less than 5% if the true rate is 3/N or higher, So, if we see no cases in 350 people, we can be pretty sure the prevalence was less than 3/350, or about 1%. Since we were already pretty sure the prevalence was way less than 1%, that hasn’t got us much further forward. We’re eventually going to want thousands, or tens of thousands, of tests. The Queenstown testing was only a start.

After that introduction, you’ll understand my reaction when RNZ’s Checkpoint reported there had been a positive test in the Queenstown supermarket, with only two-thirds of the samples run through the lab.

CORRECTION: The random testing at Queenstown has not picked up a positive result. Remaining results due Saturday. https://t.co/ohuMZBG4J3

— Checkpoint (@CheckpointRNZ) April 17, 2020

Fortunately, it turns out there had been a misunderstanding and there has not yet been a positive result from this community testing. If the true rate is 0.1% there’s a good chance we’ll see a community-positive test soon; if it’s 0.01%, not for a while. And if we’re really at the level of eliminating community transmission, even longer.

*Thomas Lumley is* *professor of biostatistics at the University of Auckland. An earlier version of this post appeared at StatsChat.*