19 minute read

Earlier this week, in the midst of a wildly stressful couple of weeks, I got a rude red “positive” light on the Metrix reader running my COVID-19 test. These tests have a 99% specificity (i.e., they give a negative result for 99% of samples that don’t have COVID in them). So from a naïve perspective, it would seem that seeing the red “positive” light on the reader would mean there was only a 1% chance of a false positive, and thus approximately a 99% chance I had COVID, on top of everything else!

But it is not so, and I don’t have COVID – and this story has important implications not just for people testing themselves for COVID under circumstances uncommon in the general population, but for anyone who ever gets screened for any kind of medical condition. To understand why and become confident at interpreting screening results, we need to dive into some math. Fear not, I have intuitive explanations and interactive widgets!

Background

The first thing to bear in mind is that this was a periodic screening test. By “screening,” I mean that I had no reason to think I had COVID when I ran the test: I hadn’t had any known exposures or done anything especially high-risk, I live by myself and work from home, local numbers are in a steep decline, and I had no symptoms. I would have given myself maybe a one in a thousand chance of having COVID before taking the test.

Why, then, was I doing a test? Someone very close to me was disabled by a past round of COVID and is immunocompromised and still susceptible to even more damage, so our options to keep them safe and comfortable are for me to be as careful about getting COVID as them (possible but constraining), to wear masks around each other (fine sometimes, but annoying to do always), or to somehow verify that I almost certainly don’t have COVID. Screening before we hang out using a lab-style molecular test, which catches infections more reliably and at earlier stages than the rapid test cards you get at CVS, is a convenient way.

Given that I thought I maybe had a 1/1,000 chance of having COVID, seeing a positive result was quite surprising. For a moment, I freaked out. But anyone who has studied statistics or medical screening for very long should have alarm bells going off in their head right now, because this is a common problem that arises when you look for statistical evidence of a rare condition. So I sat down and did the math. It turns out that if we accept that 1/1,000 estimate and use Bayesian reasoning – which we certainly should in a case like this – there was actually only a 9% chance that the red light on the reader meant I had COVID.

Bayesian reasoning

For the uninitiated, Bayesian reasoning is a mathematical formalization of the intuitive idea that you should take your prior estimate (often just called a prior) of the probability of something into account when interpreting new evidence about it. For instance, if I point a telescope at the Moon and show you a spectrometer reading that says there’s a 95% chance that it’s made of green cheese, you should still be disinclined to believe the Moon is made of green cheese, and you certainly shouldn’t think there’s a 95% chance it is. There are several good reasons for this, but one is that your prior estimate of the probability that the Moon was made of green cheese was presumably extraordinarily low. If you thought there was a one in a million chance it was made of green cheese before, you can take this new spectrometer evidence into account and update your chance to a bit more than one in a million, but you shouldn’t update much; if you started out thinking it was incredibly unlikely, moderately strong new evidence shouldn’t change your mind by itself. The popular saying “extraordinary claims require extraordinary evidence” comes from this line of reasoning.

Optional theoretical detour: If you’ve studied any statistics, you might have run into Bayes’ Theorem, which describes how to update your probability based on new evidence.

\[P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}\]

Read \(P\) as “the probability of” and \(|\) as “given that”. That is, the chance (\(P(A|B)\)) that the Moon is made of green cheese (\(A\)) given the spectrometer reading (\(B\)) is your estimate of the probability that you would see that spectrometer reading if the Moon were made of green cheese (\(P(B|A)\)), times your prior estimate of the probability that the Moon is made of green cheese (\(P(A)\)), divided by your estimate of the unconditional probability of getting the spectrometer reading when pointing it at the Moon, regardless of whether it is in fact made of green cheese (\(P(B)\)).

The calculation we’ll be doing later is actually doing this exact math in disguise, but I’ll be explaining why we use those numbers intuitively rather than just plugging them into this formula and calling it a day. If you want an exercise, try to figure out how the numbers I use correspond to the parts of this formula.

In the case of my COVID test, using the number that comes straight out of the COVID test without incorporating some kind of prior estimate would only be sensible if I previously had no information whatsoever about whether I had COVID – i.e., I thought it was a 50/50 chance. Obviously that would be absurd, even with no knowledge of my risk factors, symptoms, or the local environment, because we all know that most people don’t have COVID most of the time – even the majority of tests people take when they are coughing and think they probably have COVID are negative.

Why only a 9% chance?

When we interpret the results of a positive COVID test, we need to ask how likely it is that this positive result is real. More precisely, thinking about the complete set of all possible worlds, is the sample I scraped out of my nose and put into the machine part of the group of samples that really had COVID and were correctly identified as having COVID (true positives), or the group of samples that didn’t have COVID and were wrongly identified as having COVID (false positives)? To get an accurate assessment of my chance of having COVID, we need to consider the chance of each of these outcomes compared to all of the possible ways the world could be arranged.

Here’s the key thing to understand intuitively. In worlds where I am testing with no reason to believe I have COVID, positive samples are extremely unlikely, because the chance a person with low risk factors and no symptoms has COVID virus in his nose at any given moment is low. Even if the test correctly detects 100% of positive samples, the chance of a true positive result can’t be higher than the chance of a positive sample. This means that, even if the chance of a false positive test result is extremely small, the chance of a true positive may well be even smaller. If that’s the case, any positive result you observe is more likely to be a false positive than a true positive, maybe by quite a bit.

The Metrix COVID test run with a nasal swab has a 97% sensitivity (97% of samples that do have COVID will test positive). It has a 99% specificity (99% of samples that don’t have COVID will test negative). So if we accept my prior estimate of a 1/1,000 chance of having COVID, we should expect true positives to be 97% of the 1/1,000 possible worlds in which I actually have COVID. We should expect false positives to be 1% of the 999/1,000 possible worlds in which I don’t have COVID. (Why 1%? 99% specificity is equivalent to a 1% false positive rate, since samples that don’t have COVID and don’t test negative are false positives.)

It remains only to do the math. True positives are \(0.97 \times 0.001 = 0.00097\) of possible outcomes. False positives are \(0.01 \times 0.999 = 0.00999\) of possible outcomes. Add those together for the proportion of total positive tests compared to all tests, 0.01096, and divide the proportion of false positives by the proportion of all positives, and we find that 91.1% of positive results are false positives, given our prior of 1/1,000: ergo, even given a positive test result, I still probably didn’t have COVID.

If you didn’t completely follow this section, you might want to go back and read it again before you continue – it represents the core idea you need to understand screening results.

Interpreting results

Does the fact that 91% of positive screening results are false positives mean Metrix tests are useless? Not at all – it just means you have to interpret the results in the context of what you already know, rather than treating the pretty lights on top of the reader as a binary yes/no answer as to whether you have COVID.

Soapbox: There’s a reason every test uses the labels “positive” and “negative” rather than “you have COVID” and “you don’t have COVID”: they mean very different things. Unfortunately, no test instructions I’ve seen attempt to give any context on any of the ideas in this post, so the difference goes unnoticed by most people.

Maybe all of this is pretty complicated for an average user, but even some very basic guidance would be a big improvement: “If you take this test without reason to believe you have COVID, incorrect positive results are more likely than normal. If you get a positive result, tentatively assume you have COVID, but take a second test the next day to confirm your result.” Indeed, the instructions of some tests currently on the market actually do the reverse and incorrectly state that even when testing without reason to suspect you have COVID, a positive test means you almost certainly have it!

Also, if you think the current guidance is understandable to an average user, let me quote you some real instructions from a cheap pharmacy lateral flow test I have lying around:

[This is] a rapid test for the detection of SARS-CoV-2 antigens in anterior nasal specimens. For Emergency Use Authorization (EUA) use only. In vitro diagnostic use only....A negative test result indicates that the virus that causes COVID-19 was not detected in your sample. A negative result is presumptive, meaning it is not certain that you do not have COVID-19. You may still have COVID-19 and you may still be contagious. There is a higher chance of false negative results with antigen tests compared to laboratory-based tests such as PCR.

This is the kind of language you should use when your readers have at least an amateur interest in medicine or epidemiology, not with the general public. Also, not even trying at any numbers is cowardly and unhelpful; does “you may still be contagious” mean there’s a one in a thousand chance or a one in three chance?

I regularly find myself amazed by the fact that I, a moderately intelligent random person, can trivially find more accurate information than the instructions in an FDA-approved product sold to millions of people. The average quality of published instructions is extremely bad. To be fair, there are some complex social and political factors contributing to this state of affairs.

If you think you might have COVID

The first reason the Metrix test comes out better than it looks is that the numbers we arrived at above apply for screenings where you have no reason to believe you have COVID (and they’re moderately sensitive to changes in my 1/1000 prior probability, which I think is a reasonable estimate, but is obviously an estimate). This is not how most people use COVID tests.

If you have symptoms or some other good reason to believe you might have COVID (say, you were in a confined space with a bunch of people who were coughing and now you’re sick, or someone you live with tested positive for COVID), your prior estimate will presumably be much higher than 1/1,000. Suppose you’re testing because you’re sure you have some respiratory virus or other and you think there’s a 1/10 chance it’s COVID. Now the math is:

  • True positives: \(0.97 \times 0.10 = 0.097\)
  • False positives: \(0.01 \times 0.90 = 0.009\)

Now the situation is reversed, and the true positives are 92% of positive results (\(\frac{0.097}{0.097 + 0.009} \approx 0.915\)). That’s high enough you can sensibly act as if you have COVID when the red light comes on, and if you want to know the answer for sure, you can do a second test to confirm the next day.

Similarly, for a 1/100 prior, which might be reasonable if, for instance, you think you might have been near someone with COVID but you don’t feel sick, true positives are about 50% of positive results.

Here’s a widget you can play around with to see what the probabilities look like for any combination of prior, sensitivity, and specificity. This is general math that works for any test with similar parameters, it is not just for COVID tests (or even just for medical tests).

If you’re curious, the specificity of most cheap rapid tests is similar to that of Metrix tests, but the sensitivity is much lower; a reasonable estimate might be 70%. (More details on Wikipedia.)

Full disclosure: I had Claude do most of the work on the widget and all the widgets on this page. I never use AI to write English text I publish, including this post.

If you think you don’t have COVID

Even when doing screening with a 1/1,000 prior, the test is still providing lots of useful information. If the test comes back negative, it lowers the probability you have COVID from 0.1% to 0.003% (0.1% prior times the 3% chance of a false negative), which is a big deal when you’re worried about permanently damaging someone’s health.

0.1% might already seem pretty low, but if you really don’t want to get COVID, it’s actually surprisingly bad if you’re taking that chance repeatedly. Suppose for the sake of argument that, if you’re infectious, there’s a 50% chance of transmitting COVID to someone over whatever interaction we’re talking about. If you don’t mask or test and you always have a 1/1,000 chance of having COVID without knowing it, it takes only about 200 interactions with someone to reach a 10% chance of giving them one case of COVID. But if you confirm with a negative Metrix test every time, that goes up to 7,000.

(In reality, the transmission rate is probably less than 50% in most cases, especially if you are asymptomatic, which likely means lower viral load on average – though it depends on how close you’re getting, how long you spend together, how well the other person’s immune system is working, and other factors.)

Meanwhile, if the test comes back positive, it may not reliably mean you have COVID, but it still gives you lots of information. In my case, I went from thinking there was a 1/1,000 chance I had COVID to thinking there was a 1/10 chance I had COVID, when the probability I’m normally trying to achieve is about 1/33,000. That’s very important information; it sure changed what I did next.

Play with this widget to see how testing affects transmission rates:

And the math: If there’s a 1/1,000 chance you have asymptomatic COVID on each interaction and a 50% chance of transmitting COVID if you have COVID, there’s a 1/2,000 chance of transmission on each interaction. Then the chance that you don’t transmit COVID on each interaction is 1,999/2,000, and assuming these are independent events (probably not completely true, but likely close enough to be a reasonable estimate), the chance that you avoid transmitting COVID even once over \(n\) interactions is \(\left(\frac{1999}{2000}\right)^{n}\). We want to figure out what value of \(n\) increases the accumulated probability of one transmission to 10% (or decreases the probability of not having even one transmission to 90%), so set this equal to 90% and solve for \(n\):

\[\begin{align*} 0.9 &= \left(\frac{1999}{2000}\right)^{n}\\ \log 0.9 &= \log \left(\frac{1999}{2000}\right)^{n}\\ \log 0.9 &= n \log \left(\frac{1999}{2000}\right)\\ n &= \frac{\log 0.9}{ \log \left(\frac{1999}{2000}\right) }\\ n &\approx 210.67\\ \end{align*}\]

For the testing case, we just adjust the chance of having COVID at the beginning accordingly.

Is getting a false positive surprising?

Several people have expressed surprise on hearing that I got a false positive result: false positives seem like they should be rare. If you’re considering only the single test, that’s true: there’s only a 1/100 chance of a false positive. But using a reasoned Bayesian approach, under my circumstances, it’s not surprising at all. If you regularly screen yourself over months or years, it would be surprising if you didn’t get a false positive at some point. Consider: if someone handed you a fair die and offered you $5 if you rolled a six on the first try, it would be mildly surprising if you got one (there’s only a 16% chance). But if you rolled it 25 times, it would instead be surprising if you never got any sixes.

Here’s a widget showing your cumulative chance of a false positive given repeat screenings:

Takeaways

If you test yourself for a condition there’s a good chance you have, e.g., because it’s reasonably common and you’re having symptoms of that condition, and the test is reasonably reliable, you can generally treat that as strong evidence you do in fact have the condition. It’s still good to bear in mind that most tests are wrong occasionally, and if there’s a lot riding on the result (like the need to perform some unpleasant medical procedure), you may want to do the math before making any decisions.

When it comes to screening yourself for a condition you probably don’t have, though, I would argue you shouldn’t do it at all unless you understand the ideas presented in this post or you’re being guided by someone who does and is good at communicating what the results mean. And you should have a solid reason to do any screening, even if you fully understand the math. People often say things like “might as well, just to be safe!”, but when the chance of a false positive is, say, 10 times higher than the chance of a true positive, this isn’t necessarily the safe and rational move, even without considering the financial cost of doing the test. False positives are usually very stressful, especially if they’re for something like cancer or HIV, and in an unpleasant number of cases they can lead to unnecessary interventions that worsen your life. When the chance there’s anything going on in the first place is extremely small, the burden of false positives can easily exceed the value of a true positive result.

This doesn’t mean screening is inherently bad, of course. I’m not about to stop doing COVID screening under the right circumstances; I knew this was a risk from the beginning, and for me and my friend the benefits outweigh the risks (especially because the risks are minor here in the grand scheme of things: even if I got so far as to take a course of Paxlovid and stay home for a few days based on a false positive, that’s unlikely to cause me any lasting harm). Similarly, well-executed STI and cancer screenings with good evidence for public health benefits can save lives at very reasonable costs, especially for people at unusually high risk. But don’t do screenings “just because”; weigh the costs against the benefits, and if your doctor suggests screenings without mentioning these tradeoffs, proceed with caution.

Lastly, if you get a positive result from any screening, you should strongly consider doing repeat or follow-up testing to increase your confidence that the result is real. In the case of COVID tests, testing again the next day (or even the same day, though various failure modes are excluded by waiting a little bit) can give you a much better idea of what’s actually going on.

In the test result + prior probability widget, you can approximate the strength of evidence a second test gives you by treating the probability you got the first time as your prior estimate for the second test. This isn’t 100% accurate because the math assumes the tests are completely independent of each other, while in reality there are various reasons the results could be correlated (e.g., the Metrix reader could be broken, someone with COVID could have sneezed on your package of tests, or you could have an infection that’s just at the threshold of detectability so that whether a test finds it is essentially random), but for most purposes I think it’s a decent estimate.

Test result + prior probability widget

This is the same widget presented earlier on the page, repeated here in its own section so it’s easy to find if you need to come back and use it on some future date.