A Compressed Sensing Approach to Group-Testing for COVID-19 Detection

You are here

Inside Signal Processing Newsletter Home Page

Top Reasons to Join SPS Today!

1. IEEE Signal Processing Magazine
2. Signal Processing Digital Library*
3. Inside Signal Processing Newsletter
4. SPS Resource Center
5. Career advancement & recognition
6. Discounts on conferences and publications
7. Professional networking
8. Communities for students, young professionals, and women
9. Volunteer opportunities
10. Coming soon! PDH/CEU credits
Click here to learn more.

News and Resources for Members of the IEEE Signal Processing Society

A Compressed Sensing Approach to Group-Testing for COVID-19 Detection

By: 
Ajit Rajwade and Manoj Gopalkrishnan

The coronavirus (COVID-19) pandemic has led to widespread morbidity and mortality. Several countries have been forced into prolonged lockdowns in a desperate attempt to control the pandemic by physical distancing. Such lockdowns have devastated livelihoods and put large numbers of people at grave risk of prolonged unemployment, poverty, and even starvation.

Is there an alternative to lockdown? Early identification of infected individuals can enable quarantining of the individuals and thus control the spread of the disease. It is not enough to quarantine symptomatic individuals because infected individuals are often asymptomatic for many days. Widespread testing of asymptomatic individuals with the RT-PCR (Reverse Transcription Polymerase Chain Reaction) method can help identify the infected individuals early. However, widespread testing of asymptomatic individuals is not an available option in many countries due to constraints on resources such as time, basic equipment, cost, skilled manpower, and reagents.

The use of group testing can make widespread testing of asymptomatic individuals an available option. The current low rate of COVID-19 infection in the world population means that most samples tested are not infected, so that most tests are wasted on uninfected samples. Group testing is a process of pooling together samples of 'n' different people into multiple pools, and testing the pools instead of each individual sample. A negative result on the pool implies that all 'n' samples were negative. This saves a huge amount of testing resources, especially with low infection rates.

Group testing for medical applications has a long history dating back to the Dorfman pooling method from 1940s which was proposed for testing of blood samples for syphilis. Simple group testing schemes have already been applied in the field by several research labs in Germany, Nebraska and Israel for COVID-19 testing. Recently, Wuhan health authorities carried out a very large scale group testing for COVID-19, employing groups of size 10 and 5.

The problem with testing asymptomatic individuals is that they can get infected after they get tested. Ideally we should be testing these individuals frequently, perhaps a few times a month, or even a few times a week. Simple group testing schemes are not suited for this because their savings comes at the cost of a longer time to result. Simple group testing schemes require pooling of samples and a second round of RNA extraction and testing for all samples in positive pools. This second round can increase the time to result and be laborious to perform in laboratories where the RNA extraction process is done manually. In situations like daily testing where the result needs to be delivered fast or the additional labour required for a second round of RNA extraction is not an option, these schemes are less attractive.

Can the savings of simple pooling be retained while simultaneously returning individual-level quantitative results in a single round of testing? Yes, by use of techniques from compressed sensing. Compressed sensing is a well-known paradigm for acquisition of information (typically images) directly in compressed format, as opposed to complete acquisition followed by compression later. In the context of its application for pooled testing, this gives rise to a model equation of the following form: 'y = Ax + g'. Here 'g' is the measurement noise. 'A' is a carefully designed binary 'mixing' matrix of size 'm' by 'n', with 'm' much smaller than 'n'. The aim is to estimate the ground truth 'x' from observations 'y' and the matrix 'A'. This is a system of linear equations where the number 'm' of equations is far fewer than the number 'n' of unknowns. Typically one will not have a unique solution. However it is possible to design 'A' to ensure a unique sparse solution 'x'. Moreover, the algorithms for recovery of 'x' are computationally efficient and robust against noise [CandesWakin2008,Berinde2008].

We present Tapestry Pooling which is inspired from the theory of compressed sensing [CandesWakin2008]. Here 'x' is a vector of 'n' elements, where the i-th element 'x(i)' represents the viral concentration in the sample of the i-th person. If x(i) > 0, it means that the i-th person is infected. If x(i) = 0, it means the person is non-infected. Since prevalence rates of infection are low, we can regard 'x' to be a sparse vector, i.e. with mostly zeros. The Tapestry method creates a small number 'm' of pools, where 'm' is much less than 'n'. Each pool is constituted of equal volume aliquots from 3n/m samples. The element A(i,j) = 1$ i and only if the j-th sample has been included in the i-th pool, and 0 otherwise. Instead of testing each of the 'n' people individually, we test only the 'm' pools in the RT-PCR machine. The vector 'y' is a vector of 'm' elements representing the total viral concentration in each of the 'm' pools as measured by the RT-PCR machine.

In designing Tapestry, we had to confront three challenges. First, how do we design the mixing matrix 'A'? Second, what is an appropriate noise model that captures the underlying physical reality of the PCR test? Third, given the particular properties of our 'A' matrix and the noise model, what is a good recovery algorithm that estimates virus numbers, recovers support of 'x' accurately, and exploits the non-negativity of 'x' and 'y'?

The design of the pooling matrix 'A' requires imposing constraints that 'A' be binary, sparse and that no sparse vector (apart from the 0 vector) lies in its nullspace. The matrix 'A' should be sparse so that not too many samples contribute to any pool. This is to simplify the implementation of the pooling procedure from a technician's point of view. In Tapestry, the 'A' matrices we employ are Kirkman triples and Social Golfer triples from the Combinatorial Theory of Designs.

We introduced a new noise model. When a test is negative, PCR returns negative unless there is cross-contamination with another sample. Hence there is no noise when a test is negative. When a test is positive, qPCR estimates an affine transform of the log of the viral concentrations, leading to a multiplicative noise model.

Our recovery algorithm combines combinatorial ideas from classical group testing literature with compressed sensing ideas. We apply the classical group testing algorithm called COMP (Combinatorial Orthogonal Matching Pursuit) to eliminate samples that are clearly negative. We then apply compressed sensing techniques adapted to our noise model to the residual equations.

All these innovations allow Tapestry to deliver as many as 1000 individual results with as few as 90 tests in a single round of PCR at a prevalence rate of one percent. The quantitative results show remarkable concordance with results that would be obtained by separate individual testing of each sample.

Tapestry has a number of salient features, compared to other available group testing methods:

  1. Tapestry delivers results in a single round of testing without the need for a second confirmatory round.
     
  2. The number 'm' of required tests is O(k log n) as per compressed sensing theory [CandesWakin2008,Berinde2008]. In the targeted use cases where the number of infected samples 'k' is much less than 'n', we see that 'm' too is much less than 'n'. Consequently we obtain significant savings in testing time as well as testing resources such as number of tests, quantity of reagents, and manpower.
     
  3. Unlike many existing methods, Tapestry reconstructs viral loads i.e., number of copies of the virus per sample. This additional information may be clinically relevant.
     
  4. Tapestry can estimate the number of infected samples 'k'. This gives Tapestry a graceful failure mode. When it encounters a batch where the number of infected samples turns out to be much larger than the anticipated number, Tapestry first recognizes that this is the case by estimating the number of infected samples. It then returns a list of samples that are suspected to be positive. This list comes with the promise that with high probability it includes all the true positives. At the same time, the algorithm tries to keep the size of this list as small as possible.
     
  5. Tapestry allows PCR test measurements to be noisy. We develop a novel noise model to describe noise in PCR experiments. Our algorithms are tested on this noise model in simulation. Each sample goes to exactly three pools, and each pool has the same number of samples. This simplifies the experimental design, conserves sample, keeps pipetting overhead to a minimum, and makes sure that dilution due to pool size is in a manageable regime.

Our single-round testing technique can be deployed in many different scenarios such as the following:

  1. Testing of 105 symptomatic individuals in 45 tests.
     
  2. Testing of 195 asymptomatic individuals in 45 tests assuming a low rate of infection. A good use case for this is airport security personnel, delivery personnel, or hospital staff.
     
  3. Testing of 399 individuals in 63 tests. This can be used to test students coming back to campuses, or police force, or asymptomatic people in housing blocks and localities currently under quarantine.
     
  4. Testing of 961 people in 93 tests, assuming low infection rate. This might be suitable for airports and other places where samples can be collected and tested immediately, and it might be possible to obtain liquid handling robots.

People Involved

A number of people are involved with and contributing fruitfully to Tapestry. These include Profs. Manoj Gopalkrishnan and Ajit Rajwade from IIT Bombay, Prof. Dror Baron from NCSU, and Prof. Chandra Murthy from IISc Bangalore.

Outputs

A detailed description of our algorithms, theory and experimental results can be found in our medrxiv and arxiv pre-prints. We have designed an Android app named Byom Smart Testing to make our Tapestry protocol easy to deploy in the future. The app can be accessed here. We are also sharing our code and some amount of data here. Tapestry has a website with all the latest updates.

Related Work

There have been other groups working on different aspects of the problem of pooled testing, prominent examples being the binary PCR work of Dror Baron (see also this link), the work on Origami Assays for mixing matrix design by Peter Woolf and the work by the group of Noam Shental.

Acknowledgements

Manoj Gopalkrishnan acknowledges support by SERB grants EMR/2017/004089 and MTR/2018/000817. Ajit Rajwade acknowledges support from SERB grant MTR/2019/000691.

References

[CandesWakin2008]: E. Candes and M. Wakin, An introduction to compressive sampling, IEEE Signal Processing Magazine = 2008]

[Berinde2008]: R. Berinde, A. C. Gilbert, P. Indyk, H. Karloff and M. J. Strauss, Combining geometry and combinatorics: A unified approach to sparse signal recovery,46th Annual Allerton Conference on Communication, Control, and Computing, 2008.]

SPS Social Media

IEEE SPS Educational Resources

IEEE SPS Resource Center

IEEE SPS YouTube Channel