Date of Award


Document Type


Degree Name

Doctor of Philosophy (PhD)

Legacy Department

Mathematical Science


Gallagher, Colin M

Committee Member

Kulasekera , Karunarathna B


This dissertation consists of three projects in the area of group testing. The method of group testing, through the use of pooling, has proven to be an efficient method of reducing the time and cost associated with screening for a binary characteristic of interest, such as infection status. The salient feature of group testing that provides for these gains in efficiency is that testing is performed on pooled specimens, rather than testing specimens one-by-one. In Chapter 1, we present a general introduction of group testing. Typically, the statistical literature surrounding group testing has investigated the implementation of pooled testing for the purposes of either case identification or estimation. In this dissertation, we mainly focuses on the estimation problem which involves the development of regression models that relate individual level covariates to testing responses observed from pooled specimens. Primarily, the existing research in the area of estimation in group testing has focused on parametric regression models, where the shape of the link function is assumed as known and only a finite number of regression parameters has to be estimated. Recently, for the purpose of obviating the specification of the link function and increasing the flexibility of modeling, nonparametric group testing regression models have been studied. %It considers the case where each individual has one continuous explanatory variable and the link function is a univariate probability curve. Existing methods of estimating this unknown function are based on local moment estimators. In Chapter 2, we propose a new nonparametric estimation procedure using a local likelihood approach. For easy illustration, in this part we consider the situation where each individual is assigned to exactly one pool and only this pooled specimen is tested. Further, we assume the assay used for screening is perfect. Both of these two assumptions will be relaxed in the rest chapters of this dissertation. We show that our proposed estimator enjoys an asymptotic normal distribution with the optimal nonparametric estimation rate. Finite sample performance of the method is exhibited via some simulated examples and a real data analysis. To pursue a more suitable technique of modeling group testing data, in Chapter 3, we develop a general semiparametric framework which allows for the inclusion of only not one continuous covariate, but also multiple explanatory variables, all variants of decoding information, and imperfect testing. The asymptotic properties of our estimators are presented and guidance on finite sample implementation is provided. We illustrate the performance of our methods through simulation and by applying them to chlamydia and gonorrhea data collected by the Nebraska Public Health Laboratory as a part of the Infertility Prevention Project. In Chapter 4, we focus on the evaluation of misclassification effect of testing pools which are constructed according to any types of group testing algorithms. The existing assumption regarding them are somehow restrictive. If they are invalid, the estimation procedure can lead to severely biased estimator. In this work, we relax previously made assumptions regarding testing error rates by acknowledging the underlying mechanistic structure of the diagnostic test being employed. For easy illustration of this methodology, we mainly concentrate in parametric regression methods and propose a general estimation framework that allows for the analysis of data arising from all group testing strategies. The finite sample performance of our proposed methodology are investigated through simulation and by applying our techniques to hepatitis B data from a study involving Irish prisoners. Through these studies, we show that our methods can result in more efficient parameter estimates, when compared to competing procedures that make use of individual level data, at a fraction of the cost of data collection. Before proceeding to the main body of this dissertation, I would like to clarify that the notations defined in this work are self-contained in each separated chapter.