International Journal of Statistics in Medical Research, 2018, 7, 18-24 18
E-ISSN: 1929-6029/18 © 2018 Lifescience Global
Probability Sampling in Matched Case-Control Study in Drug
Abuse
Surya Raj Niraula
1,*
and Frederick A. Connell
2
1
School of Public Health and Community Medicine, B.P. Koirala Institute of Health Sciences, Dharan, Nepal
2
School of Public Health and Community Medicine, University of Washington, Seattle, USA
Abstract: Although random sampling is generally considered to be the gold standard for population-based research, the
majority of drug abuse research is based on non-random sampling despite the well-known limitations of this kind of
sampling. We compared the statistical properties of two surveys of drug abuse in the same community: one using
snowball sampling of drug users who then identified “friend controls” and the other using a random sample of non-drug
users (controls) who then identified “friend cases”. Models to predict drug abuse based on risk factors were developed
for each data set using conditional logistic regression. Bootstrap analysis of the random-sample data set showed less
variation, and did not change the significance of the predictors when compared to the non-bootstrap analysis.
Comparison of ROC curves using the model derived from the random-sample data set was similar when fitted to either
data set (0.93 for random-sample data vs. 0.91 for snowball-sample data (p=0.35)); however, when the model derived
from the snowball-sample data set was fitted to each of the data sets, the areas under the curve were significantly
different (0.98 vs. 0.83, p<.001). The proposed method of random sampling of controls appears to be superior from a
statistical perspective to snowball sampling and may represent a viable alternative to snowball sampling.
Keywords: Random sampling, bootstrapping, non-random sampling, ROC curve.
INTRODUCTION
The illicit drug use is a ‘hidden’ and often socially
stigmatized activity [1]. The illegal and stigmatized
behaviors of illicit drug users endow them with ‘low
social visibility’ [2]. The illegality of drug usage and the
heterogeneity of drug users make representative
community survey difficult. Such problem does not
occur in alcohol and smoking research [3,4].
A common methodological limitation in drug abuse
research is that it is frequently based on the non-
probability sampling methods. Commonly used non-
random sampling methods include snowball sampling,
convenience sampling, privileged access interviewer
method, respondent driven sampling and contact
tracing [5-8]. Furthermore, if one uses an institution
based case-control design, there is a high likelihood of
Berkson’s bias [9]. Regardless of the care with which
research based on these sampling methods is
conducted and the ‘adequacy’ of sample size, there is
no guarantee that results from these studies will be
generalizable to the population from which subjects
were selected.
This paper presents a new random sampling
strategy for research on ‘hidden’ populations and
compares statistical properties of this method to those
of a sample from the same community derived from
snowball sampling.
*Address correspondence to this author at the School of Public Health and
Community Medicine, B.P. Koirala Institute of Health Sciences, Dharan, Nepal;
Tel: +977 9842035218; Fax: +977 25 520251; E-mail: sniraula@yahoo.com
METHODS
The data for this paper were derived from a study of
risk factors for drug abuse conducted in Dharan
municipality in eastern Nepal. Nepal is a landlocked
country covering an area of 147,181 km
2
with a
population of about 26.5 million bordered by India and
China. A total of 116,181 people reside in 103.38 km
2
areas of Dharan [10]. Two matched case-control data
sets were formed using 1) snowball sampling and 2)
community-based random sampling methods for
comparison (Figure 1). In both samples cases (drug
abusers) were persons aged between 15 and 40 years
who met the DSM-IV (Diagnostic and Statistical Manual
of Mental Disorders-IV) [11] criteria for drug abuse
using the CAGE screen [12]. Controls were restricted
to persons with same age group, who had never taken
any psychoactive drugs, except as prescribed by
doctors.
Snowball Sample
Sixteen potential drug abusers were identified by
interviewing five ex-drug abusers, four drop-in-center
in-charges (Auxiliary Nurse Midwives) and four drug
abuse outreach workers. Six of these were under
severe influence of drugs and were excluded. The
remaining ten agreed to participate in the interview.
Each case was asked to name a friend who was a drug
abuser (a new case) and a friend who had never been
involved in the abuse of drugs (control). One hundred
fifty case-control pairs were identified in this way
(Figure 1).