International Journal of Statistics in Medical Research, 2018, 7, 18-24 18 E-ISSN: 1929-6029/18 © 2018 Lifescience Global Probability Sampling in Matched Case-Control Study in Drug Abuse Surya Raj Niraula 1,* and Frederick A. Connell 2 1 School of Public Health and Community Medicine, B.P. Koirala Institute of Health Sciences, Dharan, Nepal 2 School of Public Health and Community Medicine, University of Washington, Seattle, USA Abstract: Although random sampling is generally considered to be the gold standard for population-based research, the majority of drug abuse research is based on non-random sampling despite the well-known limitations of this kind of sampling. We compared the statistical properties of two surveys of drug abuse in the same community: one using snowball sampling of drug users who then identified “friend controls” and the other using a random sample of non-drug users (controls) who then identified “friend cases”. Models to predict drug abuse based on risk factors were developed for each data set using conditional logistic regression. Bootstrap analysis of the random-sample data set showed less variation, and did not change the significance of the predictors when compared to the non-bootstrap analysis. Comparison of ROC curves using the model derived from the random-sample data set was similar when fitted to either data set (0.93 for random-sample data vs. 0.91 for snowball-sample data (p=0.35)); however, when the model derived from the snowball-sample data set was fitted to each of the data sets, the areas under the curve were significantly different (0.98 vs. 0.83, p<.001). The proposed method of random sampling of controls appears to be superior from a statistical perspective to snowball sampling and may represent a viable alternative to snowball sampling. Keywords: Random sampling, bootstrapping, non-random sampling, ROC curve. INTRODUCTION The illicit drug use is a ‘hidden’ and often socially stigmatized activity [1]. The illegal and stigmatized behaviors of illicit drug users endow them with ‘low social visibility’ [2]. The illegality of drug usage and the heterogeneity of drug users make representative community survey difficult. Such problem does not occur in alcohol and smoking research [3,4]. A common methodological limitation in drug abuse research is that it is frequently based on the non- probability sampling methods. Commonly used non- random sampling methods include snowball sampling, convenience sampling, privileged access interviewer method, respondent driven sampling and contact tracing [5-8]. Furthermore, if one uses an institution based case-control design, there is a high likelihood of Berkson’s bias [9]. Regardless of the care with which research based on these sampling methods is conducted and the ‘adequacy’ of sample size, there is no guarantee that results from these studies will be generalizable to the population from which subjects were selected. This paper presents a new random sampling strategy for research on ‘hidden’ populations and compares statistical properties of this method to those of a sample from the same community derived from snowball sampling. *Address correspondence to this author at the School of Public Health and Community Medicine, B.P. Koirala Institute of Health Sciences, Dharan, Nepal; Tel: +977 9842035218; Fax: +977 25 520251; E-mail: sniraula@yahoo.com METHODS The data for this paper were derived from a study of risk factors for drug abuse conducted in Dharan municipality in eastern Nepal. Nepal is a landlocked country covering an area of 147,181 km 2 with a population of about 26.5 million bordered by India and China. A total of 116,181 people reside in 103.38 km 2 areas of Dharan [10]. Two matched case-control data sets were formed using 1) snowball sampling and 2) community-based random sampling methods for comparison (Figure 1). In both samples cases (drug abusers) were persons aged between 15 and 40 years who met the DSM-IV (Diagnostic and Statistical Manual of Mental Disorders-IV) [11] criteria for drug abuse using the CAGE screen [12]. Controls were restricted to persons with same age group, who had never taken any psychoactive drugs, except as prescribed by doctors. Snowball Sample Sixteen potential drug abusers were identified by interviewing five ex-drug abusers, four drop-in-center in-charges (Auxiliary Nurse Midwives) and four drug abuse outreach workers. Six of these were under severe influence of drugs and were excluded. The remaining ten agreed to participate in the interview. Each case was asked to name a friend who was a drug abuser (a new case) and a friend who had never been involved in the abuse of drugs (control). One hundred fifty case-control pairs were identified in this way (Figure 1).