Investigating Samples Representativeness for an Online Experiment in Java Code Search Rafael Maiani de Mello Federal University of Rio de Janeiro P.O. Box 68511, Brazil +55 21 3938 8712 rmaiani@cos.ufrj.br Kathryn T. Stolee Iowa State University 209 Atanasoff Hall Ames, IA 50011 kstolee@iastate.edu Guilherme Horta Travassos Federal University of Rio de Janeiro P.O. Box 68511, Brazil +55 21 3938 8712 ght@cos.ufrj.br AbstractContext: The results of large-scale studies in software engineering can be significantly impacted by samples’ representativeness. Diverse population sources can be used to support sampling for such studies. Goal: To compare two samples, one from the crowdsourcing platform Mechanical Turk and another from the professional social network LinkedIn, in an online experiment for evaluating the relevance of Java code snippets to programming tasks. Method: To compare the samples (subjects’ experience, programming habits) and experimental results concerned with three experimental trials. Results: LinkedIn’s subjects present significantly higher levels of experience in Java programming and programming in general than Mechanical Turk’s subjects. The experimental results revealed a significant difference between samples and suggested that LinkedIn’s subjects were more pessimistic than Mechanical Turk’s subjects despite a high level consistency in the experimental results. Conclusion: The combined use of sources of sampling can bring benefits to large scale studies in software engineering, especially when heterogeneity is desired in the population. Thus, it can be useful to investigate and characterize alternative sources of sampling for performing large-scale studies in software engineering. Keywords— experimental software engineering; sampling; population; survey; sampling frame I. INTRODUCTION In statistics, a sampling frame is the source from which a sample, i.e. a subset of units from a study population can be retrieved [1]. In the context of Software Engineering (SE) research, primary studies are often conducted over samples established by convenience [2,3,4]. Student classes, research groups and organizational units are common sampling frames from which individuals have been recruited to collaborate in SE quasi-experiments. As a consequence, the external validity of the evidence observed in such studies is significantly limited. Although the specialized nature of some SE problems allows them to be investigated through qualitative strategies such as action research [5] and case studies [6], there are many open research questions that could be better answered through large-scale experiments and surveys, in which the representativeness of the sample can significantly impact the results. Unlike areas in which the units of observation are controlled and can be applied in diverse experimental arrangements, SE research is hampered by a lack of available sampling frames composed by representative populations of individuals or groups of individuals, such as organizations and project teams [4, 7]. One can see that not only the variability of SE research contexts contributes to this scenario [8], but also the business context of SE practice. In this context, online experiments represent a good opportunity to investigate alternative sources of sampling [9] from which better adequate sampling frames can be established to support specific research contexts. A possible immediate contribution expected on using such sources is related with the increase of samples’ size, but it is not limited to. It is also expected that representative samples should be sufficiently heterogeneous from the point of view of the attributes previously established to characterize each individual from a specific study population [10]. Two first trials from an experiment on evaluating Java code snippets from three distinct search engines (Google, a source-code specific search engine, Merobase, and a research prototype, Satsy) were conducted [11, 12]. Although the operationalization and the protocol of such trials presented some differences, both used as population the anonymous workers from the crowdsourcing platform Amazon’s Mechanical Turk (MTurk). Then, a third trial was conducted having as population the members from a group of interest from the professional social network LinkedIn (www.linkedin.com). This trial applied a systematic plan to recruit a random and geographically distributed sample from such group, following concepts from a framework originally developed to support researchers on establishing representative samples in large scale SE surveys [9]. This work presents this third trial and examines the contributions on using LinkedIn as source of sampling in comparison with samples and results obtained in the previous trials using MTurk. The contributions of this work are: 1. Operational replication of a study on code search results using LinkedIn for sampling (previously performed using MTurk). 978-1-4673-7899-4/15/$31.00 ©2015 IEEE 118