Investigating Samples Representativeness for an
Online Experiment in Java Code Search
Rafael Maiani de Mello
Federal University of Rio de Janeiro
P.O. Box 68511, Brazil
+55 21 3938 8712
rmaiani@cos.ufrj.br
Kathryn T. Stolee
Iowa State University
209 Atanasoff Hall
Ames, IA 50011
kstolee@iastate.edu
Guilherme Horta Travassos
Federal University of Rio de Janeiro
P.O. Box 68511, Brazil
+55 21 3938 8712
ght@cos.ufrj.br
Abstract— Context: The results of large-scale studies in
software engineering can be significantly impacted by
samples’ representativeness. Diverse population sources can
be used to support sampling for such studies. Goal: To
compare two samples, one from the crowdsourcing platform
Mechanical Turk and another from the professional social
network LinkedIn, in an online experiment for evaluating the
relevance of Java code snippets to programming tasks.
Method: To compare the samples (subjects’ experience,
programming habits) and experimental results concerned
with three experimental trials. Results: LinkedIn’s subjects
present significantly higher levels of experience in Java
programming and programming in general than Mechanical
Turk’s subjects. The experimental results revealed a
significant difference between samples and suggested that
LinkedIn’s subjects were more pessimistic than Mechanical
Turk’s subjects despite a high level consistency in the
experimental results. Conclusion: The combined use of
sources of sampling can bring benefits to large scale studies in
software engineering, especially when heterogeneity is desired
in the population. Thus, it can be useful to investigate and
characterize alternative sources of sampling for performing
large-scale studies in software engineering.
Keywords— experimental software engineering; sampling;
population; survey; sampling frame
I. INTRODUCTION
In statistics, a sampling frame is the source from which
a sample, i.e. a subset of units from a study population can
be retrieved [1]. In the context of Software Engineering
(SE) research, primary studies are often conducted over
samples established by convenience [2,3,4]. Student
classes, research groups and organizational units are
common sampling frames from which individuals have
been recruited to collaborate in SE quasi-experiments. As a
consequence, the external validity of the evidence observed
in such studies is significantly limited.
Although the specialized nature of some SE problems
allows them to be investigated through qualitative
strategies such as action research [5] and case studies [6],
there are many open research questions that could be better
answered through large-scale experiments and surveys, in
which the representativeness of the sample can
significantly impact the results. Unlike areas in which the
units of observation are controlled and can be applied in
diverse experimental arrangements, SE research is
hampered by a lack of available sampling frames composed
by representative populations of individuals or groups of
individuals, such as organizations and project teams [4, 7].
One can see that not only the variability of SE research
contexts contributes to this scenario [8], but also the
business context of SE practice.
In this context, online experiments represent a good
opportunity to investigate alternative sources of sampling
[9] from which better adequate sampling frames can be
established to support specific research contexts. A
possible immediate contribution expected on using such
sources is related with the increase of samples’ size, but it
is not limited to. It is also expected that representative
samples should be sufficiently heterogeneous from the
point of view of the attributes previously established to
characterize each individual from a specific study
population [10].
Two first trials from an experiment on evaluating Java code
snippets from three distinct search engines (Google, a
source-code specific search engine, Merobase, and a
research prototype, Satsy) were conducted [11, 12].
Although the operationalization and the protocol of such
trials presented some differences, both used as population
the anonymous workers from the crowdsourcing platform
Amazon’s Mechanical Turk (MTurk).
Then, a third trial was conducted having as population
the members from a group of interest from the professional
social network LinkedIn (www.linkedin.com). This trial
applied a systematic plan to recruit a random and
geographically distributed sample from such group,
following concepts from a framework originally developed
to support researchers on establishing representative
samples in large scale SE surveys [9].
This work presents this third trial and examines the
contributions on using LinkedIn as source of sampling in
comparison with samples and results obtained in the
previous trials using MTurk. The contributions of this work
are:
1. Operational replication of a study on code search
results using LinkedIn for sampling (previously
performed using MTurk).
978-1-4673-7899-4/15/$31.00 ©2015 IEEE
118