XML Template (2015) [1.4.2015–6:47pm] [1–10] //blrnas3.glyph.com/cenpro/ApplicationFiles/Journals/SAGE/3B2/SCMJ/Vol00000/150008/APPFile/SG-SCMJ150008.3d (SCM) [INVALID Stage] Educational Article Opportunities for longitudinal data linkage in Scotland Gareth Hagger-Johnson Abstract Scotland has existing data resources which are competitive internationally and available to researchers from elsewhere. The Scottish Informatics and Linkage Collaboration (SILC) was recently launched, allowing data sets to be linked within and between sectors (e.g. health to non-health). The purpose of this review article is to introduce and define key terms in data linkage, to describe the emerging data linkage resources available in Scotland and to describe the opportunities available in Scotland to researchers internationally. The review is aimed at researchers internationally who are interested in data linkage using Scottish data resources. The review makes particular reference to longitudinal health data but emphasises that linkage to non-health data allows research questions to be considered that were previously not answer- able. The review is focused on longitudinal data resources (e.g. cohort studies and repeated measures designs), since they are usually the focus of data linkage research. The review concludes that any intended data linkage for research should be driven by a clear research question. The infrastructure already available and the launch of SILC will accelerate research in Scotland and generate new research questions that previously could not be considered answerable. Keywords Data linkage, longitudinal studies, Scotland, administrative data, epidemiology Introduction Scotland already has data resources which are competi- tive internationally and available to researchers from elsewhere. The purpose of this review article is to describe the emerging data linkage resources available, so that researchers interested in data linkage opportu- nities become aware of what Scotland can offer. The review makes particular reference to health data (e.g. primary care, hospital records, cancer registries, mor- tality data) but emphasises that linkage to non-health data (e.g. education, social care and crime) may offer unique opportunities for answering certain research questions. The review is focused on longitudinal data resources (e.g. cohort studies and repeated measures designs), since these are often the focus of data linkage efforts, and are usually considered better quality evi- dence in the ‘hierarchy of evidence’ than cross-sectional data or case reports, for example. 1 Longitudinal data can, however, be analysed as cross-sectional data, by restricting to the analysis to one measurement occasion if required. The review is also focused on data that have been digitised, but it is worth noting that a great deal of historic administrative data exists only in paper records. Data linkage Data linkage involves bringing two records together that belong to the same individual. This might involve linking records that belong to the same individual over time in the same data set (e.g. repeated hospital admis- sions for a patient), or linking records that belong to the same individual from two different data sets (e.g. a patient’s health records to their social care records). Data linkage can result in two kinds of data linkage errors: false matches (two people are assigned the same ID) or missed matches (the same person is assigned more than one ID). 2,3 Senior Research Associate, Department of Epidemiology and Public Health, UCL, London Corresponding author: Gareth Hagger-Johnson, Department of Epidemiology and Public Health, UCL, 1-19 Torrington Place WC1E 6BT, UK. Email: g.hagger-johnson@ucl.ac.uk Scottish Medical Journal 0(0) 1–10 ! The Author(s) 2015 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav DOI: 10.1177/0036933015575214 scm.sagepub.com