- This event has passed.
NCRM: Introduction to Data Linkage and Analysing Linked Data
May 19, 2020 @ 8:00 am - May 20, 2020 @ 5:00 pm
This short course is designed to give participants a practical introduction to data linkage and is aimed at both analysts intending to link data themselves and researchers who want to understand more about the linkage process and its implications for analysis of linked data—particularly the implications of linkage error. Day 1 (Introduction to Data Linkage) will cover examples of the uses of data linkage, data preparation, and methods for linkage (including deterministic and probabilistic approaches). Day 2 (Introduction to Analysing Linked Data) will cover processing of linked data, concepts of linkage error and bias, and handling linkage error in analysis. Examples will be drawn predominantly from health data but the concepts will apply to many other areas. This course includes a mixture of lectures and practical sessions that will enable participants to put theory into practice.
The course covers:
- Overview of data linkage (data linkage systems, benefits of data linkage, types of projects)
- Overview of linkage methods (deterministic and probabilistic, privacy-preserving)
- The linkage process (data preparation, blocking, classification)
- Classifying linkage designs
- Evaluating linkage quality and bias (types of error, analysis of linked data)
- Reporting analysis of linked data
- Practical sessions (no coding required; see below)
By the end of the course participants will:
- Understand the background and theory of data linkage methods
- Perform deterministic and probabilistic linkage
- Evaluate the success of data linkage
- Appropriately report analysis based on linked data
The course is aimed at analysts and researchers who need to gain an understanding of data linkage techniques and of how to analyse linked data. The course provides an introduction to data linkage theory and methods for those who might be implementing data linkage or using linked data in their own work. Participants may be academic researchers in the social and health sciences or may work in government, survey agencies, official statistics, for charities or the private sector.
The course does not assume any prior knowledge of data linkage. Some experience of using Excel or other software will be useful for the practical session.
Recommended preparatory reading
- Sayers A, Ben-Shlomo Y, Blom AW, Steele F. Probabilistic record linkage. Int J Epidemiol. 2016;45(3):954–964. doi:10.1093/ije/dyv322
- Doidge JC, Harron K. Demystifying probabilistic linkage: Common myths and misconceptions. Int J Popul Data Sci. 2018;3(1):410. doi:10.23889/ijpds.v3i1.410
- Harron KL, Doidge JC, Knight HE, et al. A guide to evaluating linkage quality for the analysis of linked data. Int J Epidemiol. 2017;46(5):1699–1710. doi:10.1093/ije/dyx177
- Doidge JC, Harron KL. Linkage error bias. Int J Epidemiol 2019; in press
Participants will be required to bring a laptop, preloaded with Excel (or equivalent) and LinkPlus, or be prepared to share (which is encouraged, regardless). Please note that LinkPlus is not compatible with Macs. Participants will receive printed or pdf course slides.
The fee per teaching day is:
• £30 per day for UK/EU registered students
• £60 per day for staff at UK/EU academic institutions, UK/EU Research Councils researchers, UK/EU public sector staff and staff at UK/EU registered charity organisations and recognised UK/EU research institutions.
• £220 per day for all other participants
All fees include event materials and refreshments. They do not include lunch, travel and accommodation costs.