A short course taught online by statisticians from LSHTM, and part of the School’s Centre for Data and Statistical Science for Health.
Missing data frequently occurs in both observational and experimental research. They lead to a loss of statistical power, but more importantly, may introduce bias into the analysis. In this course, we adopt a principled approach to handling missing data, in which the first step is a careful consideration of suitable assumptions regarding the missing data for a given study. Based on this, appropriate statistical methods can be identified that are valid under the chosen assumptions.
The overall aim of this course is for participants to learn about how the method of multiple imputation can be used to handle missing data in statistical analyses and to understand the assumptions under which this is valid. In addition to introducing the method in more standard settings, we will explore its use in a range of more advanced situations, including in the presence of non-linearities and interactions, propensity score analysis, prognostic model development, and for performing sensitivity analyses.
Who should apply?
Epidemiologists, biostatisticians, and other health researchers have strong quantitative skills and experience in statistical analysis. In particular, we will expect familiarity with regression models, such as linear and logistic regression, and interpretation of their results. Computer practicals will use the statistical software package R, so participants should be familiar with using R for performing statistical analyses. Full R code solutions will be provided.
Intended learning outcomes
- Understand the impacts of missing data on statistical inferences and assumptions about missingness mechanisms, including missing completely at random, missing at random, and missing not at random.
- Understand the assumptions under which multiple imputations can be used to provide valid inferences from a partially observed dataset, and be able to apply it appropriately using modern statistical software.
- Understand how multiple imputations can be applied in various advanced settings, including non-linearities and interactions, missing data sensitivity analysis, propensity score analysis, and prognostic model development.
Teaching format
The course is delivered online across 3 days. In each morning and afternoon session, a 1-hour lecture followed by a 1.5-hour computer practical.
Course Content
The course will cover:
- The effects of missing data on statistical inferences
- Missingness mechanism assumptions include missing completely at random, missing at random, and missing not at random.
- Multiple imputation for missing data, based on joint models and fully conditional specification approaches, and Rubin's pooling rules.
- Multiple imputation accommodating non-linearities and interactions.
- Multiple imputation for sensitivity analysis.
- Multiple imputation in the context of propensity score analysis.
- Multiple imputation in the context of prognostic model development and deployment.
Through computer practicals using R, participants will learn how to apply the statistical methods introduced in the course to realistic datasets.
Course Certificate and Assessment
There will be no formal assessment, but participants will receive a Certificate of Attendance.
Fees
The Statistical Analysis with Missing Data Using Multiple Imputation course cost for 2025 entry is £690.
Funding
50% discounts are available for offer holders from Low- or Middle-income countries (LMICs) and current Research Degree Students.
When applying for discounted fees, please include proof of Research Degree student or LMIC status instead of your CV. LMIC status can be confirmed with a passport and proof of current residence.
Applying for this course
Applications for 2025 are now open and can be made via our online application form.
Please read LSHTM's Admissions policies prior to submitting your application.
LSHTM may cancel courses two weeks before the first day of the course if numbers prove insufficient. In those circumstances, course fees will be refunded.
Prerequisites
No formal prerequisites in terms of qualifications, but participants should be familiar with statistical methods such as linear and logistic regression and interpretation of statistical inferences, including confidence intervals, p-values, and regression coefficients. Participants will also need to be familiar with the statistical software package R. Participants should ensure they have R installed and ready to use for computer practicals.