Seminar slides and recordings
Slides and recordings from the former Centre for Statistical Methodology's events are available below. If you experience any issues with the downloads/links on this page please e-mail: dash@lshtm.ac.uk.
Centre for Statistical Methodology Seminar
Title: Evolving approaches to choice of outcomes in clinical trials: win ratio, repeat events, etc
Abstract:
Many clinical trials have a composite outcome which is conventionally analysed as time-to-first-event. Better alternatives can put the events in a clinical hierarchy, leading to a win ratio analysis and/or make use of repeat events, e.g. hospitalisations. These evolving techniques will be described with examples from recent trials in cardiovascular diseases.
Speaker: Stuart Pocock, Professor of Medical Statistics, LSHTM
15 May 2023
Centre for Statistical Methodology Seminar
Title: Using Mendelian Randomisation to model the causal effect of cancer on health economic outcomes and to simulate the cost-effectiveness of anti-cancer interventions
Abstract:
Cancer is associated with significant economic impacts. Quantifying the scale of these impacts is challenged by confounding variables that jointly influence both cancer status and economic outcomes such as healthcare costs and quality of life. Moreover, the increasing costs attributed to cancer drug development complicate the cost-effective provision of cancer care.
Padraig Dixon addresses both challenges in this work by using germline genetic variation in the risk of incident cancer as instrumental variables in Mendelian Randomisation analyses of eight cancers. They developed causal estimates of the effect of bladder, breast, colorectal, lung, multiple myeloma, ovarian, prostate, and thyroid cancers on healthcare costs and quality adjusted life years (QALYs) using outcome data drawn from the UK Biobank cohort. They then used these results and methodologies to model the cost-effectiveness of a hypothetical population-wide preventative intervention based on a repurposed class of anti-diabetic drugs known as sodium-glucose co-transporter-2 (SGLT2) inhibitors very recently shown to reduce the odds of incident prostate cancer.
Genetic liability to prostate and breast cancers had material causal impacts on health economic outcomes. Mendelian Randomisation results for the less common cancers were associated with considerable uncertainty. SGLT2 inhibition was unlikely to be a cost-effective preventative intervention for prostate cancer, although this conclusion depended on the price at which these drugs would be offered for a novel anti-cancer indication.
These methods can be used to rapidly and efficiently estimate intervention cost-effectiveness for any disease or trait where Mendelian Randomisation is feasible.
Speaker: Padraig Dixon, University of Oxford
3 May 2023
Centre for Statistical Methodology Seminar
Title: G-formula with missing data via multiple imputation
Abstract:
The G-formula method is an increasingly popular method for estimating causal effects from observational data with a time-varying treatment or exposure. Most implementations of G-formula cannot accommodate missing values in exposure or confounder measurements, which are ubiquitous in study data.
In this talk, Dr Jonathan Bartlett will describe how G-formula can be implemented via the method of multiple imputation and that as part of this, missing data can be readily accommodated.
Speaker: Dr Jonathan Bartlett, Professor in Medical Statistics, LSHTM
8 February 2023
Centre for Statistical Methodology Seminar
Title: At-risk-measure sampling in case-control studies with aggregated data
Abstract:
Secondary sources of mobile-device data (e.g., as collected by commercial smartphone apps) can measure transient exposure with high accuracy and low researcher burden but present some challenges. First, these data commonly only sample the target population, so a case-control design may be useful. Second, data are often aggregated by location to preserve anonymity of users.
In this seminar, the speaker will discuss his paper, “At-risk-measure sampling in case-control studies with aggregated data”, which describes a method for using these types of data to estimate the incidence rate ratio from a hypothetical cohort study. Unlike incidence density sampling, a similar method, the described method directly samples the measure of the at-risk experience, such as person-distance travelled in studies of transportation risk.
Speaker: Michael D. Garber, PhD MPH, Postdoctoral Researcher, Rojas Public Health Lab, Colorado State University
25 January 2023
Centre for Statistical Methodology Seminar
Title: Bayes, buttressed by design-based ideas, is the best overarching paradigm for sample survey inference
Abstract:
This seminar covers conceptual arguments and examples suggesting that the Bayesian approach to survey inference can address the varied challenges of survey analysis.
The speaker discusses how Bayesian models that incorporate features of the complex design can yield inferences that are relevant to the specific data set obtained, but also have good repeated-sampling properties. Examples will focus on the role of auxiliary variables and sampling weights and methods for handling nonresponse.
Speaker: Professor Roderick Little, Richard D. Remington Distinguished University Professor of Biostatistics at University of Michigan.
22 November 2022
Centre for Statistical Methodology Symposium
Title: A celebration of 50 Years of the Cox model in memory of Sir David Cox
Abstract:
It has been 50 years since Sir David Cox’s seminal paper on what is now called the Cox model for the analysis of event time data was published in the Journal of the Royal Statistical Society (Cox, DR. Regression Models and Life Tables. JRSS (Series B) 1972; 34(2): 187-202.). The Cox model has been used in applications across many areas of research, as well as inspiring wide-ranging methodological developments. It is one of the most highly cited scientific papers. This symposium celebrated 50 years of the Cox model and remembered the work of Sir David Cox (1924-2022), including some of his many contributions to statistics.
10 November 2022
Centre for Statistical Methodology Seminar
Title: Integration of observational and randomised controlled trial data
Abstract:
In this talk, Dr Lauren Eyler Dang used a roadmap for causal inference to explore the challenges of integrating observational and RCT data, including considerations for designing such a hybrid trial. She discussed different approaches to data fusion, including a novel estimator that uses cross-validated targeted maximum likelihood estimation (CV-TMLE) to data-adaptively select and analyze the optimal experiment - RCT only (if no unbiased external data exists) or RCT with external data.
Speaker: Dr Lauren Eyler Dang, University of California, Berkeley
12 October 2022
Centre for Statistical Methodology Seminar
Title: M-estimation
Abstract:
To highlight the applicability of M-estimators to a variety of problems, Dr Paul Zivich reviewed examples in regression, dose-response relationships, causal inference, and transportability of randomised trials. Each example was illustrated with the corresponding estimation equations, data and computer code.
Speaker: Dr Paul Zivich, University of North Carolina at Chapel Hill
5 October 2022
Centre for Statistical Methodology Seminar
Title: The analysis of active-control trials: missing the point
Abstract:
In this talk, Professor Dunn discussed the analysis of active-control trials with a time-to-event outcome. He showed that the standard metric used in such trials, the rate ratio or rate difference comparing experimental and control arms, may not be the most clinically relevant measure. An alternative measure was proposed, which requires specification of one of two unobserved parameters. The problem was exemplified by studies from HIV pre-exposure prophylaxis, TB preventative treatment, and COVID-19 vaccines.
Speaker: Professor David Dunn, Professor of Medical Statistics at the MRC Clinical Trials Unit at UCL
1 June 2022
Centre for Statistical Methodology Seminar
Title: Spatial and spatio-temporal models in the time of COVID-19
Abstract:
In this talk Professor Marta Blangiardo will present some recent work she has done on spatial and spatio-temporal modelling of mortality data during the COVID-19 pandemic. She will first show the first comprehensive analysis of the spatio-temporal differences in excess mortality during 2020 across five European countries (Greece, Italy, England, Spain, Switzerland), using a population-based design on all-cause mortality data. Sex-specific weekly mortality rates for each area (NUTS3 regions) were estimated, based on a comparison period (2015-2019), while adjusting for age, localised temporal trends and the effect of temperature. Then, all-cause weekly deaths and mortality rates at the same spatial resolution were predicted for 2020, based on the modelled spatio-temporal trends, so that excess deaths could be estimated.
Secondly, she will talk about a two-stage spatial model to quantify inequalities in excess mortality in people aged 40 years and older at the community level during the first year of the pandemic in England, Italy and Sweden.
Speaker: Professor Marta Blangiardo (Imperial College London)
4 May 2022
Centre for Statistical Methodology Seminar
Title: A trial emulation approach for policy evaluations with group-level longitudinal data
Abstract:
In this talk, we will explore the use of target trial emulation and difference-in-difference approaches to estimate the causal effect of a policy, illustrated by stay-at-home mandates in the US during the pandemic.
To limit the spread of the novel coronavirus, governments across the world implemented extraordinary physical distancing policies, such as stay-at-home orders. Numerous studies aim to estimate the effects of these policies. Many statistical and econometric methods, such as difference-in-differences, leverage repeated measurements and variation in timing to estimate policy effects, including in the COVID- 19 context. Although these methods are less common in epidemiology, epidemiologic researchers are well accustomed to handling similar complexities in studies of individual-level interventions.
Target trial emulation emphasises the need to carefully design a nonexperimental study in terms of inclusion and exclusion criteria, covariates, exposure definition, and outcome measurement—and the timing of those variables. We argue that policy evaluations using group-level longitudinal (“panel”) data need to take a similar careful approach to study design that we refer to as policy trial emulation. This approach is especially important when intervention timing varies across jurisdictions; the main idea is to construct target trials separately for each treatment cohort (states that implement the policy at the same time) and then aggregate. We present a stylised analysis of the impact of state-level stay-at-home orders on total coronavirus cases. We argue that estimates from panel methods—with the right data and careful modeling and diagnostics—can help add to our understanding of many policies, though doing so is often challenging.
Speaker: Dr Eli Ben-Michael (Harvard University)
27 Apr 2022
Centre for Statistical Methodology series of seminars
This is a series of seminars on “Statistics for Infectious Diseases: learning from COVID-19 modelling”. These seminars are intended for students and researchers alike with interests in statistical inference for infectious diseases. Each seminar tackles one topic using different approaches.
Seminar 1: Human contacts and infectious disease transmission applied to COVID-19 modelling (30 Mar 2022)
Speakers: Dr Kiesha Prem (LSHTM) discussed “Projecting synthetic contact matrices for COVID-19 Modelling”. Dr James Munday (LSHTM) gave a talk on “Applying regularly collected contact data to epidemic response and real-time modelling application”.
Seminar 2: Assessing Impact of government interventions on COVID-19 (13 Apr 2022)
Speakers: Dr Leonid Chindelevitch gave a talk on “Evaluating the effectiveness of government interventions against the first and second wave of COVID-19” and Professor Lin Xihong’s talk discussed “Regression methods for understanding COVID-19 epidemic dynamics by integrating multiple sources of data”.
Seminar 3: Evaluation of public health interventions on COVID-19 (8 June 2022)
Centre for Statistical Methodology Seminar
Title: Emulating target trials using observational data: an application to estimating mortality of delays in appropriate antibiotic treatment and accounting for time-varying confounding and immortal time biases
Abstract:
Delays in treating bacteraemias with antibiotics to which the causative organism is susceptible are expected to adversely affect patient outcomes. However, the extent of the impact remains to be elucidated. This cannot ethically be addressed in a randomised trial. Therefore, an observational cohort study is likely to be the best alternative approach to determine the impact of delays in concordant antibiotic treatment on patient outcomes. There are, however, potentially important biases to be addressed, including time-varying confounding effect and immortal time bias.
We described two target trials, and performed analyses on observational data aiming, as far as possible, to emulate the trials. Among 1,203 patients with Acinetobacter species hospital-acquired bacteremia in a tertiary hospital in Thailand, 682 had ≥1 days of delays to concordant treatment. Surprisingly, crude 30-day mortality was lower in patients with delays of ≥3 days compared with those who had 1–2 days of delays. Accounting for confounders and immortal time bias resolved this paradox. Emulating a target trial, we found that these delays were associated with an absolute increase in expected 30-day mortality of 6.6% (95% confidence interval: 0.2, 13.0), from 33.8% to 40.4%.
Speaker: Dr Cherry Lim (Mahidol University)
09 Mar 2022
Centre for Statistical Methodology Seminar
Title: Handling missing data when estimating causal effects with Targeted Maximum Likelihood Estimation
Abstract:
This seminar introduced several approaches and presented a cutting-edge method of dealing with missing data when estimating causal effects with Targeted Maximum Likelihood Estimation framework. This event included the presentation of statistical methods that any statistician, analyst or epidemiologist interested in increasing their methodological toolbox could benefit from.
Speaker: Dr Ghazaleh Dashti (University of Melbourne)
16 Feb 2022
Centre for Statistical Methodology Symposium
Title: How can mathematical and statistical models combine with big data to improve our response to pandemics?
This symposium assessed the role of modelling and big data in preparing for pandemics and featured distinguished speakers, including Professor Sir Chris Whitty, Honorary Professor of Public and International Health, LSHTM and Chief Medical Officer for England, and Professor Dame Angela McLean, Chief Scientific Adviser for the Ministry of Defence.
It also included statistical and mathematical modellers, and leading policy advisors, who discussed the major research priorities that can help governments plan for the next pandemic.
This was a joint event with the Centre for the Mathematical Modelling of Infectious Diseases and the Centre for Epidemic Preparedness and Response.
Date: 2 February 2022
Centre for Statistical Methodology: Early Career Statistician Showcase
This event featured 8 short talks from early career statisticians from the Centre and beyond, including PhD students and postdoctoral researchers. Professor Chris Yau closed the event with a keynote.
Date: 10 Dec 2021
Short talks:
- Dr Emily Granger: “Investigating the effects of multiple treatments used in combination on health outcomes in people with cystic fibrosis”
- Charlotte Rutter: “Determinants of time trends in asthma prevalence: Global Asthma Network Phase I”
- Dr Rachel Sarguta: “Application of causal inference methods in the analysis of cluster- randomized trials of complex interventions”
- Dr Pierre Masselot: “Constrained estimation in generalized linear models and application to shape-constrained splines”
- Dr Mia Tackney: “Making every step count: handling missing accelerometer data in trials”
- Silvia Moler Zapata: “Local instrumental variable methods to estimate heterogeneous treatment effects: a case study of emergency surgery in the UK”
- Oliver Hines: “Variable importance measures for heterogeneous causal effects”
- Darren Scott: “Bayesian feature selection with variational inference”
Keynote talk:
Speaker: Professor Christopher Yau (University of Manchester)
Centre for Statistical Methodology Series of Seminars
A series of four sessions on modern concepts and methods relating to estimation of effects of treatments or exposures on survival and other time-to-event outcomes.
Talk 1: "Causal inference for survival outcomes: An introduction" (3 Nov 2021)
Speaker: Professor Bianca De Stavola (UCL) and Professor Ruth Keogh (LSHTM).
Talk 2: "Estimating adaptive treatment strategies for survival outcomes" (10 Nov 2021)
Speaker: Professor Erica Moodie (McGill University)
Talk 3: "On identification of vaccine effects in time-to-event settings" (24 Nov 2021)
Speaker: Professor Mats Stensrud (Ecole Polytechnique Fédérale de Lausanne)
Talk 4: "Shall we count the living or the dead?" (1 Dec 2021)
Speaker: Dr Anders Huitfeldt (University of Southern Denmark)
Centre for Statistical Methodology Symposium
Title: Tackling inequalities and exclusion in statistical research
This symposium aims to examine how results from statistical analysis are affected by the way that data arise and the algorithms that we use, and what methodological research on statistical design and analysis is required to identify and eliminate inherent inequalities.
Date: 17 Nov 2021
Speakers: Dr Rohini Mathur (LSHTM), Dr Mhairi Aitken (The Alan Turing Institute), Dr Darshali Vyas (Harvard University), Dr Sherri Rose (Stanford University).
Slides (Rohini Mathur, Mhairi Aitken, Darshali Vyas, Sherri Rose)
Centre for Statistical Methodology Seminar
Title: Common methods for missing data in marginal structural models: What works and why
Abstract:
Electronic health records (EHR) are useful for addressing health-related questions, such as estimating the marginal effect of treatment over a long period of time. In practice, a patient’s treatment exposure may not be constant over time but it get updated as their medical history evolves. In turn, the new treatment may affect future health events and individual factors, potentially associated with the outcome of interest.
Marginal structural models (MSMs) have been proposed to estimate marginal effects in this type of settings. The parameters of MSMs are often estimated using inverse-probability-of-treatment-weighting (IPTW) with weights accounting for time-varying confounding through the modelling of the treatment assignment mechanism. A major issue when applying this method is missing data among confounders, where a poorly informed analysis method will lead to biased estimates of treatment effects. Despite several approaches described in the literature for handling missing data in MSMs, there is little guidance on what works in practice and why.
In this presentation, we will review existing missing-data methods for MSMs and discuss the plausibility of their underlying assumptions. In particular, we will focus on the complete case analysis, multiple imputations, the last observation carried forward, the missingness pattern approach and inverse probability of missingness weighting, under three mechanisms for nonmonotone missing data encountered in research based on electronic health record data.
This session is relevant to statisticians and epidemiologists interested in long-term causal treatment effect estimation using electronic health records.
Speaker: Dr Clemence Leyrat (LSHTM)
9 Jun 2021
Centre for Statistical Methodology Seminar
Title: Anonymisation of data by synthesising data
Abstract:
The creation of synthetic datasets has been proposed as a statistical disclosure control solution, especially to generate public use files from confidential data or datasets shared within an organisation or company. It is also a tool to create ''augmented datasets'' to serve as input for micro-simulation models. The performance and acceptability of such a tool relies heavily on the quality of the synthetic data, i.e., on the statistical similarity between the synthetic and the true population of interest. Multiple approaches and tools have been developed to generate synthetic data. These approaches can be categorised into four main groups: synthetic reconstruction, combinatorial optimisation, model-based generation, and deep learning approaches. In addition, methods have been formulated to evaluate the quality of synthetic data.
In this presentation, the methods are not shown from the theoretical point of view; they are rather introduced in an applied and generally understandable fashion. We focus on new concepts for the model-based generation of synthetic data that avoids disclosure problems. In the end of the presentation, we introduce simPop, an open-source data synthesizer. simPop is a user-friendly R-package based on a modular object-oriented concept. It provides a highly optimised S4 class implementation of various methods, including calibration by iterative proportional fitting/updating and simulated annealing, and modeling or data fusion by logistic regression, regression tree methods and many other methods. Utility functions to deal with (age) heaping are implemented as well. An example is shown using real data from Official Statistics. The simulated data then serves as input for agent-based simulation and/or microsimulation, or they can be shared within a company or organisation or between organisations without running into troubles with laws on privacy and data protection. Synthetic data can be even used as open data for research and teaching.
Speaker: Dr Matthias Templ (Zurich University of Applied Sciences)
26 May 2021
Centre for Statistical Methodology Seminar
Title: Conformal inference of counterfactuals and individual treatment effects
Abstract:
Evaluating treatment effect heterogeneity widely informs treatment decision making. At the moment, much emphasis is placed on the estimation of the conditional average treatment effect via flexible machine learning algorithms. While these methods enjoy some theoretical appeal in terms of consistency and convergence rates, they generally perform poorly in terms of uncertainty quantification. This is troubling since assessing risk is crucial for reliable decision-making insensitive and uncertain environments. In this work, we propose a conformal inference-based approach that can produce reliable interval estimates for counterfactuals and individual treatment effects under the potential outcome framework.
For completely randomised or stratified randomised experiments with perfect compliance, the intervals have guaranteed average coverage in finite samples regardless of the unknown data generating mechanism. For randomised experiments with ignorable compliance and general observational studies obeying the strong ignorability assumption, the intervals satisfy a doubly robust property which states the following: the average coverage is approximately controlled if either the propensity score or the conditional quantiles of potential outcomes can be estimated accurately. Numerical studies on both synthetic and real datasets empirically demonstrate that existing methods suffer from a significant coverage deficit even in simple models. In contrast, our methods achieve the desired coverage with reasonably short intervals. This is a joint work with Emmanuel Candès.
Speaker: Dr Lihua Lei (Stanford University)
24 March 2021
Centre for Statistical Methodology Series of Seminars
This is a series of three one-hour talks, running over three consecutive Wednesdays, on modern causal prediction.
Talk 1: "Clinical prediction models: a field in crisis" (3 February 2021)
Speaker: Professor Gary Collins (University of Oxford)
Talk 2: "Multi-Outcome Risk Prediction Modelling: current state-of-play and future research" (10 February 2021)
Speaker: Dr Glen Martin (University of Machester)
Talk 3: "Dynamic survival prediction combining landmarking with a machine learning ensemble: Methodology and empirical comparison with standard techniques" (24 February 2021)
Speaker: Ms Kamaryn Tanner (LSHTM)
Centre for Statistical Methodology Seminar
Title: Causal inference in a time of coronavirus: tenofovir, tocilizumab, hydroxychloroquine
Speakers: Prof Miguel Hernán (Harvard)
27 January 2021
Centre for Statistical Methodology Seminar
Title: Spatio-temporal two-stage models for environmental health studies
Speakers: Prof Antonio Gasparrini & Dr Matteo Scortichini (LSHTM)
20 January 2021
Abstract:
Novel big data resources offer exceptional opportunities for environmental research, allowing linkage of health data with high-resolution exposure measurements in large populations and study areas. However, this new setting presents important analytical and computational issues, including: (i) issues in modelling potentially complex associations varying over spatial and temporal units; (ii) consideration of confounders and effect modifiers measured at different geographical levels; (iii) the exceptional computational burden of performing analyses spanning entire countries and several decades.
In this contribution, we present a novel spatio-temporal two-stage design to perform small-area analyses in environment-health epidemiological investigations. This framework will be illustrated in a small-area analysis of temperature-mortality associations using data collected in 34,753 Lower Layer Super Output Areas (LSOAs) in England and Wales in the period 1981-2018, including 9,697,753 deaths. Different designs are defined and applied to investigate geographical differences in the increased risks associated to heat and cold, to explore potential temporal variations, and to assess spatially and time-varying characteristics that can potentially modify the relationships.
Centre for Statistical Methodology Series of Seminars
This is a series of talks by LSHTM researchers on methodological work arising from the analysis of COVID-19 data.
Talk 1: "Routes taken and length of stay after hospital admission with COVID-19: Results and statistical challenges" (25 November 2020)
Speakers: Prof Ruth Keogh (LSHTM) and Dr Karla Diaz-Ordaz (LSHTM)
Talk 2: "Predicting risk of COVID-19 mortality in the general population" (02 December 2020)
Speaker: Dr Elizabeth Williamson (LSHTM)
Talk 3: "COVID-19 infectious disease modelling and statistics: myths, misperceptions and managing the way forward" (09 December 2020)
Speaker: Prof Nick Jewell (LSHTM and UC Berkeley)
Centre for Statistical Methodology Seminar
Title: Communicating statistics in the time of COVID
Speaker: Prof Sir David Spiegelhalter FRS OBE (University of Cambridge)
18 November 2020
Abstract:
The current epidemic is notable for the vast traffic in official and unofficial information and claims. Sir David Spiegelhalter will assess the trustworthiness of the way authorities have talked about statistics and risks, focussing attention on the regular releases of cases and deaths. In particular, he will look at how background actuarial risk can be used to get a perspective on the risks facing us all.
Centre for Statistical Methodology Seminar
Title: Prior constraints for causal inference in natural experiments
Speaker: Dr Sara Geneletti (London School of Economics)
21 October 2020
Abstract:
This talk draws on two projects Sara has been involved in in which constraints were placed on prior distributions in order to obtain causal effect estimates.
The first project is the estimation of the causal effect of statins (a type of cholesterol lowering drug) in the UK population using a regression discontinuity design. We considered both continuous and binary outcomes and imposed constraints on the prior distributions of some parameters in order to stabilise and obtain causal effect estimates.
The second project involved generating continuous values for the severity of non-custodial sentences. A long-standing issue in criminology is that sentence types come in two flavours - custodial sentences measured in days and non-custodial sentences measured as factor levels - making it difficult to compare the two types of outcomes and evaluate the effect of policy changes on sentencing.
We describe a method to extend a continuous severity score based on sentence length to non-custodial outcomes. This method involves using "prior" constraints to impose an ordering by ensuring the severity of non-custodial outcomes cannot exceed certain thresholds. The data thus generated can be used as part of an interrupted time series design to estimate the causal effects of changing sentencing guidelines.
Centre for Statistical Methodology Seminar
Title: Real-time monitoring and short-term forecasting of the COVID-19 pandemic
Speaker: Dr Sebastian Funk (LSHTM)
08 October 2020
Abstract: Real-time monitoring and short-term forecasting of infectious diseases facilitate situational awareness and can inform public health planning. One prominent way to monitor ongoing outbreaks is via the reproduction number R.
In this talk, Dr Sebastian Funk will describe an ongoing effort to track R globally and discuss the practical and statistical challenges in estimating R, from uncertainty in underlying parameters, biases in the data used for estimation, and uncertain delays in the infection and reporting process. He will further explore the link between R and short-term forecasts of the outbreak trajectory and discuss limitations in the ability to make forecasts in the longer term.
Centre for Statistical Methodology Series of Seminars
Targeted at statisticians and econometricians interested in using data-adaptive methods for estimation, while being able to obtain valid inferences, which are robust to certain types of model mis-specification. The talks are somewhat technical, but we will also be aiming to give a flavour of this rapidly evolving research area.
Doubly robust data-adaptive inference: Talk 1 (02 July 2020)
Speaker: Dr Oliver Dukes (Ghent University)
Doubly robust data-adaptive inference: Talk 2 (09 July 2020)
Speaker: Dr David Whitney (Imperial College London)
Doubly robust data-adaptive inference: Talk 3 (16 July 2020)
Speaker: Prof Stijn Vansteelandt (Ghent University and LSHTM)
Centre for Statistical Methodology Seminar
Title: Causal inference isn't what you think it is
Speaker: Prof Philip Dawid (University of Cambridge)
25 June 2020
Abstract: You may think that statistical causal inference is about inferring causation. You may think that it can not be tackled with standard statistical tools, but requires additional structure, such as counterfactual reasoning, potential responses or graphical representations. I shall try to disabuse you of such woolly misconceptions by locating statistical causality firmly within the scope of traditional statistical decision theory. From this viewpoint, the enterprise of "statistical causality" could fruitfully be rebranded as "assisted decision making".
Centre for Statistical Methodology Lecture: “Targeted learning: The bridge from machine learning to statistical and causal inference”
4 March 2020
Speaker: Prof Mark van der Laan (University of California Berkeley)
Abstract: Society is drowning in data and the current practice of learning from data is to apply traditional statistical methods that are too simplistic, arbitrarily chosen, and subject to manipulation. Nonetheless, these methods inform policy and science, affecting our sense of reality and judgements. This talk exposes this deceptive practice, and presents a solution — a principled and reproducible approach, termed targeted learning, for generating actionable and truthful information from complex, real-world data. This approach unifies causal inference, machine learning and deep statistical theory to answer causal questions with statistical confidence.
This is a public lecture, intended for academics from several disciplines and those interested in the role of causal inference in machine learning. The audience will hear about the historical developments that led to the recent "marriage" of causality and machine learning, and then specifically about targeted learning.
Centre for Statistical Methodology Seminar
Title: Satellite-based machine learning models to estimate high-resolution environmental exposures across the UK.
Rochelle Schneider and Antonio Gasparrini (LSHTM)
Abstract:
Air pollution is a public health concern, especially fine particulate matter (PM2.5). Both long- and short-term PM2.5 exposures are associated with adverse health outcomes (such as increased mortality and morbidity). Epidemiological assessment often rely on measurements from monitoring networks, which however are geographically sparse and mostly located in major cities. Novel big data data resources, such as aerosol optical depth (AOD) measurement from satellite instruments, offer a wide spatio-temporal coverage and can address limitations of traditional exposure methods.
In this talk, we present satellite-based machine learning models to reconstruct levels of PM2.5 at high spatial and temporal resolution in Great Britain within the period 2003-2018. The model combines earth observation satellite measurements with multiple resources, including station data, climate and atmospheric models, traffic data, land-cover, and other geospatial features. The model then rely on a multi-stage random forest algorithm to predict PM2.5 concentrations at various temporal (daily to yearly) and spatial (1km to 100m) resolution. Such exposure data can be liked to small-area or individual-level health databases to perform country-wide epidemiological analyses on the health risks associated to air pollution.
Centre for Statistical Methodology Workshop: “Methods in Integrative Genomics”
Speakers:
- Manuela Zucknick (University of Oslo): Multivariate structured Bayesian variable selection for treatment prediction in pharmacogenomic screens Slides (.pdf, 3.4 MB)
- Ernest Diez Benavente (LSHTM): A molecular barcode to inform the geographical origin and transmission dynamics of Plasmodium vivax malaria Slides (.pdf, 1.3 MB)
- Ricard Argelaguet (European Bioinformatics Institute): MOFA: a principled framework for the unsupervised integration of multi-omics data Slides (.pdf 1.0 MB).
- Paul Kirk (MRC Biostatistics Unit, Cambridge): Integrative clustering approaches for multi-omics datasets Slides (.pdf 8 MB).
Centre for Statistical Methodology Seminar
Title: Statistical methods for cost-effectiveness analysis: a personal history.
Andy Briggs (LSHTM)
Health Economics Theme
Abstract: The last 25 years have seen a large increase in the contribution that health economic analysis has made in national and international decisions about health care provision. Andy Briggs has been working at the interface between medical statistics and health economics throughout this period. In this talk he gives a personal history of that journey with an emphasis on how statistical thinking has improved the methods of health economic evaluation over that period. Looking to the future, there remains much potential for statistical methods to continue to improve the way in which we evaluate the cost-effectiveness of health care interventions and to improve health care decision making as a result.
Centre for Statistical Methodology Seminar
Title: Adjusting for selection bias due to missing data in electronic health records-based research.
Tanayott Thaweethai (Harvard School of Public Health)
Abstract: The widespread adoption of electronic health records (EHR) over the last decade has resulted in an explosion of data available to researchers, which has transformed the landscape of observational research. Since EHR are not collected for research purposes, observational studies using EHR are particularly susceptible to issues of missing data. I present a scalable method that considers a modularization of the data provenance, which entails breaking down the path to observing ‘complete’ data in the EHR into a sequence of decisions or events. Following modularization, the analyst has the flexibility to model each ‘step’ along the sequence individually using inverse probability weighting (IPW) or multiple imputation (MI). In some settings, this approach can even handle data suspected to be missing not at random. I establish the asymptotic properties of an estimator that combines IPW with MI, finding that Rubin’s standard combining rules can be substantially biased under certain conditions. I applied this approach to two settings: first, to address missing baseline and follow-up BMI in a study of bariatric surgery among patients with renal impairment, and second, to address missing eligibility criteria in a single-arm clinical trial where a synthetic control arm is built from patient EHR data.
Centre for Statistical Methodology Seminar
Title: Assessing causal effects in the presence of treatment switching through principal stratification.
Fabrizia Mealli (University of Florence)
Causal inference Theme
Abstract: Consider clinical trials focusing on survival outcomes for patients suffering from Acquired Immune Deficiency Syndrome (AIDS)-related illnesses or particularly painful cancers in advanced stages. These trials often allow patients in the control arm to switch to the treatment arm if their physical conditions are worse than certain tolerance levels. The Intention-To-Treat analysis compares groups formed by randomization regardless of the treatment actually received. Although it provides valid causal estimates of the effect of assignment, it does not measure the effect of the actual receipt of the treatment and ignores the information of treatment switching in the control group. Other existing methods propose to reconstruct the outcome a unit would have had if s/he had not switched. But these methods usually rely on strong assumptions, for example, there exists no relation between patient’s prognosis and switching behavior, or the treatment effect is constant. Clearly, the switching status of the units in the control group contains important post-treatment information, which is useful to characterize the treatment effect heterogeneity. We propose to re-define the problem of treatment switching using principal stratification and introduce new causal estimands, principal causal effects for patients belonging to subpopulations defined by the switching behavior under control. For statistical inference, we use a Bayesian approach to take into account that (i) switching happens in continuous time generating infinitely many principal strata; (ii) switching time is not defined for units who never switch in a particular experiment; and (iii) survival time and switching time are subject to censoring. We illustrate our framework using a synthetic dataset based on the Concorde study, a randomized controlled trial aimed to assess causal effects on time-to-disease progression or death of immediate versus deferred treatment with zidovudine among patients with asymptomatic HIV infection. Joint work with Alessandra Mattei and Peng Ding.
Centre for Statistical Methodology Seminar
Title: Selecting causal risk factors from high-throughput experiments using multivariable Mendelian randomization.
Verena Zuber (Imperial College London)
Bayesian Theme Seminar
Slides and audio available soon
Abstract: Modern high-throughput experiments provide a rich resource to investigate causal determinants of disease risk. Mendelian randomization (MR) is the use of genetic variants as instrumental variables to infer the causal effect of a specific risk factor on an outcome. Multivariable MR is an extension of the standard MR framework to consider multiple potential risk factors in a single model. However, current implementations of multivariable MR use standard linear regression and hence perform poorly with many risk factors.
Here, we propose a novel approach to multivariable MR based on Bayesian model averaging (MR-BMA) that scales to high-throughput experiments and can select biomarker as causal risk factors for disease. In a realistic simulation study we show that MR-BMA can detect true causal risk factors even when the candidate risk factors are highly correlated. We illustrate MR-BMA by analysing publicly-available summarized data on metabolites to prioritise likely causal biomarkers for cardiovascular disease.
Centre for Statistical Methodology Symposium: “Quantitative approaches to personalised medicine”
12 November 2019
Speakers:
- Dr John Whittaker (Glaxo Smith Klein Pharmaceuticals): The pharmaceutical industry and personalisation: what have we learnt, and what’s required in future? Slides (.pdf, 0.1MB)
- Professor Mihaela van der Schaar (University of Cambridge): Transforming medicine through Artificial Intelligence-enabled healthcare pathways Slides (.pdf, 1.5MB)
- Dr Brian Tom (MRC Biostatistics Unit, Cambridge): Personalising inter-donation intervals amongst blood donors Slides (.pdf, 0.4MB)
- Dr Karla Diaz-Ordaz (LSHTM): Using data-adaptive methods to investigate conditional treatment effects: towards personalised treatment regimes Slides (.pdf, 0.9MB)
- Professor Andrew Briggs (LSHTM): The economics of personalised medicine: threat or opportunity? Slides (.pdf, 0.4MB)
- Dr Stephen Senn (Independent Statistical Consultant, Edinburgh): A statistical sceptic’s view of personalised medicine Slides (.pdf, 0.9MB)
Centre for Statistical Methodology Seminar
Title: Dealing with missing binary outcomes in cluster randomized trials: weighting vs. imputation methods
Elizabeth L. Turner (Duke University)
Analysis of Clinical Trials Theme
Abstract: Cluster randomized trials are commonly used to evaluate the impact of public health interventions on a range of outcomes and in a range of global health settings. Yet, most CRTs have some missing outcome data and analysis of available data may be biased when outcome data are not missing completely at random. In this talk, we will focus on analysis of CRTs with binary outcomes using the generalized estimating equations (GEE) approach.
In this context, multilevel multiple imputation for GEE (MMI-GEE) has been widely used and methodological work has been undertaken to evaluate its properties (e.g. see work by LSHTM researchers including Hossain, Diaz-Ordaz and Bartlett). Performance of this method has been shown to be very good but there are some challenges to implementing this procedure in standard software. Alternative approaches such as inverse probability weighted GEE (W-GEE) are less common but may be easier to implement in practice. Therefore, we have evaluated properties of W-GEE methods and compared the results with MMI-GEE for binary outcomes using both simulations and using a real data example from a CRT to evaluate the effect of a teacher-training intervention on child literacy outcomes in Kenya. This is joint work with Lanqiu Yao, Fan Li and Melanie Prague.
Centre for Statistical Methodology Seminar
Title: Causal inference and competing events
Jessica Young (Havard Medical School)
Causal Inference Theme
Abstract: In failure-time settings, a competing risk event is any event that makes it impossible for the event of interest to occur. For example, cardiovascular disease death is a competing event for prostate cancer death because an individual cannot die of prostate cancer once he has died of cardiovascular disease. Various statistical estimands have been defined as possible targets of inference in the classical competing risks literature. These include the so-called cause-specific hazard, subdistribution hazard, marginal hazard, cause-specific cumulative incidence and marginal cumulative incidence. Many reviews have described these statistical estimands and their estimating procedures with recommendations about their reporting when the goal is causal effect estimation. However, this previous work has not used a formal framework for characterizing causal effects and their identifying conditions which makes it difficult to evaluate these recommendations, even in a randomized trial with no loss to follow-up. Here we will place these estimands within a counterfactual framework for causal inference in order to:
- define counterfactual contrasts in each of these estimands under different treatment interventions
- interpret each contrast under data generating assumptions represented by a causal DAG
- understand identification of each of these contrasts in data with censoring events, including how identification can be evaluated with causal DAGs and
- how the combined choice of estimand and identifying assumptions leads to a choice of estimating procedure
Centre for Statistical Methodology Seminar
Title: Beyond the average: Contrasting targeted learning and causal forests for inference about conditional average treatment effects of social health insurance programmes
Noemi Kreif (University of York)
Big Data and Machine Learning Theme
Abstract: Researchers evaluating social policies are often interested in identifying individuals who would benefit most from a particular policy. Recently proposed causal inference approaches that incorporate machine learning (ML) have the potential to help explore treatment effect heterogeneity in a flexible yet principled way. We contrast two such approaches in a study evaluating the effects of enrollment in social health insurance schemes on health care utilisation of Indonesian mothers. First, we apply a double-machine learning approach, targeted minimum loss-based estimation (TMLE) where we estimate both the outcome regression and the propensity score flexibly using an ensemble ML approach. From the individual-level predictions of potential outcomes we calculate individual-level treatment effects and use a Random Forest (RF) procedure to identify the variables that predict these effects. We contrast this exploratory approach to an application of the Causal Forests method (Wager and Athey, 2018 JASA), which has been designed to directly estimate heterogeneous treatment effects, by modifying the standard RF algorithm to maximise the variance of the predicted treatment effects. In both analyses we find that the most important effect modifiers include educational status, age and household wealth. When reporting conditional average treatment effects (CATEs) for these subgroups, the methods agree that less well-educated and younger mothers would benefit more from health insurance than well-educated and older ones. The CATEs reported by the Causal Forests have larger confidence intervals than those reported by the TMLE approach, potentially due to the extra sample splitting step employed.
Centre for Statistical Methodology Seminar
Title: Post-“Modern Epidemiology”: when methods meet matter
George Davey Smith (University of Bristol)
Causal Inference Theme
Slides (pdf)
Abstract: In the last third of the 20th century, etiological epidemiology within academia in high-income countries shifted its primary concern from attempting to tackle the apparent epidemic of non-communicable diseases to an increasing focus on developing statistical and causal inference methodologies. This move was mutually constitutive with the failure of applied epidemiology to make major progress, with many of the advances in understanding the causes of non-communicable diseases coming from outside the discipline, while ironically revealing the infectious origins of several major conditions. Conversely, there were many examples of epidemiologic studies promoting ineffective interventions and little evident attempt to account for such failure. Major advances in concrete understanding of disease etiology have been driven by a willingness to learn about and incorporate into epidemiology developments in biology and cognate data science disciplines. If fundamental epidemiologic principles regarding the rooting of disease risk within populations are retained, recent methodological developments combined with increased biological understanding and data sciences capability should herald a fruitful post–modern.
Centre for Statistical Methodology Seminar
Causal Inference Theme
A new approach to generalizability of clinical trials
Anders Huitfeldt (LSE)
Centre for Statistical Methodology Seminar
Causal Inference Theme
Using Quantitative Bias Analysis to Deal with Misclassification in the Results Section, not the Discussion Section.
Matt Fox (Boston University)
Centre for Statistical Methodology Seminar
Statistical Computing Theme
An extended mixed-effects model for meta-analysis: statistical framework and the R package mixmeta.
Antonio Gasparrini and Francesco Sera (LSHTM)
Centre for Statistical Methodology Seminar
Big Data Theme
Large numbers of explanatory variables.
Heather Battey (Imperial College London)
Slides available soon
Centre for Statistical Methodology Seminar
Friday 14 December 2018
Design and analysis of trials where the outcome is a rate of change, with an introduction to a new Stata package for sample size calculation
Chris Frost and Amy Mullick (LSHTM)
Centre for Statistical Methodology Seminar
Friday 30 November 2018
Uncertainty and missing data in dietary intake and activity data.
Graham Horgan (Rowett Institute, University of Aberdeen)
Slides and audio (external website)
Centre for Statistical Methodology Seminar
Clinical Trials Theme
Friday 23 November 2018
Lessons learned from implementing a stratified medicine master protocol: The National Lung Matrix Trial
Prof Lucinda Billingham (University of Birmingham)
Slides available soon
Centre for Statistical Methodology Seminar
Clinical Trials Theme
Friday 2 November 2018
Response-Adaptive Randomisation: Implementing Optimality Criteria in Clinical Trials
Sofia Villar (MRC Biostatistics Unit, Cambridge)
Slides (pdf)
Centre for Statistical Methodology Seminar
Friday 26 October 2018
Framework and practical tool for eliciting expert priors in clinical trials
with MNAR outcomes
Alexina Mason (LSHTM)
Slides (pdf)
Centre for Statistical Methodology Seminar
Friday 28 September 2018
Assessing comparative effectiveness of cancer treatments in the SEER-Medicare linked database: a causal approach
Lucia Petito (Harvard School of Public Health)
Slides and audio (external website)
Centre for Statistical Methodology Seminar
Missing Data & Measurement Error Theme
6 July 2018
Generating multiple imputation from multiple models to reflect missing data mechanism uncertainty: Application to a longitudinal clinical trial
Prof Ofer Harel (University of Connecticut)
Slides and audio (external website)
Centre for Statistical Methodology Seminar
2 July 2018
Bayesian treatment comparison using parametric mixture priors computed from elicited histograms
Moreno Ursino (Cordeliers Research Centre, Paris)
Slides and audio (external website)
Centre for Statistical Methodology Seminar
Health Economics Theme
29 June 2018
Experiences of structured elicitation cost-effectiveness analyses
Marta Soares (University of York)
Slides and audio (external website)
Centre for Statistical Methodology Seminar
Time Series Regression Analysis Theme
18 May 2018
Case time series: a flexible design for big data epidemiological analyses
Antonio Gasparrini (LSHTM)
Slides and audio (external website)
Centre for Statistical Methodology Seminar
Causal Inference Theme
4 May 2018
How to obtain valid tests and confidence intervals for treatment effects after confounder selection?
Prof Stijn Vansteelandt (University of Ghent & LSHTM)
Slides and audio (external website)
Centre for Statistical Methodology Seminar
Survival Analysis Theme
27 April 2018
Dynamic prediction in fertility
Nan van Geloven (University of Leiden)
Slides and audio (external website)
Early Career Researcher Showcase
26 March 2018
- Ruth Farmer (LSHTM): Dealing with time dependent confounding in diabetes pharmacoepidemiology: an application of marginal structural models to electronic health care records
- Baptiste Leurent (LSHTM): Sensitivity analysis for informative missing data in cost-effectiveness analysis
- Christen Gray (LSHTM): Use of the Bayesian family of methods for correcting exposure measurement error in polynomial regression models
- Jennifer Thompson (LSHTM): Advice for using generalised estimating equations in a stepped-wedge trial
- Benedetta Pongiglione (UCL): Disability and all-cause mortality in the older population: evidence from the English Longitudinal Study of Ageing
- Andrea Gabrio (UCL): Statistical issues in small/pilot cost-effectiveness analyses from individual level data
- Gillian Stresman (LSHTM): Spatial analysis to understand malaria transmission and the potential for spatially targeted interventions
- Prof Vern Farewell (MRC Biostatistics Unit): Use of a multi-state model with a composite arthritis outcome
Slides and audio (external website)
Centre for Statistical Methodology Seminar
Big Data Theme
26 January 2018
Statistical methods for real-time monitoring of health outcomes
Prof Peter Diggle (University of Lancaster)
Slides (.pdf, 8.2MB)
Slides and audio (external website)