Close

Using automated machine learning for large-scale, locally-tailored disease surveillance

Title of PhD project / theme

Using automated machine learning for large-scale, locally-tailored disease surveillance. 

Supervisory team

LSHTM
Lead: Thibaut Jombart (thibaut.jombart@lshtm.ac.uk, Faculty of Epidemiology and Population Health) 
John Edmunds

Nagasaki University
Xerxes Seposo Tesoro (seposo.xerxestesoro@nagasaki-u.ac.jp)
 

Brief description of project / theme

The COVID-19 pandemic has highlighted the need for rapid and accurate detection of localised disease hotspots. Unfortunately, traditional surveillance methods often struggle to accommodate local confounders such as differences in testing, reporting delays, and periodicity which may vary from one location to another [1].  

Automated machine learning (autoML) offers excellent potential for addressing such challenges, as it permits to select automatically the best-fitting model for each location separately, each time accounting for local features which impact cases dynamics. The new surveillance algorithm ‘ASMODEE’ implements this principle for COVID-19 surveillance [2], and is now part of surveillance pipelines at the World Health Organization (WHO) and in other public health agencies [3]. 

The aim of this project will be to further develop autoML approaches for disease surveillance. First, the candidate will assess the impact of locally varying confounders on the detection of disease hotspots, and compare the performances of ASMODEE to classical surveillance methods. This approach will also be extended to environmental predictors, with potential applications to non-communicable diseases. Second, we will investigate the impact of spatial structures for detecting local ‘aberrations’, i.e. locations where case dynamics depart noticeably from their neighbours. Last, we will explore how machine learning methods can be used for predicting future hotspots, using the WHO COVID-19 monitoring platform (developed by TJ) as a case study. 

This work will be led in close collaboration with the COVID-19 analytics team at the WHO. It will also benefit from support in software development at LSHTM through the Epiverse initiative [4] and the R Epidemics Consortium [5]. All developments are made available as free, open-source software, to maximize the impact of our work. 

References

[1] Salmon M, Schumacher D, Höhle M. Monitoring Count Time Series in R: Aberration Detection in Public Health Surveillance. Journal of Statistical Software, Articles. 2016;70: 1–35. 
[2] Jombart T, Ghozzi S, Schumacher D, Taylor TJ, Leclerc QJ, Jit M, et al. Real-time monitoring of COVID-19 dynamics using automated trend fitting and anomaly detection. Philos Trans R Soc Lond B Biol Sci. 2021;376: 20200266. 
[3] Jombart T. Outbreak Monitoring using ASMODEE: an example of automated data infrastructure. 15 Jul 2021 [cited 1 Nov 2021]. Available: https://asmodee-infrastructure-handbook.netlify.app/ 
[4] Epiverse: Distributed Pandemic Tools Program. 5 Aug 2021 [cited 1 Nov 2021]. Available: https://data.org/initiatives/epiverse/ 
[5] RECON - R Epidemics Consortium. 2018 [cited 26 Sep 2018]. Available: https://www.repidemicsconsortium.org/ 

The role of LSHTM and NU in this collaborative project

This project will benefit from the shared experience of LSHTM and NU supervisors in infectious disease epidemiology and modelling. As the objective is to develop tools which can accommodate different epidemiological contexts, the project will also benefit from comparing epidemic situations in Japan and in the UK. 
All supervisors have expertise in time series modelling. XST will bring the expertise in environmental epidemiology needed for extending the methodology to non-communicable diseases. TJ and JE bring the needed expertise in infectious disease epidemiology and surveillance. 

Particular prior educational requirements for a student undertaking this project

Candidates should hold a degree in infectious disease epidemiology, modelling, statistics or data science. 

Skills we expect a student to develop/acquire whilst pursuing this project

The student will gain knowledge in statistics and machine learning methods, model selection, disease surveillance, R programming and good practices for reproducible data science.