Introduction
I am a Data Scientist with a robust background in data management & analysis in population research. Since joining the school in 2017, I have contributed significantly to major international networks like ALPHA and INSPIRE. My academic credentials include a BSc Honours in Physics and an MBA in Information Technology. I am also a passionate researcher, pursuing my PhD alongside my professional responsibilities. My professional focus includes developing data pipelines, harmonizing large datasets, and leveraging advanced data science tools to enhance population research and policy.
Early Career Contribution to Population Research
- Pioneering Data Capture Systems: Spearheaded the introduction of electronic field data capture at the Vadu Health and Demographic Surveillance System, transitioning from laptops to Android tablets for large-scale longitudinal field surveillance.
- INDEPTH Network Leadership: Served on the Scientific Advisory Committee and led data integration and building of a data repository for HDSS sites in Africa, Asia, and Oceania (http://indepth-ishare.org/) under the iSHARE2 project.
- Health Surveillance: Directed the technical team for India's National Surveillance System for Enteric Fever.
- Developed and deployed a platform for the Health Care Human Resource Database Building Project for the State of West Bengal, India in collaboration with the Society for Health & Demographic Surveillance and the Department of Health and Family Welfare. This project aimed to assess the capacity and distribution of healthcare professionals across the state.
- Supported the establishment of data management systems for HDSS sites at Lavale, Sewagram, and Kasurdi in India under various academic and medical institutions.
Data Science Roles in Major Networks
- ALPHA Network: Designed ETL pipelines for efficient data management using Pentaho Data Integration and automated quality checks and harmonization processes with the Center-in-a-Box (CiB) environment. (https://alpha.lshtm.ac.uk/)
- INSPIRE Network: Leveraged OHDSI tools to map data to the OMOP CDM, harmonized COVID-19 data, and developed synthetic datasets. (https://aphrc.org/inspire/ & https://inspiredata.network/)
ALPHA Network Contributions
At the ALPHA Network, I designed and implemented an advanced ETL (Extract, Transform, Load) pipeline utilizing Pentaho Data Integration to efficiently manage site data within the Centre-in-a-Box (CiB) environment. This pipeline ensures seamless data quality, harmonization, and alignment with research objectives. Additionally, I developed and automated critical data management processes, including data uploads, quality checks, reporting, and harmonization, all of which are central to advancing HIV research and generating valuable health insights. Furthermore, I successfully migrated HIV data from two HDSS sites into the standardized HICDEP format, contributing to the consistency and interoperability of data across the network.
INSPIRE Network Contributions
In INSPIRE, I leverage OHDSI tools to transform data from ALPHA specifications into the OMOP Common Data Model (CDM). I lead efforts to harmonize COVID-19 data from the Integrated Disease Surveillance and Response (IDSR) system across the African region into the OMOP CDM. Additionally, I configured the INSPIRE platform-as-a-service (PaaS) on Microsoft Azure, enabling efficient and scalable data processing.
A key highlight of my work includes generating a synthetic dataset for WHO’s IDSR system focused on COVID-19, and supporting the development of an ETL pipeline for seamless migration to OMOP CDM. This work, documented in a GitHub repository (https://github.com/tathagatabhattacharjee/), is now expanding to cover mental health, infectious, and non-communicable diseases.
INSPIRE PEACH Project
Under the INSPIRE umbrella, I was involved in the PEACH (Platform for Evaluation and Analysis of COVID-19 Harmonised Data) project. This initiative builds a Pan-African data ecosystem, harmonizing health data to enhance policy and pandemic response capabilities. https://inspiredata.network/)
INSPIRE Mental Health Project
In the INSPIRE Mental Health Project, I contributed to the integration and harmonization of mental health data across diverse datasets using the OMOP Common Data Model (CDM). My work involved mapping mental health indicators and leveraging advanced analytics to improve understanding and research on mental health conditions in low- and middle-income countries focusing on Africa. This initiative plays a crucial role in advancing mental health research by providing standardized, high-quality data for global analysis
Co-Principal Investigator, RESPIRE Platform III
As Co-PI for RESPIRE Platform III: Open Science, Data & Methodologies, I contribute to the development of open-access tools, promote data standardization, and enhance capacity building in respiratory health research.
(https://usher.ed.ac.uk/respire/platforms/open-science-data-methodologies)
Data Science Without Borders (DSWB)
DSWB is a global initiative dedicated to bridging data science expertise across institutions to enhance public health research and policy in Africa. My role involves harmonizing diverse datasets, building analytical pipelines, and empowering partner organizations through capacity building and technical support which also involves identifying research questions from the datasets of the Pathfinder sites and using Artificial Intelligence (AI) / Machine Learning (ML) to advance analytics capabilities.
I am currently involved in two impactful projects: the INSPIRE Mental Health project and the Data Science Without Borders (DSWB) project.
With a strong foundation in data science and public health, I am dedicated to advancing global health research through data-driven solutions and innovative methodologies.
Affiliations
Teaching
I have been teaching ETL techniques as part of the Health Data Management Module for the MSc Health Data Science program across three academic years, from 2020-2021 through 2022-2023.
Research
My research leverages Machine Learning (ML) techniques to enhance retrospective record linkage between Health & Demographic Surveillance Systems (HDSS) and health clinic data in low- and middle-income countries (LMICs). This work focuses on improving health monitoring and epidemiological analysis of both communicable and non-communicable diseases, with an emphasis on HIV. By integrating diverse health datasets, my goal is to generate actionable insights that can inform health interventions and strengthen healthcare systems in these regions.
My research is guided by a team of supervisors who provide invaluable support and expertise. Professor Jim Todd and Dr. Emma Slaymaker from the Department of Population Health, along with Dr. Chodziwadziwa Kabudula from the University of the Witwatersrand, South Africa, have been instrumental in shaping the direction of my work. Their deep commitment and guidance play a crucial role in advancing my research.