United Kingdom
Built and benchmarked LLM workflows leveraging transformer-based representation learning (SapBERT) to harmonise biomedical ontologies (12M+ entities), enabling consistent data integration across heterogeneous sources. Designed a quantitative framework for scoring disease prevalence across pharmaceutical markets, prioritising indications for early-stage drug discovery. Implemented features to support validation of target–indication pairs, mining structured genomic and disease databases to identify causal genes for target prioritisation. Collaborated with engineering and business analysis teams to develop retrieval-based, agentic LLM workflows for extracting structured insights from scientific literature. Developed few-shot prompting approaches for text-to-SQL queries of biomedical databases. Evaluated integration of knowledge graphs with retrieval-augmented generation (GraphRAG) to support conversational AI in drug development. Built and deployed an OpenSearch-based API for lexical retrieval over ontology terms, owning curation, indexing, and implementation end to end.
First author of “Equity in cancer genomics in the UK: Ancestry analysis for a national cancer cohort” published in Lancet Oncology 2025. Analysed effects of patient ancestry on clinical genomic diagnostics in the 100,000 Genomes Project Cancer Programme. Engineered pipelines to aggregate genetic data from Oxford Nanopore sequencing. Designed and executed benchmarking of techniques to detect, correct, and merge noisy genetic variant data.
Designed new bioinformatic methods for detecting copy number variation in selective whole genome amplified samples. Developed features for sample tracking system in Django using agile software development methodologies. Analyzed effects of various wetlab protocols, and bioinformatic approaches on malaria sample genotypes. Developed rapid lineage typing bioinformatic pipeline to identify SARS-CoV-2 variants of concern for informing reports sent to UK Health and Security Agency within hours of sequencing. Investigated association between vaccination and infection with different SARS-CoV-2 lineages.
Analyzed viral evolution of resistance to immune response and treatment using next generation sequencing. Designed primers for Hepatitis infection diagnostics. Employed Random Forests for predicting sources of phylogenetic reconstruction errors. Developed MPI Python software to overcome missing data in phylogenetic reconstructions from shotgun sequencing.
Developed pipelines to assemble 3.6 Gbp sunflower genome, integrating physical and genetic maps. Coordinated bioinformatic resources for ~30 person evolutionary genetics lab. Authored successful grant proposal for $20K worth of computational resources from Compute Canada.