Do Chemformers Dream of Organic Matter? Evaluating Transformer Models for Synthesis Prediction in the Pharmaceutical Domain

Language models like transformers have found a natural place in drug discovery, solving tasks such as property prediction, molecular optimization, and reactivity predictions. Transformer models trained on public data for synthesis prediction tasks, such as product and retrosynthesis prediction, have proven effective and sometimes outperform other approaches, including template-based retrosynthesis. In this contribution, we will outline our efforts of training and introducing transformer models for synthesis predictions into our production platform for synthesis planning that is used daily by chemists. We will discuss particular challenges faced when training on a large corpus of reaction data, comparison with existing models currently used by chemists for product prediction and retrosynthesis. We will show that transformer models trained on a diverse set of reactions can surpass existing models with impressive performance. Finally, we will outline outstanding issues preventing the full adoption of transformer models for synthesis prediction.

Biography

Samuel Genheden leads the Deep Chemistry team in Discovery Sciences, AstraZeneca R&D. He received his PhD in theoretical chemistry from Lund University in 2012, having studied computational methods to estimate ligand-binding affinities. He continued with postdocs at the Universities of Southampton and Gothenburg, where he simulated membrane phenomena using multiscale approaches. He joined the Molecular AI department at AstraZeneca in 2020 and became team leader in 2022. The team’s research focuses on the AiZynth platform for AI-assisted retrosynthesis planning. Samuel’s interests lie in studying chemical and biological systems with computers and using these approaches to impact drug development. He is a keen advocate for open-source software.