Yale chemists go ‘retro’ with new AI-based model

Victor Batista
Retrosynthesis, a longstanding challenge in organic chemistry, has a complexity that is similar to a game of chess. You have a target molecule you’re aiming for, a set of basic materials to get there, and must pursue a set of steps to accomplish the task.
But in retrosynthesis, the possibilities for each step expand exponentially, making it incredibly difficult and time-consuming to reach a target.
But in a recent study published in the Journal of Chemical Information and Modeling, Yale’s Victor Batista and members of his lab describe a novel, artificial intelligence-based approach to direct, multistep retrosynthesis. Batista is the John Gamble Kirkwood Professor of Chemistry in the FAS, a member of the Energy Sciences Institute on West Campus and the Yale Quantum Institute, and director of the Center for Quantum Dynamics on Modular Quantum Devices.
Compared to previous methods, the new approach is three times more likely to suggest a correct route to a target molecule on the first attempt, the researchers say. The research has a public web portal, is open-source, and already has filled more than 800 requests from 100 users.
“Instead of older methods, we re-framed the problem as a sequence prediction task, allowing us to train a transformer model — the same architecture behind large language models like ChatGPT — to predict entire synthesis routes natively,” said Anton Morgunov, a Ph.D. candidate in the GSAS (and a member of Batista’s lab) and co-lead of the project with Ph.D. candidate Yu Shee.
Shee, Morgunov, and Batista are authors of the new study, along with Haote Li, who earned a Ph.D. in chemistry at Yale earlier this year.
The researchers noted that while they have not completely solved the challenge of retrosynthesis — their model struggles with particularly complex chemical structures — their approach shows promise and can be refined further.