Automatic for the chemist

UPDATE: This work eventually led to the Synthia software from Merck.

For decades, chemists have toiled over reaction flasks searching for new ways to mix and match atoms to make new molecules with which to cure ills, boost crops and generally improve our standard of living. There are countless still who spend their working days scouring the scientific literature for shortcuts and using trial and error to find fast and efficient synthetic routes to that all-powerful catalyst or a wonder drug from an obscure soil fungus. Less than flask-happy chemists hope to use computer programs to design their reactions for them and ultimately control the robot arm to shake the test-tube for them.

German chemist Johann Gasteiger together with colleagues at the Institute for Organic Chemistry at the University of Erlangen-Nurnberg has spent fifteen years or so designing a neural network program that might be a first step on the way to hanging up the lab-coat.

cannabinoid comes easier?
Why spend weeks designing a synthesis?
His system uses the accrued information found in commercial databases containing hundreds of thousands of chemical reactions – each with its own reaction conditions: cooking time, pressure, catalysts, reagents and acidity, listed together with physical parameters about the molecules involved.

Today, a chemist might search such a database manually or use a search program to pick out reactions of interest. This, according to Gasteiger, can get embarrassing, “A single search can lead to a list of several hundred reactions from a database that can contain millions,” he explains, “so manual analysis is both laborious and time consuming.” One way to cut down on the effort involved is to classify the multitude of reactions.

Chemists have been classifying whole swathes of reactions for years by naming them after their inventors – the Wittig, the Beckman, the Diels-Alder, but, posits Gasteiger, this system does not help very much in indicating to the chemist exactly what takes place in a particular reaction brew. This is especially true because there are literally dozens of variants in each class. He felt that the solution would be a neural network could do the sorting for him. “There are two approaches to teaching a neural network chemistry”, explains Gasteiger, “supervised and unsupervised learning.” The former is labour intensive and involves presenting the network with input patterns for thousands of reactions and telling it which ones work in which circumstances. “We prefer the unsupervised approach,” says Gasteiger, “It cuts the workload considerably.”

How to teach a neural network chemistry? Gasteiger and his team have used a Kohonen network – a computer model of how our brains organise sensory information – sights, sounds, and tactile feelings in which inputs are mapped onto a two-dimensional network of neurones. By extending this mapping process to the properties of reactions in a database they could gain important information about many reactions at once.

Instead of sensory inputs for the network the researchers used each factor affecting a reaction – such as temperature and acidity – and these co-ordinates were fed into the neural network.

The team picked on a single broad class of reactions to test their networking ideas: reactions that involved adding a carbon-hydrogen group to an alkene. This type of reaction encompasses a variety of important schemes used to produce many industrially useful chemicals such as esters for artificial flavourings – so-called Michael additions, Friedel-Crafts alkylations by alkenes and free radical additions to alkenes.

They used a search program to narrow things down first – they obtained a set of 120 reactions from a 370 000 strong database. They then chose seven characteristic physical properties associated with the actual portion of the molecule that changes – the reaction centre – as the input for the neural network. For instance, the ability of the double bonds between carbon atoms to attract electrons, its electronegativity, the total charge, and the degree of possible distortion of the electron cloud in the bond, its polarisability.

The network they used is a grid of 12×12 neurones with a “weight” associated with the seven chosen variables. When a reaction is input the variables are mapped into the neurone whose weights are most similar to the input. Reactions are input sequentially and after each entry the weights on each neurone are adjusted to make them more similar to the input variables. The adjustment is highest the closer the hit on each neurone and tails off with distance.

The next input if it has similar variables will be mapped on to a neurone close to the first but if it is different a neurone it will locate on a distant neurone and the weights will be adjusted again. The result of these weight adjustments is that the network is trained to recognise patterns of parameters and to place a particular reaction accordingly. Eventually a 2D landscape of reactions is built up with similar reactions close to each other forming groups of reaction types. Logically, reactions far apart in the landscape are very different. Isolated peaks in the landscape point to unusual and uncommon reactions.

The most exciting aspect of the way Gasteiger’s neural network can classify reactions is not that it verifies the system already used by chemists every day, but that if they have a new compound they can look at the seven variables, feed them into the trained network and the network will assign it to a specific neurone. This allows the chemist to see the likely reaction a molecule will undergo in the lab. For instance, if a molecule finds itself at the centre of the area of the map covered by the so-called Michael addition then it is likely to undergo a standard Michael addition. If it is further afield it will probably undergo something more exotic.

It took less than 20 seconds for Gasteiger’s team to train the network with their sample of 120 reactions on a Sun workstation. So to train it on the full reaction database would take little more than a day or two allowing some time for checking. Gasteiger points out that computer time once the neural network is trained is very short (less than half a second) so making predictions about a particular molecule is very fast.

Classifying reactions is not the whole story though – once you know what type of reaction a molecule will undergo, the next step is to work out how it can be used to build up more complex molecules. Chemists usually picture a target molecule and cut it up into smaller jigsaw pieces that can then be re-assembled in the reaction flask. The difficulty lies not only in knowing where to make the breaks to simplify the reactions needed to put the puzzle back together, but in finding reactions that can make the lugs of each jigsaw piece fit together properly. This might be where Gasteiger’s neural network could help in predicting what would work.

Corey’s own program for automating the process, LHASA (Logic and Heuristics Applied to Synthetic Analysis), is marketed by LHASA UK, a company based at the University of Leeds). According to Nigel Greene of LHASA UK, “LHASA is a knowledge-based expert system not a reaction database.” It uses what he calls transforms to describe a generic chemical reaction class e.g. the Michael addition. These transforms are compiled manually from a study of the chemical literature. The program then searches the query compound for the correct stuctural requirements in order to apply the transforms, which is tantamount to picturing the break-up of the jigsaw.

According to James Hendrickson of Brandeis University, “there are literally millions of different routes possible, from different starting materials, to any substance of interest.” He and his team have devised a program (SYNGEN), which can find the shortest route to any molecule from available starting materials. First, SYNGEN looks for the best way to dismantle the target jigsaw. Then, for each dissection it generates the reactive chemical groups needed to carry out that reaction sequence to build the product. Results are displayed onscreen. “In a number of cases to date, the computer has generated the current industrial routes to several pharmaceuticals, such as estrone,” explains Hendrickson. SYNGEN has also proposed more efficient routes to numerous compounds such as lysergic acid, the precursor to ergot drugs and LSD. A new version of the program is in development ready for licensing to pharmaceutical companies this year.

William Jorgensen of Yale University in New Haven Connecticut is working on yet another program CAMEO (Computer Aided Mechanistic Evaluation of Organic reactions). The chemist feeds the starting materials – using a sketchpad – and the reaction conditions – via drop-down menus – into CAMEO, virtually speaking, and the program attempts to predict the course of the reaction. It assembles a reaction from underlying mechanistic steps because as Jorgensen points out a large fraction of organic reactions are just combinations of various fundamental steps.

Sometimes CAMEO (also marketed by LHASA UK) claims no reaction product will emerge, a chemical rule would be broken if it were. The chemist can then run the reaction again virtually in a different solvent or at a higher temperature and watch the result, cutting testing time in the lab.

The various programs apart may not seem to offer a chance for the chemist to boost their leisure time but together they may provide a way of classifying reaction types, working out what type of reaction might take to yield a new molecule using a neural network, feeding it into CAMEO to see whether reactions with other molecules could lead to it and then using SYNGEN to optimise the route.

Some chemists are not worried about losing their jobs just yet though. Al Meyers of Colorado State University at Fort Collins, says, “There is a delicate balance between reacting species, solvents concentrations, selective reaction behaviour, and most important, the human ability to observe what is happening, cannot be incorporated into a reaction software package.” Software will play its role though, “The synthesis programs can bring into focus the many options available to the seasoned chemist”, he adds.

We will have to wait and see who or what is shaking the reaction flasks in ten years time.