Automatic for the chemist

UPDATE: This work eventually led to the Synthia software from Merck.

For decades, chemists have toiled over reaction flasks searching for new ways to mix and match atoms to make new molecules with which to cure ills, boost crops and generally improve our standard of living. There are countless still who spend their working days scouring the scientific literature for shortcuts and using trial and error to find fast and efficient synthetic routes to that all-powerful catalyst or a wonder drug from an obscure soil fungus. Less than flask-happy chemists hope to use computer programs to design their reactions for them and ultimately control the robot arm to shake the test-tube for them.

German chemist Johann Gasteiger together with colleagues at the Institute for Organic Chemistry at the University of Erlangen-Nurnberg has spent fifteen years or so designing a neural network program that might be a first step on the way to hanging up the lab-coat.

cannabinoid comes easier?
Why spend weeks designing a synthesis?
His system uses the accrued information found in commercial databases containing hundreds of thousands of chemical reactions – each with its own reaction conditions: cooking time, pressure, catalysts, reagents and acidity, listed together with physical parameters about the molecules involved.

Today, a chemist might search such a database manually or use a search program to pick out reactions of interest. This, according to Gasteiger, can get embarrassing, “A single search can lead to a list of several hundred reactions from a database that can contain millions,” he explains, “so manual analysis is both laborious and time consuming.” One way to cut down on the effort involved is to classify the multitude of reactions.

Chemists have been classifying whole swathes of reactions for years by naming them after their inventors – the Wittig, the Beckman, the Diels-Alder, but, posits Gasteiger, this system does not help very much in indicating to the chemist exactly what takes place in a particular reaction brew. This is especially true because there are literally dozens of variants in each class. He felt that the solution would be a neural network could do the sorting for him. “There are two approaches to teaching a neural network chemistry”, explains Gasteiger, “supervised and unsupervised learning.” The former is labour intensive and involves presenting the network with input patterns for thousands of reactions and telling it which ones work in which circumstances. “We prefer the unsupervised approach,” says Gasteiger, “It cuts the workload considerably.”

How to teach a neural network chemistry? Gasteiger and his team have used a Kohonen network – a computer model of how our brains organise sensory information – sights, sounds, and tactile feelings in which inputs are mapped onto a two-dimensional network of neurones. By extending this mapping process to the properties of reactions in a database they could gain important information about many reactions at once.

Instead of sensory inputs for the network the researchers used each factor affecting a reaction – such as temperature and acidity – and these co-ordinates were fed into the neural network.

The team picked on a single broad class of reactions to test their networking ideas: reactions that involved adding a carbon-hydrogen group to an alkene. This type of reaction encompasses a variety of important schemes used to produce many industrially useful chemicals such as esters for artificial flavourings – so-called Michael additions, Friedel-Crafts alkylations by alkenes and free radical additions to alkenes.

They used a search program to narrow things down first – they obtained a set of 120 reactions from a 370 000 strong database. They then chose seven characteristic physical properties associated with the actual portion of the molecule that changes – the reaction centre – as the input for the neural network. For instance, the ability of the double bonds between carbon atoms to attract electrons, its electronegativity, the total charge, and the degree of possible distortion of the electron cloud in the bond, its polarisability.

The network they used is a grid of 12×12 neurones with a “weight” associated with the seven chosen variables. When a reaction is input the variables are mapped into the neurone whose weights are most similar to the input. Reactions are input sequentially and after each entry the weights on each neurone are adjusted to make them more similar to the input variables. The adjustment is highest the closer the hit on each neurone and tails off with distance.

The next input if it has similar variables will be mapped on to a neurone close to the first but if it is different a neurone it will locate on a distant neurone and the weights will be adjusted again. The result of these weight adjustments is that the network is trained to recognise patterns of parameters and to place a particular reaction accordingly. Eventually a 2D landscape of reactions is built up with similar reactions close to each other forming groups of reaction types. Logically, reactions far apart in the landscape are very different. Isolated peaks in the landscape point to unusual and uncommon reactions.

The most exciting aspect of the way Gasteiger’s neural network can classify reactions is not that it verifies the system already used by chemists every day, but that if they have a new compound they can look at the seven variables, feed them into the trained network and the network will assign it to a specific neurone. This allows the chemist to see the likely reaction a molecule will undergo in the lab. For instance, if a molecule finds itself at the centre of the area of the map covered by the so-called Michael addition then it is likely to undergo a standard Michael addition. If it is further afield it will probably undergo something more exotic.

It took less than 20 seconds for Gasteiger’s team to train the network with their sample of 120 reactions on a Sun workstation. So to train it on the full reaction database would take little more than a day or two allowing some time for checking. Gasteiger points out that computer time once the neural network is trained is very short (less than half a second) so making predictions about a particular molecule is very fast.

Classifying reactions is not the whole story though – once you know what type of reaction a molecule will undergo, the next step is to work out how it can be used to build up more complex molecules. Chemists usually picture a target molecule and cut it up into smaller jigsaw pieces that can then be re-assembled in the reaction flask. The difficulty lies not only in knowing where to make the breaks to simplify the reactions needed to put the puzzle back together, but in finding reactions that can make the lugs of each jigsaw piece fit together properly. This might be where Gasteiger’s neural network could help in predicting what would work.

Corey’s own program for automating the process, LHASA (Logic and Heuristics Applied to Synthetic Analysis), is marketed by LHASA UK, a company based at the University of Leeds). According to Nigel Greene of LHASA UK, “LHASA is a knowledge-based expert system not a reaction database.” It uses what he calls transforms to describe a generic chemical reaction class e.g. the Michael addition. These transforms are compiled manually from a study of the chemical literature. The program then searches the query compound for the correct stuctural requirements in order to apply the transforms, which is tantamount to picturing the break-up of the jigsaw.

According to James Hendrickson of Brandeis University, “there are literally millions of different routes possible, from different starting materials, to any substance of interest.” He and his team have devised a program (SYNGEN), which can find the shortest route to any molecule from available starting materials. First, SYNGEN looks for the best way to dismantle the target jigsaw. Then, for each dissection it generates the reactive chemical groups needed to carry out that reaction sequence to build the product. Results are displayed onscreen. “In a number of cases to date, the computer has generated the current industrial routes to several pharmaceuticals, such as estrone,” explains Hendrickson. SYNGEN has also proposed more efficient routes to numerous compounds such as lysergic acid, the precursor to ergot drugs and LSD. A new version of the program is in development ready for licensing to pharmaceutical companies this year.

William Jorgensen of Yale University in New Haven Connecticut is working on yet another program CAMEO (Computer Aided Mechanistic Evaluation of Organic reactions). The chemist feeds the starting materials – using a sketchpad – and the reaction conditions – via drop-down menus – into CAMEO, virtually speaking, and the program attempts to predict the course of the reaction. It assembles a reaction from underlying mechanistic steps because as Jorgensen points out a large fraction of organic reactions are just combinations of various fundamental steps.

Sometimes CAMEO (also marketed by LHASA UK) claims no reaction product will emerge, a chemical rule would be broken if it were. The chemist can then run the reaction again virtually in a different solvent or at a higher temperature and watch the result, cutting testing time in the lab.

The various programs apart may not seem to offer a chance for the chemist to boost their leisure time but together they may provide a way of classifying reaction types, working out what type of reaction might take to yield a new molecule using a neural network, feeding it into CAMEO to see whether reactions with other molecules could lead to it and then using SYNGEN to optimise the route.

Some chemists are not worried about losing their jobs just yet though. Al Meyers of Colorado State University at Fort Collins, says, “There is a delicate balance between reacting species, solvents concentrations, selective reaction behaviour, and most important, the human ability to observe what is happening, cannot be incorporated into a reaction software package.” Software will play its role though, “The synthesis programs can bring into focus the many options available to the seasoned chemist”, he adds.

We will have to wait and see who or what is shaking the reaction flasks in ten years time.

Interview with Eric Scerri

This “Personal Reactions” interview with Eric Scerri originally appeared in my column in The Alchemist webzine, 1998-04-03.

Biography:
Eric ScerriProfessor Eric Scerri, born 30th August 1953, Malta. Nominated for the Dexter Award in the History of Chemistry. Interested in the philosophy of chemistry, especially philosophical aspects of the periodic system and of quantum chemistry.


Position:

Assistant Professor, Bradley University, Illinois.

Major life events:
Gaining a PhD in History and Philosophy of Science at King’s College, London on the Relationship of Chemistry to Quantum Mechanics. Being invited to the home of philosopher of science Sir Karl Popper for a discussion on quantum mechanics, chemistry, philosophy, life and the universe. Going to the US as a postdoctoral fellow in History and Philosophy of Science at Caltech. Becoming editor of Foundations of Chemistry.


How did you get your current job?
Job advert in Chemical and Engineering News.

What do you enjoy about your work?
Lecturing to students and generally interacting with people. Being paid to do what I enjoy the most, chemistry.

What do you hate about your industry?
The presence of large numbers of people who do no research, do not keep up with recent developments and pontificate endlessly about how “professional” they are.

What was your first experiment?
My first experiment while teaching was the fountain experiment.

Did it work?
No it did not. As anyone who has tried it will tell you, it’s tricky. I made sure I got it to work the second time.

What was your chemistry teacher at school like?
Excellent, warm and inspiring. Both women: Mrs Davis and Mrs Walden at Walpole Grammar, Ealing, London. The school has now been demolished to make space for a housing estate.

Meeting Popper must have been a formative experience?
You bet! First, he got very angry with me because I had sent him an article in which I was criticising his views on the discovery of hafnium. According to him and many others Bohr predicted that hafnium should be a transition metal and not a rare earth and that led directly to the discovery of hafnium by Coster and von Hevesey. The full story is far more complicated as I and others have emphasised.

Popper in fact accepted my specific criticisms on the hafnium case. I think his initial anger was a sort of knee-jerk reaction, which he had to all critics. After about five minutes, he became a perfectly charming host and answered all my questions and made me feel like an equal even in purely philosophical matters.

What is your greatest strength?
Presentation of ideas in lectures. Being able to criticise arguments.

Weakness?
Sometimes over-critical.

What advice would you give a younger scientist?
Concentrate on mastering mathematical techniques. If the student ever wants to go into theory she will have to be a master of mathematical techniques. Chemical theory is very, very interesting.

What would you rather be if not a scientist?
A jazz and blues musician.

In whose band?
In my own band! I have been playing since I was 16 or so.

Which scientist from history would you like to meet?
Linus Pauling

What would you ask him?
About the genesis of quantum chemistry and about the people he came into contact with during his postdoctoral stay in Germany. I think he had the deepest respect for them but was personally more interested in applications to chemistry than reaching a deep understanding of quantum physics. His own approach may have appeared a little too cavalier to the European purists. By his own admission Pauling was working with Bohr’s old quantum theory when he first went to Europe only to be informed by Wolfgang Pauli that more sophisticated versions of quantum mechanics had been developed. Pauling immediately made the switch.

How has the Internet influenced what you do?
Enormously. First of all on a practical level I can find addresses, e-mails, phone numbers of anyone I care to with a little bit of searching. If I read an interesting article I can track down the author and ask them a question a few moments after first reading their ideas.

I should also point out that the Internet brings problems. A student recently wrote a paper for me on the history of the periodic table. He referred exclusively to material on the Internet. Most of the paper was filled with inaccuracies, complete mistakes etc. It was not the student’s fault. The problem is that anyone can set up a beautifully illustrated web page without bothering about the academic content and cast it out on to the Web for unsuspecting students to find. There is of course no [peer] review process for what goes on to the Web.

Wasn’t the student a bit naive to assume total credibility of unqualified sources?
Okay, you are right. He was not a brilliant student and he was lazy. Let’s just say it is tempting for students to sit in their own rooms and surf the Web instead of getting their butts into the library.

Why do you think the public fears science?
Lack of knowledge of course and the hard-edged and clinical image portrayed by many scientists.

What are the ultimate goals for chemists?
I am a philosopher of chemistry and chemical educator. I cannot really answer this question which seems to be directed towards “real chemists”. But do you really mean “ultimate goals”? If I were a theoretical chemist I would say to be able to calculate everything from first principles so that we would never need to do experiments and could pack up and go home. If I were a real chemist reaching such “ultimate goals” would not be much fun.

What will chemistry do in the next ten years?
Nor am I a fortune-teller.

You could speculate though…
Well, I really think computational chemistry and modelling will go on expanding as quickly as do developments in the computer industry. Chemists are going to have to get used to the idea that more and more “experiments” will be done on the computer. This should not imply however that quantum chemistry could explain everything in chemistry – that chemistry has been reduced. Far from it. It just means that computational chemistry can be used as a useful tool along with the various spectroscopic techniques, which have already revolutionised chemistry.

What invention would you like to wipe from history?
Chemical weaponry

Shining, Unhappy Plants

It is the dead of night, one summer just after the turn of the next century. Despite the darkness, a Midwestern farmer is surveying his acres of crops. From several clumps of plants scattered randomly throughout his fields there emanates an eerie blue glow. The farmer worries: The plants are obviously under stress.

If scientists in the United Kingdom are right, this scene might be played out all over the world. Glowing blue plants may someday provide an early-warning system that will alert farmers to infection and herbivore attack in time for defensive action.

At the Institute of Cell and Molecular Biology at the University of Edinburgh, a team led by plant biochemist Anthony Trewavas has been developing a genetic-engineering program to meet this goal. They are working with a protein that causes certain marine creatures, such as the jellyfish Aequorea victoria to give off light when they are attacked by predators. In response to touch, jellyfish cells fill rapidly with calcium ions, which act as a cellular alarm signal during the organism’s response to stress. The calcium ions bind to various molecules, including the protein aequorin. In binding to calcium aequorin gains an influx of energy, which it dissipates by giving off photons. In other words, it glows.

Plant cells also have an electrical response to stresses such as infection, touch and cold shock. Calcium ions pour in, again playing a signaling role in mobilizing the organism’s defenses. Trewavas and his team wanted to effectively amplify the calcium signal so that the farmer could lend a helping hand to a stressed plant. He reported the team’s latest results at the annual Science Festival of the British Association for the Advancement of Science in Newcastle-upon-Tyne in September.

A motivation for the research is the widespread use of blanket spraying of pesticides. Farmers practice blanket spraying in anticipation of infection or infestation because they would lose crops if they waited for visible signs of attack on leaf surfaces–if you wait, it is often too late to rescue the harvest. Farmers equipped with an early-warning system might be able to spray in time to prevent losses, and to spray only areas affected.

In the early stages of their work, the Edinburgh team transferred the genes that code for the fluorescent calcium-binding protein aequorin from the jellyfish into tobacco plants and mosses. They succeeded in their first goal: When wounded or infected or otherwise stressed, test plants responded quickly by giving off a very faint blue glow, detectable by ultrasensitive camera equipment.

“At the moment,” says Trewavas, “the light is not visible to the naked eye, but that is because this is a jellyfish gene, not a plant gene.” The jellyfish gene includes a number of DNA sections (codons) that plants use rarely, if ever, and this difference in how the genetic information is arranged limits plants’ ability to “read” the gene. “That means we need to resynthesize the gene to optimize it for plants,” he said.

The team hopes to increase expression of the protein, using appropriate promoters, so that the glow is visible in darkness. The choice of promoters could also make the signal more specific, so that, for instance, it would indicate a response to infection rather than to cold shock. Even if one seed in a thousand produced a plant capable of glowing, the warning would be more effective than that achieved in experiments using microinjected fluorescent dyes. Dyes that respond to accelerated calcium flow have been used to monitor plant stress, but these techniques are limited to single or small groups of cells.

Trewavas is optimistic that his technology will be available to farmers by 2000. “If the jellyfish can do it,” he says, “then so can we.” Neal Stewart, Jr., assistant professor of biology at the University of North Carolina at Chapel Hill, shares Trewavas’s bullish outlook and is beginning his own research. “I think that perhaps the year of commercialization may be optimistic–maybe not–but new and improved fluorescent proteins should be on line soon.”

The reference for my original article on this topic is American Scientist, Volume 84, Issue 1, p.25-26