All hail CAESA!

CAESA, Computer-Aided Estimation of Synthetic Accessibility, has a simple aim: to score a target compound by estimating the difficulty faced in attempting to synthesise it in the chemical laboratory and produce a list of starting materials for the synthesis. It can do so within a few seconds, far faster than a team of post-docs checking laboratory shelves and supply catalogues, Indeed, CAESA can provide feasible solutions to some very complex synthetic problems.

The program works partly by analysing a target chemical's structure for complex features such as fused or bridged rings and assigning a complexity score. CAESA also recognises that apparent complexity might be circumvented if starting materials incorporating these complex features are available. By carrying out a retrosynthetic analysis, CAESA program works backwards from the target looking for appropriate and available starting materials that could be stitched together by known chemical reactions to rebuild the product. A higher score means a more difficult or a lower-yielding reaction.

It sounds like a straightforward and sensible idea. A medicinal chemist may have designed a nice molecule to dock with a diseases enzyme, but how easy will be making the compound for testing. To be commercially useful CAESA must match synthetic chemists doing the job themselves. There are four criteria on which a judgement might be made.

First, the program should find the lowest required number of synthetic steps. Secondly, it must take into account the reaction difficulty and/or the plausible yields at each step. Thirdly, starting materials selected have to be either "off-the-shelf" or easily made in any laboratory. The developers collaborate with supply companies including Acros, Lancaster and Sigma Aldrich to keep the databases of starting materials current. There are about 75000 structures included in the starting materials database of the CAESA package at present. Finally, the scheme must involve the minimal number of FGIs (Functional Group Interconversions); syntheses that have many FGIs are never easy, or cheap.

For example, the simple FGI halide conversion to an alcohol is assigned a rating of 1. In contrast, a much more sophisticated chemical change, such as the Pauson-Khand reaction, which is a [2+2+1] cycloaddition and converts an alkene, alkyne, and carbon monoxide producing a cyclopentenone scores 4 in CAESA's eyes.

Users have a degree of control over how CAESA will rank a product's synthetic complexity. Synthetic distance is input to begin the analysis and this can be used to limit the total difficulty or number of steps a user will tolerate. CAESA then selects starting materials based on their total coverage of the target compound, so that the starting material which covers the most area will be the most favoured. CAESA's underlying mode of action is a rule-based expert system, various built-in knowledge bases carry out different functions in order to identify starting materials and estimate the synthetic accessibility.

CAESA has three ever-growing retrosynthetic knowledge bases, each containing different transformations and information to allow an evaluation to be carried out. The first contains strategic disconnections, the retrosynthetic equivalent of a bond-making reaction. These simple and versatile disconnections, such as amide or ester disconnections, all CAESA to cut a large target molecule into manageable chunks. The second knowledge base contains more general disconnections. This database is the main information source of the program called into action in most analyses. The third knowledge base contains simple FGIs, and is used to transform simple intermediates. Another trick up CAESA's sleeve is in exploiting a two-directional approach to the analysis. Some simple synthetic transformations are applied to all starting materials in the database generating an expanded virtual library of available materials.

The developers carried out a number of validation tests on CAESA with 75 disparate chemical structures. The primary test, involved comparing CAESA's answers with those devised by a group of expert synthetic chemists at a large pharmaceutical company. Out of the 75 target structures, 65 analyses were in full agreement with the schemes generated by the experts. Of the ten cases that were not quite right, seven contained fused heterocycles. Information on these common but synthetically complex groups is not currently included in CAESA in great detail but will be added in a future release. The same applies to compounds with phosphate groups, an area with few transformations. Most synthetic chemists know far more reactions than CAESA has in its current knowledge base, but in contrast, CAESA knows far more about the commercial availability of starting materials. CAESA has arrived, analysed and could one day conquer.

Industrial chemists and pharmaceutical scientists will love CAESA - it saves them time and effort! It allows targets to be assessed and organised according to complexity of synthesis as well as helping in the search for more efficient routes to a compound.

CAESA started life in the chemistry department at Leeds University. The client side works via a web-browser on any operating system and in a server side system can be customized to the users own databases and ways of working.