Contents Fragmenter creating fragments by cleavage rules Fragment statistics sorting fragments by activity values R-group decomposition finding a scaffold with attached ligands web:
Fragmenter Basics Fragmenter cleaves single bonds to generate molecular fragments. The cleavage rules correspond to chemical reactions in order to enhance synthetic accessibility. Fragmenter fragments molecules based on predefined cleavage rules. The cleavage rules are given in form of reaction molecules in the configuration XML. By default, all non-ring bonds matching the cleavage bonds in the rules are cleaved. However, it is possible to provide a revision algorithm that forbids certain cuts depending on predefined criteria (e.g. the resulting fragment size, the structural environment of the bond, the number of cleaved bonds in the resulting fragments, etc.). Currently one such algorithm is implemented: the RECAP method.
The RECAP Method All non-ring bonds matching the cleavage bonds in the rules are allowed to be cleaved by default. The RECAP algorithm forbids the cleavage of some bonds according to the following rules: 1.Never cut a hydrogen-connecting bond. 2.Never cut a bond connecting a ring-carbon and a hetero atom (optional). 3.Never cut ring bonds. (Fragmenter always keeps this rule, we add it here for completeness.) 4.Refuse a cut if any of the resulting fragments is on the specified Notlist. 5.Refuse a cut if the number of open bonds in any of the resulting fragments exceeds the specified limit. 6.Refuse a cut if the number of atoms in any of the resulting fragments is less than the predefined minimal atom count.
Fragmenter Cleavage Rules The following rules are typically used with the RECAP algorithm, but Fragmenter accepts any custom cleavage rules described by reaction equations. The cleavage points on the fragments are labeled with the cleavage rules:
Fragmentation Example I. An example fragmentation of tamoxifen (left), an oestrogen antagonist and atenolol (right), an anti-hypertension drug:
Fragmentation Example II. An example fragmentation with amine type cleavage bonds: input molecule fragments amine cleavage
Fragmentation Example III. All fragments of the same input molecule (extensive fragmentation):
Fragment Statistics Basics FragmentStatistics creates statistical results from the output of Fragmenter. The simplest usage is to remove duplicate fragments and sort fragments by occurrence, but FragmentStatistics can also sort fragments by a scoring function based on molecule activity or other data read from the input molecules and stored together with the generated fragments.
Fragment Statistics Input / Output The input of FragmentStatistics is the output of Fragmenter in cxsmiles format with the following fields: 1.SMILES string 2.atom labels storing fragment cleavage data 3.unique ID (used for fragment duplicate check) 4.input molecule data read from SDFile tag (optional, e.g. molecule activity) The output of FragmentStatistics is a sorted cxsmiles table with the following data: 1.SMILES string 2.atom labels storing fragment cleavage data 3.atom count 4.fragment counts per activity categories (number of identical fragments in each activity category, one field for each) 5.score
The Scoring Function Fragments are sorted by activity which is calculated in form of a scoring function: ac x (w1*c1 + w2*c wN*cN) ac is the heavy atom count w1, w2,..., wN are the category weights in descending order (default: from +1 to -1, equidistant) c1, c2,..., cN are the fragment counts in each category, in descending activity order x is the exponent of the heavy atom count (default: 1 ) If there is no activity data then FragmentStatistics simply removes fragment duplicates and sorts fragments by ac x c1 where c1 is the fragment count. By default the exponent is 1 and the score is thus ac*c1. If there are two activity categories then the default scoring function is ac(c1 - c2), if there are three categories, then it is ac(c1 - c3).
Scoring Function Example – single cutoff value Two activity ranges with cutoff value 0.5 :
Scoring Function Example – discrete activity range Discrete activity values:
Generating Fragment Statistics I. We start with a large set of input molecules with activity data: Activity = 4 Activity = 0.05 Activity = 5 Activity = 50
Generating Fragment Statistics II. For the purpose of fragment statistics, start with generating a broad set of fragments without the RECAP (or any other) restrictions: Standardization Cleavage reactions Extensive fragmentation
Generating Fragment Statistics III. The generated fragments inherit the activity values from the parent molecule: Fragments (with cleavage and activity data) are stored in cxsmiles format. The activity data is stored in field_1. field_1 = 50
Generating Fragment Statistics IV. First make statistics with duplicate filtering and sorting. These are the 4 most active fragments (by score = atom count * occurrence): field_0: atom count field_1: fragment occurrence field_2: score (field_1 * field_2)
Generating Fragment Statistics V. Next include activity data in the scoring, with cutoff value 1. This means that molecules with activity value at least 1 are considered active, while all others are inactive. These are the 4 most active fragments (by score = atom count * (active occurrence - inactive occurrence)): field_0 : atom count field_1 : fragment occurrence in the active set ( score >= 1 ) field_2 : fragment occurrence in the inactive set ( score < 1 ) field_3 : score ( atom count * (active occurrence - inactive occurrence) )
R-group Decomposition – the query R-group decomposition is a special kind of substructure search that aims at finding a central structure - scaffold - and identify its ligands at certain attachment positions. The query molecule consists of the scaffold and ligand attachment points represented by R-groups: The two R1 nodes should match identical structures by default – but this behaviour can be changed by setting the –p (--skip-same-structure-check) parameter
R-group Decomposition – the targets Our sample targets all contain the query (scaffold with R-group attachment points) but not all of them can satisfy the condition of identical R1-ligands: single hit: identical R1 ligands single hit: the same R1 ligand (R-bridge) more hits: all with different R1 ligands more hits: one with identical R1 ligands
R-group Decomposition – decomposition I. Attachment points can be denoted by different symbols, depending on the –a (--attachment-symbol) option: N: none P: attachment point A: any-atom (default) M: atom map L: atom label
R-group Decomposition – decomposition II. SMILES table: the output is written in a SMILES table if the –f (--format) option is omitted: Otherwise the output is written as molecule series in the specified output form, with atom color codes (separated by ; characters) stored in the DMAP property (SDF/MRV tag). The code is: 0: scaffold atom n: Rn ligand atom (n > 0) -: non-hit atom Example: 0;0;1;1;1;1;2;2;1;0;-;-
R-group Decomposition – decomposition III. The DMAP property can be used in mview to color the atoms according to a color-map file that maps the color codes to colors. Set the –p (--skip-same-structure-check) option to allow the two R1 nodes match different ligands. Finally, use the –A (--allhits) option to see all possible decompositions. In this way our last target will also have two decompositions:
Visit other technical presentations MarvinSketch/View MarvinSpace Calculator Plugins JChem Base JChem Cartridge Standardizer Screen JKlustor Fragmenter Reactor
References Fragmenter, fragment statistics: R-group decomposition: 1.RECAP - Retrosynthetic Combinatorial Analysis Procedure: A Powerful New Technique for Identifying Privileged Molecular Fragments with Useful Applications in Combinatorical Chemistry In: J. Chem. Inf. Comput. Sci. 1998, Schneider, G. et al.; De novo design of molecular architectures by evolutionary assembly of drug-derived building blocks. J. Comput.-Aided Mol. Des. 2000, 14,