Presentation is loading. Please wait.

Presentation is loading. Please wait.

Roland Knispel Business Analyst, Biologics and Plexus Suite

Similar presentations


Presentation on theme: "Roland Knispel Business Analyst, Biologics and Plexus Suite"— Presentation transcript:

1 Roland Knispel Business Analyst, Biologics and Plexus Suite When Is a Large Biomolecule a Small Molecule? Compound Databases with a Twist Biologics/Biomolecules/Biopolymers When necessary explain here the problem of using the right word to describe the entities that most life scientists use. A biologic often covers all small and macromolecules produced by biological systems as opposed to synthetic products, which are typically small molecules, but can also be macromolecules like polymers, dendrimers etc. Biologics may also include supramolecular entities like vaccines, cell lines etc. The meaning of the word biologic is therefore not exact. It is usually used to distinguish it from an entity, which is produced by chemical synthesis. In wikipedia a biomolecule is defined as any molecule that is present in living organisms, including large macromolecules such as proteins, polysaccharides, lipids, and nucleic acids, as well as small molecules such as primary metabolites, secondary metabolites, and natural products. Using the word macromolecule is also misleading, because a macromolecule is really a word that is used to distinguish a molecule from small molecules (Mw under 1000). A macromolecule may or may not be a polymer. There are macromolecules that we are not (fully) supporting (for example random polymers, dendrimers etc), and which are produced by synthetic (random polymers, dendrimers, plastics etc.) or biological means (some polysaccharides, bioplastics). The use of biopolymers and derivatives would be the most accurate, but this is rarely used and might contain random biopolymers like polysaccharides, bacterial cell walls etc. that we do not support. Thus it is clear that there is no single word currently that would properly describe the set of molecules that we aim to support with this toolkit. In this presentation we shall use the word Biologic most often interchangeably with the word Biomolecule. We restrict the use of these words to entities composed of biological polymers (biopolymers) like peptides and nucleic acids and their derivatives or any more complex entities (cell lines, vaccines etc) that may properly be described by the presence or absence of the former entities.

2 Company fingerprint CHEMAXON provides chemical software development platforms and solutions for the biotechnology and pharmaceutical industries, which are successfully used in publishing, flavors and fragrances research, petroleum and fine chemicals development areas as well. Our innovation targets are to become leaders in web-based data management solutions, to bridge chemistry and biology together, and to offer out-of-the-box solutions for cloud.

3 Company fingerprint HQ: Budapest, Hungary 18+ 3 130+ 100% 50+ 800+
200k+ years of experience offices employees private company implementation partners clients academic users

4 Real world scenarios

5 Customer A A happy and satisfied user of ChemAxon‘s Compound Registration said „I want to report a performance issue: a single submission takes > 12min to register“ That‘s unusual, could you send us the structure? Here it is:

6

7 Customer A A happy and satisfied user of ChemAxon‘s Compound Registration I want to report a performance issue: a single submission takes > 12min to register That‘s unusual, could you send us the structure? Here it is: We accelerated our tautomer check for proteins, protein registration now works in a few seconds. But really try to find a better alternative!

8 Customer B Another happy and satisfied user of ChemAxon‘s Compound Registration I want to report a performance issue: registration and search times on our system take minutes. That‘s unusual, could we investigate your DB?

9 Customer B

10 Customer B Another happy and satisfied user of ChemAxon‘s Compound Registration I want to report a performance issue: registration and search times on our system take minutes. That‘s unusual, could we investigate your DB? You have > 50k proteins in it, stored as a single star atom (*) with the sequence as atom attribute. Our JChem technology is not really optimized for that. We setup a custom pre-filter for you, but it would be better to find an alternative!

11 Customer C Several of our peptide chemists are drawing their structures in our ELN using ChemDraw, from where it gets automatically submitted to our Chemical Registration. They receive a registration notification but sometimes cannot locate it in the registry DB afterwards. Turns out a registrar has re-directed some of them to another bespoke registration system. Could you help us resolve this issue? Lost your peptide?

12 Lost your peptide? Looks like, we‘ve got just the tools you need! We‘ll integrate them into your environment to make the life of your registrars and scientists easier.

13 Root problem Novel entities are ‚massaged‘ into existing registration systems Bespoke registration systems are often siloed Human interpretation is often required but not objective Effects Performance losses or bottlenecks Data integration trouble Unhappy scientists Reduced productivity

14 Please LAUNCH: Poll Question 1

15 Customer C Several of our peptide chemists are drawing their structures in our ELN using ChemDraw, from where it gets automatically submitted to our Chemical Registration. They get a registration notification but sometimes cannot always locate it in the registry DB afterwards. Turns out a registrar has re-directed some of them to another bespoke registration system. Could you help us resolve this? Looks like, we have got just the tools you need! We‘ll integrate them into your environment to make the life of registrars and scientists easier.

16 What‘s in a structure

17 Cyclosporin A Sequence-based depiction Canonical SMILES
Cyclosporin A IUPAC name (chemical) (3S,6S,9S,12R,15S,18S,21S,24S,30S,33S)-30-ethyl-33-[(E,1R,2R)-1-hydroxy-2-methylhex-4-enyl]-1,4,7,10,12,15,19,25,28-nonamethyl-6,9,18,24-tetrakis(2-methylpropyl)-3,21-di(propan-2-yl)-1,4,7,10,13,16,19,22,25,28,31-undecazacyclotritriacontane-2,5,8,11,14,17,20,23,26,29,32-undecone Closest natural sequence* AALLVTAGLVL Common name Cyclosporin A InChI InChI=1S/C62H111N11O12/c (15)52(75)51-56(79)65-43(26-2)58(81)67(18)33-48(74)68(19)44(29-34(3)4)55(78)66-49(38(11)12)61(84)69(20)45(30-35(5)6)54(77)63-41(16)53(76)64-42(17)57(80)70(21)46(31-36(7)8)59(82)71(22)47(32-37(9)10)60(83)72(23)50(39(13)14)62(85)73(51)24/h25,27,34-47,49-52,75H,26,28-33H2,1-24H3,(H,63,77)(H,64,76)(H,65,79)(H,66,78)/b27-25+/t40-,41+,42-,43+,44+,45+,46+,47+,49+,50+,51+,52-/m1/s1 CAS number Sequence-based depiction IUPAC name (biological) cyclo[((2S)-2-aminobutyryl)-sarcosyl-N-methyl-L-leucyl-L-valyl-N-methyl-L-leucyl-L-alanyl-D-alanyl-N-methyl-L-leucyl-N-methyl-L-leucyl-N-methyl-L-valyl-N-methyl-(4R)-4-[(E)-but-2-enyl]-4-methyl-L-threonyl] Canonical HELM* PEPTIDE1{A.[dA].[meL].[meL].[meV].[BMT]. [Abu].[Sar].[meL].V.[meL]}$PEPTIDE1,PEPTIDE1,11:R2-1:R1$$$ InChI key PMATZTZNYRCHOR-CGLBZJNRSA-N Ref: PubChem CID *ChemAxon generated

18 Cyclosporin A 2D structure (MarvinJS) Canonical SMILES
Cyclosporin A IUPAC name (chemical) (3S,6S,9S,12R,15S,18S,21S,24S,30S,33S)-30-ethyl-33-[(E,1R,2R)-1-hydroxy-2-methylhex-4-enyl]-1,4,7,10,12,15,19,25,28-nonamethyl-6,9,18,24-tetrakis(2-methylpropyl)-3,21-di(propan-2-yl)-1,4,7,10,13,16,19,22,25,28,31-undecazacyclotritriacontane-2,5,8,11,14,17,20,23,26,29,32-undecone Closest natural sequence* AALLVTAGLVL Common name Cyclosporin A InChI InChI=1S/C62H111N11O12/c (15)52(75)51-56(79)65-43(26-2)58(81)67(18)33-48(74)68(19)44(29-34(3)4)55(78)66-49(38(11)12)61(84)69(20)45(30-35(5)6)54(77)63-41(16)53(76)64-42(17)57(80)70(21)46(31-36(7)8)59(82)71(22)47(32-37(9)10)60(83)72(23)50(39(13)14)62(85)73(51)24/h25,27,34-47,49-52,75H,26,28-33H2,1-24H3,(H,63,77)(H,64,76)(H,65,79)(H,66,78)/b27-25+/t40-,41+,42-,43+,44+,45+,46+,47+,49+,50+,51+,52-/m1/s1 CAS number IUPAC name (biological) cyclo[((2S)-2-aminobutyryl)-sarcosyl-N-methyl-L-leucyl-L-valyl-N-methyl-L-leucyl-L-alanyl-D-alanyl-N-methyl-L-leucyl-N-methyl-L-leucyl-N-methyl-L-valyl-N-methyl-(4R)-4-[(E)-but-2-enyl]-4-methyl-L-threonyl] 2D structure (MarvinJS) Canonical HELM* PEPTIDE1{A.[dA].[meL].[meL].[meV].[BMT]. [Abu].[Sar].[meL].V.[meL]}$PEPTIDE1,PEPTIDE1,11:R2-1:R1$$$ InChI key PMATZTZNYRCHOR-CGLBZJNRSA-N Ref: PubChem CID *ChemAxon generated

19 Cyclosporin A Sequence BioEddie Canonical SMILES IUPAC name (chemical)
Cyclosporin A IUPAC name (chemical) (3S,6S,9S,12R,15S,18S,21S,24S,30S,33S)-30-ethyl-33-[(E,1R,2R)-1-hydroxy-2-methylhex-4-enyl]-1,4,7,10,12,15,19,25,28-nonamethyl-6,9,18,24-tetrakis(2-methylpropyl)-3,21-di(propan-2-yl)-1,4,7,10,13,16,19,22,25,28,31-undecazacyclotritriacontane-2,5,8,11,14,17,20,23,26,29,32-undecone Closest natural sequence* AALLVTAGLVL Common name Cyclosporin A InChI InChI=1S/C62H111N11O12/c (15)52(75)51-56(79)65-43(26-2)58(81)67(18)33-48(74)68(19)44(29-34(3)4)55(78)66-49(38(11)12)61(84)69(20)45(30-35(5)6)54(77)63-41(16)53(76)64-42(17)57(80)70(21)46(31-36(7)8)59(82)71(22)47(32-37(9)10)60(83)72(23)50(39(13)14)62(85)73(51)24/h25,27,34-47,49-52,75H,26,28-33H2,1-24H3,(H,63,77)(H,64,76)(H,65,79)(H,66,78)/b27-25+/t40-,41+,42-,43+,44+,45+,46+,47+,49+,50+,51+,52-/m1/s1 CAS number Sequence BioEddie IUPAC name (biological) cyclo[((2S)-2-aminobutyryl)-sarcosyl-N-methyl-L-leucyl-L-valyl-N-methyl-L-leucyl-L-alanyl-D-alanyl-N-methyl-L-leucyl-N-methyl-L-leucyl-N-methyl-L-valyl-N-methyl-(4R)-4-[(E)-but-2-enyl]-4-methyl-L-threonyl] Canonical HELM* PEPTIDE1{A.[dA].[meL].[meL].[meV].[BMT]. [Abu].[Sar].[meL].V.[meL]}$PEPTIDE1,PEPTIDE1,11:R2-1:R1$$$ InChI key PMATZTZNYRCHOR-CGLBZJNRSA-N Ref: PubChem CID *ChemAxon generated

20 In a workflow How to make transition seamless
How chemist refines structure Original Modified How SAR is performed

21 Case study: ChEMBL v21 Data migration, enrichment and curation

22 CHEMBL v21 19773 peptide biotherapeutics with associated activity data
Molecules served as HELM or chemical structure file ChEMBL interface: searchable by CompoundID, chemical structure or select metadata

23 Standardize Convert Canonicalize Store/Search
Ungroup S-groups, Strip salts/solvents, Remove explicit hydrogens, Neutralize structure, Aromatize, Standardize functional groups CHEMBL297610 Tools Used KNIME Standardizer node

24 PEPTIDE1{E}|PEPTIDE2{C.G.[X1091]}$PEPTIDE1,PEPTIDE2,1:R3-1:R1$$$
Standardize Convert Canonicalize Store/Search CHEMBL223118 3 wrong conversions in ChEMBL data set identified correct wrong Tools Used KNIME Biomolecule Toolkit BioEddie for image rendering PEPTIDE1{E.C.G.[X1091]}$$$$ PEPTIDE1{E}|PEPTIDE2{C.G.[X1091]}$PEPTIDE1,PEPTIDE2,1:R3-1:R1$$$

25 Standardize Convert Canonicalize Store/Search CHEMBL412009
864 HELM notations in ChEMBL changed Duplicate filtering Before: PEPTIDE1{[ac].[dE].[Phe(4-Cl)].[d3-Pal]}|PEPTIDE2{D.R.[dNal].L.K}|PEPTIDE3{P.[dDpr].[am]}$PEPTIDE2,PEPTIDE3,5:R2-1:R1|PEPTIDE2,PEPTIDE1,5:R3-2:R3|PEPTIDE1,PEPTIDE2,4:R2-1:R1|PEPTIDE3,PEPTIDE2,2:R3-1:R3$$$ After: PEPTIDE1{[ac].[dE].[Phe(4-Cl)].[d3-Pal].D.R.[dNal].L.K.P.[dDpr].[am]}$PEPTIDE1,PEPTIDE1,11:R3-5:R3|PEPTIDE1,PEPTIDE1,9:R3-2:R3$$$ Tools Used KNIME Biomolecule Toolkit CHEMBL42623 Before: PEPTIDE1{G.Y.G.F}$PEPTIDE1,PEPTIDE1,4:R2-1:R1$$$ After: PEPTIDE1{F.G.Y.G}$PEPTIDE1,PEPTIDE1,4:R2-1:R1$$$

26 Standardize Convert Canonicalize Store/Search Search by:
Sequence (incl. wildcards) Chemical structure Modifications Metadata Query # hits Molecules containing „Oxytocin“ in name field 2 Molecules with the natural analogue sequence of Oxytocin 18 Oxytocin-like sequences with non-standard amino acids 17 Oxytocin derivatives containing the chemical structure of penicillamine 3 Oxytocin derivatives containing the L-penicillamine monomer Tools Used Biomolecule Toolkit

27 BioEddie: Oxytocin from Sequence or MOL File

28 Biomolecule toolkit and Bioeddie
JChem and Marvin analogues for large molecule informatics

29 Biomolecule Toolkit API (Java and REST-ful) for Integrated in
Native HELM support (HELM, HELM2, xHELM) Standardization Centralized DB storage Registration of entities and batches with custom business logic Search by sequence/chemical structure/metadata Conversion to/from Mol/FASTA/HELM Property calculations Integrated in InstantJChem, Texelia BioScity, IDBS E-Workbook Main message We provide a machine readable format for the unambiguous description of biomolecules. We support fully and partially defined biomolecules. We have full metadata support, which is a lot more relevant for biomolecules than for small molecules. Notes You may be asked here if we support random polymers. The Toolkit is not aimed at solving the description of random polymers (e.g. dendrimers etc.). Mention this here only if it is asked.

30 BioEddie JS application for all major browsers Easy editing
No-structure components Native support for MOL/HELM/sequence

31 BioEddie JS application for all major browsers Easy editing
No-structure components Native support for MOL/HELM/sequence Customizable views Multi-level annotations

32 BioEddie JS application for all major browsers Easy editing
No-structure components Native support for MOL/HELM/sequence Customizable views Multi-level annotations NEW: sequence domain support (Abs)

33 Supported entity types
General remark for the supported entity types section General workflow The user defines known structural parts at the full chemical detail describes the unknown structural details to the best possible level using highly customizable attribute-value pairs (metadata) defines logical criteria on the treatment of these metadata for registration and searching The Biomolecule Toolkit enables the definition and registration of these data as defined by the user provides tools to visualize and search the data Visualization of the data is possible at multiple levels, schematic view of the entire entity, schematic view of the entity as composed from building blocks, view of the full chemical structure of the entity (where applicable).

34 Nucleic acids Nucleic acids with standard, non-standard or unnatural bases and backbone chain chemistries Mipomersen (Kynamro) Scientific background Mipomersen (the active ingredient in Kynamro), an FDA improved intravenous drug. The drug is used to treat homozygous familiar hypercholesterolemia. Synthetic oligonucleotide that acts as an antisense oligonucleotide (ASO) targeting the apolipoprotein B mRNA. This leads to reduction of cholesterol levels. The natural backbone chemistry of the nucleic acid is changed by replacing some of the ribose sugars to 2-O-Methoxyethyl ribose and all phosphates to thiophosphate rendering the molecule highly nuclease resistant. These changes increase the half-life of the nucleic acid in blood. Only the terminal five sugars are modified to 2-O-Methoxyethyl ribose, while the rest of the sugars are left unchanged as nucleic acids with all riboses replaced to 2-O-Methoxyethyl ribose are poor templates for RNAse H (gapmer antisense oligonucleotide design), the activity of which is needed for silencing its mRNA target.

35 Small peptides Ribosomal, non-ribosomal or synthetic peptides with standard, post-translationally modified, non-standard or unnatural amino acids Goserelin (Zoladex) Scientific background Goserelin (active ingredient in Zoladex)– an FDA approved intravenous drug for suppressing sex hormone production. It is used for inhibition of sex hormone dependent cancer (breast, prostate etc.) growth. It is also used to treat sexual aberrations. Mechanism of action: Gonadrotropin Releasing Hormone Antagonist, when constantly present desensitizes the GnRH receptors, and thereby the Gonadotropin Releasing Hormone secretion is downregulated. Decreased GnRH levels lead to reduced production and secretion of luteinizing hormone (LH) and follicule stimulating hormone (FSH) leading to hypogonadism, and thus a dramatic reduction of estradiol and testosterone levels in both sexes. The fully synthetic peptide is a GnRH mimic. The peptide sequence is similar to the AA stretch of the Homo sapiens GnRH seq. To increase its half life in blood, the N-terminal glutamate is changed to pyroglutamate and the C-terminal Glycine is replaced by azaglycine (Peptides in the blood are primarily degraded by terminal peptidases, blocking the ends therefore increases stability). To increase the binding strength to the GnRH receptor a Glycine is replaced by tert-butyl d-Serine in the peptide chain.

36 Large peptides (proteins)
Protein sequences including post-translationally modified residues, intrachain and interchain cross-links Trastuzumab (Herceptin) Monoclonal antibody Scientific background Trastuzumab (active ingredient in Herceptin, Genentech) is an FDA approved drug for treatment of Her2+ breast cancers. Approximately one in four breast cancers have increased HER2 expression. HER stands for Human Epidermal growth Factor Receptor 2. The drug is typically used in combination with traditional chemotherapies (doxorubicin,pacli/docetaxel). Trastuzumab is a humanized antibody with specificity to HER2. Trastuzumab binding to HER2 leads to decreased signaling of HER2, which suppresses tumor growth. In addition the antibody flags tumor cells for immune cell mediated destruction. The mechanism of action is not clearly known.

37 Conjugates Molecule(s) bound with known chemistry to a known building block, but exact occupied binding site(s) unknown K Any MCC DM1 Lys ADR=3.5 Ado-trastuzumab-emtansin (Kadcyla) (Antibody drug conjugate) Scientific background This ADC is FDA approved, it is Trastuzumab (herceptin) conjugated through a linker to DM1, a maytansinoid cytotoxic agent. It is used for treatment of HER2+ cancers (mostly breast cancers), just as the unconjugated version. This is a use case for describing conjugation of small molecules to a single type of monomer at non-defined sites. The red bars on the schematic representation of the antibody denote the Lysine residues found in the antibody. The epsilon amino groups of these lysines may be used semi-stochastically as conjugation sites. The bracket around the antibody denotes that it is known that a small molecule is conjugated to the antibody, but its exact location is not determined.

38 Please LAUNCH: Poll Question 2

39 Agnostic registration

40 Agnostic registration
Perception engine MRV SMILES MDL Mol Compound Registration Biomolecule Registration HELM FASTA Small molecules Large molecules

41 Agnostic registration
Submission Format identification Perception engine Format validation MRV SMILES Read to internal representation MDL Mol Compound Registration Biomolecule Registration HELM Perceive alternative representations FASTA Determine optimal storage Small molecules Large molecules

42 Agnostic registration
Perception engine ID Generator MRV SMILES MDL Mol Compound Registration Biomolecule Registration HELM FASTA Small molecules Large molecules

43 Agnostic registration
Perception engine ID Generator MRV SMILES MDL Mol Compound Registration Biomolecule Registration ‚Other‘ Registration Next ‚Other‘ Registration HELM FASTA Other Small molecules Large molecules

44 Agnostic registration
Perception engine ID Generator MRV SMILES MDL Mol Compound Registration Biomolecule Registration ‚Other‘ Registration Next ‚Other‘ Registration HELM FASTA Other Small molecules Large molecules

45 Summary

46 Root problem Novel entities are ‚massaged‘ into existing registration systems Bespoke registration systems are often siloed Human interpretation is often required but not objective Cure Bespoke registration systems ensure performance, accuracy and consistency A perception engine helps to integrate registration and other workflows ChemAxon offers tools and services to implement

47 Want to know more? Get in touch!
Roland Knispel

48 Q&A Please feel free to ask away!


Download ppt "Roland Knispel Business Analyst, Biologics and Plexus Suite"

Similar presentations


Ads by Google