Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computation in Biology Nagasuma Chandra Bioinformatics Centre & SERC IISc.

Similar presentations


Presentation on theme: "Computation in Biology Nagasuma Chandra Bioinformatics Centre & SERC IISc."— Presentation transcript:

1 Computation in Biology Nagasuma Chandra Bioinformatics Centre & SERC IISc

2 Next-generation biologists must straddle computation and biology

3

4 Organism Organ Cell Organelle Tissue Supramolesular assembly Macromolecule Hierarchical structures in living systems

5

6 Genome Sequence- a book of life DOE-Genomes.org

7 examplesfromenglishtext genomicbiologytakesaholisticapproachtomolecularbiologyandev olutionbystudyingthecompletegenomeitsgenesanditsproteinexpre ssionpatternsncbiprovidesseveralgenomicbiologytoolsandresourc esincludingorganismspecificpagesthatincludelinkstomanywebsite sanddatabasesrelevanttothatspeciesweinviteyoutoexplorethelinks providedonthispage.

8 examplesfromenglishtext genomicbiologytakesaholisticapproachtomolecularbiologyandev olutionbystudyingthecompletegenomeitsgenesanditsproteinexpre ssionpatternsncbiprovidesseveralgenomicbiologytoolsandresourc esincludingorganismspecificpagesthatincludelinkstomanywebsite sanddatabasesrelevanttothatspeciesweinviteyoutoexplorethelinks providedonthispage. Genomic biology takes a holistic approach to molecular biology and evolution by studying the complete genome, its genes, and its protein expression patterns.NCBI provides several genomic biology tools and resources, including organism-specific pages that include links to many web sites and databases relevant to that species. We invite you to explore the links provided on this page.

9 Molecular circuitry in the cell

10 Biochemical networks www.expasy.ch

11 Cellular networks Characteristics of the yeast proteome: map of protein-protein interactions. H.Jeong, S.P. Mason, A.-L. Barabasi, Z.N. Oltvai, Nature, 411, 40-41 (2001);Nature

12 Role of computation Data management Data Analysis & Interpretation Prediction Application

13 What you need… A model A computational tool

14 Models Levels of modelling Abstraction level Hierarchy in living organisms

15 Abstraction level of the model

16 Molecular models Sequences Structures Genome Sequences The ‘omics’ era

17 Software tools Accelrys Tripos MOE BioSuite Schrodinger + hundreds of academic software bits

18 What you can do …………. Sequence Space Determine identity of the molecule Predict physicochemical properties Predict three dimensional structure Predict Function Apply in pharmaceutical/ other industries

19 Examples Accelrys GCG MOE BioSuite

20 Example usage

21 Examples of GCG capabilities Sequence Comparison Database Searching and Retrieval DNA/RNA Secondary Structure Prediction Editing and Publication Evolution Fragment Assembly Gene Finding and Pattern Recognition Sequence Importing and Exporting Mapping Primer Selection Protein Analysis

22 Single Gene/Protein Sequence analysis- MOE The colored bars over the sequences reflect the secondary structure of those sequences having associated atomic coordinates. Chains with sequence-only data have no such bars. In this instance, seven of the chains in the family have structural data and can therefore be used as structural templates. This image illustrates Residue Identity matrix in MOE which shows Chains 13 and 14 have the highest percent identity to the query sequence.

23 Whole genome Sequence analysis- BioSuite

24 Structures Advantages of structural-level studies The protein folding problem Sequence-Structure Gap Need to predict structure using computational methods Applications

25 Four levels of protein structure

26 Structures Advantages of structural-level studies The protein folding problem Sequence-Structure Gap Need to predict structure using computational methods Applications

27 What you can do …………. Structure Space Visualize structures Build molecular models Manipulate Analyse Simulate molecular behaviour Apply in Drug Discovery

28 Visualization : Viewer Module of InsightII Pulldowns Module Icon Icon Palette Command prompt Information Area

29 Visualizations

30 Ligand-Protein Interaction

31 Aiding NMR Structue determination

32 Aiding crystal structure determination.. X-ray crystallography

33 Building molecular models Small molecules Protein/ Nucleic acid/ Carbohydrates Predicting Protein Structure Homology modelling Threading Modifications- Site directed mutants Protein-ligand complexes

34 BIOPOLYMER Biopolymer module provides tools for building and modifying a wide range of biological macromolecules, including proteins, peptides, nucleic acids, and carbohydrates. Backbone structure of the C-terminal fragment of E.coli 50S ribosomal protein (in yellow), predicted from the carbon trace using the Protein/Backbone command of the Biopolymer module. The crystallographic backbone structure is shown superimposed in blue. The RMS deviation between corresponding backbone atoms of the two structures is 0.52 Angstroms It is useful in: Building Proteins and Peptides Structural Domain Analysis Building Carbohydrates Building Nucleic Acids Structural Database Searching. This module in turn can be used later by other programs for structure refinement and analysis of small and large molecules

35 Manipulations Eg., Conformation tweaking The following images are examples of this method of predicting conformations of a few long sidechains of PDB protein 1IC6.A. In each of the following figures, the native conformation is shown colored by element. In the left image, the predicted rotamer (the rotamer with the lowest deltaG) is shown in white. In the right image, all other rotamers generated by the conformational search are shown. ASP_187 HIS_229

36 MODELER MODELER uses a comparative modeling methodology to rapidly build structural models for protein sequences without a known structure. It derives 3D protein models without the time consuming separate stages of core region identification and loop region building or searching that are inherent to manual homology modeling schemes. MODELER can create a model even with only one source protein. In this case, the structure for dihydrofolate reductase from Lactobacillus Casei is used to generate a model for the E. Coli protein. The model is 2.2 Å RMS deviation from the crystal structure of the E. Coli protein.

37 PROFILES – 3 D Profiles-3D offers a unique approach to structure prediction by measuring the compatibility between protein sequences and known protein structures, and then using this information to address the inverse protein folding problem. Profiles-3D enables you to investigate which particular fold an amino acid sequence is likely to adopt. Benefits: Profiles-3D can test the validity of a model or preliminary structures derived from experimental data or modeling studies. Profiles-3D can suggest which 3D structure an amino acid sequence is likely to adopt by relating structural properties to amino acid sequence information. Reference template proteins identified by Profiles-3D can be used as input to InsightII Homology,MODELER module. This image shows the result of a “Profiles-3D Verify” showing a ribbon drawing of a model of myoglobin,where a single alpha-helix has been purposely misfolded.Profiles-3D has detected the misfolded region, and Insight II has automatically created the subset that was used to color the structure and ribbon.

38 MATCHMAKER MatchMaker uses an inverse-folding method to predict the 3D structure of a protein from its amino acid sequence.By comparing a new protein sequence to its topology fingerprint database, MatchMaker assesses the ability of a sequence to adopt characteristic topologies. Even in the absence of strong sequence similarity, MatchMaker generates high quality structural models. Examples of MatchMaker output, including a histogram of sequence-structural compatibility (upper right), a sub-optimal alignment plot (upper left),an energy profile (middle left), and a prediction of structural elements (helix/beta strand, buried/exposed) for the input sequence.

39 Simulations- ‘Discover’

40 Analysis Protein characterization Protein Comparison Sequence-Structure-Function relationships Active site detection Ligand Binding mode analysis Electrostatic analysis

41 Structure Analysis Quality Check

42 ProTable used to analyze and evaluate protein structures. ProTable creates Ramachandran plots, assesses deviation of local geometries and side chain rotameric states from standard protein values, and determines the energetics of each residue. PROTABLE These images show the results of a ProTable evaluation of a theoretical model of prostatespecific antigen (2PSA). MatchMaker energies reveals a loop (highlighted in green) that may require further refinement. Structures (purple and blue are low probability; orange and red are high probability). An automated Ramachandran analysis (right) identifies backbone torsions in borderline or disallowed regions.

43 DELPHI DelPhi is a powerful and versatile Poisson-Boltzmann electrostatics simulation engine. DelPhi gives you the ability to determine the specificity of ligand-receptor interactions which aids in accelerating drug discovery. DelPhi calculates :  Electrostatic properties,including the effects of bulk solvent and ionic strength for nucleic acids, polysaccharides, and complexes such as glycoproteins and protein/DNA. HIV protease, rendered with an electrostatic contour surface with a stick rendering of the drug inside the surface. Blue is positive, red is negative charge and gray is neutral.

44 Applications: Drug Discovery

45 SITEID SiteID provides analysis and visualization tools leading to the identification of potential binding sites within or at the surface of biological targets. The binding pocket of dihydrofolate reductase located by SiteID and shown as a MOLCAD surface. The red areas of the surface indicate contact atoms in the pocket, while the yellow areas show the residues in which those atoms are contained. The inhibitor (methotrexate) is shown in green. Applications:  Locate ligand binding pockets on a Macromolecule.  Identify protein-protein interaction surfaces.  Identify constraints in a novel protein structure for 3D database searching to find or optimize lead compounds.

46 Active Site Detection: MOE uses a fast geometric algorithm, based on Edelsbrunner’s alpha shapes, to detect candidate protein-ligand and protein- protein binding sites. Individual sites can be visualized or populated with “dummy atoms” for docking calculations or Starting points for de novo ligand design efforts. STRUCTURE BASED DESIGN TOOLS Left PDB 1AAQ (HIV-1 Protease) and the first site located by the MOE Site Finder. Middle 1AAQ with the complexed ligand (hydroxyethylene isostere). Right Hydroethylene isostere overlaid with calculated alpha spheres of the first site.

47 FLEX X FlexX rapidly docks a conformationally flexible ligand into a binding site, using an incremental construction algorithm that builds the ligand in the active site. FlexX is composed of four basic components:  Conformational flexibility.  Set of possible protein-ligand interactions.  Scoring function for the interactions.  Algorithm for placement and incremental growth of the ligand from a defined core. A set of inhibitors docked into the active site of Carboxypeptidase A by FlexX. The protein backbone and the active site surface were rendered using MOLCAD. The active site surface is color-coded by electrostatic potential.

48 RACHEL RACHEL performs automated combinatorial optimization of lead compounds by systematically derivatizing user-defined sites on the ligand. Applications:  Combinatorially enumerate user defined sites on a lead scaffold to optimize binding within a receptor  Bridge high-affinity ligand fragments positioned within the active site The X-ray structure of N9 influenza virus neuraminidase (2QWK) shown with five ligands generated using RACHEL that are predicted to be active. Hydrogen bonds between the ligands and residues are indicated by dashed yellow lines. The surface was rendered using MOLCAD. Dark purple regions contain a greater Acceptor/donor density and light purple regions indicate areas where hydrogen bonding is less likely to occur.

49 HTS-QSAR : CCG’s unique Binary QSAR methodology is ideal for building pass/fail models from high error content data and standard molecular descriptors. The resulting probabilistic models (based on Bayesian statistical inference) are used as a biasing agent in the design of focused combinatorial libraries HIGH THROUGHPUT DISCOVERY TOOLS

50 Molecular Databases : The MOE Molecular Database is a disk-based spreadsheet central to the manipulation and visualization of large collections of compounds.Data can be imported and exported in various standard file formats and merged with structural or biological activity data. MOLECULAR DATABASE VIEWER MOLECULAR DATABASE CALCULATOR CHEMINFORMATICS TOOLS

51 SEARCH COMPARE Search Compare provides systematic conformational search and analysis as well as superimposition, molecular similarity. Using Search Compare, two angiotensin II antagonists are flexibly superimposed based on the field similarity (combined steric and electrostatic potentials).

52 UNITY Unity locates compounds in databases that match a pharmacophore or fit to receptor site. Applications:  Exploration of databases for compounds consistent with a pharmacophore hypothesis  Lead explosion by retrieving similar compounds  Virtual screening of compound databases to discover lead compounds  Determining reagents in commercial databases that support combinatorial chemistry synthesis A UNITY query constructed at the active site of the streptavidin/biotin complex (1STP). Yellow lines originate at hydrogen bonding sites of the protein (shown as spheres) and terminate within the spatial constraint for complementary ligand sites. A surface constraint at the protein/ligand interface is shown in green. The spatial cap in red accounts for a bifurcated interaction with an Asp carboxyl. Partial match groups are shown in different colors: red, yellow, or green.

53 CATALYST/SHAPE Catalyst/SHAPE identifies compounds that possess similar 3D shapes to a specified 3D conformation. Methotrexate is displayed (left: hydrogen removed) in its bound conformation to the enzyme dihydrofolate reductase inhibitor. On the right are 3D compounds retrieved from the Derwent’s World Drug Index that best fit the shape of the bound conformation of methotrexate. This shape-based 3D search was performed with Accelrys’ Catalyst/SHAPE Performs flexible shape-based database searches. Performs statistical analysis of shape indices of a particular database. Simultaneously performs shape and pharmacophore searches via a merged query. FEATURES:

54 HypoGen Given only available experimental information such as 2D structures and biological activities of a set of molecules, Catalyst can be used to generate general interaction hypotheses that explain variations in activity across a set of molecules. Two 5HT3 antagonists (green and yellow) mapped on to a six-feature hypothesis.

55 C2-LIGAND FIT C 2.LigandFit provides active site finding, flexible docking and scoring capabilities, allowing evaluation of compounds against a receptor site Active site identification for HIV Protease usingC2LigandFit flood filling technique Features Active site search by flood filling method Fast conformational search for ligand in protein cavity Fast grid method for evaluation of protein- ligand interactions Clustering of docked conformers Multiple scoring functions

56 C2ADME TOOL C2ADME provides computational models for the prediction of absorption, Distribution, metabolism,and excretion (ADME) properties derived from chemical structures. Plot of Polar Surface Area (PSA) vs. LogP for a sample of the World Drug Index (WDI) database showing the 95% and 99% confidence limit ellipses corresponding to the Absorption Model. The points are color coded by Absorption level (Good,Moderate, Poor and Very Poor). Features:  C2ADME provides computational ADME/Tox prediction tools with the ability to predict problematic New Chemical Entities at an early stage of the development process  C2ADME currently includes models for passive intestinal absorption,blood-brain barrier (BBB) penetration,and aqueous solubility at 25°C.

57 In-built utilities Scripting- automation Session Folders Log files

58 What you should remember ….. Good computational practices Other users are as important as yourself Do not use up licenses unduly Preparation Evaluate protocol, choice of package, follow job submission rules

59 Access details Insight/ Catalyst/ Cerius – SGI machines- base modules- several licenses Tripos- SGI machines MOE- Linux platform/ Windows/ SGI BioSuite- Linux


Download ppt "Computation in Biology Nagasuma Chandra Bioinformatics Centre & SERC IISc."

Similar presentations


Ads by Google