Databanks + New tools = New insights THE AXIOM S imple A tom D epth I ndex C alculator protein fold barcoding CATH – ADAPT…

Slides:



Advertisements
Similar presentations
Proteins: Structure reflects function….. Fig. 5-UN1 Amino group Carboxyl group carbon.
Advertisements

Review.
It og Sundhed Nov Jan. Thomas Nordahl Petersen, Associate Professor Center for Biological Sequence Analysis, DTU
Review of Basic Principles of Chemistry, Amino Acids and Proteins Brian Kuhlman: The material presented here is available on the.
Bioinformatiha 2, Firenze 18 ottobre -15 Different protein folds require different amino acid composition of their cores Davide Alocci, Andrea Bernini,
Lactate dehydrogenase + 38 ATP + 2 ATP. How does lactate dehydrogenase perform its catalytic function ?
Applications of knowledge discovery to molecular biology: Identifying structural regularities in proteins Shaobing Su Supervisor: Dr. Lawrence B. Holder.
Review: Amino Acid Side Chains Aliphatic- Ala, Val, Leu, Ile, Gly Polar- Ser, Thr, Cys, Met, [Tyr, Trp] Acidic (and conjugate amide)- Asp, Asn, Glu, Gln.
Protein Purification and Analysis Day 4. Amino Acids, Peptides, and Proteins.
5’ C 3’ OH (free) 1’ C 5’ PO4 (free) DNA is a linear polymer of nucleotide subunits joined together by phosphodiester bonds - covalent bonds between.
An overview of amino acid structure Topic 2. Biomacromolecule A naturally occurring substance of large molecular weight e.g. Protein, DNA, lipids etc.
Protein-a chemical view A chain of amino acids folded in 3D Picture from on-line biology bookon-line biology book Peptide Protein backbone N / C terminal.
Amino Acids and Proteins 1.What is an amino acid / protein 2.Where are they found 3.Properties of the amino acids 4.How are proteins synthesized 1.Transcription.
Lectures on Computational Biology HC Lee Computational Biology Lab Center for Complex Systems & Biophysics National Central University EFSS II National.
Sequence analysis June 18, 2008 Learning objectives-Understand the concept of sliding window programs. Understand difference between identity, similarity.
©CMBI 2008 Aligning Sequences The most powerful weapon in the bioinformaticist’s armory is sequence alignment. Why? Lets’ think about an alignment. It.
It og Sundhed Thomas Nordahl Petersen, Associate Professor Center for Biological Sequence Analysis, DTU
Thomas Blicher Center for Biological Sequence Analysis
Lipids A. Classified based on solubility (like dissolves like) 1. insoluble in polar solvents 2. soluble in nonpolar solvents 3. lipids are hydrophobic.
It & Health 2009 Summary Thomas Nordahl Petersen.
ProteinStructuralDatabases. Proteins are built from amino-acids. Introduction H | NH2-c-CO2H | R.
©CMBI 2005 Why align sequences? Lots of sequences with unknown structure and function. A few sequences with known structure and function If they align,
Protein: Linear chain of amino acids called residues (4 in this toy protein) Ser Trp Leu O N N N N O O C C C C O O CαCα CαCα CαCα CαCα Lys H H H H H The.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.
The relative orientation observed for  helices packed on ß sheets.
Protein Structure FDSC400. Protein Functions Biological?Food?
You Must Know How the sequence and subcomponents of proteins determine their properties. The cellular functions of proteins. (Brief – we will come back.
Proteins. The central role of proteins in the chemistry of life Proteins have a variety of functions. Structural proteins make up the physical structure.
Marlou Snelleman 2012 Proteins and amino acids. Overview Proteins Primary structure Secondary structure Tertiary structure Quaternary structure Amino.
Proteins are polymers of amino acids.
Protein Structural Prediction. Protein Structure is Hierarchical.
Structure and Function of Proteins Lecturer: Dr. Ora Furman Oct 2009 Winter 2009/10 Teaching Assistants: Miraim Oxsman Sivan Pearl.
Chapter Three Amino Acids and Peptides
Structural alignment Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.
Proteins account for more than 50% of the dry mass of most cells
The.pdb file format, and other resources for structural information Topic 5 Chapter 10 & 11, Du and Bourne “Structural Bioinformatics”
©CMBI 2006 Amino Acids “ When you understand the amino acids, you understand everything ”
Protein Sequences. The Genetic Code The natural extension of the genetic code…
On the nature of cavities on protein surfaces: Application to the Identification of drug-binding sites Murad Nayal, Barry Honig Columbia University, NY.
BIOCHEMISTRY REVIEW Overview of Biomolecules Chapter 4 Protein Sequence.
Macromolecular Visualization or… Where to go when ChemDraw just isn’t enough Martin Case Chem
1.Overall amino acid structure 2.Amino acid stereochemistry 3.Amino acid sidechain structure & classification 4.‘Non-standard’ amino acids 5.Amino acid.
AMINO ACIDS.
Amino Acids & Side Groups Polar Charged ◦ ACIDIC negatively charged amino acids  ASP & GLU R group with a 2nd COOH that ionizes* above pH 7.02nd COOH.
Proteins – Amides from Amino Acids
Secondary structure prediction
Learning Targets “I Can...” -State how many nucleotides make up a codon. -Use a codon chart to find the corresponding amino acid.
1 10/26/2015 MOLECULES. 2 10/26/2015 H 2 N-CH-C-OH O R Monomer E.g. protein Monomer vs polymer amino acid monomer R is a side group.
Copyright © 2007, BioXGEM Lab., Institute of Bioinformatics, NCTU All rights reserved. ARPPI: A knowledge-based model for protein-protein interactions.
Indiana University School of C571/C696 Chemical Information Tech. 2004, Lecture 7. Page 1 C571/C696 Chemical Information Technology David Wild
intro-VIRUSES Virus NamePDB ID HUMAN PAPILLOMAVIRUS 161DZL BACTERIOPHAGE GA1GAV L-A virus1M1C SATELLITE PANICUM MOSAIC VIRUS1STM SATELLITE TOBACCO NECROSIS2BUK.
A program of ITEST (Information Technology Experiences for Students and Teachers) funded by the National Science Foundation Background Session #3 DNA &
Amino Acids ©CMBI 2001 “ When you understand the amino acids, you understand everything ”
Marlou Snelleman 2011 Proteins and amino acids. Overview Proteins Primary structure Secondary structure Tertiary structure Quaternary structure Amino.
Proteins.
Proteins Structure of proteins Proteins are made of C, H, O and nitrogen and may have sulfur. The monomers of proteins are amino acids An amino acid.
X-ray detection xray/facilities.html.
Houssam Nassif, Hassan Al-Ali, Sawsan Khuri, Walid Keirouz, and David Page An ILP Approach to Model and Classify Hexose Binding Sites.
©CMBI 2008 Databases Data must be in a certain format for software to recognize Every database can have its own format but some data elements are essential.
Supplementary Fig. 1 Relative concentrations of amino acids after transamination reaction catalyzed by PpACL1, α- ketoglutarate as the amino acceptor.
Fibrous Proteins Examples 1. a-keratins 2. Silk Fibroin 3. Collagen
Arginine, who are you? Why so important?. Release 2015_01 of 07-Jan-15 of UniProtKB/Swiss-Prot contains sequence entries, comprising
Cathode (attracts (+) amino acids)
Figure 3.14A–D Protein structure (layer 1)
Packet #9 Supplement.
Amino Acids Amine group -NH2 Carboxylic group -COOH
Packet #9 Supplement.
Chapter Three Amino Acids and Peptides
Ching-Ling Teng, Robert G. Bryant  Biophysical Journal 
Chapter 18 Naturally Occurring Nitrogen-Containing Compounds
Presentation transcript:

Databanks + New tools = New insights THE AXIOM S imple A tom D epth I ndex C alculator protein fold barcoding CATH – ADAPT…

protein foldingBirth of the Earth Digging inside objects to discover their origins SADIC: a new tool to analyze atom depth

* Chakravarty S, Varadarajan R. Residue depth: a novel parameter for the analysis of protein structure and stability. Structure Fold Des : * Pintar A, Carugo O, Pongor S. Atom depth as a descriptor of the protein interior. Biophys J : atom depth calculated as the distance with: the closest external water* the closest dot of the water accessible surface* the closest surface exposed atom* atom depth HEWL 4lzt 2D

atom depth 2D Daniele Varrazzo, Andrea Bernini1, Ottavia Spiga, Arianna Ciutti, Stefano Chiellini,Vincenzo Venditti, Luisa Bracci and Neri Niccolai. Three-dimensional Computation of Atomic Depth in Complex Molecular Structures Bioinformatics : Calculation of exposed volumes 3D HEWL 4lzt 2D

atom depth Calculation of exposed volumes HEWL 4lzt 3D Daniele Varrazzo, Andrea Bernini1, Ottavia Spiga, Arianna Ciutti, Stefano Chiellini,Vincenzo Venditti, Luisa Bracci and Neri Niccolai. Three-dimensional Computation of Atomic Depth in Complex Molecular Structures Bioinformatics :

Calculation of exposed volumes Depth index: D i,r = 2V i,r / V 0,r where V i,r is the exposed volume of a sphere of radius r centered on atom i of the molecule and V 0,r is the exposed volume of the same sphere when centered on an isolated atom HEWL 4lzt atom depth 3D Daniele Varrazzo, Andrea Bernini1, Ottavia Spiga, Arianna Ciutti, Stefano Chiellini,Vincenzo Venditti, Luisa Bracci and Neri Niccolai. Three-dimensional Computation of Atomic Depth in Complex Molecular Structures Bioinformatics : the sphere radius r should have the biggest value which makes V i = 0 for the most buried atom

Thr 47 α carbon D i,9 = 1.59 Ile 58 α carbon D i,9 = 0.13 Trp 28 α carbon D i.9 = atom depth 3D vs 2D HEWL 4lzt

3D atom depth analysis from PDB ID 1UBQ DiDi

SBL Bioinformatics Projects Projects SADIC correlated: 1.fold dependent aa compositions of protein cores; 2.towards i-SADIC Projects SADIC uncorrelated: 1.systematic analysis of PPI

D i analysis of protein atoms defining strutural layers in protein 3D structures each strutural layer includes atoms with similar D i ’s fast and accurate analysis of aa content of structural layers

3 VTR (chitinolytic enzyme 572 aa) D i analysis of protein atoms

N 0.19 CA 0.30 C 0.25 O 0.23 CB 0.50 CG 0.68 CD 0.91 CE 1.11 NZ 1.29 K63 N 0.38 CA 0.52 C 0.50 O 0.52 CB 0.76 CG 0.95 CD 1.17 OE OE E24 3D atom depth analysis N 0.10 CA 0.05 C 0.11 O 0.18 CB 0.02 CG 0.02 CD CD L43 D imax from PDB ID 1UBQ

D imax analysis of protein residues defining aa occupancy in protein strutural layers each strutural layer includes residues with similar D imax ’s fast and accurate analysis of aa distribution in protein structures

D imax analysis of protein singles quite a few proteins like to stay single (at least in the crystalline state) Bioinformatiha 2, Firenze 18 ottobre -9

a database of protein singles Experimental Method: X-RAY (79,770) Chain Type: Protein (74,456) Only 1 chain in asym. unit: (28,803) Oligomeric state: 1 (21,193) Number of Entities: 1 (3,517) Homologue 95% identity (2,410) 2,410 proteins in the dataset 4,657,574 atoms 589,383 residues DOOPS:

a database of protein singles 2,410 proteins in the dataset 4,657,574 atoms 589,383 residues DOOPS: Swiss-Prot: 540,958 proteins in the dataset (192 Maa)

calculation of % amino acid content in L 0 the first quantitative analysis of a large array of protein cores! D imax analysis of protein cores 2,410 proteins; 4,657,574 atoms; 589,383 residues DOOPS: ~20 % of total molecular volume Σ DOOPS aa(L 0 ) = 106,088 (from 2410 proteins) core aa if D imax < 0.2

ClassArchitecturesTopology Homologous superfamily Domains 1 (mainly α) ,038 2 (mainly β) ,881 3 (α & β) ,029 4 (few sec. str.) ,588 Total ,536 D i analysis of protein cores folding clues from aa core composition? :

total Proteins mono 213 (84) 84 (40) 19 (17) 10 (3) 17 (13) 57 (37) 94 (73) 134 (110) 12 (12) 84 (73) 52 (44) 139 (106) (8) 49 (49) 1,190 (872) ( ) D i analysis of protein cores folding clues from aa core composition? # domain DOOPS + CATH selected Architectures with ≥ 10 PDB files :

Cys PDB ID 1UZK(A01) aa % average value (av) av + σ av + 2σ av - σ av - 2σ Towards protein folding barcodes ribbon Leu Phe PDB ID 1RG8(A00) trefoil Val PDB ID 2IMH(A01) four layer sandwich ClassArchitecturesTopology Homologous superfamily Total % L overall ALA 13,2810,3221,4612,749,2610,058,439,325,510,6910,0812,5811,8814,9512, ARG 0,61,280,241,3900,641,720,7500,551,111,750,30,470, ASN 0,672,620,732,771,852,041,771,3602,12,90,961,522,82, ASP 1,612,620,242,911,231,272,031,7902,12,93,021,772,340, CYS 3,352,995,370,8322,842,041,464,420,922,832,11,491,861,43, GLN 0,61,50,241,111,231,151,811,6900,461,562,150,991,41, GLU 1,481,440,731,5201,151,191,0400,912,592,411,080,930, GLY 8,058,729,7613,8516,059,9216,210,829,178,7811,8111,3512,6413,089, HIS 1,011,62,441,110,620,760,790,5602,651,963,021,910,472, ILE 12,689,9510,738,596,7913,6110,6810,7813,7612,811,7712,5311,537,0111, LEU 23,8818,3422,4411,778,0217,1812,9713,9833,9416,5411,914,3314,2215,4213, LYS 0,670,9101,1100,380,490,5600,090,621,360,5500, MET 2,624,171,714,9902,82,653,151,832,932,762,412,393,271, PHE 6,446,792,934,574,327,127,066,7315,67,224,956,186,074,216, PRO 1,342,463,412,633,093,3132,7803,292,91,842,251,41, SER 3,494,553,665,963,095,345,565,132,752,835,354,434,236,075, THR 2,284,814,157,25,563,315,124,470,923,25,224,254,945,145, TRP 1,011,5502,773,70,381,632,782,752,191,520,661,260,472, TYR 2,623,690,244,572,471,272,694,380,923,293,121,582,3202, VAL 12,349,689,517,629,8816,2812,7513,5111,9314,5312,8811,716,2919,1615, # PDB 213 (84) 84 (40) 19 (17) 10 (3) 17 (13) 57 (37) 94 (73) 134 (110) 12 (12) 84 (73) 52 (44) 139 (106) (8) 49 (49) 2,410 D i of 173,536 CATH domains 28 h, 5’ (average comp. time 1.72 s/domain) Calculations performed on 6 cores 990X CPU based computer Ala PDB ID 3CKC(A02) alpha horseshoe CATH-ADAPT CATHa da pt CATH - atom d epth a ssisted protein tomography

Towards protein folding barcodes Putting the protein universe in order

Towards protein folding barcodes Putting the protein universe in order

towards i-SADIC (implemented SADIC)

towards i-SADIC (implemented SADIC) H/D exchange rate profiles

towards i-SADIC (implemented SADIC) H/D exchange rate profiles D D D D D D D D D D D D D D

towards i-SADIC (implemented SADIC) H/D exchange rate profiles

towards i-SADIC (implemented SADIC) H/D exchange rate profiles

towards i-SADIC (implemented SADIC) H/D exchange rate profiles

2D atom depth or 3D atom depth H/D exchange rate profiles data from Pedersen TG, Thomsen NK, Andersen KV, Madsen JC, Poulsen FM. Determination of the rate constants k1 and k2 of the Linderstrom-Lang model for protein amide hydrogen exchange. A study of the individual amides in hen egg-white lysozyme. J Mol Biol (2): dnw i = or atom distance with the nearest water molecule D i,9 = or atom depth index with a probe od radius 9 Å

iSADIC atom depth 3D atom depth H/D exchange rate profiles data from Pedersen TG, Thomsen NK, Andersen KV, Madsen JC, Poulsen FM. Determination of the rate constants k1 and k2 of the Linderstrom-Lang model for protein amide hydrogen exchange. A study of the individual amides in hen egg-white lysozyme. J Mol Biol (2): D i,9 = or atom depth index with a probe od radius 9 Å iD i,9 = aD i,9 + bASA i cD i,9 + dDnw i

iSADIC atom depth 3D atom depth H/D exchange rate profiles iD i,9 = aD i,9 + bASA i cD i,9 + dDnw i

protein-protein interface analysis biological vs crystallographic interfaces

crystallographic dimers biological dimers

vs N ARG CA ARG C ARG O ARG CB ARG CG ARG CD ARG NE ARG CZ ARG NH1 ARG NH2 ARG H ARG HA ARG HB2 ARG HB3 ARG HG2 ARG HG3 ARG HD2 ARG HD3 ARG HE ARG HH11 ARG HH12 ARG HH21 ARG HH22 ARG N LYS CA LYS C LYS O LYS CB LYS CG LYS CD LYS CE LYS NZ LYS H LYS HA LYS HB2 LYS HB3 LYS HG2 LYS HG3 LYS HD2 LYS HD3 LYS HE2 LYS HE3 LYS HZ1 LYS HZ2 LYS HZ3 LYS