Prediction of protein function from sequence analysis

Slides:



Advertisements
Similar presentations
Faculty of Computer Science Dalhousie University, Canada Andrew Rau-Chaplin, Parallel Computational Biochemistry.
Advertisements

MitoInteractome : Mitochondrial Protein Interactome Database Rohit Reja Korean Bioinformation Center, Daejeon, Korea.
Welcome to class of Metabolism of nitrogen compound Dr. Meera Kaur.
Pfam(Protein families )
Structural bioinformatics
Protein structure (Part 2 of 2).
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Biological Databases Notes adapted from lecture notes of Dr. Larry Hunter at the University of Colorado.
The Protein Data Bank (PDB)
Protein Tertiary Structure. Primary: amino acid linear sequence. Secondary:  -helices, β-sheets and loops. Tertiary: the 3D shape of the fully folded.
Digestion of Proteins 25.7 Degradation of Amino Acids 25.8 Urea Cycle Chapter 25 Metabolic Pathways for Lipids and Amino Acids.
Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.
Protein structure Classification Ole Lund, Associate professor, CBS, DTU.
BLOSUM Information Resources Algorithms in Computational Biology Spring 2006 Created by Itai Sharon.
Protein and Function Databases
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Nitrogen Metabolism 1. Nitrogen Fixation 2. Amino Acid Biosynthesis.
Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology.
Predicting Function (& location & post-tln modifications) from Protein Sequences June 15, 2015.
By: H.Baniamerian Kermanshah university of medical science.
Protein Tertiary Structure Prediction
General pathways of amino acids transformation.
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Frank Dehnewww.dehne.net Parallel Computational Biochemistry.
Amino acid degradation Most of absorbed dietary amino acids are catabolized by 2 subsequent steps: I- Removal of α-amino group: α-amino group is removed.
PROTEIN STRUCTURE CLASSIFICATION SUMI SINGH (sxs5729)
Methods of Enzyme Assay
Prediction of protein contact maps Piero Fariselli Department of Biology University of Bologna.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Protein and RNA Families
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department.
Motif discovery and Protein Databases Tutorial 5.
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
Copyright OpenHelix. No use or reproduction without express written consent1.
Classification of protein and domain families Sequence to function Protein Family Resources and Protocols for Structural and Functional Annotation of Genome.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Themes of Biology Bio- chemistry Cell Structure & Membranes EnergeticsHeredityMolecular Biology Lab-pourri FinalFinal JeopardyJeopardy.
Rita Casadio BIOCOMPUTING GROUP University of Bologna, Italy Prediction of protein function from sequence analysis.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
InterPro Sandra Orchard.
Welcome to the Protein Database Tutorial. This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
Protein families, domains and motifs in functional prediction May 31, 2016.
Aldehyde Dehydrogenase Zach Lawton 1. Background Superfamily of Nictonamide adenine dinucleotide phosphate (NADP) enzymes Location: all three domains.
Amino acids - Classifications, Amino acids Physico – Chemical Properties, Protein structure, folding & function, Nitrogen Cycle Nitrogen Balance, Reductive.
Genetic Code and Interrupted Gene Chapter 4. Genetic Code and Interrupted Gene Aala A. Abulfaraj.
Amino Acid Metabolism CHY2026: General Biochemistry.
Methods of Enzyme Assay
Enter in the formation of A.A. pool
Urea Biosynthesis Transamination. 2. Oxidative Deamination.
Protein databases Henrik Nielsen
Bio/Chem-informatics
24.6 Degradation of Proteins and Amino Acids
Catabolism of amino acids
Amino acid metabolism Metabolism of amino acids differs, but 3 common reactions: Transamination Deamination Decarboxylation.
Demo: Protein Information Resource
20.2 Classification of Enzymes
Amino Acid Metabolism.
Amino acid metabolism.
Amino Acid Pool   The amount of free amino acids distributed throughout the body is called amino acid pool. Plasma level for most amino acids varies widely.
Amino Acid Pool   The amount of free amino acids distributed throughout the body is called amino acid pool. Plasma level for most amino acids varies widely.
There are four levels of structure in proteins
Determination of the enzyme ALT (SGPT) & AST activity in serum by enzymatic method using Biophotometer.
Amino acid degradation
Molecular Modeling By Rashmi Shrivastava Lecturer
Protein structure prediction.
Worked Example 27.1 Predicting Transamination Products
The malate-aspartate shuttle.
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

Prediction of protein function from sequence analysis Rita Casadio BIOCOMPUTING GROUP University of Bologna, Italy

The “omic” era Genome Sequencing Projects: Archaea : 74 species In Progress:52 Bacteria: 973 species In Progress: 2266 species Complete-23 Draft Assembly–318 In Progress-359 Eukaryotic: http://www.ncbi.nlm.nih.gov/genomes/static/gpstat.html Update: January 2010

The Data Bases of Biological Sequences and Structures GenBank: 108,431,692 sequences 106,533,156,756 nucleotides NR(*): 10,381,779 sequences 3,542,056,219 residues >BGAL_SULSO BETA-GALACTOSIDASE Sulfolobus solfataricus. MYSFPNSFRFGWSQAGFQSEMGTPGSEDPNTDWYKWVHDPENMAAGLVSG DLPENGPGYWGNYKTFHDNAQKMGLKIARLNVEWSRIFPNPLPRPQNFDE SKQDVTEVEINENELKRLDEYANKDALNHYREIFKDLKSRGLYFILNMYH WPLPLWLHDPIRVRRGDFTGPSGWLSTRTVYEFARFSAYIAWKFDDLVDE YSTMNEPNVVGGLGYVGVKSGFPPGYLSFELSRRHMYNIIQAHARAYDGI KSVSKKPVGIIYANSSFQPLTDKDMEAVEMAENDNRWWFFDAIIRGEITR GNEKIVRDDLKGRLDWIGVNYYTRTVVKRTEKGYVSLGGYGHGCERNSVS LAGLPTSDFGWEFFPEGLYDVLTKYWNRYHLYMYVTENGIADDADYQRPY YLVSHVYQVHRAINSGADVRGYLHWSLADNYEWASGFSMRFGLLKVDYNT KRLYWRPSALVYREIATNGAITDEIEHLNSVPPVKPLRH 35,5 HGE! SwissProt: 514,212 sequences 180,900,945 residues PDB: 60,654 structures membrane proteins <2% Update: January 2009 (*) CDS translations+PDB+SwissProt+PIR+PRF

From Genotype to Phenotype …with different effects depending on variability Over 20 millions of single mutations are known in genes >protein kinase acctgttgatggcgacagggactgtatgctgatctatgctgatgcatgcatgctgactactgatgtgggggctattgacttgatgtctatc.... Genes in DNA... (about 30,000 in the human genome) …code for proteins... ….in methabolic pathways From Genotype to Phenotype …proteins correspond to functions... …when they are expressed From 5000 to 10000 proteins per tissue Proteins interact

The Human Interactome in STRING STRING 8—a global view on proteins and their functional interactions in 630 organisms- Jensen et al., 2009, Nucleic Acids Research, Vol 37. The Human Interactome in STRING 22,937 proteins and 1,482,533 interactions http://string.embl.de

One problem of the “omic era”: Protein functional annotation

The Protein Data Bank No of Proteins with known structure: 57529 http://www.rcsb.org/pdb/home/home.do No of Proteins with known structure: 57529

SCOP: Structural Classification of Proteins Domains are hierarchically classified: class - fold: proteins with secondary structures in same arrangement with the same topological connections superfamily: structures and functional features suggest a common evolutionary origin family: proteins with identities ≥30%; with identities <30% but with similar structures and functions

From the Protein Sequence to the Structure and Function space Lesk A., 2004

From the Protein Sequence to the Structure space PDB Sequence comparison Sequence Identity (%) 0% 30% 100% Fold recognition Machine-learning aided alignment Threading New Folds Ab initio and de novo modelling Machine-learning prediction of structural features From the Protein Sequence to the Structure space

What is protein function? From the Protein Sequence to the Structure and Function space What is protein function?

What is a function? For enzymes: function can be defined on the basis of the catalysed molecular reaction. e.g. aspartic aminotransferase (AST)

In biochemistry, a transaminase or an aminotransferase is an enzyme that catalyzes a type of reaction between an amino acid and an α-keto acid. Specifically, this reaction (transamination) involves removing the amino group from the amino acid, leaving behind an α-keto acid, and transferring it to the reactant α-keto acid and converting it into an amino acid. The enzymes are important in the production of various amino acids, and measuring the concentrations of various transaminases in the blood is important in the diagnosing and tracking many diseases. Transaminases require the coenzyme pyridoxal-phosphate, which is converted into pyridoxamine in the first phase of the reaction, when an amino acid is converted into a keto acid. Enzyme-bound pyridoxamine in turn reacts with pyruvate, oxaloacetate, or alpha-ketoglutarate, giving alanine, aspartic acid, or glutamic acid, respectively. The presence of elevated transaminases can be an indicator of liver damage.

Enzyme Commission (E.C.) classification A hierarchical classification for enzymes

EC 2.6 Transferring nitrogenous groups EC 2.6.1Transaminases EC 2.6.1.1 Aspartate transaminase Other name(s): glutamic-oxaloacetic transaminase; glutamic-aspartic transaminase; transaminase A; AAT; AspT; 2-oxoglutarate-glutamate aminotransferase; aspartate α-ketoglutarate transaminase; aspartate aminotransferase; aspartate-2-oxoglutarate transaminase; aspartic acid aminotransferase; aspartic aminotransferase; aspartyl aminotransferase; AST; glutamate-oxalacetate aminotransferase; glutamate-oxalate transaminase; glutamic-aspartic aminotransferase; glutamic-oxalacetic transaminase; glutamic oxalic transaminase; GOT (enzyme); L-aspartate transaminase; L-aspartate-α-ketoglutarate transaminase; L-aspartate-2-ketoglutarate aminotransferase; L-aspartate-2-oxoglutarate aminotransferase; L-aspartate-2-oxoglutarate-transaminase; L-aspartic aminotransferase; oxaloacetate-aspartate aminotransferase; oxaloacetate transferase; aspartate:2-oxoglutarate aminotransferase; glutamate oxaloacetate transaminase Systematic name: L-aspartate:2-oxoglutarate aminotransferase

Problems: Isoforms e.g How to differentiate the function of the cytoplasmic aspartate amintransferase from that of mitochondrial isoform? Non enzymatic proteins

GO function vocabulary: The Ontologies Cellular component Biological process Molecular function GO function vocabulary: http://www.geneontology.org/

Gene Ontology classification: The human cytoplasmic aspartate transaminase GO:0004069 GO:0005829 GO:0006533

One BIG problem of the “omic era”: Protein functional annotation

Functional annotation in silico by homology search ADH1_SULSO ----------MRAVRLVEIGKP--LSLQEIGVPKPKGPQVLIKVEAAGVCHSDVHMRQGRFGNLRIVE ADH_CLOBE ----------MKGFAMLGINKLG---WIEKERPVAGSYDAIVRPLAVSPCTSDIHTVFEGA------- ADH_THEBR ----------MKGFAMLSIGKVG---WIEKEKPAPGPFDAIVRPLAVAPCTSDIHTVFEGA------- ADH1_SOLTU MSTTVGQVIRCKAAVAWEAGKP--LVMEEVDVAPPQKMEVRLKILYTSLCHTDVYFWEAKG------- ADH2_LYCES MSTTVGQVIRCKAAVAWEAGKP--LVMEEVDVAPPQKMEVRLKILYTSLCHTDVYFWEAKG------- ADH1_ASPFL ----MSIPEMQWAQVAEQKGGP--LIYKQIPVPKPGPDEILVKVRYSGVCHTDLHALKGDW------- Sequence comparison is performed with alignment programs Sequence identity  40 % Similar structure and function (??) Methods for similarity searches: BLAST, Psi-BLAST (http://www.ncbi.nlm.nih.gov/BLAST/) sequence Altschul et al., (1990) J Mol Biol 215:403-410 Altschul et al., (1998) Nucleic Acids Res. 25:3389-3402 Pfam (http://pfam.wustl.edu/hmmsearch.shtml) sequence/structure Bateman et al., (2000) Nucleic Acids Research 28:263-266

Transfer by inheritance: Function annotation transfer from sequence through homology

http://www.uniprot.org/

PDB The annotation process at UniProt

Open problems of “inheritance through homology “ Not all UniProt files are GO annotated The optimal threshold value of sequence identity for function transfer is not known Proteins contain multiple domains Proteins can share common domains and not necessarily the same function In proteins different combination of shared domains lead to different biological roles