1 Bio-Trac 25 (Proteomics: Principles and Methods) October 3, 2008 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department.

Slides:



Advertisements
Similar presentations
Bio-Trac 25 (Proteomics: Principles and Methods) March 26, 2004 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist Protein Information Resource National.
Advertisements

Beyond PubMed and BLAST: Exploring NCBI tools and databases Kate Bronstad David Flynn Alumni Medical Library.
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
COT 6930 HPC and Bioinformatics Bioinformatics Resources and Databases Xingquan Zhu Dept. of Computer Science and Engineering.
Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase.
EBI Proteomics Services Team – Standards, Data, and Tools for Proteomics Henning Hermjakob European Bioinformatics Institute SME forum 2009 Vienna.
Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.
Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○
Gene Ontology John Pinney
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
EBI is an Outstation of the European Molecular Biology Laboratory. Alex Mitchell InterPro team Using InterPro for functional analysis.
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research.
Protein databases Morten Nielsen. Background- Nucleotide databases GenBank, National Center for Biotechnology Information.
Archives and Information Retrieval
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
The Cell, Central Dogma and Human Genome Project.
IST Computational Biology1 Information Retrieval Biological Databases 2 Pedro Fernandes Instituto Gulbenkian de Ciência, Oeiras PT.
Class European Resources Protein Focused. Protein Databases EBI – European Bioinformatics Institute
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology.
Doug Brutlag Professor Emeritus Biochemistry & Medicine (by courtesy) Protein Sequence Databases Computational Molecular Biology Biochem 218 – BioMedical.
1 iProLINK: An integrated protein resource for literature mining and literature-based curation 1. Bibliography mapping - UniProt mapped citations 2. Annotation.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Doug Brutlag Professor Emeritus Biochemistry & Medicine (by courtesy) Genome Databases Computational Molecular Biology Biochem 218 – BioMedical Informatics.
Pattern databasesPattern databasesPattern databasesPattern databases Gopalan Vivek.
Ch10. Intermolecular Interactions and Biological Pathways
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Bioinformatics for biomedicine
1 Bio-Trac 25 (Proteomics: Principles and Methods) October 5, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department.
Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski Hugh Nicholas
NCBI’s Bioinformatics Resources Michele R. Tennant, Ph.D., M.L.I.S. Health Science Center Libraries U.F. Genetics Institute January 2015.
IProLINK – A Literature Mining Resource at PIR (integrated Protein Literature INformation and Knowledge ) Hu ZZ 1, Liu H 2, Vijay-Shanker K 3, Mani I 4,
Biology 224 Instructor: Tom Peavy Feb 21 & 26, Protein Structure & Analysis.
Intralab Workshop - Reactome CMAP Chang-Feng Quo June 29 th, 2006.
Biological Databases By : Lim Yun Ping E mail :
1 Bio-Trac 40 (Protein Bioinformatics) October 8, 2009 Zhang-Zhi Hu, M.D. Associate Professor Department of Oncology Department of Biochemistry and Molecular.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
BLOCKS Multiply aligned ungapped segments corresponding to most highly conserved regions of proteins- represented in profile.
Protein Information Resource Protein Information Resource, 3300 Whitehaven St., Georgetown University, Washington, DC Contact
Protein and RNA Families
Mining Biological Data. Protein Enzymatic ProteinsTransport ProteinsRegulatory Proteins Storage ProteinsHormonal ProteinsReceptor Proteins.
Functional Annotation and Functional Enrichment. Annotation Structural Annotation – defining the boundaries of features of interest (coding regions, regulatory.
Other biological databases and ontologies. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and.
Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department.
Sequencing the World of Possibilities for Energy & Environment MGM workshop. 19 Oct 2010 Information Sources for Genomics Konstantinos Mavrommatis Genome.
EB3233 Bioinformatics Introduction to Bioinformatics.
MARC: Developing Bioinformatics Programs July 2009 Alex Ropelewski PSC-NRBSC Bienvenido Vélez UPR Mayaguez 1 Essential Computing for Bioinformatics Lecture.
A collaborative tool for sequence annotation. Contact:
Bioinformatics and Computational Biology
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
March 28, 2002 NIH Proteomics Workshop Bethesda, MD Lai-Su Yeh, Ph.D. Protein Scientist, National Biomedical Research Foundation Demo: Protein Information.
Central hub for biological data UniProtKB/Swiss-Prot is a central hub for biological data: over 120 databases are cross-referenced (EMBL/DDBJ/GenBank,
InterPro Sandra Orchard.
Protein databases Petri Törönen Shamelessly copied from material done by Eija Korpelainen and from CSC bio-opas
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Tutorial: Bioinformatics Resources ( georgetown
Demo: Protein Information Resource
Archives and Information Retrieval
Biological Sequence Databases
UniProt: Universal Protein Resource
Genome Annotation Continued
PIR: Protein Information Resource
Overview of Microbial Pathway and Genome Databases
Tutorial: Bioinformatics Resources
Protein Sequence Analysis - Overview -
Protein Sequence Analysis - Overview -
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

1 Bio-Trac 25 (Proteomics: Principles and Methods) October 3, 2008 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department of Biochemistry and Molecular & Cellular Biology Georgetown University Medical Center Tutorial: Bioinformatics Resources (

2 computer + mouse = bioinformatics (information) (biology) NIH Biomedical Information Science and Technology Initiative (BISTI) Working Definition (2000) - Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data. What is Bioinformatics?

3 Molecular Biology Database Collection ( ent/full/36/suppl_1/D2 ) ent/full/36/suppl_1/D key databases of 14 categories

4 Database Collection in Nucleic Acids Res.

5 Online Access to Database Collection

6 Overview I.Text search / Information retrieval II.Sequence & genomics databases III.Protein family databases IV.Databases of protein functions V.Databases of protein structures VI.Proteomics databases Database Contents, Search and Retrieval Lab session Lab session

7 Entrez Text Searches ( Lab Integrated one-stop search

8 PubMed Literature Database ( Lab Literature mining PMID:

9 iProLINK: Protein Literature Mining Resource Text mining for protein phosphorylation Gene/protein name thesaurus: synonyms, ambiguous names… Lab RLIMS-P: BioThesaurus:

10 BioThesaurus: Gene/protein name searches - synonyms, ambiguous names… Synonyms: CRYAA crystallin, alpha A CRYA1 HSPB4… Lab

11 RLIMS-P: Text mining for protein phosphorylation Lab

12 Google type search vs. PIR Text Search (I) Lab ( search/textsearch.html) search/textsearch.htmlhttp://pir.georgetown.edu/pirwww/ search/textsearch.html Boolean searches: AND, OR, NOT

13 PIR Text Search (II) Search for synonyms Lab Search: alpha crystallin A chain that are in protein families? null = absent; not null = present

14 PIR Text Search (III) Search: what crystallins are enzymes and what families they belong to? Can you find which crystallins have 3D structure determined? Lab Argininosuccinate lyase (EC )

15 Find proteins related to diabetes and with 3D- structure determined? UniProt Text Search Lab

16 Search continues… Lab

17 I. Sequence & Genomics Databases NCBI Resources –GenBank: An annotated collection of all publicly available nucleotide and protein sequences. –RefSeq: NCBI non-redundant set of reference sequences, including genomic DNA, transcript (RNA), and protein products –Entrez Gene: Gene-centered information at NCBI. –UniGene: Unified clusters of ESTs and full-length mRNA sequences. –OMIM: Online Mendelian inheritance in man: a catalog of human genetic and genomic disorders. UniProt Consortium Database: Universal protein resource, a central repository of protein sequence and function. Model Organism Genome Databases: MGD, RGD, SGD, Flybase… GeneCards: Integrated database of human genes, maps, proteins and diseases. SNP Consortium Database (dbSNP); International HapMap Project: Genes associated with human diseases (

18 UniProt Consortium Databases ( Universal Protein Resource Since October 2002 New! UUW 6.6 million Since July 2008

19 UniProt Report (I) Sections of the record Lab Entry View: Sequence & Annotation

20 UniProt Report (II) – sequence and features Lab

21 UniProt Report (III) – UniRef90

22 Entrez Gene – Gene centric information

23 OMIM: Online Mendelian inheritance in man ( Juvenile cataract of Down syndrome Autosomal recessive congenital progressive cataract

24 II. Protein Family Databases Whole Proteins –PIRSF: Nonoverlapping Classification of Full Length Proteins Based on Evolutionary Relationship –COG (Clusters of Orthologous Groups) of Complete Genomes –PANTHER: Proteins Classified into Families/Subfamilies of Shared Function –ProtoNet: Automatic Hierarchical Classification of Proteins Protein Domains –Pfam: Alignments and HMM Models of Protein Domains –SMART: Protein Domain Identification and Annotation –CDD: Conserved Domain Database Protein Motifs –PROSITE: Protein Patterns and Profiles –BLOCKS: Protein Sequence Motifs and Alignments –PRINTS: Compendium of Protein Fingerprints (a group of conserved motifs) Integrated Family Databases –InterPro: Integrate Pfam, PRINTS, PROSITES, ProDom, SMART, PIRSF, SuperFamily…

25 Protein Clustering COGs: ( nih.gov/COG/) nih.gov/COG/ nih.gov/COG/ Initial version New version: Includes Eukaryotic Clusters - KOGs

26 PIRSF: Full Length Classification iProClass Family Report ( Lab

27 Domain Classification – Pfam Domain ( bin/ipcEntry?id=P02493) bin/ipcEntry?id=P02493 ( bin/Pfam/swisspfamget.pl?name= CRYAA_RABIT) bin/Pfam/swisspfamget.pl?name= CRYAA_RABIT

28 Pfam Domain ( bin/Pfam/getacc?PF00525) bin/Pfam/getacc?PF00525

29 Protein Motifs: PROSITE – A database of protein families and domains. It consists of biologically significant sites, patterns and profiles. (

30 Integrated Family Classification InterPro InterPro: An integrated resource unifying PROSITE, PRINTS, ProDom, Pfam, SMART, and TIGRFAMs, PIRSF. ( interpro/search.html) interpro/search.html Mapping of families

31 III. Databases of Protein Functions Metabolic Pathways, Enzymes, and Compounds –Enzyme Classification: Classification and Nomenclature of Enzyme- Catalysed Reactions (EC-IUBMB) –KEGG (Kyoto Encyclopedia of Genes and Genomes): Metabolic Pathways –LIGAND (at KEGG): Chemical Compounds, Reactions and Enzymes –EcoCyc: Encyclopedia of E. coli Genes and Metabolism –MetaCyc: Metabolic Encyclopedia (Metabolic Pathways) –BRENDA: Enzyme Database –UM-BBD: Microbial Biocatalytic Reactions and Biodegradation Pathways Inter-Molecular Interactions and Regulatory Pathways –IntAct: Protein interaction data from literature and user submission –BIND: Descriptions of interactions, molecular complexes and pathways –DIP: Catalogs experimentally determined interactions between proteins –Reactome - A curated knowledgebase of biological pathways –BioCarta: Biological pathways of human and mouse –GO: Gene Ontology Consortium Database Pathway Resources - Pathguide

32 Biological Pathway Resource Collection Protein-protein interactions Metabolic pathways Signaling pathways Pathway diagrams Transcription factors / gene regulatory networks Protein-compound interactions Genetic interaction networks

33 Pathway Commons Search across multiple pathway databases; common format for global analysis

34 KEGG Metabolic & Regulatory Pathways ( bin/show_pathway?hsa ) bin/show_pathway?hsa KEGG is a suite of databases and associated software, integrating our current knowledge on molecular interaction networks, the information of genes and proteins, and of chemical compounds and reactions. ( Lab

35 BioCyc: EcoCyc/MetaCyc Metabolic Pathways The BioCyc Knowledge Library is a collection of Pathway/Genome Databases (

36 BioCarta Cellular Pathways (

37 Reactome: Collaboration of CSHL, EBI and GO Consortium Curated resource of core pathways and reactions in human biology Authored by biological researchers of field experts Cross-referenced with NCBI, Ensembl and UniProt, HapMap, KEGG… Inferred orthologous events in 22 non-human species (mouse, rat…)

38 Transforming Growth Factor (TGF) beta signaling [Homo sapiens] Event ->REACT_6879.1: Activated type I receptor phosphorylates R-SMAD directly [Homo sapiens] Object -> REACT_7364.1: Phospho-R-SMAD [cytosol] Event -> REACT_6760.1: Phospho-R-SMAD forms a complex with CO-SMAD [Homo sapiens] Object -> REACT_7344.1: Phospho-R-SMAD:CO-SMAD complex [cytosol] Event -> REACT_6726.1: The phospho-R-SMAD:CO-SMAD transfers to the nucleus Object -> REACT_7382.2: Phospho-R-SMAD:CO-SMAD complex [nucleoplasm] …… ( bin/eventbrowser?DB=gk_curre nt&FOCUS_SPECIES=Homo% 20sapiens&ID=170834&) bin/eventbrowser?DB=gk_curre nt&FOCUS_SPECIES=Homo% 20sapiens&ID=170834& Reactome: events and objects (including modified forms and complex)

39 Protein-Protein Interaction Database - IntAct (

40 Gene Ontology (GO) - Molecular Function - Biological Process - Cellular Component (

41 IV. Databases of Protein Structures Protein Structure –PDB: Structure Determined by X-ray Crystallography and NMR –PDBsum: Summaries and analyses of PDB structures –MMDB: NCBI’s database of 3D structures, part of NCBI Entrez –SWISS-MODEL Repository: Database of annotated protein 3D models –ModBase: Annotated comparative protein structure models Structure Classification –CATH: Hierarchical Classification of Protein Domain Structures –SCOP: Familial and Structural Protein Relationships –FSSP: Protein Fold Classification Based on Structure--Structure Alignment

42 PDB: Experimental 3D Structure Repository ( Rat gamma-crystallin (chain A, B.) Can you do a text search at PIR to find this (CRGE_RAT)? Lab

43 PDBsum: Pictorial Database to Provide Summary and Analysis to PDB Entries Search 3-D structure summary 2-D structure summary ( n-srv/databases/pdbsum/) n-srv/databases/pdbsum/

44 Protein Structural Classification (1) CATH: Hierarchical domain classification of protein structures ( )

45 Protein Structural Classification (2) ( SCOP: comprehensive description of structural and evolutionary relationships between all proteins whose structure is known.

46 SWISS-MODEL Repository A database of annotated three-dimensional comparative protein structure models A database of annotated three-dimensional comparative protein structure models ( r_ac=CRBA1_MOUSE&job=2) r_ac=CRBA1_MOUSE&job=2

47 VI. Proteomic Resources GELBANK ( 2D-gel patterns of species with completed genomes.GELBANK ( 2D-gel patterns of species with completed genomes. SWISS-2DPAGE ( index of 2D-gelsSWISS-2DPAGE ( index of 2D-gelshttp:// PEP ( pep/): Predictions for Entire Proteomes: summarized analyses of protein sequencesPEP ( pep/): Predictions for Entire Proteomes: summarized analyses of protein sequenceshttp://cubic.bioc.columbia.edu/ pep/ pep/ Integr8 ( A browser for information relating to completed genomes and proteomes, based on data contained in Genome Reviews and the UniProt proteome setsIntegr8 ( A browser for information relating to completed genomes and proteomes, based on data contained in Genome Reviews and the UniProt proteome setshttp:// PRIDE ( PRoteomics IDEntifications database Expression Profiling databasesPRIDE ( PRoteomics IDEntifications database Expression Profiling databaseshttp:// GPMdb ( Mass spec proteomics DatabasesGPMdb ( Mass spec proteomics Databaseshttp://gpmdb.thegpm.org/ PeptideAtlas ( compendium of peptides identified in a large set of tandem mass spectrometry proteomic experimentsPeptideAtlas ( compendium of peptides identified in a large set of tandem mass spectrometry proteomic experimentshttp:// HUPO ( Human Proteome Organization to foste international proteomics initiatives.HUPO ( Human Proteome Organization to foste international proteomics initiatives.

48 2D-Gel Image Databases ( Part of WORLD-2DPAGE: index to 2-D PAGE databases and services ( Lab

49 GPMdb: MS Data Search ( /) / Craig, et al., J Proteome Res. 2004, 3:

50 PRIDE: centralized, standards compliant, public data repository for proteomics data HUPO Plasma Proteome Project

51 Protein Examples Rabbit alpha crystallin A (UniProtKB: CRYAA_RABIT/P02493) Delta crystallin II (Argininosuccinate lyase) (UniProtKB: ARLY2_ANAPL/P24058) Any additional proteins of your interest for search and retrieval Lab: I.Text search / Information retrieval 1.Literature search and text mining –Finding synonyms (BioThesaurus) –Information extraction (e.g., protein phosphorylation sites) 2.Find the sequence for the rabbit alpha crystallin A chain 3.Find all alpha crystallin A chain classified in protein families 4.Search crystallins that have active enzyme activities 5.Find crystallins that have determined 3D structures II.Database contents (reports) 1.Sequence & genomics databases (UniProt) 2.Protein family databases (PIRSF) 3.Database of protein functions (KEGG) 4.Databases of protein structures (PDB) 5.Proteomics databases (Swiss-2D)