Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research.

Slides:



Advertisements
Similar presentations
Bio-Trac 25 (Proteomics: Principles and Methods) March 26, 2004 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist Protein Information Resource National.
Advertisements

Databases (“knowledge bases”) used in genome analysis
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
COT 6930 HPC and Bioinformatics Bioinformatics Resources and Databases Xingquan Zhu Dept. of Computer Science and Engineering.
Integration of Protein Family, Function, Structure Rich Links to >90 Databases Value-Added Reports for UniProtKB Proteins iProClass Protein Knowledgebase.
Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.
Gene Ontology John Pinney
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
BIO-TRAC 25 (Proteomics: Principles and Methods) March 28, 2003 NIH, Bethesda, MD Zhang-Zhi Hu, M.D. Bioinformatics Scientist, Protein Information Resource.
Archives and Information Retrieval
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Protein Databases EBI – European Bioinformatics Institute
The Cell, Central Dogma and Human Genome Project.
IST Computational Biology1 Information Retrieval Biological Databases 2 Pedro Fernandes Instituto Gulbenkian de Ciência, Oeiras PT.
The Protein Data Bank (PDB)
Class European Resources Protein Focused. Protein Databases EBI – European Bioinformatics Institute
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Signaling Pathways and Summary June 30, 2005 Signaling lecture Course summary Tomorrow Next Week Friday, 7/8/05 Morning presentation of writing assignments.
Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology.
An Introduction to Bioinformatics Molecular Biology Databases.
Doug Brutlag Professor Emeritus Biochemistry & Medicine (by courtesy) Protein Sequence Databases Computational Molecular Biology Biochem 218 – BioMedical.
1 iProLINK: An integrated protein resource for literature mining and literature-based curation 1. Bibliography mapping - UniProt mapped citations 2. Annotation.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Doug Brutlag Professor Emeritus Biochemistry & Medicine (by courtesy) Genome Databases Computational Molecular Biology Biochem 218 – BioMedical Informatics.
Pattern databasesPattern databasesPattern databasesPattern databases Gopalan Vivek.
Ch10. Intermolecular Interactions and Biological Pathways
Bioinformatics.
Development of Bioinformatics and its application on Biotechnology
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Bioinformatics for biomedicine
1 Bio-Trac 25 (Proteomics: Principles and Methods) October 5, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department.
1 Bio-Trac 25 (Proteomics: Principles and Methods) October 3, 2008 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department.
Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski Hugh Nicholas
NCBI’s Bioinformatics Resources Michele R. Tennant, Ph.D., M.L.I.S. Health Science Center Libraries U.F. Genetics Institute January 2015.
Biology 224 Instructor: Tom Peavy Feb 21 & 26, Protein Structure & Analysis.
Biological Databases By : Lim Yun Ping E mail :
Doug Raiford Lesson 3.  More and more sequence data is being generated every day  Useless if not made available to other researchers.
1 Bio-Trac 40 (Protein Bioinformatics) October 8, 2009 Zhang-Zhi Hu, M.D. Associate Professor Department of Oncology Department of Biochemistry and Molecular.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
BLOCKS Multiply aligned ungapped segments corresponding to most highly conserved regions of proteins- represented in profile.
Protein Database David Shiuan Department of Life Science Institute of Biotechnology Interdisciplinary Program of Bioinformatics National Dong Hwa University.
Protein Information Resource Protein Information Resource, 3300 Whitehaven St., Georgetown University, Washington, DC Contact
Protein and RNA Families
Mining Biological Data. Protein Enzymatic ProteinsTransport ProteinsRegulatory Proteins Storage ProteinsHormonal ProteinsReceptor Proteins.
Other biological databases and ontologies. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and.
Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department.
Sequencing the World of Possibilities for Energy & Environment MGM workshop. 19 Oct 2010 Information Sources for Genomics Konstantinos Mavrommatis Genome.
MARC: Developing Bioinformatics Programs July 2009 Alex Ropelewski PSC-NRBSC Bienvenido Vélez UPR Mayaguez 1 Essential Computing for Bioinformatics Lecture.
Bioinformatics and Computational Biology
March 28, 2002 NIH Proteomics Workshop Bethesda, MD Lai-Su Yeh, Ph.D. Protein Scientist, National Biomedical Research Foundation Demo: Protein Information.
InterPro Sandra Orchard.
Information retrieval and sliding window programs April 5, 2011 Hand in Homework #1. Homework #2 due Tuesday, April 12. Learning objectives- Understand.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
Tutorial: Bioinformatics Resources ( georgetown
Biological Databases By: Komal Arora.
Demo: Protein Information Resource
Archives and Information Retrieval
UniProt: Universal Protein Resource
Genome Annotation Continued
PIR: Protein Information Resource
Overview of Microbial Pathway and Genome Databases
Tutorial: Bioinformatics Resources
Protein Sequence Analysis - Overview -
Protein Sequence Analysis - Overview -
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Overview of Enzyme, Protein and Network Databases
Presentation transcript:

Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research Assistant Professor, Department of Biochemistry and Molecular Biology Georgetown University Medical Center Tutorial: Bioinformatics Resources Tutorial: Bioinformatics Resources (

2 computer + mouse = bioinformatics (information) (biology) NIH Biomedical Information Science and Technology Initiative (BISTI) Working Definition (2000) - Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data. What is Bioinformatics?

3 Molecular Biology Database Collection key databases of 15 categories ( /full/34/suppl_1/D3/DC1) /full/34/suppl_1/D3/DC1

4 Database Collection in Nucleic Acids Res.

5 Online Access to Database Collection

6 Overview I. Text search / Information retrieval II. Sequence & genomics databases III. Protein family databases IV. Database of protein functions V. Databases of protein structures VI. Proteomics databases Database Contents, Search and Retrieval

7 Text Searches Entrez Text Searches (

8 PubMed Literature Database (

9 UniProt Text Search ( org/cgi-bin/textSearch) org/cgi-bin/textSearch Google type search vs. Boolean searches: AND, OR, NOT

10 PIR Text Search (I) ( search/textsearch.html) search/textsearch.htmlhttp://pir.georgetown.edu/pirwww/ search/textsearch.html Search: Alpha crystallin A chain and protein family?

11 PIR Text Search (II) Can you find which crystallin that has 3D structure determined? Search: Crystallins that are enzymes ?

12 I. Sequence & Genomics Databases GenBank An annotated collection of all publicly available nucleotide and protein sequences. GenBank : An annotated collection of all publicly available nucleotide and protein sequences. RefSeq: NCBI non-redundant set of reference sequences, including genomic DNA, transcript (RNA), and protein products UniProt Consortium Database : U niversal protein knowledgebase, a central resource of protein sequence and function from Swiss-Prot, TrEMBL and PIR. Entrez Gene: Gene-centered information at NCBI. UniGene: Unified clusters of ESTs and full-length mRNA sequences. OMIM : Online Mendelian inheritance in man: a catalog of human genetic and genomic disorders. Model Organism Genome Databases: MGD, RGD, SGD, Flybase… GeneCards : Integrated database of human genes, maps, proteins and diseases. SNP Consortium Database

13 UniProt Consortium Databases ( million Universal Protein Resource UniProtKB UniRef UniParc

14 UniProt Sequence Report (I) ( bin/unipEntry?id=CRYAA_RABIT) bin/unipEntry?id=CRYAA_RABIT What’s the difference between CRYAA_RABIT & CYRBAA?

15 UniProt Sequence Report (II) ( ( prot.org/cgi- bin/unipEntry?id= UniRef90_P02489) prot.org/cgi- bin/unipEntry?id= UniRef90_P02489

16 Entrez Gene =Retrieve&dopt=Graphics&list_uids=12954#ubor0_RefSeq

17 OMIM: Online Mendelian inheritance in man (

18 II. Protein Family Databases Whole Proteins PIRSF: A Network Classification System of Protein Families COG (Clusters of Orthologous Groups) of Complete Genomes ProtoNet: Automated Hierarchical Classification of Proteins Protein Domains Pfam: Alignments and HMM Models of Protein Domains SMART: Protein Domain Families CDD: Conserved Domain Database Protein Motifs PROSITE: Protein Patterns and Profiles BLOCKS: Protein Sequence Motifs and Alignments PRINTS: Protein Sequence Motifs and Signatures Integrated Family Databases iProClass: Superfamilies/Families, Domains, Motifs, Rich Links InterPro: Integrate Pfam, PRINTS, PROSITES, ProDom, SMART, PIRSF, SuperFamily

19 Protein Clustering COGs: ( nih.gov/COG/) nih.gov/COG/ nih.gov/COG/

20 KOGs: Eukaryotic Clusters ( gov/COG/new/shokog.cgi? KOG3591) gov/COG/new/shokog.cgi? KOG3591

21 Domain Classification ( ( bin/Pfam/swisspfamget.pl?na me=CRYAA_RABIT) bin/Pfam/swisspfamget.pl?na me=CRYAA_RABIT

22 Pfam Domain ( bin/Pfam/getacc?PF00525) bin/Pfam/getacc?PF00525

23 Integrated Family Classification InterPro InterPro: An integrated resource unifying PROSITE, PRINTS, ProDom, Pfam, SMART, and TIGRFAMs, PIRSF. ( uk/interpro/search. html) uk/interpro/search. html

24 PIRSF: Full Length Classification iProClass Family Report (

25 Protein Motifs PROSITE is a database of protein families and domains. It consists of biologically significant sites, patterns and profiles. (

26 III. Databases of Protein Functions Metabolic Pathways, Enzymes, and Compounds Enzyme Classification: Classification and Nomenclature of Enzyme-Catalysed Reactions (EC-IUBMB) KEGG (Kyoto Encyclopedia of Genes and Genomes): Metabolic Pathways LIGAND (at KEGG): Chemical Compounds, Reactions and Enzymes EcoCyc: Encyclopedia of E. coli Genes and Metabolism MetaCyc: Metabolic Encyclopedia (Metabolic Pathways) BRENDA: Enzyme Database UM-BBD: Microbial Biocatalytic Reactions and Biodegradation Pathways Cellular Regulation and Gene Networks EpoDB: Genes Expressed during Human Erythropoiesis BIND: Descriptions of interactions, molecular complexes and pathways DIP: Catalogs experimentally determined interactions between proteins BioCarta: Biological pathways of human and mouse GO: Gene Ontology Consortium Database

27 KEGG Metabolic & Regulatory Pathways ( bin/show_pathway?hsa ) bin/show_pathway?hsa KEGG is a suite of databases and associated software, integrating our current knowledge on molecular interaction networks, the information of genes and proteins, and of chemical compounds and reactions. (

28 BioCyc (EcoCyc/MetaCyc Metabolic Pathways) The BioCyc Knowledge Library is a collection of Pathway/Genome Databases (

29 BioCarta Cellular Pathways (

30 Protein-Protein Interaction: BIND (

31 Gene Ontology ( Three GOs: Molecular Function Biological Process Cellular Component

32 IV. Databases of Protein Structures Protein Structure PDB: Structure Determined by X-ray Crystallography and NMR PDBsum: Summaries and analyses of PDB structures MMDB: NCBI’s database of 3D structures, part of NCBI Entrez SWISS-MODEL Repository: Database of annotated protein 3D models ModBase: Annotated comparative protein structure models Structure Classification CATH: Hierarchical Classification of Protein Domain Structures SCOP: Familial and Structural Protein Relationships FSSP: Protein Fold Classification Based on Structure--Structure Alignment

33 PDB: Experimental 3D Structure Repository ( Rat gamma-crystallin, chain A, B. Can you do a text search at PIR to find this?

34 PDBsum: Summary and Analysis Summary and Analysis ( srv/databases/pdbsum/) srv/databases/pdbsum/ Search 3-D structure summary 2-D structure

35 Protein Structural Classification (1) CATH: Hierarchical domain classification of protein structures ( ucl.ac.uk/bsm/cath_new/) ucl.ac.uk/bsm/cath_new/

36 Protein Structural Classification (2) ( SCOP: comprehensive description of structural and evolutionary relationships between all proteins whose structure is known.

37 SWISS-MODEL Repository A database of annotated three-dimensional comparative protein structure models A database of annotated three-dimensional comparative protein structure models ( mr.php?sptr_ac=CRGE_RAT&job=2) mr.php?sptr_ac=CRGE_RAT&job=2

38 VI. Proteomic Resources GELBANK ( 2D-gel patterns from completed genomes; SWISS-2DPAGE ( PEP ( pep/): Predictions for Entire Proteomes: summarized analyses of protein sequences pep/ pep/ Integr8 ( ): A browser for information relating to completed genomes and proteomes, based on data contained in Genome Reviews and the UniProt proteome sets PRIDE ( PRoteomics IDEntifications database Expression Profiling databases GPMdb ( Mass Spec Proteomics Databases

39 2D-Gel Image Databases (1) ( (

40 2D-Gel Image Databases (2) (

41 GPMdb MS Data Search Craig, et al., J Proteome Res. 2004, 3:

42 iProLINK: Protein Literature Mining Resource Text mining of Protein phospohrylation Gene/protein name thesaurus: synonyms, ambiguous names…

43 Choose additional protein IDs to browse the variety of molecular biology databases each sequence report links to. Delta crystallin II (Argininosuccinate lyase) (UniProt: ARLY2_ANAPL/P24058) Alpha crystallin A (UniProt: CRYAA_RABIT/P02493)Lab: