Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research.

Similar presentations


Presentation on theme: "Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research."— Presentation transcript:

1 Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research Assistant Professor, Department of Biochemistry and Molecular Biology Georgetown University Medical Center Tutorial: Bioinformatics Resources Tutorial: Bioinformatics Resources (http://pir.georgetown.edu/~huz/class/bioinfo_resource.html)http://pir.georgetown.edu/~huz/class/bioinfo_resource.html

2 2 computer + mouse = bioinformatics (information) (biology) NIH Biomedical Information Science and Technology Initiative (BISTI) Working Definition (2000) - Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data. What is Bioinformatics?

3 3 Molecular Biology Database Collection -- 858 key databases of 15 categories (http://nar.oxfordjournals.org/cgi/content /full/34/suppl_1/D3/DC1)http://nar.oxfordjournals.org/cgi/content /full/34/suppl_1/D3/DC1

4 4 Database Collection in Nucleic Acids Res.

5 5 http://pir.georgetown.edu/~huz/class/2005_database_update.html Online Access to Database Collection http://www.oxfordjournals.org/nar/database/cap/ 2006

6 6 Overview I. Text search / Information retrieval II. Sequence & genomics databases III. Protein family databases IV. Database of protein functions V. Databases of protein structures VI. Proteomics databases Database Contents, Search and Retrieval

7 7 Text Searches Entrez Text Searches (http://www.ncbi.nlm.nih.gov/Entrez/)http://www.ncbi.nlm.nih.gov/Entrez/

8 8 PubMed Literature Database ( http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=Search&DB=PubMed) http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=Search&DB=PubMed

9 9 UniProt Text Search (http://www.pir.uniprot. org/cgi-bin/textSearch)http://www.pir.uniprot. org/cgi-bin/textSearch Google type search vs. Boolean searches: AND, OR, NOT

10 10 PIR Text Search (I) (http://pir.georgetown.edu/pirwww/ search/textsearch.html) http://pir.georgetown.edu/pirwww/ search/textsearch.htmlhttp://pir.georgetown.edu/pirwww/ search/textsearch.html Search: Alpha crystallin A chain and protein family?

11 11 PIR Text Search (II) Can you find which crystallin that has 3D structure determined? Search: Crystallins that are enzymes ?

12 12 I. Sequence & Genomics Databases GenBank An annotated collection of all publicly available nucleotide and protein sequences. GenBank : An annotated collection of all publicly available nucleotide and protein sequences. RefSeq: NCBI non-redundant set of reference sequences, including genomic DNA, transcript (RNA), and protein products UniProt Consortium Database : U niversal protein knowledgebase, a central resource of protein sequence and function from Swiss-Prot, TrEMBL and PIR. Entrez Gene: Gene-centered information at NCBI. UniGene: Unified clusters of ESTs and full-length mRNA sequences. OMIM : Online Mendelian inheritance in man: a catalog of human genetic and genomic disorders. Model Organism Genome Databases: MGD, RGD, SGD, Flybase… GeneCards : Integrated database of human genes, maps, proteins and diseases. SNP Consortium Database

13 13 UniProt Consortium Databases (http://www.uniprot.org) http://www.uniprot.org 2.85 million Universal Protein Resource UniProtKB UniRef UniParc

14 14 UniProt Sequence Report (I) (http://www.pir.uniprot.org/cgi- bin/unipEntry?id=CRYAA_RABIT)http://www.pir.uniprot.org/cgi- bin/unipEntry?id=CRYAA_RABIT What’s the difference between CRYAA_RABIT & CYRBAA?

15 15 UniProt Sequence Report (II) (http://www.pir.uniprot.org/cgi-bin/unipEntry?id=UniRef100_P02489)http://www.pir.uniprot.org/cgi-bin/unipEntry?id=UniRef100_P02489 (http://www.pir.uni prot.org/cgi- bin/unipEntry?id= UniRef90_P02489)http://www.pir.uni prot.org/cgi- bin/unipEntry?id= UniRef90_P02489

16 16 Entrez Gene http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd =Retrieve&dopt=Graphics&list_uids=12954#ubor0_RefSeq

17 17 OMIM: Online Mendelian inheritance in man (http://www.ncbi.nlm.nih.gov/entrez/dispomim.cgi?id=123580)http://www.ncbi.nlm.nih.gov/entrez/dispomim.cgi?id=123580

18 18 II. Protein Family Databases Whole Proteins PIRSF: A Network Classification System of Protein Families COG (Clusters of Orthologous Groups) of Complete Genomes ProtoNet: Automated Hierarchical Classification of Proteins Protein Domains Pfam: Alignments and HMM Models of Protein Domains SMART: Protein Domain Families CDD: Conserved Domain Database Protein Motifs PROSITE: Protein Patterns and Profiles BLOCKS: Protein Sequence Motifs and Alignments PRINTS: Protein Sequence Motifs and Signatures Integrated Family Databases iProClass: Superfamilies/Families, Domains, Motifs, Rich Links InterPro: Integrate Pfam, PRINTS, PROSITES, ProDom, SMART, PIRSF, SuperFamily

19 19 Protein Clustering COGs: (http://www.ncbi.nlm. nih.gov/COG/) http://www.ncbi.nlm. nih.gov/COG/http://www.ncbi.nlm. nih.gov/COG/

20 20 KOGs: Eukaryotic Clusters (http://www.ncbi.nlm.nih. gov/COG/new/shokog.cgi? KOG3591)http://www.ncbi.nlm.nih. gov/COG/new/shokog.cgi? KOG3591

21 21 Domain Classification (http://pir.georgetown.edu/cgi-bin/ipcEntry?id=CRYAA_RABIT)http://pir.georgetown.edu/cgi-bin/ipcEntry?id=CRYAA_RABIT (http://www.sanger.ac.uk/cgi- bin/Pfam/swisspfamget.pl?na me=CRYAA_RABIT)http://www.sanger.ac.uk/cgi- bin/Pfam/swisspfamget.pl?na me=CRYAA_RABIT

22 22 Pfam Domain (http://www.sanger.ac.uk/cgi- bin/Pfam/getacc?PF00525)http://www.sanger.ac.uk/cgi- bin/Pfam/getacc?PF00525

23 23 Integrated Family Classification InterPro InterPro: An integrated resource unifying PROSITE, PRINTS, ProDom, Pfam, SMART, and TIGRFAMs, PIRSF. (http://www.ebi.ac. uk/interpro/search. html)http://www.ebi.ac. uk/interpro/search. html

24 24 PIRSF: Full Length Classification iProClass Family Report (http://pir.georgetown.edu/cgi-bin/ipcSF?id=SF002280)http://pir.georgetown.edu/cgi-bin/ipcSF?id=SF002280

25 25 Protein Motifs PROSITE is a database of protein families and domains. It consists of biologically significant sites, patterns and profiles. (http://us.expasy.org/prosite/)http://us.expasy.org/prosite/

26 26 III. Databases of Protein Functions Metabolic Pathways, Enzymes, and Compounds Enzyme Classification: Classification and Nomenclature of Enzyme-Catalysed Reactions (EC-IUBMB) KEGG (Kyoto Encyclopedia of Genes and Genomes): Metabolic Pathways LIGAND (at KEGG): Chemical Compounds, Reactions and Enzymes EcoCyc: Encyclopedia of E. coli Genes and Metabolism MetaCyc: Metabolic Encyclopedia (Metabolic Pathways) BRENDA: Enzyme Database UM-BBD: Microbial Biocatalytic Reactions and Biodegradation Pathways Cellular Regulation and Gene Networks EpoDB: Genes Expressed during Human Erythropoiesis BIND: Descriptions of interactions, molecular complexes and pathways DIP: Catalogs experimentally determined interactions between proteins BioCarta: Biological pathways of human and mouse GO: Gene Ontology Consortium Database

27 27 KEGG Metabolic & Regulatory Pathways (http://www.genome.ad.jp/dbget- bin/show_pathway?hsa00220+4.3.2.1)http://www.genome.ad.jp/dbget- bin/show_pathway?hsa00220+4.3.2.1 KEGG is a suite of databases and associated software, integrating our current knowledge on molecular interaction networks, the information of genes and proteins, and of chemical compounds and reactions. (http://www.genome.ad.jp/kegg/kegg2.html)http://www.genome.ad.jp/kegg/kegg2.html

28 28 BioCyc (EcoCyc/MetaCyc Metabolic Pathways) The BioCyc Knowledge Library is a collection of Pathway/Genome Databases (http://biocyc.org/)http://biocyc.org/

29 29 BioCarta Cellular Pathways (http://www.biocarta.com/index.asp)http://www.biocarta.com/index.asp

30 30 Protein-Protein Interaction: BIND (http://www.bind.ca/) http://www.bind.ca/

31 31 Gene Ontology (http://www.geneontology.org/) http://www.geneontology.org/ Three GOs: Molecular Function Biological Process Cellular Component

32 32 IV. Databases of Protein Structures Protein Structure PDB: Structure Determined by X-ray Crystallography and NMR PDBsum: Summaries and analyses of PDB structures MMDB: NCBI’s database of 3D structures, part of NCBI Entrez SWISS-MODEL Repository: Database of annotated protein 3D models ModBase: Annotated comparative protein structure models Structure Classification CATH: Hierarchical Classification of Protein Domain Structures SCOP: Familial and Structural Protein Relationships FSSP: Protein Fold Classification Based on Structure--Structure Alignment

33 33 PDB: Experimental 3D Structure Repository (http://www.rcsb.org/pdb/)http://www.rcsb.org/pdb/ Rat gamma-crystallin, chain A, B. Can you do a text search at PIR to find this?

34 34 PDBsum: Summary and Analysis Summary and Analysis (http://www.ebi.ac.uk/thornton- srv/databases/pdbsum/)http://www.ebi.ac.uk/thornton- srv/databases/pdbsum/ Search 3-D structure summary 2-D structure

35 35 Protein Structural Classification (1) CATH: Hierarchical domain classification of protein structures (http://www.biochem. ucl.ac.uk/bsm/cath_new/)http://www.biochem. ucl.ac.uk/bsm/cath_new/

36 36 Protein Structural Classification (2) (http://scop.mrc-lmb.cam.ac.uk/scop/data/scop.b.html)http://scop.mrc-lmb.cam.ac.uk/scop/data/scop.b.html SCOP: comprehensive description of structural and evolutionary relationships between all proteins whose structure is known.

37 37 SWISS-MODEL Repository A database of annotated three-dimensional comparative protein structure models A database of annotated three-dimensional comparative protein structure models (http://swissmodel.expasy.org/repository/s mr.php?sptr_ac=CRGE_RAT&job=2)http://swissmodel.expasy.org/repository/s mr.php?sptr_ac=CRGE_RAT&job=2

38 38 VI. Proteomic Resources GELBANK (http://gelbank.anl.gov): 2D-gel patterns from completed genomes; SWISS-2DPAGE (http://www.expasy.org/ch2d/) http://gelbank.anl.govhttp://www.expasy.org/ch2d/http://gelbank.anl.govhttp://www.expasy.org/ch2d/ PEP (http://cubic.bioc.columbia.edu/ pep/): Predictions for Entire Proteomes: summarized analyses of protein sequences http://cubic.bioc.columbia.edu/ pep/http://cubic.bioc.columbia.edu/ pep/ Integr8 ( http://www.ebi.ac.uk/integr8/ ): A browser for information relating to completed genomes and proteomes, based on data contained in Genome Reviews and the UniProt proteome sets http://www.ebi.ac.uk/integr8/ PRIDE (http://www.ebi.ac.uk/pride/): PRoteomics IDEntifications database Expression Profiling databases http://www.ebi.ac.uk/pride/ GPMdb (http://gpmdb.thegpm.org/): Mass Spec Proteomics Databases http://gpmdb.thegpm.org/

39 39 2D-Gel Image Databases (1) (http://us.expasy.org/ch2d/2d-index.html)http://us.expasy.org/ch2d/2d-index.html (http://us.expasy.org/cgi-bin/nice2dpage.pl?P02489)http://us.expasy.org/cgi-bin/nice2dpage.pl?P02489

40 40 2D-Gel Image Databases (2) (http://gelbank.anl.gov/2dgels/index.asp)http://gelbank.anl.gov/2dgels/index.asp

41 41 GPMdb MS Data Search http://gpmdb.thegpm.org/ Craig, et al., J Proteome Res. 2004, 3:1234-42.

42 42 iProLINK: Protein Literature Mining Resource http://pir.georgetown.edu/iprolink/ Text mining of Protein phospohrylation Gene/protein name thesaurus: synonyms, ambiguous names…

43 43 Choose additional protein IDs to browse the variety of molecular biology databases each sequence report links to. Delta crystallin II (Argininosuccinate lyase) (UniProt: ARLY2_ANAPL/P24058) Alpha crystallin A (UniProt: CRYAA_RABIT/P02493)Lab:


Download ppt "Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research."

Similar presentations


Ads by Google