Presentation is loading. Please wait.

Presentation is loading. Please wait.

BIO-TRAC 25 (Proteomics: Principles and Methods) March 28, 2003 NIH, Bethesda, MD Zhang-Zhi Hu, M.D. Bioinformatics Scientist, Protein Information Resource.

Similar presentations


Presentation on theme: "BIO-TRAC 25 (Proteomics: Principles and Methods) March 28, 2003 NIH, Bethesda, MD Zhang-Zhi Hu, M.D. Bioinformatics Scientist, Protein Information Resource."— Presentation transcript:

1 BIO-TRAC 25 (Proteomics: Principles and Methods) March 28, 2003 NIH, Bethesda, MD Zhang-Zhi Hu, M.D. Bioinformatics Scientist, Protein Information Resource National Biomedical Research Foundation Tutorial: Bioinformatics Resources

2 2 What is Bioinformatics? NIH Biomedical Information Science and Technology Initiative (BISTI) Working Definition (2002) - Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data. Bioinformatics is the application of information technology to the analysis, organization and distribution of biological data in order to answer complex biological questions.

3 3 Bioinformatics Resources The Molecular Biology Database Collection: An Online Compilation of Relevant Database Resources 2003 update: http://www3.oup.co.uk/nar/database/ 2003 update: http://www3.oup.co.uk/nar/database/http://www3.oup.co.uk/nar/database/ Nucleic Acids Research Database Issues (January Annually) (2003 - http://nar.oupjournals.org/content/vol31/issue1/) Nucleic Acids Research Database Issues (January Annually) (2003 - http://nar.oupjournals.org/content/vol31/issue1/)http://nar.oupjournals.org/content/vol31/issue1/ DBcat: A Catalog of > 500 Biological Databases http://www.infobiogen.fr/services/dbcat/ http://www.infobiogen.fr/services/dbcat/ http://www.infobiogen.fr/services/dbcat/

4 4 Molecular Biology Database Collection Molecular Biology Database Collection (http://nar.oupjournals.org/cgi/content/full/31/1/1#GKG120TB1)http://nar.oupjournals.org/cgi/content/full/31/1/1#GKG120TB1

5 5 The Molecular Biology Database Collection: 2003 update (Baxevanis, A.D.) -- An online resource of 386 key databases of 18 categories Major sequence repositories Comparative Genomics Gene Expression Gene Identification and Structure Genetic and Physical Maps Genomic Databases Intermolecular Interactions Metabolic Pathways and Cellular Regulation Mutation Databases Pathology Protein Sequence Motifs Proteome Resources Retrieval Systems and Database Structure RNA Sequences StructureTransgenics Varied Biomedical Content

6 6 Overview Protein Sequence Analysis I. Sequence Similarity Search and Alignment II. Family Classification Methods III. Structure Prediction Methods Molecular Biology Databases IV. Protein Family Databases V. Database of Protein Functions VI. Databases of Protein Structures Proteomic Resources VII. 2D-gel databases VIII. Proteomic analyses

7 7 I. Sequence Similarity Search Find a protein sequence: text search Based on Pair-Wise Comparisons BLOSUM scoring matrix BLOSUM scoring matrix PAM scoring matrix PAM scoring matrix Dynamic Programming Algorithms Global Similarity: Needleman-Wunsch (GAP/BestFit) Global Similarity: Needleman-Wunsch (GAP/BestFit) Local Similarity: Smith-Waterman (SSEARCH) Local Similarity: Smith-Waterman (SSEARCH) Heuristic Algorithms (Sequence Database Searching) FASTA: Based on K-Tuples (2-Amino Acid) FASTA: Based on K-Tuples (2-Amino Acid) BLAST: Triples of Conserved Amino Acids BLAST: Triples of Conserved Amino Acids Gapped-BLAST: Allow Gaps in Segment Pairs (NREF) Gapped-BLAST: Allow Gaps in Segment Pairs (NREF) PHI-BLAST: Pattern-Hit Initiated Search (NCBI) PHI-BLAST: Pattern-Hit Initiated Search (NCBI) PSI-BLAST: Iterative Search (NCBI) PSI-BLAST: Iterative Search (NCBI)

8 8 Sequence Search by Text or Unique ID (http://www.ncbi.nlm.nih.gov/Entrez/) (http://pir.georgetow n.edu/pirwww/search /textsearch.html)

9 9 Pair-Wise Comparisons Scoring matrix lobal local Global and local Similarity: Dynamic Programming ( (Needleman-Wunsch, Smith-Waterman) (http://www.ebi.ac.uk/emboss/align/) http://www.ebi.ac.uk/emboss/align/

10 10 FASTA Search (http://www.ebi. ac.uk/fasta33/)http://www.ebi. ac.uk/fasta33/ (http://pir.georgetown.edu/pirwww/search/fasta.html)http://pir.georgetown.edu/pirwww/search/fasta.html

11 11 Gapped-BLAST Search (http://pir.georgetown.edu/pirwww/search/pirnref.shtml)http://pir.georgetown.edu/pirwww/search/pirnref.shtml (http://www.ncbi.nlm.nih.gov/BLAST/)http://www.ncbi.nlm.nih.gov/BLAST/

12 12 PSI-BLAST Iterative Search (http://www.ncbi.nlm.nih.gov/BLAST/)http://www.ncbi.nlm.nih.gov/BLAST/

13 13 PSI-BLAST

14 14 II. Family Classification Methods Multiple Sequence Alignment and Phylogenetic Analysis ClustalW Multiple Sequence Alignment ClustalW Multiple Sequence Alignment Alignment Editor & Phylogenetic Trees Alignment Editor & Phylogenetic Trees Based on Family Information PROSITE Pattern Search PROSITE Pattern Search Motif and Profile Search Motif and Profile Search Hidden Markov Model (HMMs) Hidden Markov Model (HMMs)

15 15 Multiple Sequence Alignment ClustalW ( http://pir.georgetown.edu/pirwww/search/multaln.html ) http://pir.georgetown.edu/pirwww/search/multaln.html

16 16 Alignment Editor (Jalview) (http://www.ebi.ac.uk/clustalw/)http://www.ebi.ac.uk/clustalw/

17 17 Alignment Editor (GeneDoc) (http://www.psc.edu/biomed/genedoc/)http://www.psc.edu/biomed/genedoc/

18 18 Phylogenetic Analysis Tree Programs: (http://evolution. genetics.washington.edu/phylip.html) Tree Searches: (http://pauling. mbu.iisc.ernet.in/~pali/index.html)http://pauling. mbu.iisc.ernet.in/~pali/index.html

19 19 PROSITE Pattern Search (http://pir.georgetown.edu/pirwww/search/patmatch.html)http://pir.georgetown.edu/pirwww/search/patmatch.html

20 20 Profile Search (http://bmerc-www.bu.edu/bioinformatics/profile_request.html)http://bmerc-www.bu.edu/bioinformatics/profile_request.html

21 21 Hidden Markov Model Search (http://www.sanger.ac.uk/Software/Pfam/search.shtml)http://www.sanger.ac.uk/Software/Pfam/search.shtml (http://smart.embl -heidelberg.de)http://smart.embl -heidelberg.de

22 22 III. Structural Prediction Methods Signal Peptide (e.g. http://www.cbs.dtu.dk/services/)http://www.cbs.dtu.dk/services/ Transmembrane Helix (e.g. http://www.cbs.dtu.dk/services/)http://www.cbs.dtu.dk/services/ 2D Prediction (e.g. http://cubic.bioc.columbia.edu/ predictprotein/, http://www.compbio.dundee.ac.uk/ WWW_Servers/JPred/jpred.html)http://www.compbio.dundee.ac.uk/ 3D Modeling (e.g. http://guitar.rockefeller.edu/modeller/ modeller.html)http://guitar.rockefeller.edu/modeller/ modeller.html

23 23 Structure Prediction: A Guide (www.bmm.icnet.uk/ people/rob/CCP11B BS/flowchart2.html)www.bmm.icnet.uk/ people/rob/CCP11B BS/flowchart2.html

24 24 Protein Prediction Server (http://www.cbs. dtu.dk/services/)http://www.cbs. dtu.dk/services/

25 25 Signal Peptide Prediction (http://www.stepc.gr/~synaptic/sigfind.html)http://www.stepc.gr/~synaptic/sigfind.html (http://www.cbs.dtu. dk/services/SignalP)http://www.cbs.dtu. dk/services/SignalP

26 26 Transmembrane Helix (http://www.cbs.dtu.dk/services/TMHMM/)http://www.cbs.dtu.dk/services/TMHMM/

27 27 Protein Structure Prediction (http://cmgm.stanford.edu/WWW/www_predict.html)http://cmgm.stanford.edu/WWW/www_predict.html (http://restools.sdsc.edu/ biotools/biotools9.html)http://restools.sdsc.edu/ biotools/biotools9.html

28 28 Structure Prediction Server (http://cubic.bioc.columbia.edu/predictprotein/)http://cubic.bioc.columbia.edu/predictprotein/ (http://www.compbio.dun dee.ac.uk/WWW_Servers/ JPred/jpred.html)http://www.compbio.dun dee.ac.uk/WWW_Servers/ JPred/jpred.html

29 29 3D-Modelling (http://guitar.rockefeller.edu/modeller/modeller.html)http://guitar.rockefeller.edu/modeller/modeller.html (http://www.expasy. ch/swissmod/SWISS -MODEL.html)http://www.expasy. ch/swissmod/SWISS -MODEL.html

30 30 IV. Protein Family Databases Whole Proteins PIR: Superfamilies and Families COG (Clusters of Orthologous Groups) of Complete Genomes ProtoNet: Automated Hierarchical Classification of Proteins Protein Domains Pfam: Alignments and HMM Models of Protein Domains SMART: Protein Domain Families Protein Motifs PROSITE: Protein Patterns and Profiles BLOCKS: Protein Sequence Motifs and Alignments PRINTS: Protein Sequence Motifs and Signatures Integrated Family Databases iProClass: Superfamilies/Families, Domains, Motifs, Rich Links InterPro: Integrate Pfam, PRINTS, PROSITES, ProDom, SMART

31 31 Protein Clustering (http://www.ncbi.nlm.nih.gov/COG/) http://www.ncbi.nlm.nih.gov/COG/

32 32 Protein Domains Pfam (http://www.sanger.ac.uk/Software/Pfam/)http://www.sanger.ac.uk/Software/Pfam/ SMART (http:// smart.embl-heid elberg.de/smart/ show_motifs.pl)

33 33 Protein Motifs PROSITE is a database of protein families and domains. It consists of biologically significant sites, patterns and profiles. (http://www.expasy.ch/prosite/)

34 34 Integrated Family Classification InterPro: An integrated resource unifying PROSITE, PRINTS, ProDom, Pfam, SMART, and TIGRFAMs. (http://www.ebi.ac.uk/interpro/search.html)http://www.ebi.ac.uk/interpro/search.html

35 35 V. Databases of Protein Functions Metabolic Pathways, Enzymes, and Compounds Enzyme Classification: Classification and Nomenclature of Enzyme-Catalysed Reactions (EC-IUBMB) KEGG (Kyoto Encyclopedia of Genes and Genomes): Metabolic Pathways LIGAND (at KEGG): Chemical Compounds, Reactions and Enzymes EcoCyc: Encyclopedia of E. coli Genes and Metabolism MetaCyc: Metabolic Encyclopedia ( Metabolic Pathways) WIT: Functional Curation and Metabolic Models BRENDA: Enzyme Database UM-BBD: Microbial Biocatalytic Reactions and Biodegradation Pathways Klotho: Collection and Categorization of Biological Compounds Cellular Regulation and Gene Networks EpoDB: Genes Expressed during Human Erythropoiesis BIND: Descriptions of interactions, molecular complexes and pathways DIP: Catalogs experimentally determined interactions between proteins RegulonDB: Escherichia coli Pathways and Regulation

36 36 KEGG Metabolic & Regulatory Pathways (http://www.genome.ad.jp/dbget- bin/show_pathway?hsa00590+874)http://www.genome.ad.jp/dbget- bin/show_pathway?hsa00590+874 KEGG is a suite of databases and associated software, integrating our current knowledge on molecular interaction networks, the information of genes and proteins, and of chemical compounds and reactions. (http://www.genome.ad.jp/kegg/kegg2.html)http://www.genome.ad.jp/kegg/kegg2.html

37 37 BioCyc (EcoCyc/MetaCyc Metabolic Pathways) The BioCyc Knowledge Library is a collection of Pathway/Genome Databases (http://biocyc.org/)http://biocyc.org/

38 38 Protein-Protein Interactions: DIP (http://dip.doe-mbi.ucla.edu/)http://dip.doe-mbi.ucla.edu/

39 39 Protein-Protein Interaction: BIND (http://www.bind.ca/) http://www.bind.ca/

40 40 BioCarta Cellular Pathways (http://www.biocarta.com/index.asp)

41 41 VI. Databases of Protein Structures Protein Structure and Classification PDB: Structure Determined by X-ray Crystallography and NMR CATH: Hierarchical Classification of Protein Domain Structures SCOP: Familial and Structural Protein Relationships FSSP: Protein Fold Family Database Protein Sequence-Structure Relationship PIR-NRL3D: Protein Sequence-Structure Database PIR-RESID: Protein Structure/Post-Translational Modifications HSSP: Families and Alignments of Structurally-Conserved Regions

42 42 PDB Structure Data (http://www.rcsb.org/pdb/)http://www.rcsb.org/pdb/

43 43 PDBsum: Summary and Analysis Summary and Analysis (http://www.biochem.ucl. ac.uk/bsm/pdbsum)http://www.biochem.ucl. ac.uk/bsm/pdbsum

44 44 Protein Structural Classification CATH: Hierarchical domain classification of protein structures (http://www.biochem.ucl.ac.uk/bsm/cath_new/)http://www.biochem.ucl.ac.uk/bsm/cath_new/

45 45 Protein Structural Classification (http://scop.mrc-lmb. cam.ac.uk/scop/)http://scop.mrc-lmb. cam.ac.uk/scop/ The SCOP database aims to provide a detailed and comprehensive description of the structural and evolutionary relationships between all proteins whose structure is known, including all entries in the PDB.

46 46 Proteomic Resources GELBANK (http://gelbank.anl.gov): 2D-gel patterns from completed genomes; SWISS-2DPAGE (http://www.expasy.org/ch2d/) http://gelbank.anl.govhttp://www.expasy.org/ch2d/http://gelbank.anl.govhttp://www.expasy.org/ch2d/ PEP: Predictions for Entire Proteomes: (http://cubic.bioc.columbia.edu/ pep/): Summarized analyses of protein sequences http://cubic.bioc.columbia.edu/ pep/http://cubic.bioc.columbia.edu/ pep/ Proteome BioKnowledge Library: (http://www.proteome.com): Detailed information on human, mouse and rat proteomes http://www.proteome.com Proteome Analysis Database (http://www.ebi.ac.uk/proteome/): Online application of InterPro and CluSTr for the functional classification of proteins in whole genomes http://www.ebi.ac.uk/proteome/ Expression Profiling databases: GNF (http://expression.gnf.org/cgi- bin/index.cgi, human and mouse transcriptome), SMD (http://genome- www5.stanford.edu/MicroArray/SMD/, Stanford microarray data analysis), EBI Microarray Informatics (http://www.ebi.ac.uk/microarray/ index.html, managing, storing and analyzing microarray data) http://expression.gnf.org/cgi- bin/index.cgihttp://genome- www5.stanford.edu/MicroArray/SMD/http://www.ebi.ac.uk/microarray/ index.htmlhttp://expression.gnf.org/cgi- bin/index.cgihttp://genome- www5.stanford.edu/MicroArray/SMD/http://www.ebi.ac.uk/microarray/ index.html

47 47 VII. 2D-Gel Image Databases (http://www- lecb.ncifcrf.g ov/2dwgDB)http://www- lecb.ncifcrf.g ov/2dwgDB (http://gelbank.anl.gov/2dgels/index.asp)http://gelbank.anl.gov/2dgels/index.asp (2D-gel of human ventricle proteins)

48 48 VIII. Proteome Analysis (http://www.ebi.ac.uk/proteome)http://www.ebi.ac.uk/proteome

49 49 Expression Profiling Human and Mouse Transcriptome (http://expression.gnf.org/cgi-bin/index.cgi)http://expression.gnf.org/cgi-bin/index.cgi (http://genome-www. stanford.edu/serum/)http://genome-www. stanford.edu/serum/

50 50 Lab: Visit selected websites and analyze some protein sequence of your own choices. List of Bioinformatics Resources of this tutorial available: http://pir.georgetown.edu/~huz/bioinfo_resource.html Try some of the following sequences for analysis: 1) well characterized proteins: PIR:A26366(CYP17), JS0747(Sp1) 2) less characterized proteins: PIR:A59000(MATER) TrEMBL:Q9QY16(GRTH) 3) hypothetical protein: PIR:T12515, T00338, T47130 SWISS-PROT:Q9BWT7


Download ppt "BIO-TRAC 25 (Proteomics: Principles and Methods) March 28, 2003 NIH, Bethesda, MD Zhang-Zhi Hu, M.D. Bioinformatics Scientist, Protein Information Resource."

Similar presentations


Ads by Google