Presentation is loading. Please wait.

Presentation is loading. Please wait.

Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.

Similar presentations


Presentation on theme: "Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological."— Presentation transcript:

1 Other biological databases

2 Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological systems Protein families and domains Whole genome data Sequence data Ontologies -GO

3 Other Biological Databases Transcription factor binding sites - TRANSFAC Protein structure databases- PDB, SCOP, CATH Protein family databases- Pfam, Prints, PROSITE etc. Chemicals and small molecules - ChEBI Gene expression databases – GEO, ArrayExpress Metabolic pathways - Reactome, KEGG Genome Databases- Ensembl, FlyBase, WormBase etc. Human genetics-related databases –HapMap, dbSNP

4 Transcription factor binding sites TRANSFAC –database of eukaryotic transcription factors: http://www.gene- regulation.com/pub/databases.html#transfac TESS –Transcription Element Search System –for predicting transcription factor binding sites, uses TRANSFAC: http://www.cbi.upenn.edu/tess TFsearch –for searching transcription factor binding sites: http://www.cbrc.jp/research/db/TFSEARCH.html

5 Protein structure databases Main resource is Protein Data Bank (PDB): http://www.rcsb.org/pdb/ Contains the spatial coordinates of macromolecule atoms whose 3D structure has been obtained by X-ray or NMR studies Proteins represent more than 90% of available structures (others are DNA, RNA, sugars, viruses, protein/DNA complexes…) Can search by PDB code

6 Searching MSD http://www.ebi.ac.uk/msd -Search by PDB code

7 Protein structure-related databases Structural family databases based on PDB – SCOP (http://scop.mrc-lmb.cam.ac.uk/scop/) and CATH (http://www.biochem.ucl.ac.uk/bsm/cath/) Predicted structures in SWISS-MODEL (http://swissmodel.expasy.org//SWISS- MODEL.html)

8 Protein family databases Databases that produce signatures for identifying protein families or domains Used for functional classification of proteins E.g. Pfam, PROSITE, Prints, SMART, TIGRFAMs etc. Integrated into single resource InterPro (http://www.ebi.ac.uk/interpro)

9 InterProScan sequence search Stand-alone version available

10 InterPro text search Search keyword, protein acc or InterPro acc

11 Results for protein acc

12 Example InterPro entry

13 Chemicals and small molecules Chemical abstracts- http://www.cas.org/ ChEBI- http://www.ebi.ac.uk/chebi KEGG –part of it includes chemicals http://www.genome.jp/kegg ChemID plus -chemicals cited in NLM databases http://chem2.sis.nlm.nih.gov/chemidplus/chemi dlite.jsp MSD-Chem –ligands and chemicals in MSD

14 CheBI example entry

15 Hierarchy for chemicals

16 Gene expression databases NCBI Gene Expression Omnibus (GEO) http://www.ncbi.nlm.nih.gov/geo/ ArrayExpress http://www.ncbi.nlm.nih.gov/geo/ Stanford microarray database http://genome- www5.stanford.edu/ Can usually search for experiments or particular expression profiles

17 GEO search page

18 Profiles search results

19 Specific entry and experiment info

20 ArrayExpress search results

21 What does the data look like? Info on experiment, array used, etc. Raw or processed tab delimited file containing spots and their intensities cy3/cy5 ratios) across different samples Files with meta data e.g. sample info, annotation and coordinates of each spot on array

22 Proteomics: SWISS-2DPAGE

23 Enzymes and metabolic pathways Contain information describing enzymes, biochemical reactions and metabolic pathways; ENZYME and BRENDA: nomenclature databases that store information on enzyme names and reactions; IntEnz: Integrated relational Enzyme database

24 Enzyme nomenclature E.C. (Enzyme Commission) numbers assigned based on reactions they catalyze Hierarchy, high level groups: –EC 1 –Oxidoreductases –EC 2 –Transferases –EC 3 –Hydrolases –EC 4 –Lyases –EC 5 –Isomerases –EC 6 –Ligases

25 EC example

26 Metabolic Pathway databases PATHGUIDE >200 pathways KEGG (Kyoto encyclopedia of genes and genomes): http://www.genome.jp/kegg -includes: –Database of chemicals, genes and networks (metabolic, regulatory etc.) –Well-curated and quite specific EcoCyc (Encyclopedia of E. coli K12 genes and metabolism): http://ecocyc.org –curation of entries genome Reactome –curated biological pathways: http://www.reactome.org/ GenMAPP –pathways contributed by users

27 http://www.genome.ad.jp/kegg Different pathway in different species: -> comparison

28 Pathway in Reactome

29 Example of a pathway in BioCyc

30 Protein-protein interaction databases Protein-protein interaction databases store pairwise interactions or complexes Can get 1 to more than 20,000 interactions per publication IntAct http://www.ebi.ac.uk/intact DIP (Database of Interacting Proteins) http://dip.doe- mbi.ucla.edu/ BIND (Biomolecular Interaction Network Database) http://submit.bind.ca:8080/bind/

31 Protein-protein interactions in IntAct

32 Integrated functional interactions in STRING

33 Genome browsers Integrate sequence & functional data for a genome Ensembl –genome browser for major eukaryotic genomes, e.g. human, mouse etc. http://www.ensembl.org UCSC browser -http://genome.ucsc.edu/ FlyBase –Drosophila genome database: http://www.ebi.ac.uk/flybase WormBase –C. elegans: http://www.wormbase.org PlasmoDB –Plasmodium (malaria): http://plasmodb.org Etc.

34 Ensembl genome browser

35 Ensembl gene view 1

36 Ensembl gene view 2

37 Gene within context on chromosome

38 Human genetics databases GeneCards (http://www.genecards.org/) HapMap (http://hapmap.ncbi.nlm.nih.gov/) OMIM http://www.ncbi.nlm.nih.gov/omim HGDP Human Genome Diversity Project (http://hagsc.org/hgdp/files.html)

39 Most of the databases are disease or gene centric i.e. p53 Mutation/polymorphism databases

40 dbSNP http://www.ncbi.nlm.nih.gov/SNP/ Repository of all known mutation (human and other organisms)

41 Where to find the databases Table of addresses for major databases and tools Nucleic Acids Research Database issue January each year Nucleic Acids Research Software issue –new Expasy list of tools: http://ca.expasy.org/links.html

42 Large scale data retrieval Programmatic access to many databases MySQL access to some BioMart access –public and private FTP sites –large data downloads

43 Other tutorials http://www.ensembl.org/info/website/tutorials/ind ex.html http://www.ebi.ac.uk/training/online/ http://www.ebi.ac.uk/2can/home.html


Download ppt "Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological."

Similar presentations


Ads by Google