Presentation is loading. Please wait.

Presentation is loading. Please wait.

Genome Related Biological Databases. Content DNA Sequence databases Protein databases Gene prediction Accession numbers NCBI website Ensembl website.

Similar presentations


Presentation on theme: "Genome Related Biological Databases. Content DNA Sequence databases Protein databases Gene prediction Accession numbers NCBI website Ensembl website."— Presentation transcript:

1 Genome Related Biological Databases

2 Content DNA Sequence databases Protein databases Gene prediction Accession numbers NCBI website Ensembl website

3 Nucleotide databases GenBankEMBL DDBJ Housed at EBI European Bioinformatics Institute www.ebi.ac.uk/embl/ Housed at NCBI National Center for Biotechnology Information www.ncbi.nlm.nih.gov/Genbank/ Housed in Japan www.ddbj.nig.ac.jp/ Welcome-e.html The underlying raw DNA sequences are identical

4 >100,000 species are represented in GenBank all species 196,538 viruses 5,214 bacteria 14,258 archaea 500 eukaryota 171,843

5

6 NCBI nucleotide databases GenBank Individual submissions Bulk submissions (Genome centers) High throughput sequencing (DNA) Expressed Sequence Tags (mRNA) RefSeq Curated subset of GenBank “Reference” sequence Single sequence per locus / molecule

7 Protein databases NCBI RefSeq and Protein EBI Swiss-Prot, PIR and TrEMBL → UniProt Translated from nucleotide sequence Curated Combined

8 UniProt versus GenBank and RefSeq UniProt Produced by SIB, EBI & Georgetown U. Protein data only Curated in SwissProt, not in TrEMBL GenBank/RefSeq Produced by INSDC and NCBI Protein and nucleotide data Curated in RefSeq, not in GenBank

9 Accession numbers Label to unambiguously identify a sequence Examples (all for retinol-binding protein, RBP4): protein DNA RNA X02775GenBank genomic DNA sequence NT_030059Genomic contig Rs7079946dbSNP (single nucleotide polymorphism) RBP4HUGO genenames N91759.1An expressed sequence tag (1 of 170) NM_006744RefSeq DNA sequence (from a transcript) NP_007635RefSeq protein AAC02945GenBank protein Q28369UniProt protein 1KT7Protein Data Bank structure record

10 From Sequence to Genes Gene prediction Extrinsic Search for genes based on observed mRNA / Protein sequences UniGene Ab initio Predict genes based on genomic sequence alone Promoter sequence Poly(A) tail binding sites, CpG islands, splicing sites

11 UniGene Predict genes based on ESTs EST: DNA sequence corresponding to mRNA from expressed gene ~500 base pairs long Sequenced from a cDNA library Cluster ESTs from many cDNA libraries to predict distinct genes

12 EST clusters This is a gene with 1 EST associated; the cluster size is 1 This is a gene with 10 ESTs associated; the cluster size is 10

13 Likely to be real genes

14 Gene databases Ensembl (EBI) Automatic annotation: mRNA and protein sequence Curated annotation: Vega project Entrez Gene (NCBI) Links RefSeq sequences to external annotations

15 Web sites for biological databases NCBIwww.ncbi.nlm.nih.govwww.ncbi.nlm.nih.gov EBIwww.ebi.ac.ukwww.ebi.ac.uk ENSEMBL www.ensembl.org (= at EBI)www.ensembl.org

16 NCBI website

17

18

19

20

21

22

23

24

25

26 PubMed

27 Ensembl website

28 Ensembl structure Gene: ENSG… Transcript: ENST… Protein: ENSP…

29 Ensembl search

30 OTTHUMGXXX (Curated) ENSGXXXX (Predicted)

31 Vega gene page

32 Ensembl gene page

33

34 Ensembl transcript page

35 Ensembl protein page


Download ppt "Genome Related Biological Databases. Content DNA Sequence databases Protein databases Gene prediction Accession numbers NCBI website Ensembl website."

Similar presentations


Ads by Google