Download presentation
Presentation is loading. Please wait.
1
GENBANK, SWISSPROT AND OTHERS As Problem Sources for CSE 549 Andriy Tovkach Genetics
2
GENBANK OVERVIEW Consists of EMBL, NCBI and DDBJ Started 10 years ago Exponential growth (graph)graph On Saturday, the 7 th – 20.2 billion bases
3
FILE FORMAT Header Features Sequence (see files)see files
4
FASTA FORMAT Single line description begins with > Followed by sequence data Can be both protein or DNA
5
ENTREZ as RETRIEVAL SYSTEM PubMed – 12 million citations from life science journals Nucleotide – collection of DNA sequences Nucleotide Protein – protein sequences from SwissProt Protein Genome – genomes of over 800 organisms Also Structure, PopSet, Taxonomy, OMIM
6
PROTEIN DATABASES SWISS-PROT SWISS-PROT EBI – TREMBL NCBI – GENPEPT (already in history)GENPEPT
7
GENOME DATABASES SGD: homepage example 1.1 example 1.2 Wormbase Wormbase Ensembl Human Genome Browser Ensembl Human Genome Browser
8
CONCLUSIONS Sequencing projects produce a lot of data These data have at least to be structured in the databases Ideally all sequences need high-quality human annotation That’s why computer scientists are welcome in biology
9
LITERATURE Genebank presentation by Manpreet Katari (CSE 549, Fall 2000) Thomas Lengauer (Ed.) Bioinformatics – From Genomes to Drugs Entrez website Google
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.