Presentation is loading. Please wait.

Presentation is loading. Please wait.

© Wiley Publishing. 2007. All Rights Reserved. Using Nucleotide Sequence Databases.

Similar presentations

Presentation on theme: "© Wiley Publishing. 2007. All Rights Reserved. Using Nucleotide Sequence Databases."— Presentation transcript:

1 © Wiley Publishing. 2007. All Rights Reserved. Using Nucleotide Sequence Databases

2 Learning Objectives Distinguish the structure of eukaryotic and prokaryotic genes Make sense of a GenBank entry Understand the difference between GenBank and a gene-centric resource Browse whole-genome databases

3 Outline 1.Reminder on genes and genomes 2.Searching GenBank (the DNA database) 3.Using gene-centric databases 4.Analyzing microbial genomes 5.Browsing the human genome

4 Typical Prokaryotic Genome Prokaryotes are microscopic organisms They have a circular genome Its length is a few million Bp (0.6 – 10 Mb) Prokaryotes have about 1 gene per Kb 70 % of their genome is coding for proteins Their genes do not overlap

5 Typical Prokaryotic Protein- Coding Gene The gene has an uninterrupted sequence Prokaryotic mRNA contains The Ribosome Binding Site (RBS) The Open Reading Frame (ORF) in one piece In operons, the RNA can contain several ORFs

6 Typical Eukaryotic Genome Eukaryotes can be small (yeast) or big (whales) Genomes are made of linear pieces of DNA called chromosomes One chromosome: 10 to 700 Mb The Human Genome Contains 22+1 chromosomes Is 3 Gb long One gene every 100 Kb (human) 5 % of the genome is coding for proteins

7 Typical Eukaryotic Protein- Coding Gene The coding sequences are made of coding exons separated by introns Introns are spliced out and exons glued together to make the ORF One gene can code for several alternative proteins: alternative splicing

8 Prokaryotes vs. Eukaryotes Prokaryotes Genome=one large circular chromosome + a few small circular chromosomes (plasmides) 0.5 to 8 Mb / chromosome Genes in one piece 70% of the genome is coding 1 gene / Kb Eukaryotes Genome= many large linear chromosomes 10 to 700 Mb / chromosome Genes split 5% of the genome is coding 1 gene/ 100 Kb (Human)

9 GenBank Housed by the National Center for Bioitechnologies (NCBI) GenBank is the memory of biological science Contains EVERY DNA sequence ever published GenBank is the original information source for most biological databases GenBank is more complicated to use than gene- centric databases

10 Reading a Prokaryotic GeneBank Entry ACCESSION is the accession number Unique to each entry Permanent LOCUS contains information on gene size ORGANISM Defines the organism containing the gene REFERENCE indicates who produced the sequence FEATURES lists some functional features of the gene GenBank entries can contain more than one gene

11 FEATURE section of a GenBank Entry Promoter Gives the precise coordinates of the promoter There can be more than one promoter RBS gives the coordinates of the Ribosome Binding Site CDS gives all the properties of the CoDing sequence that codes for the protein

12 Reading a Eukaryotic GeneBank Entry The sections are the same as in a prokaryotic entry SOURCE contains a map section that indicates the chromosome containing the gene GENE introduces indications to reconstruct the CDS from the gene Remember: Eukaryotic genes are interrupted by introns

13 Assembling CDSs from a GenBank Entry The gene, mRNA, and CDS sections tell you which segments of which entry must be joined to reconstruct the gene, the mRNA, or the CDS

14 Assembling CDSs from a GenBank Entry A gene can code for several alternative mRNAs Example: The dUTPase Gene codes for Mitochondrial dUTPase Nuclear dUTPase

15 Limitations of GenBank GenBank entries can contain Entire genes Portions of genes Many genes GenBank entries can be of uneven quality Can be duplicates and/or inaccurate The database is not a selection center All data is treated equally GenBank entries are not the final word on particular genes They have no authoritative biological meaning They merely keep track of what was done Gene-centric databases are needed to compile everything that is known on a given gene and to correct potential errors

16 Using Gene-centric Databases: Entrez Gene Entrez Gene can be accessed from the NCBI In GenBank, each entry is one sequence from one publication In Entrez Gene, each entry is one gene Entrez Gene is built with GenBank data

17 Whole-Genome Databases The Entrez Gene genome provides access to whole-genome databases Use whole-genome sites to explore complete genomes of Viruses Prokaryotes Eukaryotes A genome browser lets you get the details or the big picture Zoom in on a precise gene Zoom out of a portion of the genome Visualize positions

18 Visualizing a Viral Genome at the NCBI Go to Select viruses on the left side Type HIV1 The browser displays a map of the virus and links to information relevant to the virus and its proteins

19 Exploring the Human Genome with ENSEMBL Accessible at ENSEMBL is a database of eukaryotic genomes Annotated entries Wide range of examples: human, mouse, dog, and so on ENSEMBL annotation is mostly automated ENSEMBL contains tools to Browse the complete genome Search the complete genome with BLAST Visualize the position of a gene Visualize all experimental information on this gene (transcripts)

20 Visualizing Human Chromosomes on ENSEMBL

21 Visualizing Human Chromosomes on ENSEMBL (cont’d.) By pointing on a chromosome region you can zoom inside the chromosome All genes are cross-indexed with databases so you can find all related experimental information

22 Going Farther The TIGR Institute: TIGR = The Institute for Genomic Research Specializes in prokaryotes The DoE Joint Genome Institute : DoE = Department of Energy (U.S. government agency) Focuses on environmentally important prokaryotes University of California at Santa Cruz: A very good alternative to ENSEMBL

Download ppt "© Wiley Publishing. 2007. All Rights Reserved. Using Nucleotide Sequence Databases."

Similar presentations

Ads by Google