Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner

Similar presentations


Presentation on theme: "The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner"— Presentation transcript:

1 The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner pevsner@jhmi.edu

2 Many of the images in this powerpoint presentation are from Bioinformatics and Functional Genomics by J Pevsner (ISBN 0-471-21004-8). Copyright © 2003 by Wiley. These images and materials may not be used without permission from the publisher. Visit http://www.bioinfbook.org Copyright notice

3 Today: Human genome Friday Nov. 7: computer lab Monday Nov. 10: Human disease (West Lecture Hall) Wednesday Nov. 12: Final exam (in class); find-a-gene project due Announcements

4 Final exam on November 12 Format: -- closed book -- one hour, in-class (ok to take longer) -- to practice, do the self-test quizzes at the ends of chapters 15-18. Some of the questions will be based on the recent article on human chromosome 6: Mungall AJ et al., The DNA sequence and analysis of human chromosome 6. Nature 425, 805-811, 23 October 2003. See also the accompanying News & Views: Grimwood J and Schmutz J, Six is seventh, Nature 425, 775-776, 23 October 2003.

5 Outline of today’s lecture 1.Summary of major findings of Human Genome Project 2. Web resources for the human genome 3.We will follow the outline of the February 2001 Nature paper describing the human genome. Page 607

6 Main conclusions of the human genome project Page 608

7 Main web sites for the human genome Genome Hub National Human Genome Research Institute (NHGRI) http://www.genome.gov/ NCBI Genome Central www.ncbi.nlm.nih.gov/genome/central Ensembl http://www.ensembl.org/genome/central/ Page 608

8 1.There are about 30,000 to 40,000 human genes. This number is far smaller than earlier estimates. Page 608 Main conclusions of human genome project

9 1.There are about 30,000 to 40,000 human genes. This number is far smaller than earlier estimates. The public consortium estimated 31,000, while Celera estimated 38,500. But note: Many predicted genes are unique to each group There are many transcripts of unknown function Current estimates (2003) are ~30,000 genes. Page 608 Main conclusions of human genome project

10 Page 608 Main conclusions of human genome project 1. We have about the same number of genes as fish and plants, and not that many more genes than worms and flies.

11 1. We have about the same number of genes as fish and plants, and not that many more genes than worms and flies. Fugu rubripes (pufferfish): 31,000 to 38,000 Arabidopsis thaliana (thale cress): 26,000 Caenorhabditis elegans (worm): 19,000 Drosophila melanogaster (fly): 13,000 Page 608 Main conclusions of human genome project

12 2. The human proteome is far more complex than the set of proteins encoded by invertebrate genomes. Page 608 Main conclusions of human genome project

13 2. The human proteome is far more complex than the set of proteins encoded by invertebrate genomes. Vertebrates have a more complex mixture of protein domain architectures. Additionally, the human genome displays greater complexity in its processing of mRNA transcripts by alternative splicing. Page 608 Main conclusions of human genome project

14 Page 608 Main conclusions of human genome project 3. Hundreds of human genes were acquired from bacteria by lateral gene transfer, according to the initial report.

15 3. Hundreds of human genes were acquired from bacteria by lateral gene transfer, according to the initial report. Evidence: compare the proteomes of human, fly, worm, yeast, Arabidopsis, eukaryotic parasites, and all completed prokaryotic genomes. Find some genes shared exclusively by humans and bacteria—but according to TIGR, only about 40 of these genes (or fewer?) were acquired by LGT. (See Salzberg et al., Science 292:1903, 2001). Reasons for artifactually high estimates include: -- gene loss -- small sample size of species Page 608 Main conclusions of human genome project

16 4. 98% of the genome does not code for genes Page 608 Main conclusions of human genome project

17 4. 98% of the genome does not code for genes >50% of the genome consists of repetitive DNA derived from transposable elements (also called interspersed repeats): LINEs (20%) SINEs (13%) LTR retrotransposons (8%) DNA transposons (3%) Page 608 Main conclusions of human genome project

18 4. 98% of the genome does not code for genes >50% of the genome consists of repetitive DNA derived from transposable elements: LINEs (20%) SINEs (13%) LTR retrotransposons (8%) DNA transposons (3%) There has been a decline in activity of some of these elements in the human lineage. Page 608 Main conclusions of human genome project

19 5. Segmental duplication is a frequent occurrence in the human genome. -- tandem duplications (rare) -- retrotransposition (intronless paralogs) -- segmental duplications (common) Page 608 Main conclusions of human genome project

20 6. There are 300,000 Alu repeats in the human genome. These are about 300 base pairs and contain an AluI restriction enzyme site. They occupy 3% of the genome. We saw an example of an Alu repeat in Chapter 16. Their distribution is non-random: they are retained in GC-rich regions and may confer some benefit. Page 608 Main conclusions of human genome project

21 7. The mutation rate is about twice as high in male meiosis than female meiosis. Most mutation probably occurs in males. Page 609 Main conclusions of human genome project

22 8. More than 1.4 million single nucleotide polymorphisms (SNPs; single base pair changes) were identified. Celera initially identified 2.1 million SNPs. Currently, dbSNP at NCBI (build 118) has about 5.8 million human SNPs (2.4 million validated). A SNP occurs every 100 to 300 base pairs. A random pair of haploid genomes differs at a rate of 1 base pair every 1250, on average (Celera). Fewer than 1% of SNPs alter protein sequence. Page 609 Main conclusions of human genome project

23 Three gateways to access the human genome Page 608

24 Three gateways to access the human genome NCBI map viewer www.ncbi.nlm.nih.gov Ensembl Project (EBI/Sanger Institute) www.ensembl.org UCSC (Golden Path) www.genome.ucsc.edu Page 609

25 Three gateways to access the human genome NCBI map viewer www.ncbi.nlm.nih.gov Ensembl Project (EBI/Sanger Institute) www.ensembl.org UCSC (Golden Path) www.genome.ucsc.edu Each of these three sites provides essential resources to study the human genome (and other genomes)

26 Fig. 17.1 Page 610 NCBI offers a human map viewer

27 Fig. 17.2 Page 611 Map viewer: RBP4 on chromosome 10 Click to customize the tracks on this map

28

29 LocusLink DNA (contig) OMIM Sequence viewer protein evidence viewer Model maker HomoloGene Confirmed gene model orientation

30 Fig. 17.3 Page 613 NCBI’s evidence viewer provides data on gene models (e.g. mapping ESTs to genomic DNA)

31 Fig. 17.3 Page 613 NCBI evidence viewer: gene structures

32 Fig. 17.3 Page 613 NCBI evidence viewer: gene structures Evidence for a discrepancy (e.g. sequencing error or polymorphism)

33 The Ensembl project currently includes genome browsers for nine organisms: Humanmousezebrafish Fugumosquitofruitfly C. elegans C. briggsaerat Visit http://www.ensembl.org Ensembl Page 610

34 Fig. 17.4 Page 614 Ensembl human genome browser

35 Fig. 17.5 Page 615 Ensembl: GeneView for RBP4

36 Fig. 17.6 Page 616 Ensembl: GeneView for RBP4

37 Fig. 17.7 Page 617 Ensembl human genome browser: ContigView

38 Fig. 17.7 Page 617 Ensembl human genome browser: ContigView

39 Fig. 17.8 Page 618 Ensembl human genome browser: TransView

40 Fig. 17.9 Page 619 Ensembl: ProteinView for RBP4

41 Fig. 17.10 Page 620 Ensembl: MapView for chromosome 10

42 Fig. 17.11 Page 621 Ensembl: SyntenyView for chromosome 10

43 The University of California at Santa Cruz (UCSC) offers a genome browser with the “golden path” annotation of the human genome. The browser features searches by keyword, gene name, or other text searches. UCSC offers the lightning fast BLAT BLAST-like tool (see Chapter 5). A key feature of this browser is its customizable annotation tracks. About half of these tracks are offered by users of the site throughout the world. Visit http://genome.ucsc.edu The UCSC human genome browser Page 614

44 Fig. 17.12 Page 622

45 Fig. 17.13 Page 623

46 This lecture continues with part 2 of 2…


Download ppt "The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner"

Similar presentations


Ads by Google