Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Human Genome Project Main reference: Nature (2001) 409, 860-921

Similar presentations

Presentation on theme: "The Human Genome Project Main reference: Nature (2001) 409, 860-921"— Presentation transcript:

1 The Human Genome Project Main reference: Nature (2001) 409, Whole issue also available from Nature Genome Gateway Describes the publicly funded project; Celeras private HGP published in Science

2 Main points Basic genome statistics Genome browsers e.g. UCSC, EnsemblUCSCEnsembl Genomic landscape Repeated DNA as a fossil record Number of genes Polymorphism Applications

3 The Strategy The genome sequence was a multinational collaboration involving 100s of scientists, millions of dollars, many countries The strategy was top-down using methods developed on small genomes (e.g. yeast) Figure 2 in the Nature paper

4 Genome statistics Total size = 3290 Mb 212 Mb of heterochromatin Chromosomes range from 279 Mb (#1) to 45 Mb (#21) (fig 9, table 8 in paper) Total raw sequence 23,000 Mb Number of genes = about 31,000 About 30% of the genome is transcribed About 1.5% of the genome is protein coding

5 Repeat DNA fossils Genomes are full of repeated DNA sequences of various kinds (table 11/12) Each type of repeat has a single origin and has replicated many times within the genome, transposing to new sites and accumulating mutations By comparing copies of the repeat to see how much they have diverged, can get an idea of how old repeat is (fig 18)

6 Humans versus worms and flies Humans have only about twice as many genes as worms or flies (table 23) But human genes are subject to more alternative splicing (60% vs 22%; average 3 different transcripts per gene) So humans probably have about 5 times as many proteins as worms or flies Complexity is not proportional to numbers of genes or proteins, but to the number of interactions they can have

7 Index of human genes and proteins 3 basic methods to predict genes from the genomic DNA: Comparison with ESTs, mRNAs Homology with other known genes/proteins Purely computational methods based on Hidden Markov Models (HMMs) Started with predictions by Ensembl, combined with other information…..

8 The Human Proteome Key database is InterPro, which combines information on all known protein domainsInterPro Only 94 of the 1262 InterPro types (7%) are vertebrate-specific - so most domains are older than common ancestor of all animals - new ones are not invented very often Many of these are concerned with defence/immunity and the nervous system Most novelty is generated by new protein architectures, combining old domains in new ways (fig 42/45)

9 Genome History Mouse and human diverged about 100Mya, so there is 200My of evolution between them Chromosome translocations are involved in the formation of new species By comparing locations in the genome of homologous genes, can define regions of synteny (fig 46) Breakage seems to occur randomly, but tends to be in gene-poor regions No convincing evidence for whole-genome duplications

10 Polymorphism More than a million SNPs (single nucleotide polymorphisms were found Average 1 SNP per 1.9kb or 15 SNPs per gene Combinations of closely linked SNP alleles form haplotypes Not all possible haplotypes are found in population - e.g about 4-5 per gene (theoretically could have 2 15 = about 32000) HapMap – the haplotype mapping projectHapMap A paper (Trends in Genetics) on the subject of haplotype blockspaper

11 Applications in medicine Having the genome sequence, and databases of genes, makes it much easier to find disease genes by positional cloning (e.g. BRCA2 for breast cancer) Sequence reveals new drug targets: e.g. a new type of serotonin receptor, predicted from sequence, shown to be a candidate for treating mood disorders and schizophrenia

12 Latest - the Y chromosome Nature paper

Download ppt "The Human Genome Project Main reference: Nature (2001) 409, 860-921"

Similar presentations

Ads by Google