Human Genome Project. Basic Strategy How to determine the sequence of the roughly 3 billion base pairs of the human genome. Started in 1995. Various side.

Slides:



Advertisements
Similar presentations
The Human Genome Project
Advertisements

Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.
Sequencing a genome. Definition Determining the identity and order of nucleotides in the genetic material – usually DNA, sometimes RNA, of an organism.
Recombinant DNA Technology
Genomics & Proteomics What is genomics? GOALS of Genomics
9 Genomics and Beyond Brief Chapter Outline
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. CHAPTER 18 LECTURE SLIDES.
Physical Mapping I CIS 667 February 26, Physical Mapping A physical map of a piece of DNA tells us the location of certain markers  A marker is.
CHAPTER 15 Microbial Genomics Genomic Cloning Techniques Vectors for Genomic Cloning and Sequencing MS2, RNA virus nt sequenced in 1976 X17, ssDNA.
DNA Sequencing and Gene Analysis
16 and 20 February, 2004 Chapter 9 Genomics Mapping and characterizing whole genomes.
CS273a Lecture 2, Autumn 10, Batzoglou DNA Sequencing (cont.)
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
Goals of the Human Genome Project determine the entire sequence of human DNA identify all the genes in human DNA store this information in databases improve.
Genome Analysis Determine locus & sequence of all the organism’s genes More than 100 genomes have been analysed including humans in the Human Genome Project.
Today’s Lecture Genetic mapping studies: two approaches
Reading the Blueprint of Life
Fine Structure and Analysis of Eukaryotic Genes
DNA Technology and Genomics
20.1 – 1 Look at the illustration of “Cloning a Human Gene in a Bacterial Plasmid” (Figure 20.4 in the orange book). If the medium used for plating cells.
Presentation on genome sequencing. Genome: the complete set of gene of an organism Genome annotation: the process by which the genes, control sequences.
From Haystacks to Needles AP Biology Fall Isolating Genes  Gene library: a collection of bacteria that house different cloned DNA fragments, one.
Genomics Chapter 18.
HAPLOID GENOME SIZES (DNA PER HAPLOID CELL) Size rangeExample speciesEx. Size BACTERIA1-10 Mb E. coli: Mb FUNGI10-40 Mb S. cerevisiae 13 Mb INSECTS.
Mouse Genome Sequencing
AP Biology Ch. 20 Biotechnology.
20.1 – 1 Look at the illustration of “Cloning a Human Gene in a Bacterial Plasmid” (Figure 20.4 in the orange book). If the medium used for plating cells.
Unit 4 Vocabulary Review. Nucleic Acids Organic molecules that serve as the blueprint for proteins and, through the action of proteins, for all cellular.
Human Genome Project by: Amanda Mosello. What is the Human Genome Project? created in 1990, by the National Institutes of Health and the US Department.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
DNA Technology Chapter 20.
Genomics BIT 220 Chapter 21.
Fig Chapter 12: Genomics. Genomics: the study of whole-genome structure, organization, and function Structural genomics: the physical genome; whole.
20.1 Structural Genomics Determines the DNA Sequences of Entire Genomes The ultimate goal of genomic research: determining the ordered nucleotide sequences.
Biological Motivation for Fragment Assembly Rhys Price Jones Anne R. Haake.
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Chapter 21 Eukaryotic Genome Sequences
Recombinant DNA Technology and Genomics A.Overview: B.Creating a DNA Library C.Recover the clone of interest D.Analyzing/characterizing the DNA - create.
Linkage and Mapping. Figure 4-8 For linked genes, recombinant frequencies are less than 50 percent.
PHYSICAL MAPPING AND POSITIONAL CLONING. Linkage mapping – Flanking markers identified – 1cM, for example Probably ~ 1 MB or more in humans Need very.
Genomics and Forensics
Chapter 5 The Content of the Genome 5.1 Introduction genome – The complete set of sequences in the genetic material of an organism. –It includes the.
ABC for the AEA Basic biological concepts for genetic epidemiology Martin Kennedy Department of Pathology Christchurch School of Medicine.
Human Genome.
Chapter 11: Functional genomics
Chapter 2 From Genes to Genomes. 2.1 Introduction We can think about mapping genes and genomes at several levels of resolution: A genetic (or linkage)
1 From Mendel to Genomics Historically –Identify or create mutations, follow inheritance –Determine linkage, create maps Now: Genomics –Not just a gene,
DNA Technology and Genomics
Chapter 3 The Interrupted Gene.
Genomics Chapter 18.
Genomics A Systematic Study of the Locations, Functions and Interactions of Many Genes at Once.
Plan A Topics? 1.Making a probiotic strain of E.coli that destroys oxalate to help treat kidney stones in collaboration with Dr. Lucent and Dr. VanWert.
Gene Technologies and Human ApplicationsSection 3 Section 3: Gene Technologies in Detail Preview Bellringer Key Ideas Basic Tools for Genetic Manipulation.
Genome Analysis. This involves finding out the: order of the bases in the DNA location of genes parts of the DNA that controls the activity of the genes.
Chapter 14 GENETIC TECHNOLOGY. A. Manipulation and Modification of DNA 1. Restriction Enzymes Recognize specific sequences of DNA (usually palindromes)
GENOME ORGANIZATION AS REVEALED BY GENOME MAPPING WHY MAP GENOMES? HOW TO MAP GENOMES?
Looking Within Human Genome King abdulaziz university Dr. Nisreen R Tashkandy GENOMICS ; THE PIG PICTURE.
Genomics Chapter Mapping Genomes Maps of genomes can be divided into 2 types -Genetic maps -Abstract maps that place the relative location of genes.
Human Genome Project.
Genomics A Systematic Study of the Locations, Functions and Interactions of Many Genes at Once.
Chapter 5 The Content of the Genome
Today’s Lecture Genetic mapping studies: two approaches
Peter John M.Phil, PhD Atta-ur-Rahman School of Applied Biosciences (ASAB) National University of Sciences & Technology (NUST)
Today… Review a few items from last class
Genomes and Their Evolution
Genomics Genetic Analysis on a Genome-wide (global) scale
Introduction to Bioinformatics II
New Class Offering.
From Mendel to Genomics
Presentation transcript:

Human Genome Project

Basic Strategy How to determine the sequence of the roughly 3 billion base pairs of the human genome. Started in Various side projects: genetic diseases, variations between individuals, ethnic variation, comparison to other species. Strategy: –1. physical map relating specific DNA markers to the proper chromosomal position. –2. Overlapping set of cloned DNAs (contigs) –3. sequencing and assembly –4. finding the genes in the sequence –5. annotation of gene function

Physical Maps A genetic map uses recombination, crossing over during meiosis, to determine how frequently two genes (or markers) are inherited together. A physical map determines where a given DNA marker is located on the DNA of the chromosome. Genetic and physical maps are (supposed to be) colinear—all the genes appear in the same order in both maps. But, distances are quite different: there is very little recombination in the centromeres, so large DNA distances are very short recombination distances. Genetic maps using microsatellite (SSR) markers were used to develop physical maps: the appropriate SSR sites were expected to be found on the corresponding cloned DNA.

Sequence Tagged Sites a sequence tagged site (STS) is a short sequence that is unique in the genome. You obtain the sequence information from cloned DNA, and then locate it in the genome. Using PCR it is then possible to determine whether your STS is present in any other clone or cell line. Obtaining STS: sequencing the ends of large cloned DNAs (BACs or YACs, for example). Uniqueness: use the cloned DNA from the STS as a probe on a Southern blot of genomic DNA: if the STS is unique, only 1 band will hybridize. Repetitive DNA is very common in the human genome, and many DNA sequences are not unique. A good source of unique DNA is EST clones: cDNA made from messenger RNA. Size: a DNA sequencing run will usually give bp of good, reliable sequence information. On the other hand, consider the size for the genome: 3 x 10 9 bp. Each base is one of 4 choices, so a 16 bp sequence will appear about once in 4.3 x 10 9 bp. In practice, 20 bp is about the minimum size for good PCR amplification, and 24 bp is about the minimum that will give a good BLAST hit.

Somatic Cell Hybrids Human and mouse (or hamster) cultured cells can be fused together using polyethylene glycol. –The resulting fused cell is a heterokaryon: it has 2 nuclei from different species. –If the heterokaryon undergoes mitosis, the nuclei fuse. –Human chromosomes are unstable in a mixed nucleus, and most of them are randomly lost. The mouse chromosomes all stay. –Different cell lines can be established that contain different combinations of human chromosomes –You can identify which human chromosomes remain using chromosome banding techniques. A good way to determine which chromosome a DNA sequence is on. Sometimes also for gene products or phenotypes.

Radiation Hybrids Standard somatic cell fusions contain entire human chromosomes. To locate a gene more closely, you need to use chromosome fragments. Start by irradiating human cells with a controlled dose of X-rays: chromosomes break up. Then, fuse the cells to mouse cells. The human chromosome fragments get integrated into the mouse chromosomes. Create a panel of mouse/human hybrid cell lines. –The current standard panels contain about 100 cell lines. –Each line contains about 32% of the human genome –Average size of human genome fragment = 25 kbp –More radiation = smaller fragments Mapping: the hybrid cell lines contain random human chromosome fragments, but closely linked sites are usually in the same cell line (same basic principle as recombination mapping). –Until you have located some of the markers on the chromosomes, radiation hybrid mapping only gives you information about whether any two sequences are close together on the chromosome.

Contigs A contig is a set of partially overlapping clones, a contiguous set of clones. No gaps between them. Contigs allow you to build up the sequence of the chromosome over much larger regions than any single clone. The first reasonably complete physical map of the human genome involved contigs generated by YACs (yeast artificial chromosomes). Initially, you have a collection of clones with no information about how they are ordered on the chromosome. Contigs are built up by using PCR to identify unique sequences (STS or EST) on each clone, and then looking for overlaps between the clones.

Sequencing Strategy Once a contig map of the genome was obtained, it was necessary to sequence each individual clone. Most of the actual human genome sequencing was done on BAC clones, which are less prone to rearrangement than YAC clones. BACs are about kbp long. Large clones are generally sequenced by shotgun sequencing: The large cloned DNA is randomly broken up into a series of small fragments ( less than 1 kb). These fragments are cloned and sequenced. A computer program then assembles them based on overlaps between the sequences of each clone. To ensure that every bit has been covered, you need to sequence random clones until you have covered each spot 5-10 times on average.

Whole Genome Shotgun Sequencing Why bother with creating a large scale physical map: all that YAC and BAC cloning, radiation hybrids, STS comparisons, etc? Why not just fragment the whole genome into 1 kb pieces, sequence them all, and let the computer assemble the whole genome? In practice, the genome is cloned into large fragments first, and then each large fragment is broken up for shotgun sequencing. But, the large fragments are not ordered: no physical map or set of contigs is created. Requires a lot of overlapping coverage Also requires good software. Very successful for prokaryotic genomes (10 Mbp or less). –but the human genome is 300 times larger Big problem: repeat sequence DNA, which is everywhere, and especially near the centromere. To find overlaps between clones, you need unique regions. It remains unclear whether whole genome shotgun sequencing will work if there is no other information available to provide order. It has not been widely adopted for eukaryotic projects (so far).

Gene Detection the best evidence that a given DNA sequence is expressed is to find an EST (cDNA copy of mRNA) that matches it. Large numbers of EST libraries have been constructed and sequenced. –The primary result of this was to determine that many genes have several different intron slicing patterns: sequences are exons in some tissues but introns in others. Homology searches, using BLAST, are a good way to find genes. If a DNA sequence closely matches a sequence from another organism, it has been evolutionarily conserved, and that usually means that it is an expressed gene. Exon prediction: exons need to be open reading frames (no stop codons), and they display patterns of nucleotide usage different from random DNA. Several different programs exist, and they give somewhat varying results. “Hypothetical genes” are genes whose existence has been predicted by computer but which lacks any experimental or cross-species data to confirm it. –a “conserved hypothetical gene” is a sequence that matches other species even though there is no EST or other experimental evidence for its expression

Gene Annotation Computer predictions of gene function are mediocre at best. Humans, especially those who are experts in the field, do a much better job of evaluating evidence and deciding what a given gene’s function is. There is a big problem of too much information not uniformly coded or maintained. The scientific literature contains numerous examples of the same gene or protein with several different names, and getting common definitions of functions is even harder. To counter this, the Gene Ontology Consortium (GO) has created a controlled vocabulary of about 11,000 terms. Every gene product (protein) can be annotated into three general categories: –molecular function: what the protein actually does, such as “kinase activity” –biological process: what cellular process the protein participates in, such as “signal transduction” –cellular component: where the protein is found in the cell, such as “integral to the plasma membrane” Each gene product can have multiple descriptive terms. The terms are hierarchical: more specific terms are contained within less specific terms. But, a given term can have more than one parent and more than one child term.

GO Example