Human Genome
Human Genome Contents: 3200 Mb Genes: 1200 Mb Genes 48 Mb Related 1152 Mb: Pseudogenes, Gene Fragments, Introns Intergenic DNA 2000 Mb Interspersed Repeats 1400 Mb Microsatellite (short tandem repeats) 90 Mb Telomeres: End Sequences Centromeres: Single Nucleotide Polymorphisms
Chromosomes Shorter than DNA they contain Histones: DNA binding proteins Two Copies held together by centromeres Telomere: Terminal region Two humans differ by 0.1%
Donors HGP: Celera: 5 subjects (three men; two women) Opportunity advertised near labs First come; First Taken 5-10 samples for every one used No link between donor and sample Celera: 5 subjects (three men; two women) One Asian; One African-American; One Hispanic; Two Caucasians Craig Venter
Basic Technology Physical Mapping Cloning Shotgun Sequencing Computational Sequence Reassembly
STS High Resolution, Rapid, Simple 100 - 500 bp Collection of overlapping fragments Each point represented multiple times in random fragments Sequence must be known Unique in chromosome under study
Physical Mapping A set of clone fragments whose position relative to each other is known Restriction Maps: Relative locations of Restriction Sites Fluorescent in situ hybridization (FISH): Marker locations mapped by hybridizing probe to chromosomes Sequence Tagged Sites (STS): Positions of short sequences mapped by PCR or hybridization analysis of genome fragments Expressed Sequence Tags (EST): short sequences from cDNA clones
Genome cut into fragments Cloned as library in vector (red)
Hybridisation mapping:1 pick clones into a grid 2 hybridise to probe 1 3 hybridise to probe 2 4 build contigs In this case, two clones hybridised to both probes and thus they are predicted to overlap. Those hybridising to only one probe are predicted to extend out to the left or right.
Fingerprinting: Digest clones and run On gel Overlap by shared bands
Assembly of Contiguous DNA Sequence Shotgun Approach Contigs: Result of joining overlapping sequences Scaffold: Result of connecting contigs by filling in gaps BAC: Bacteria artificial chromosome vector: Inserts 100 - 200 kbs
Regional mapping
Regional mapping
Regional mapping Minimal tiling path selected for sequencing.
Restriction fragment fingerprinting Molecular weight marker every 5th lane Restriction fragment fingerprinting >20 kbp ~300 bp - BAC clones are grown in 96-well format - Hind III digest - 1% agarose
Contig assembly FPC* Overlap identification by restriction pattern similarities Facilitated contig assembly *Sanger Centre C. Soderlund, I Longden and R. Mott Clone A B C D E F G * All restriction fragments within a clone selected for the tiling path must be verified by their presence in overlapping clones. : insert fragments : vector fragments
BCM- HGSC
Shotgun Sequencing I :RANDOM PHASE Sheared DNA: 1.0-2.0 kb Bac Clone: 100-200 kb Random Reads Sequencing Templates: BCM- HGSC
Shotgun Sequencing II:ASSEMBLY Low Base Quality Single Stranded Region Mis-Assembly (Inverted) Sequence Gap Consensus BCM- HGSC
Shotgun Sequencing III: FINISHING Low Base Quality Single Stranded Region Mis-Assembly (Inverted) Sequence Gap Consensus BCM- HGSC
Shotgun Sequencing III: FINISHING Single Stranded Region Mis-Assembly (Inverted) Sequence Gap Consensus BCM- HGSC
Shotgun Sequencing III: FINISHING Mis-Assembly (Inverted) Sequence Gap Consensus BCM- HGSC
Shotgun Sequencing III: FINISHING Mis-Assembly (Inverted) Consensus BCM- HGSC
Shotgun Sequencing III: FINISHING High Accuracy Sequence: < 1 error/ 10,000 bases BCM- HGSC
Whole Genome Shotgun Sequencing Sheared DNA: 1.0-2.0 kb Whole Genome: 3,000 Mb Random Reads Sequencing Templates: BCM- HGSC
Whole Genome Shotgun Sequencing:Assembly Low Base Quality Single Stranded Region Mis-Assembly (Inverted) Sequence Gap Consensus BCM- HGSC
Whole Genome Shotgun Sequencing:Assembly Sequence Gap Low Base Quality Consensus BCM- HGSC
Random fragmentation of genome produces good sampling of its sequence space. Overlaps are identified, and subassembly of sequence takes place after cloning into universal vector.
Digested into Random Fragments
Cloned into Vector
Sequenced from know ends of plasmid (vector)
Assembled into contigs Assembled into contigs. Gaps and single-stranded regions identified for further study. Targeted for new sequencing. Double-Barreled: Both Strands.
In the gaps:
Whole-Genome Shotgun Sequencing Speed-up: Assembled Correctly? Avoid up-front mapping Huge amount of computer time to identify overlaps Have to reference a map Repeats are a problem: Leave out sequence between repeats Missing Reference End Sequence means Error
HGP Isolate large fragments in BACs with framework of landmark-based physical map Sequence on clone-by-clone basis Time-Consuming subcloning of random fragments and physical mapping
Sequence Reassembly Phrap Shortest Covering Superstring Map Assembly Overlap: Finding overlapping fragments Layout: ordering fragments Consensus: Sequences from layout