Presentation on theme: "Genomic Duplications, Structural Variation and Disease Evan Eichler Howard Hughes Medical Institute University of Washington April 3 rd,2006, Frontiers."— Presentation transcript:
Genomic Duplications, Structural Variation and Disease Evan Eichler Howard Hughes Medical Institute University of Washington April 3 rd,2006, Frontiers in Genomics
Genomic Variation Single base-pair changes – point mutations Small insertions/deletions– frameshift, microsatellite, minisatellite Mobile elements—retroelement insertions (300bp -10 kb in size) Large-scale genomic variation (>10 kb) – Large-scale Deletions – Segmental Duplications Chromosomal variation—translocations, inversions, fusions. Mutational mechanisms underlying genetic variation? Cytogenetics Sequence
Global Analysis of Segmental Duplications Approaches: Computational a) Whole genome assembly comparison b) Whole genome shotgun sequence detection strategies Experimental Comparative sequence analysis, array comparative genomic hybridization, comparative FISH Interchromosomal Intrachromosomal Segmental Duplications Question: What is the organization, mechanism and impact of recent human segmental duplications? >90% and > 1kb in length
Human Chimpanzee 24.8 Mb+ new 6.6 Mb+ shared 21.7 Mb+ new 7.2 Mb+ shared 16.0 Mb+ shared Chimp hyperexpansion Polymorphism 15-20% Summary: Segmental Duplication Asymmetry 76.3 Mb of Differentially Duplicated Euchromatic Material
Hyperexpansion of a Chimpanzee Segmental Duplication. 4>>>>>400 copies Cheng, Z et al., (2005), Nature
Human Segmental Duplications Properties Large (>10 kb) Recent (>95% identity) Interspersed (60% are separated by more than 1 Mb) Modular (duplicon architecture) ~389 acceptor regions 2.7% Genetic Difference, human vs. chimpanzee What impact in terms of human variation?
Models of Disease Rare Duplication-mediated Structural Variation Common Fine-Scale Structural Variation Rare Duplication-Mediated Structural Variation
Genomic Disorders TEL ABC ABC Aberrant Recombination TEL ABC ABC Human Disease GAMETES Triplosensitive, Haploinsufficient and Imprinted Genes Hypothesis: Mechanism underlying Uncharacterized Mental Retardation?
130 candidate regions (298 Mb) 23 associated with genetic disease Target patients array CGH Duplication Map of Human Genome Bailey et al. (2002), Science:293:1003-1007
Array Comparative Genomic Hybridization High-throughput detection of large-scale variation (>50 kb), LCV or CNP= Deletions and Duplications (Iafrate et al., 2004; Sebat et al., 2004). 12 mm Array of Human BAC Clones Hybridization Normal Human DNA Sample Disease individual DNA Sample Merge Cy3 Channel Cy5 Channel
Duplication Microarrary: Experimental Design TEL BACs 130 regions of the human genome 2178 BACs or on average ~10-12 BACs per region Perform ArrayCGH—reciprocal dye swap experiments Strategy: Identify normal variation and then search for variation only observed in disease patients dist: >50 kb<5 Mb prop: 95% identity, 10 kb
~3.0 Mb deletion observed in IMR26 (=common VCF 22q11 deletion) VCF Deletion detected in IMR26
CNP detected by Seg Dup array and Iafrate et al. CNPs detected by Seg Dup array in HapMap samples Novel ~2.5Mb deletion only observed in IMR Novel LCV/CNP Detected in IMR43 Sharp et al., unpublished
Variation in IMR 7/9 events are de novo New Genomic Disorder Candidates 23 (n=31 patients) novel sites of variation defined by >2 BACs 291 IMR samples (Oxford Cohort) screened to date 5 are seen in more than one unrelated patient
Problems: Array CGH has a lower limit to detect deletions (~30 kb) Oligo-based approaches effectively sample a small fraction of the genome and extrapolate size indirectly 2. Neither can identify subtle (5-30 kb) variation 3. Neither approach can detect inversions. 1.Precise location of the rearrangement is unknown. 4. Location and structure of the change unknown
Models of Disease Rare Duplication-mediated Structural Variation Common Fine-Scale Structural Variation
Comparing Human Genomes by Paired-End Sequence ~1.1 million fosmid paired-ends were sequenced by MIT to facilitate gap closure during final phases of HGP Fosmid insert size tightly distributed around mean (40 +/- 2.6 kb), low copy=stability; capillary sequencing=low mispairing rate Derived from a single female donor PDR cell line Approach: optimal placement of fosmid ends against human genome could theoretically detect rearrangements: Inversions << Insertion >< Deletion >< Concordant >< Build35 Fosmid Dataset: 1,122,408 fosmid pairs preprocessed (15.5X genome coverage) 639,204 fosmid pairs BEST pairs (8.8 X genome coverage)
< 32 kb Putative Insertion >48 kb Putative Deletion discordant by orientation (yellow/gold) discordant size (red) duplication track a) Insertion Deletion Inversion b) c) Structural polymorphisms? Genome-wide Detection of Structural Variation (>8kb)
GSTM1 ~ 20 kb deletion minspread 28 kb (9 fosmids) 50% of Caucasians/Saudis are -/- for 18 kb gene (predisposition to cancer) +++ ultrarapid GSTM1 activity Validated Structural Polymorphisms CYP2D6 ~ 5-10 kb insertion Minspread 17 kb (7 fosmids) Alternate haplotype support 1-29% Caucasians/Japanese have multiple copies (entire gene ~5 kb) Associated with resistance to antipsychotic tricyclic antidepressants GSTM1 CYP2D6
Summary: 6/16 of common polymorphisms detected Tuzun et al. (2005) Nat. Genet
Fine-Scale Structural Variation Map: ( build35 vs. Fosmids ) 1.3% Discordant Fosmids Identify 295 clusters (2 or more) 246 supported by second haplotype 147 inserts, 93 deletions, 57 inverts 18 putative L1 events—10 deletions and 8 insertions (6 kb insertion) 89 locate within gene regions. 138 unique regions of the genome 159 duplicated regions of the genome “Heterochromatic” regions Deletion Inversions Insertion(Fosmid) “Duplicated” regions
PCR Breakpoint Genotyping Assays for Structural Variation Tested 11 structural variants (5 insertions, 4 deletions, and 2 inversions) 7 successful assays (6 >20% minor allele frequency)
Illumina Golden-Gate Genotyping Assays for Structural Variation
CEPH Yoruba Japanese and Chinese Human Genome Structural Variation Project 2 scientific meetings (2005) 2 working groups (AHG, MSWG (12/05) Coordinating Committee (1/06) NIH Council (2/06) Press Release (3/15/06) Goal: Complete Characterization of Structural Variation in 48 HapMap Samples
Summary Humans relatively unique in size, proportion and architecture of interspersed segmental duplications Large-Scale Variation Normals: Identified 257 CNPs using a targeted microarray to duplicated regions IMR: Identified 23 sites (>2 BACs) unique to patients (n=291 probands) (5 are recurrent and 7 are confirmed de novo) Novel Genomic Disorders Fine-Scale Variation: Developed an approach to map and sequence common fine-scale variation within the human Population, estimate ~200-300 differences > 8 kb between 2 individuals.
Models of Human “Genetic” Disease 1) Simple Mendelian -- one gene-one disease, familial, highly penetrant, small fraction of pop. Eg. cystic fibrosis 2) Chromosome Disease – large chromosomal regions, non-familial, sporadic, relatively high frequency Eg. Turner Syndrome 3) Genomic Disease – familial and/or recurrent, deletion or duplication of large # of genes, dosage effects. Eg. Prader-Willi Syndrome. 4) Complex Traits-- multiple genes plus environment, familial, variably penetrant, large fraction of population, susceptibility genes eg. hypertension.
Acknowledgements CWRU/UChicago Stuart Schwartz Laurie Christ Eichler Lab Eray Tuzun Andy Sharp Devin Locke Matthew Johnson Zhaoshi Jiang Jon Bleyhl Sean McGrath Tera Newman Jeff Bailey Anne Morrison Lisa Pertz Ze Cheng Xinwei She James Sprague UWGSC Maynard Olson Rajinder Kaul Hillary Hayden Eric Haugen Agencourt Doug Smith Oxford Jonathan Flint Samantha Knight NHGRI Jim Mullikin UCSF Dan Pinkel Donna Albertson UW Debbie Nickerson Mark Rieder Chris Carlson Josh Smith
Kaul et al, unpublished Sequence of Traversing Fosmid Fills Gaps
Singleton Fosmids Extend into Gaps Kaul et al, unpublished
Fosmid Pairs that fail to Map to build35 4773 fosmid paired-end sequences fail to map to build 35. –1613 have 150 bp >Q30 at either end and have >100 bp unique seq 1416 of these have no hit to HTGS BAC sequence 1503 BLAST hit chimpanzee WGS but only 403 within chimp assembly Estimate that represents ~10-20 Mb. 1503 of these selected for fingerprinting (4 enzymes). Four independent restriction enzymes (EcoR I, Hind III, Bgl II and Nsi I ) Contigs constructed from 1376 clones (95% success rate) using Composite Mutual Overlap Statistic (CMOS)
FISH Summary of Orphan Fosmids 52 contigs tested by FISH 15 subtelomeric, 5 acrocentric and 5 pericentromeric 22 interstitial euchromatin (9 corresponding to known gaps) 10 contigs =no signals observed against 2 individuals (6/10 largest)