Presentation is loading. Please wait.

Presentation is loading. Please wait.

Genomic Duplications, Structural Variation and Disease Evan Eichler Howard Hughes Medical Institute University of Washington April 3 rd,2006, Frontiers.

Similar presentations

Presentation on theme: "Genomic Duplications, Structural Variation and Disease Evan Eichler Howard Hughes Medical Institute University of Washington April 3 rd,2006, Frontiers."— Presentation transcript:

1 Genomic Duplications, Structural Variation and Disease Evan Eichler Howard Hughes Medical Institute University of Washington April 3 rd,2006, Frontiers in Genomics

2 Genomic Variation Single base-pair changes – point mutations Small insertions/deletions– frameshift, microsatellite, minisatellite Mobile elements—retroelement insertions (300bp -10 kb in size) Large-scale genomic variation (>10 kb) – Large-scale Deletions – Segmental Duplications Chromosomal variation—translocations, inversions, fusions. Mutational mechanisms underlying genetic variation? Cytogenetics Sequence

3 Global Analysis of Segmental Duplications Approaches: Computational a) Whole genome assembly comparison b) Whole genome shotgun sequence detection strategies Experimental Comparative sequence analysis, array comparative genomic hybridization, comparative FISH Interchromosomal Intrachromosomal Segmental Duplications Question: What is the organization, mechanism and impact of recent human segmental duplications? >90% and > 1kb in length

4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y Total: 5.26% (150.8 Mb) Inter: 2.36% (67.6 Mb) Intra: 3.87% (111.1 Mb) Non-random distribution 5.3 fold bias to pericentromere 389 regions > 100 kb nexi “Heterochromatic” regions Duplications 100 Mb 50 Mb 150 Mb 200 Mb250 Mb 10 Mb ( build34, >90%, >1kb) Recent Duplication Architecture of the Human Genome Alpha Satellite 4p16.1 4p16.3 7q36 2p22 11p15 7q36 10q26 12q24 Xq28 4q24 22q12 12p11 11q14 21q21 2p11 (700 kb) 11q14 4p16.1

5 Human Genome Segmental Duplication Pattern chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chrX chrY ~4% duplication >20 kb, >95% ~4 average # duplicates 59.5% pairwise (> 1 Mb) She, X et al., (2004), Nature

6 1-2% duplication >20 kb, >95% 2-3 average # duplicates July 2004, mmu5 Mouse Segmental Duplication Pattern She, X in press

7 Whole-Genome Analysis (2,865 Mb) Build 34, July 2003, 25.8 K alignments Percent Identity (%) Percent Similarity of Human Segmental Duplications Sum of Aligned Bases (kb) Interchromosomal Intrachromosomal 2000 4000 6000 8000 10000 12000 5000 10000 15000 0 20000 90 90.5 91 91.5 92 92.5 93 93.5 94 94.5 95 95.5 96 96.5 97 97.5 98 98.5 99 99.5 100 0 12 My5 My25My 49 Mb

8 Human Chimpanzee 24.8 Mb+ new 6.6 Mb+ shared 21.7 Mb+ new 7.2 Mb+ shared 16.0 Mb+ shared Chimp hyperexpansion Polymorphism 15-20% Summary: Segmental Duplication Asymmetry 76.3 Mb of Differentially Duplicated Euchromatic Material

9 Hyperexpansion of a Chimpanzee Segmental Duplication. 4>>>>>400 copies Cheng, Z et al., (2005), Nature

10 Human Segmental Duplications Properties Large (>10 kb) Recent (>95% identity) Interspersed (60% are separated by more than 1 Mb) Modular (duplicon architecture) ~389 acceptor regions 2.7% Genetic Difference, human vs. chimpanzee What impact in terms of human variation?

11 Models of Disease Rare Duplication-mediated Structural Variation Common Fine-Scale Structural Variation Rare Duplication-Mediated Structural Variation

12 Genomic Disorders TEL ABC ABC Aberrant Recombination TEL ABC ABC Human Disease GAMETES Triplosensitive, Haploinsufficient and Imprinted Genes Hypothesis: Mechanism underlying Uncharacterized Mental Retardation?

13 Genomic DisorderBrain Congenital Anomalies Locus Interva l kb LCR size kb Duplicon %ident ity Incidence (%) Incidence (MR) Williams-Beuren syndromeSevere MR craniofacial, heart disease 7q11.231,600>320 PMS2/GTFI 2 96-99 0.010.5 Prader-Willi syndromeSevere MR small hands, feet, hypotonia, obesity, short stature 15q11.2-q133500400HERC292-99 0.0070.35 Angelman syndromeSevere MR microcephaly, hyoptonia, seizures 15q11.2-q133500400HERC292-99 0.0070.35 Smith-Magenis syndromeSevere MR crainiofacial, peripheral neuropathy 17p11.24000200SMSREP98.2-99 0.0040.2 dup17p11.2mild MR peripheral neuropathy 17p11.24000200SMSREP98.2-99 0.0010.05 Velocardiofacial syndromemild MR cardiac, craniofacial defects 22q11.2~3000~300LCR2298-99 0.030.7 Cat Eye SyndromeSevere MR craniofacial,colobo ma 22q113000400LCR2298-99 0.0030.15 Inv dup(15)Mild/Severe mild facial, seizures 15q11/q144000400HERC298 0.010.5 NeurofibromatosisMild MR fibromatous tumours, visual defects 17q11.2150085NF1REP98.4 0.0030.03 CMT1Ano MR peripheral neuropathy 17p12140024 CMT1A- REP 98.7 0.01NA HNPPno MR peripheral neuropathy 17p12140024 CMT1A- REP 98.7 0.001NA 0.089 2.80% Duplication-Mediated Disease

14 130 candidate regions (298 Mb) 23 associated with genetic disease Target patients array CGH Duplication Map of Human Genome Bailey et al. (2002), Science:293:1003-1007

15 Array Comparative Genomic Hybridization High-throughput detection of large-scale variation (>50 kb), LCV or CNP= Deletions and Duplications (Iafrate et al., 2004; Sebat et al., 2004). 12 mm Array of Human BAC Clones Hybridization Normal Human DNA Sample Disease individual DNA Sample Merge Cy3 Channel Cy5 Channel

16 Duplication Microarrary: Experimental Design TEL BACs 130 regions of the human genome 2178 BACs or on average ~10-12 BACs per region Perform ArrayCGH—reciprocal dye swap experiments Strategy: Identify normal variation and then search for variation only observed in disease patients dist: >50 kb<5 Mb prop: 95% identity, 10 kb

17 Hybridization R921 -1.5 -0.5 0 0.5 1 1.5 2 5101520 D3767 -1.5 -0.5 0 0.5 1 1.5 0 5101520 R1080 -2 -1.5 -0.5 0 0.5 1 1.5 05101520 Log 2 Hybridization Relative Intensity BAC Probes 1-3 4-5 6 7-14 15 16-20

18 Study Populations Normal unaffected (diversity panel and HapMap Samples). Target= 800 samples, Completed: 75 + 269 samples=344 total—Identified additional 257 CNPs. Idiopathic Mental Retardation: Target =900 samples; (400 samples Flint, 500 CWRU samples); 291 complete

19 Normal Large-Scale Genomic Structural Variation Based on our analysis of ~568 chromosomes (~40/130 hotspots show no variation)—NAHR resistant or selection?

20 Validation using Nimblegen Arrays Duplication Deletion Locke et al., unpublished

21 Deletion Variants Appear Less Common

22 Study Populations Normal unaffected (diversity panel and HapMap Samples). Target= 800 samples, Completed: 75 + 269 samples=344 total—Identified additional 257 CNPs. Idiopathic Mental Retardation: Target =900 samples; (400 samples Flint, 500 CWRU samples); 291 complete

23 ~3.0 Mb deletion observed in IMR26 (=common VCF 22q11 deletion) VCF Deletion detected in IMR26

24 CNP detected by Seg Dup array and Iafrate et al. CNPs detected by Seg Dup array in HapMap samples Novel ~2.5Mb deletion only observed in IMR Novel LCV/CNP Detected in IMR43 Sharp et al., unpublished

25 Novel 2.5Mb Chr1 deletion in IMR43

26 Variation in IMR 7/9 events are de novo New Genomic Disorder Candidates 23 (n=31 patients) novel sites of variation defined by >2 BACs 291 IMR samples (Oxford Cohort) screened to date 5 are seen in more than one unrelated patient

27 Problems: Array CGH has a lower limit to detect deletions (~30 kb) Oligo-based approaches effectively sample a small fraction of the genome and extrapolate size indirectly 2. Neither can identify subtle (5-30 kb) variation 3. Neither approach can detect inversions. 1.Precise location of the rearrangement is unknown. 4. Location and structure of the change unknown

28 Models of Disease Rare Duplication-mediated Structural Variation Common Fine-Scale Structural Variation

29 SMA susceptibility 88.7/99.8% >100 kb 5q13 50% +++/- Duplication SMN2 nicotine metabolism 24kb/96.2% 7 kb 19q13.2 1.3% +/- Duplication CYP2A6 Congenital drenal hyperplasia 0 35 kb 6p21.3 1.6% +/- Duplication CYP21A2 antidepressant resistance 5.4kb/91-97% 5 kb 22q13.1 1-29% +++ Duplication CYP2D6 toxin resistance, cancer susceptibility 24kb/95.6% 18 kb 1p13.3 50% -/- Deletion GSTM1 immune response 91-97% Variable 14q32.3 4-15% +/- Deletion/Dup IGVH26 none 48kb/99% 219 kb Xq28 33% -/+ Inversion EMD/FLN heart defect susceptibility 400kb/98.9% 5 Mb 8p23 26% -/+ Inversion DEF3A-OR halothane/epoxide sensitivity 17kb/94% 54.3 kb 22q11.2 20% -/- Deletion GSTT1 Phenotype Dup Size Locus Freq. Type Gene Intermediate-Size Structural Variation (ISV) and Inversions Adapted from Buckland, Ann Med

30 Comparing Human Genomes by Paired-End Sequence ~1.1 million fosmid paired-ends were sequenced by MIT to facilitate gap closure during final phases of HGP Fosmid insert size tightly distributed around mean (40 +/- 2.6 kb), low copy=stability; capillary sequencing=low mispairing rate Derived from a single female donor PDR cell line Approach: optimal placement of fosmid ends against human genome could theoretically detect rearrangements: Inversions << Insertion >< Deletion >< Concordant >< Build35 Fosmid Dataset: 1,122,408 fosmid pairs preprocessed (15.5X genome coverage) 639,204 fosmid pairs BEST pairs (8.8 X genome coverage)

31 < 32 kb Putative Insertion >48 kb Putative Deletion discordant by orientation (yellow/gold) discordant size (red) duplication track a) Insertion Deletion Inversion b) c) Structural polymorphisms? Genome-wide Detection of Structural Variation (>8kb)

32 GSTM1 ~ 20 kb deletion minspread 28 kb (9 fosmids) 50% of Caucasians/Saudis are -/- for 18 kb gene (predisposition to cancer) +++ ultrarapid GSTM1 activity Validated Structural Polymorphisms CYP2D6 ~ 5-10 kb insertion Minspread 17 kb (7 fosmids) Alternate haplotype support 1-29% Caucasians/Japanese have multiple copies (entire gene ~5 kb) Associated with resistance to antipsychotic tricyclic antidepressants GSTM1 CYP2D6

33 Summary: 6/16 of common polymorphisms detected Tuzun et al. (2005) Nat. Genet

34 ……Sequence the Structural Variation

35 Putative Insertion (8,384 bp) build34 fosmid

36 Putative Deletion (14,055 bp) build34 fosmid

37 SIGLEC5A LSP1TNNT3 KCNJ2 KCNJ16 GSST2 DDT MEGF11 b35 fosmid b35 fosmid b35 fosmid b35 fosmid b35 fosmid b35 fosmid a) c) e) b) d) f) Sequencing Genic Structural Variation

38 Gene Families and Structural Variants Drug detoxification: glutathione-S-transferase, cytochromeP450, carboxylesterases Immune response and inflammation: leukocyte immunoglobulin-like receptor, defensin, phorbolin Surface integrity genes: mucin, enamelin, late epidermal cornified envelope genes, galectin Surface antigens: melanoma antigen gene family, rhesus antigen Environmental Interaction Genes.

39 Fine-Scale Structural Variation Map: ( build35 vs. Fosmids ) 1.3% Discordant Fosmids Identify 295 clusters (2 or more) 246 supported by second haplotype 147 inserts, 93 deletions, 57 inverts 18 putative L1 events—10 deletions and 8 insertions (6 kb insertion) 89 locate within gene regions. 138 unique regions of the genome 159 duplicated regions of the genome “Heterochromatic” regions Deletion Inversions Insertion(Fosmid) “Duplicated” regions

40 PCR Breakpoint Genotyping Assays for Structural Variation Tested 11 structural variants (5 insertions, 4 deletions, and 2 inversions) 7 successful assays (6 >20% minor allele frequency)

41 Illumina Golden-Gate Genotyping Assays for Structural Variation

42 CEPH Yoruba Japanese and Chinese Human Genome Structural Variation Project 2 scientific meetings (2005) 2 working groups (AHG, MSWG (12/05) Coordinating Committee (1/06) NIH Council (2/06) Press Release (3/15/06) Goal: Complete Characterization of Structural Variation in 48 HapMap Samples

43 Detected Variants from Two Individuals.

44 Complementary Approaches 1503 variants, 115 Mb, 800 genes structurally variant Eichler (2006) Nat. Genet

45 Summary Humans relatively unique in size, proportion and architecture of interspersed segmental duplications Large-Scale Variation Normals: Identified 257 CNPs using a targeted microarray to duplicated regions IMR: Identified 23 sites (>2 BACs) unique to patients (n=291 probands) (5 are recurrent and 7 are confirmed de novo) Novel Genomic Disorders Fine-Scale Variation: Developed an approach to map and sequence common fine-scale variation within the human Population, estimate ~200-300 differences > 8 kb between 2 individuals.


47 Models of Human “Genetic” Disease 1) Simple Mendelian -- one gene-one disease, familial, highly penetrant, small fraction of pop. Eg. cystic fibrosis 2) Chromosome Disease – large chromosomal regions, non-familial, sporadic, relatively high frequency Eg. Turner Syndrome 3) Genomic Disease – familial and/or recurrent, deletion or duplication of large # of genes, dosage effects. Eg. Prader-Willi Syndrome. 4) Complex Traits-- multiple genes plus environment, familial, variably penetrant, large fraction of population, susceptibility genes eg. hypertension.

48 Acknowledgements CWRU/UChicago Stuart Schwartz Laurie Christ Eichler Lab Eray Tuzun Andy Sharp Devin Locke Matthew Johnson Zhaoshi Jiang Jon Bleyhl Sean McGrath Tera Newman Jeff Bailey Anne Morrison Lisa Pertz Ze Cheng Xinwei She James Sprague UWGSC Maynard Olson Rajinder Kaul Hillary Hayden Eric Haugen Agencourt Doug Smith Oxford Jonathan Flint Samantha Knight NHGRI Jim Mullikin UCSF Dan Pinkel Donna Albertson UW Debbie Nickerson Mark Rieder Chris Carlson Josh Smith

49 ……Finding Novel Human Sequence

50 Kaul et al, unpublished Sequence of Traversing Fosmid Fills Gaps

51 Singleton Fosmids Extend into Gaps Kaul et al, unpublished

52 Fosmid Pairs that fail to Map to build35 4773 fosmid paired-end sequences fail to map to build 35. –1613 have 150 bp >Q30 at either end and have >100 bp unique seq 1416 of these have no hit to HTGS BAC sequence 1503 BLAST hit chimpanzee WGS but only 403 within chimp assembly Estimate that represents ~10-20 Mb. 1503 of these selected for fingerprinting (4 enzymes). Four independent restriction enzymes (EcoR I, Hind III, Bgl II and Nsi I ) Contigs constructed from 1376 clones (95% success rate) using Composite Mutual Overlap Statistic (CMOS)

53 FISH Summary of Orphan Fosmids 52 contigs tested by FISH 15 subtelomeric, 5 acrocentric and 5 pericentromeric 22 interstitial euchromatin (9 corresponding to known gaps) 10 contigs =no signals observed against 2 individuals (6/10 largest)

Download ppt "Genomic Duplications, Structural Variation and Disease Evan Eichler Howard Hughes Medical Institute University of Washington April 3 rd,2006, Frontiers."

Similar presentations

Ads by Google