Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Ashkenazi Genome Project

Similar presentations


Presentation on theme: "The Ashkenazi Genome Project"β€” Presentation transcript:

1 The Ashkenazi Genome Project
Shai Carmi Pe’er lab, Columbia University Joint Group Meeting November 2012

2 Recent History of Ashkenazi Jews
Mediterranean origin (?) Ca. 1000: Small communities in N. France, Rhineland Migration east Expansion ~10M today, mostly in US and Israel Relative isolation

3 Ashkenazi Jewish Genetics
Recently, AJ shown to be a genetically distinct group Close to Middle-Eastern & South-European populations 300 Jewish individuals; SNP arrays Jewish non-AJ Europeans AJ Middle-Eastern Price et al., PLoS Genetics 2008. Olshen et al., BMC Genetics 2008. Need et al., Genome Biology 2009. Kopelman et al., BMC Genetics, 2009. Behar et al., Nature 2010. Bray et al., PNAS 2010. Guha et al., Genome Biology 2012. Atzmon et al., AJHG 2010

4 Recent Demography & IBD
In small populations, common ancestors are likely recent. Generation 𝑔 1 2 3 A B

5 Recent Demography & IBD
In small populations, common ancestors are likely recent. Generation 𝑔 For g-generation ancestor, chances of IBD ~ 4 βˆ’π‘” , but length ~ 1 𝑔 (M). IBD is highly informative on recent history! A B A B A shared segment ⟹ Many long haplotypes identical-by-descent

6 Formal Inference Using IBD
Assume a population of historical size 𝑁 𝑑 = 𝑁 0 πœ† 𝑑 . Total shared segments of length β„“ 1 <β„“< β„“ 2 : 0 ∞ 𝑒 βˆ’ 0 𝑑 𝑑𝑑′ πœ†( 𝑑 β€² ) πœ†(𝑑) 𝑒 βˆ’2 β„“ 1 𝑁 0 𝑑 1+2 β„“ 1 𝑁 0 𝑑 βˆ’ 𝑒 βˆ’2 β„“ 2 𝑁 0 𝑑 1+2 β„“ 2 𝑁 0 𝑑 𝑑𝑑 Detect IBD in sample ⟹ Infer history 𝑁 𝑑 . A B Palamara et al., AJHG 2012 IBD sharing abundant in AJ Atzmon et al., AJHG 2012 Gusev et al., MBE 2011 A B A shared segment

7 Power of imputation by IBD
AJ Genetic History 2,300 N t Effective size 45,000 270 4,300,000 Years ago 800 Present High potential for genetic studies! AJ UK Power of imputation by IBD Palamara et al., AJHG 2012 Expansion rate β‰ˆ34% per generation

8 The Ashkenazi Genome Consortium
10 labs from NY area and Israel. Goals: Sequence to high coverage hundreds of healthy AJ Use as a reference panel for Association studies Imputation Clinical interpretation Understand AJ population history Understand AJ functional genetic variation (negative/positive selection)

9 The Ashkenazi Genome Consortium
Phase I: 144 AJ personal genomes ~60yo, healthy controls Unrelated, PCA-validated AJ Selected to maximize sharing with rest of cohort Technology: Complete Genomics Sequenced so far: ~100 genomes Data presented: 58 genomes Phase II: Hundreds of genomes (2013?) More collaborators

10 Quality Measures Property Genome (exome) Coverage ~55x Fraction called
96.5Β±0.003% (98%) Fraction with coverage > 20x 92.4Β±0.018% (94.9%) Concordance with SNP array 99.87Β±0.1% Ti/Tv ratio 2.14Β±0.003 (3.05) Ti/Tv

11 Multi-nucleotide variant count
Variant statistics Statistic Per genome (exome) Total SNPs 3.4M (22k) Novel SNPs 3.7% (4%) Het/hom ratio 1.64 (1.67) Insertions count 223k (246) Deletions count 237k (218) Multi-nucleotide variant count 83k (374) Synonymous SNPs 10525 Non-synonymous SNPs 9695 Nonsense SNPs 71 Other disrupting 241 CNV count 336 SV count 1486 MEI count 3475

12 Comparison to Europeans
Extrapolated to 100% genome TAGC Flemish (M) Similar results in 13 CG European public genomes. (k)

13 Het/Hom Ratio Significant in comparison to both Flemish and HapMap EU.
Was observed in SNP arrays (Need et al., Genome Biology 2009). Did I not just say that AJ have more IBD?

14 Het/Hom Ratio Years ago t AJ EU IBD observed Present

15 Data Flow Pipeline Backup 3x CGA tools testVariants VCF Fix Plink/Seq
QC Compress, index Plink Phase Distribute

16 Quality Control False positive rate assessment by runs of homozygosity: Assume hets in high confidence roh are FP. hets Paternal Maternal High confidence rohs only (>7.5MB, no gaps). 7 segments in 7 individuals (total 72MB). Count het SNPs in original files. Genome wide extrapolation: ~20,000 per genome. ~3-5% FP rate for indels.

17 Quality Control Remove: β‡’FP after QC: ~5,000 per genome.
Indels and MNPs Low-quality SNPs β‡’FP after QC: ~5,000 per genome. Multi-allelic SNPs Half-calls SNPs with high no-call rate SNPs not in HWE Monomorphic reference SNPs Inbred individual

18 Applicability to Clinical Genomics
Variants of unknown significance Technical false positives True variants without health impact Novel variants per sample Not in TAGC

19 Distance between phased hets
Phasing Sequencing is in mate-pairs Haplotype information available for ~30-35% of hets. BEAGLE error rate: 3-4%. Seqphase: new phasing tool Based on SHAPEIT Incorporates reads 18 hours on chromosome 1. Distance between phased hets 100 300 500 Frequency

20 Variant Discovery Number of non-reference variants.
Extrapolation using Gravel et al., PNAS 2011.

21 Variant Discovery Number of segregating sites Sn(t), heterozygosity H(t). Zivkovic and Stephan, Theor. Pop. Biol 𝑆 𝑛 𝑑 =πœƒ π‘˜=1 𝑛/2 (4π‘˜βˆ’1) 𝑛 2π‘˜ 𝑛+2π‘˜βˆ’1 2π‘˜ βˆ’βˆž 𝑑 exp⁑ βˆ’ 2π‘˜ 2 𝑠 𝑑 𝑑𝑒 𝜌(𝑒) 𝑑𝑠 𝐻 𝑑 =πœƒ βˆ’βˆž 𝑑 exp⁑ βˆ’ 𝑠 𝑑 𝑑𝑒 𝜌(𝑒) 𝑑𝑠 N(t): # diploids at time t; N=N(t=0); ρ(t)=N(t)/N; n: # diploid samples t: #generations/2N; ΞΈ=4NΞΌ; ΞΌ: mutation rate per generation Use double expansion model of Palamara et al., AJHG 2012. Define t=0 at the start of the first expansion. Match H(t).

22 Variant Discovery ?

23 Allele Frequency Spectrum
Counts Fractions All Pop.-specific

24 Demographic Inference
Folded allele frequency spectrum + coalescent simulations. Double expansion model + ancient AJ foundation bottleneck. Find maximum likelihood solution (Gutenkunst et al., PLoS Genet. 2009) Average over simulations to obtain expected spectrum. Assume mutation frequency is drawn according to expected spectrum. Multinomial probability approximated as Poisson. 100 10 1 0.1 %sites

25 Demographic Inference
t Years ago 3,000 Similar to Palamara et al., with somewhat larger population sizes. To do: Gene flow from EU; better inference tools. 5000 90,000 875 500 Present 7,500,000 N Effective size

26 Ongoing Analysis Exome analysis Mobile elements insertion
Genes w/ AJ-specific high mutation load Mobile elements insertion Common insertions frequencies correlated with 1KG AJ disease genes (Ostrer & Skorecki, Human Genetics 2012) Some carriers detected 276 non-synonymous mutations, >65 known 60 loss-of-function

27 Summary AJ bottleneck and expansion reveal potential for genetics studies. High quality genomes sequenced by TAGC indicate utility in clinical setting. Complete variant discovery improves demographic inference; subtle differences from Europeans. Future directions: Imputation power using TAGC vs. 1000Genomes Local ancestry inference Effect of natural selection

28 Thank you! TAGC consortium members: Funding:
Columbia University Computer Science: Itsik Pe’er, Pier Francesco Palamara Undergrads: Fillan Grady, Ethan Kochav, James Xue IT: Shlomo Hershkop Long-Island Jewish Medical Center: Todd Lencz, Semanti Mukherjee, Saurav Guha Columbia University Medical Center: Lorraine Clark, Xinmin Liu Albert Einstein College of Medicine: Gil Atzmon, Harry Ostrer Mount Sinai School of Medicine: Inga Peter, Laurie Ozelius Memorial Sloan Kettering Cancer Center: Ken Offit, Vijai Joseph Yale School of Medicine: Judy Cho, Ken Hui, Monica Bowen The Hebrew University of Jerusalem: Ariel Darvasi VIB, Gent, Belgium Herwig Van Marck, Stephane Plaisance Complete Genomics Jason Laramie Funding: Human Frontiers Science program.


Download ppt "The Ashkenazi Genome Project"

Similar presentations


Ads by Google