Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lessons learnt from the 1000 Genomes Project about sequencing in populations Gil McVean Wellcome Trust Centre for Human Genetics and Department of Statistics,

Similar presentations


Presentation on theme: "Lessons learnt from the 1000 Genomes Project about sequencing in populations Gil McVean Wellcome Trust Centre for Human Genetics and Department of Statistics,"— Presentation transcript:

1 Lessons learnt from the 1000 Genomes Project about sequencing in populations Gil McVean Wellcome Trust Centre for Human Genetics and Department of Statistics, University of Oxford

2 Some questions What has the 1000 Genomes Project told us about how to sequence (in) populations What has the 1000 Genomes Project told us about populations

3 Samples for the 1000 Genomes Project Major population groups comprised of subpopulations of c. 100 each GBR FIN TSI IBS CEU JPT CHB CHS CDX KHV GWB GHN YRI MAB LWK MXL CLM ASW AJM ACB PEL PUR Samples from S. Asia

4 The role of the 1000G Project in medical genetics A catalogue of variants – 95% of variants at 1% frequency in populations of interest A representation of ‘normal’ variation A set of haplotypes for imputation into GWAS A training ground for sequencing/statistical/computational technologies

5 TSI* CEU JPT CHB CHS* YRI LWK* *Exon pilot only Samples for the 1000 Genomes Project: Pilot

6 Population-scale genome sequencing Haplotypes 2x 10x

7

8 What has the project generated?

9 >15 million SNPs, >50% of them novel dbSNP entries increased by 70%

10 An huge increase in the set of structural variants

11 A robust and modular pipeline for analysis of population- scale sequence data

12 An efficient format for storing aligned reads and a set of tools to manipulate and view the files SAM/BAM format for storing (aligned) reads Bioinformatics (2009) http://samtools.sourceforge.net

13 An information-rich format for storing generic haplotype/genotype data and tools for manipulating the files http://vcftools.sourceforge.net

14 An understanding of the ‘rare functional variant load’ carried by individuals c. 250 LOF / person c. 75 HGMD DM

15 USH2A Mutations cause with Usher syndrome 66 missense variants in dbSNP 2/3 detected in 1000 Genomes Pilot One HGMD ‘disease-causing’ variant homozygous in 3 YRI – Other reports indicate this is not a real disease-causing variant

16 Samples for the 1000 Genomes Project: Phase1 GBR FIN TSI CEU JPT CHB CHS YRI LWK MXL CLM ASW PUR

17 Lessons learnt about sequencing in populations

18 Lesson 1. The low-coverage model works for variant discovery

19 A near complete record of common variants CEU

20 Lesson 2. The low coverage model works for SNP genotyping

21 A set of accurate genotypes/haplotypes CEU

22

23 Lesson 3. The genome has a large grey area where variant calling is hard

24

25 Lesson 4. Joint calling of different variant types substantially improves the quality of calls

26

27 Lesson 5. Managing uncertainty is important

28

29 Lesson 6. Data visualisation is key

30

31 Lessons learnt about populations

32

33 Closely related populations can have substantially different rare variants

34

35 Spatial heterogeneity in non-genetic risk can differentially confound association studies for rare and common variants Iain Mathieson

36 Thanks to the many... Steering committee – Co-chairs: Richard Durbin and David Altshuler Samples and ELSI Committee – Co-chairs: Aravinda Chakravarti and Leena Peltonen Data Production Group – Co-chairs: Elaine Mardis and Stacey Gabriel Analysis Group – Co-Chairs: Gil McVean and Goncalo Abecasis – Subgroups in gene-targeted sequencing (Richard Gibbs) and population genetics (Molly Przeworski) Structural Variation Group – Co-chairs: Matt Hurles, Charles Lee and Evan Eichler DCC – Co-Chairs: Paul Flicek and Steve Sherry


Download ppt "Lessons learnt from the 1000 Genomes Project about sequencing in populations Gil McVean Wellcome Trust Centre for Human Genetics and Department of Statistics,"

Similar presentations


Ads by Google