The 1000 Genomes Project Gil McVean Department of Statistics, Oxford.

The 1000 Genomes Project Gil McVean Department of Statistics, Oxford

What is the 1000 Genomes Project? A catalogue of all types of genetic variation, including rare variants (c. 1% frequency) obtained by sequencing at least 1000 individuals from geographic centres of major medical genetics interest A large international collaboration –UK, USA, China, Germany An exploration of the use of next-generation technologies for population-scale genome sequencing A resource for accelerating the rate of identifying disease mechanisms in the follow-up to disease-association studies

Samples for the main project UK FIN TSI ESP CEU JPT CHB CHS DAI KVT GMB GHN YRI MLW LWK Major population groups comprised of subpopulations of c. 100 each MXL ASW new CMB PRO

Population-scale genome sequencing Haplotypes 2x 10x

Pilot experiments Pilot 1 –Low-coverage (2x-4x) on 60 unrelated individuals from each of CEU, YRI and CHB+JPT Pilot 2 –High-coverage (20x diploid) on 2 trios (one from CEU, one from YRI) Pilot 3 –Exons from 1000 genes to 20x in c. 1000 samples (largely European) Complete!

The 1000G Low Coverage Pilot 185 individuals from 4 populations – CEU (63), CHB (30), JPT (30), YRI (62) PopulationTechnologyN IndividualsMapped Bases (billions) Mean Coverage / Individual CEUSLX524823.09 SOLiD302402.66 454181322.45 CHBSLX302342.60 JPTSLX282272.70 45429.61.60 YRISLX605943.30 SOLiD520.61.38 454210.81.80 Combined1851,8843.52

Even still, at lot of data isn’t much In the Pilot 1 sample 1 tera-basepairs leaves the CEU with… –6% of genotypes with 0 reads –16% of genotypes with < 2 reads –29% of genotypes with < 3 reads

ftp.1000genomes.ebi.ac.uk www.1000genomes.org Pilot release expected Nov/Dec 2009 ftp-trace.ncbi.nih.gov/1000genomes/ftp

What has the project already generated?

Over 9 millions novel SNPs Total 17.2 M SNPs called Previously ~12M SNPs “known” (dbSNP 129) –7.9M confirmed –9.2M novel 4.84 1.09 0.78 0.48 2.805.65 1.54 CEUYRI CHB+JPT 0.50 0.38 0.29 0.26 2.20 4.38 1.35 CEU YRI CHB+JPT Total SNPsNovel SNPs Le Quang

A near complete record of common SNPs Durbin, Le Quang

A set of accurate genotypes This is about where simulations suggest we should be with 2-4x on 60 samples Note this quality is much much better than if calls were made marginally Durbin, Le Quang

Many novel indels and larger structural variants

Zam Iqbal Up to 50kb Novel sequence from de novo assembly

Some interesting biology - variation in SNP density

Some more interesting biology – high Fst SNPs Ryan Hernandez, Adam Auton

Even more interesting biology – loss of function mutations Daniel MacArthur

A robust and modular pipeline for analysis of population-scale sequence data

An efficient format for storing aligned reads and a set of tools to manipulate and view the files SAM/BAM format for storing (aligned) reads Bioinformatics (2009) http://samtools.sourceforge.net

An information-rich format for storing generic haplotype/genotype data and tools for manipulating the files www.1000genomes.org/wiki/doku.php?id=1000_genomes:analysis:vcfv3.2

Using the 1000G data now

IMPUTE Genotypes in additional samples from standard product Reference panel (1000G) Imputation … 11101010101011 … … 00111110000111 … … 11110000011101 … … 00101011100101 … … 1.2..1.0.0..22… … 11220110200122 … Imputed genotypes

Imputation performance across SNP types from P1 (CEU) from Affy 500k Annotation# SNPsInfo measure All414,3210.780 MAF < 5%102,0000.543 MAF > 5%312,3210.857 UCSC Genes6,6280.736 Depth < 1003,153 (0.7%)0.611 SimpRpts25,6250.607 SimpRpts + Depth < 1001,652 (6.5%)0.671 SegDups24,3010.686 SegDups + Depth < 100665 (2.7%)0.388 Jonathan Marchini

Looking forward... Already have data generated for c. 200 more Europeans –Data generation largely complete by mid 2010 Much work still to be done on accurate inference of all types of variation from NGS data Data already proven useful for a number of projects – please use it

Thanks to the many...

The 1000 Genomes Project Gil McVean Department of Statistics, Oxford.

Similar presentations

Presentation on theme: "The 1000 Genomes Project Gil McVean Department of Statistics, Oxford."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The 1000 Genomes Project Gil McVean Department of Statistics, Oxford.

Similar presentations

Presentation on theme: "The 1000 Genomes Project Gil McVean Department of Statistics, Oxford."— Presentation transcript:

Similar presentations

About project

Feedback