Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sequencing of the South Asian Genome Lamri Amel Postdoctoral fellow 1.

Similar presentations


Presentation on theme: "Sequencing of the South Asian Genome Lamri Amel Postdoctoral fellow 1."— Presentation transcript:

1 Sequencing of the South Asian Genome Lamri Amel Postdoctoral fellow 1

2 Human genome sequencing projects (I) 2001-2004 First draft of the Caucasian genome Final version Remark: all males 2009 1 st Korean (East Asian) genome sequence 1 st Han Chinese (East Asian) genome sequenced 2008 1 st Yoruban (African) genome sequenced 2010 First 2 Southern African genomes Years 2

3 Human genome sequencing projects (II) 2010 1000G pilot project N= 179 Caucasians, Africans and east Asians 2012 1000G Phase1, n=1092 Caucasians, Africans, east Asians and native Americans First South Asian- indian genome sequenced 2015 1000G phase 3 N= 2,504 individuals from 26 populations (Caucasians, Africans, east Asians and native Americans and south Asians (~500) 2014 The south Asian genome sequencing in (N=148 and 38) Years 12 years later … 3 What about the other ethnic groups ??

4 4

5 1000Genomes Chambers et al UK USA N=148 Population Sampling N=489 Wong et al, n =38 5

6 Sequencing and validation Wong et alChambers et al1000 Genome project NNN Whole- genome Sequencing 38168489 The three studies have : Different sample size Different sequencing depth ==> What is the sequencing depth ?? Wong et alChambers et al1000 Genome project NdepthN N Whole- genome Sequencing 38× 30168× 4.3489×8 6

7 Sequencing AAATCTGTTCAACCATGCACAGTAATCGATTGACT DNA sequencing X Contigs (overlap) TGTTCAACCATGC AACCATGCACAGTA CACAGTAATCGAT TAATCGATTGAC TGTTCAACCATGCACAGTAATCGATTGAC reconstruction TGTTCAACCATGC AACCATGCACAGTA 7

8 Sequencing depth number of times a base pair is covered by contigs A 4x (Low coverage) x60 (high coverage) A Less precise (sequencing errors) Cheeper => more samples genotyped for a fixed budget Genotype accuracy is higher More expansive => less samples genotyped for a fixed budget 8

9 Sequencing and validation The three studies have : Different sample size Different sequencing depth ==> What is the sequencing depth ?? Wong et alChambers et al1000 Genome project NdepthN N Whole- genome Sequencing 38× 30168× 4.3489×8 9

10 Sequencing and validation Wong et alChambers et al1000 Genome project depthN N N Whole- genome Sequencing × 3038× 4.3 × 28 168 8 ×8489 Whole- exome sequencing --× 20.6147×65.7489 High density genotyping microarray -- 168-489 10

11 Loses genetic important genetic information Allows the identification of new exonic variants only Cheaper  Deeper sequencing of more samples and/or more depth Gathers all the genetic information Allows the identification of new variants More expensive  Usually low depth and/or less samples Whole genome sequencing / whole exome sequencing / genotyping arrays Gene 1Gene 2 Sequencing Whole exome sequencing Whole genome sequencing Gene 1Gene 2Gene 1Gene 2 Genotyping array A/T G/T C/A G/A G/C Loses genetic important genetic information +++ No identification of new variants Cheaper +++  genotyping of more samples +++ genotyping 11

12 Sequencing and validation Wong et alChambers et al1000 Genome project depthN N N Whole- genome Sequencing × 3038× 4.3 × 28 168 8 ×8489 Whole- exome sequencing --× 20.6147×65.7489 High density genotyping microarray -- 168-489 12

13 Sequencing and validation Wong et alChambers et al1000 Genome project depthN N N Whole- genome Sequencing × 3038× 4.3168×8489 Whole- exome sequencing --× 20.6147×65.7489 High density genotyping microarray -- 168-489 Sequencing of relatives ----x47141 (129 trios, 12 duos) 13

14 Sequencing and validation Wong et alChambers et al1000 Genome project depthN N N Whole- genome Sequencing × 3038× 4.3168×8489 Whole- exome sequencing --× 20.6147×65.7489 High density genotyping microarray -- 168-489 Sequencing of relatives ----x47141 (129 trios, 12 duos) 14

15 Population CodePopulation DescriptionSuper Population Code CHBHan Chinese in Bejing, ChinaEAS JPTJapanese in Tokyo, JapanEAS CHSSouthern Han ChineseEAS CDXChinese Dai in Xishuangbanna, ChinaEAS KHVKinh in Ho Chi Minh City, VietnamEAS CEUUtah Residents with Northern and Western Ancestry EUR TSIToscani in ItaliaEUR FINFinnish in FinlandEUR GBRBritish in England and ScotlandEUR IBSIberian Population in SpainEUR YRIYoruba in Ibadan, NigeriaAFR LWKLuhya in Webuye, KenyaAFR GWDGambian in Western Divisions in the GambiaAFR MSLMende in Sierra LeoneAFR ESNEsan in NigeriaAFR ASWAmericans of African Ancestry in SW USAAFR ACBAfrican Caribbeans in BarbadosAFR MXLMexican Ancestry from Los Angeles USAAMR PURPuerto Ricans from Puerto RicoAMR CLMColombians from Medellin, ColombiaAMR PELPeruvians from Lima, PeruAMR GIHGujarati Indian from Houston, TexasSAS PJLPunjabi from Lahore, PakistanSAS BEBBengali from BangladeshSAS STUSri Lankan Tamil from the UKSAS ITUIndian Telugu from the UKSAS 1000 Genome project populations 15

16 A typical south asian Genome has between 4 and 4.2 million variants, Only 2% of these variants are rare (<0,5%) 16

17 In a typical South Asian genome, nonsynonymous and regulatory variants account for less than 0,5% of total variants. 0.002% 0.37% 17

18 G1 G2 G3 18

19 The rarest variants are most commonly shared between other ethnic groups of the same super population Shared rare variants 19

20 A Auton et al. Nature 526, 68-74 (2015) doi:10.1038/nature15393 Population differentiation. 20

21 Figure 4. Enrichment for stratified genetic variants at genetic loci associated with respective phenotype in genome-wide association studies. Chambers JC, Abbott J, Zhang W, Turro E, Scott WR, et al. (2014) The South Asian Genome. PLoS ONE 9(8): e102645. doi:10.1371/journal.pone.0102645 http://journals.plos.org/plosone/article?id=info:doi/10.1371/journal.pone.0102645 21

22 Why would anyone pay 120 million$ to sequence 2500 human genomes ? Enthusiastic genetic researcher Greedy businessman And you wasted 120M$ just to come up with that table ?!?!?! Can we make money out of that ?? 22 Waaah, look at these amazing 1000G results ! Did you know that the average south Asian genome had 4M variants ?

23 Whole genome sequencing Applications populations history and evolution (demography, migration, admixture, selection) 23

24 Demography Bottleneck shared demographic history for all humans beyond,150,000 to 200,000 years ago. European, Asian and American populations shared strong and sustained bottlenecks, between 15,000 - 20,000 years ago. These bottlenecks were followed by extremely rapid inferred population growth especially in Bengladesh Reason ???? growth * * * * * 24

25 Population structure Estimation of the proportion of each genome derived from several (7) putative ‘ancestral populations’ Using a maximum likelihood approach Each color represents a different ancestor east west USA and barbados UK USA North South 25

26 Population structure Estimation of the proportion of each genome derived from several (7) putative ‘ancestral populations’ Using a maximum likelihood approach Each color represents a different ancestor east west 26

27 Population structure Estimation of the proportion of each genome derived from several (7) putative ‘ancestral populations’ Using a maximum likelihood approach Each color represents a different ancestor USA and barbados North South 27

28 Population structure Estimation of the proportion of each genome derived from several (7) putative ‘ancestral populations’ Using a maximum likelihood approach Each color represents a different ancestor USA and barbados 28

29 Population structure Estimation of the proportion of each genome derived from several (7) putative ‘ancestral populations’ Using a maximum likelihood approach Each color represents a different ancestor 29

30 Population structure Estimation of the proportion of each genome derived from several (7) putative ‘ancestral populations’ Using a maximum likelihood approach Each color represents a different ancestor 30

31 Population structure Estimation of the proportion of each genome derived from several (7) putative ‘ancestral populations’ Using a maximum likelihood approach Each color represents a different ancestor 31

32 Population structure Estimation of the proportion of each genome derived from several (7) putative ‘ancestral populations’ Using a maximum likelihood approach Each color represents a different ancestor South Asian derive from one common ancestor (admixture ignored) east west USA and barbados UK USA North South 32

33 Population structure Out of Africa human evolution model 33

34 Ancestral North Indian Ancestral South Indian Input of genetical studies to historical discoveries in SA Genetics Indo-European Dravidian Language 1900 – 4200 years ago 34

35 3000 years  now : striking reduction of the gene flows. Admixture was replaced by strong endogamy These observation are suported by written texts that suggest the establishement of the cast system during the same period of time Input of genetical studies to historical discoveries in SA 35

36 Am J Hum Genet 2013 Nature 2009 Am J Hum Genet 2011 Did not use 1000G sequencing data 36

37 Whole genome sequencing Applications : Imputation SNP1 SNP2 SNP3SNP4SNP5 Reference genome A T C G A G C G C C A ? ? G ? A T ? G A Your genotypes Imputation 37

38 Imputation allows to increase the number of available genotypic data in a study genotyped with an array Increase the genome coverage and hence the chance of detecting an association signal when performing a GWAS Imputation is a common practice in all GWAS studies and uses 1000G data as a reference (Thank you 1000G !!) Whole genome sequencing Applications : Imputation 38

39 Take home message The south Asian has both unique and shared genomic feature with other genomes The sequencing of the human genomes offer an invaluable source data with huge applications in health and research! 39

40 Thank you for your attention 40

41 References 1000 Genome project Consortium, 2015, nature. Chambers et al, 2014, plos one. Wang et al, 2014, plos genet. Moorjani et al, 2013, Hum mol genet. Reich et al, 2009, nature. 41


Download ppt "Sequencing of the South Asian Genome Lamri Amel Postdoctoral fellow 1."

Similar presentations


Ads by Google