Presentation is loading. Please wait.

Presentation is loading. Please wait.

人类群体遗传学 基本原理和分析方法 中科院 - 马普学会计算生物学伙伴研究所 中国科学院上海生命科学研究院研究生课程 人类群体遗传学 徐书华 金 力.

Similar presentations


Presentation on theme: "人类群体遗传学 基本原理和分析方法 中科院 - 马普学会计算生物学伙伴研究所 中国科学院上海生命科学研究院研究生课程 人类群体遗传学 徐书华 金 力."— Presentation transcript:

1 人类群体遗传学 基本原理和分析方法 中科院 - 马普学会计算生物学伙伴研究所 中国科学院上海生命科学研究院研究生课程 人类群体遗传学 徐书华 金 力

2 序号日 期日 期课程内容授课教师 1 2 月 26 日 Hardy-Weinberg 平衡检验原理及其应用徐书华 2 3月5日3月5日 遗传多态性统计量徐书华 3 3 月 12 日 进化树的构建方法及应用徐书华 4 3 月 19 日 Coalescence 原理及应用李海鹏 5 3 月 26 日 遗传漂变效应及有效群体大小的估计徐书华 6 4月2日4月2日 人群遗传结构分析 ( I )徐书华 7 4月9日4月9日 单倍型估计及连锁不平衡分析徐书华 8 4 月 16 日 人群遗传结构分析 ( II ) 徐书华 9 4 月 23 日 基因定位中的关联分析 (I) 何云刚 10 4 月 30 日 基因定位中的关联分析 (II) 徐书华 11 5月7日5月7日 人类基因组中的连锁不平衡模式及标签位点的选择徐书华 12 5 月 14 日 基因表达数据的分析方法严军 13 5 月 21 日 人群历史的遗传学研究徐书华 5 月 28 日 端午节 14 6月4日6月4日 法医学检测及分析方法李士林 15 6 月 11 日 自然选择检验原理和方法徐书华 16 6 月 18 日 全基因组基因型数据正选择检验方法徐书华 17 6 月 25 日 课程考试教育基地 2008 - 2009 学年第二学期《人类群体遗传学分析方法》课程表 上课时间:每周四上午 10:00-11:50 上课地点:中科大厦 4 楼 403 室第 7 教室

3 第九讲 第九讲 人类基因组中的连锁不平衡模式及标签位点的选择

4 SNP association with disease allele GENE disease allele marker SNP marker SNP marker SNP How closely must SNPs be spaced? 30,000 to 1,000,000 SNPs to span the genome? Numbers will depend on local haplotype structure, amount of LD -30kb “blocks” 100,000 “independent” SNPs to span genome marker SNP

5 Problems of Using SNPs for Association Studies ► The number of SNPs is still too large to be used for association studies.  There are millions of SNPs in a human body.  To reduce the SNP genotyping cost, we wish to use as few SNPs as possible for association studies.

6 How many common SNPs in human genome? ► Common SNPs: minor allele frequency (MAF) >0.05; ► Suppose we have 50 samples of African, European, Asian respectively; ► Theta=1.2/kb for African population; ► Theta=0.8/kb for European and Asian population; ► Autosome length (L)=2.68 billion bp; ► We expect 9.8 million common SNPs in 50 African samples; ► We expect 6.5 million common SNPs in 50 European samples; ► We expect 6.5 million common SNPs in 50 Asian samples; where

7 ThetaK=1.2/kb

8 ThetaK=0.8/kb

9 LD pattern in Human genome ► Genetic variation is organised in relatively short blocks of strong LD; ► Each block contains only a few common haplotypes, separated by points, typically recombination hotspots; ► Across recombination hotspots little association remains.

10 LD structure in human genome

11 LD cause decrease of haplotypes ► Suppose we have a region with 100 SNPs ► Expected haplotypes without LD ► = 2 100 = 1.27 x 10 30 ► Observed haplotype with moderate LD ► = 18

12 CAh14b .06.40.33.05.11.05.07.02.27.24 Daly et al., Nature Genetics, Oct. 2001 block 1 block 2 block 3 block 4 block 5 block 6 block 7 block 8 block 9 block 10 block 11 Haplotype block on 5q31

13 The haplotype patterns for 20 independent globally diverse chromosomes defined by 147 common human chromosome 21 SNPs. The 147 SNPs span 106 kb of genomic DNA sequences. The 147 SNPs are divided into 18 blocks. Patil N, et al., Science, Nov.2001. htSNP htSNP

14 “ Yin Yang ”  阴阳现象 12345 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 site site haplotype.80.18.10.06.04

15 The proportion of sequence contained in haplotype blocks of various sizes a | European-American sample. b | African-American sample. c | East Asian sample. d | Sub-Saharan African sample. e | Environmental Genome Project (EGP) single nucleotide polymorphism (SNP) study. f | Seattle SNP study.

16 80% of all recombination occurs in 10% of the sequence HapMap phase II

17 Tag SNPs ► Tag SNPs are a small subset of SNPs that is sufficient for performing association studies without losing the power of using all SNPs

18 Examples of Tag SNPs P1P1 P2P2 P3P3 P4P4 S1S1 S2S2 S3S3 S4S4 S5S5 S6S6 S7S7 S8S8 S9S9 S 10 S 11 S 12 SNP loci Haplotype pattern ► In fact, it is not necessary to genotype all SNPs. ► SNPs S 3, S 4, and S 5 can form a set of tag SNPs. P1P1 P2P2 P3P3 P4P4 S3S3 S4S4 S5S5

19 Examples of Wrong Tag SNPs P1P1 P2P2 P3P3 P4P4 S1S1 S2S2 S3S3 S4S4 S5S5 S6S6 S7S7 S8S8 S9S9 S 10 S 11 S 12 SNP loci Haplotype pattern ► SNPs S 1, S 2, and S 3 can not form a set of tag SNPs because P 1 and P 4 will be ambiguous. P1P1 P2P2 P3P3 P4P4 S1S1 S2S2 S3S3

20 Examples of Tag SNPs P1P1 P2P2 P3P3 P4P4 S1S1 S2S2 S3S3 S4S4 S5S5 S6S6 S7S7 S8S8 S9S9 S 10 S 11 S 12 SNP loci Haplotype pattern ► SNPs S 1 and S 12 can form a set of tag SNPs. ► This set of SNPs is the minimum solution in this example. P1P1 P2P2 P3P3 P4P4 S1S1 S 12

21 Pairwise LD Based Methods ► Find minimum tagSNP set such that SNP is either a tagSNP or is in LD with a tagSNP. ► Pairwise r 2 is directly related to sample size and power of association studies. Pritchard and Przeworski 2001. ► Greedy approach: keep selecting untagged SNP that is in LD with the most remaining untagged SNPs. Carlson et al. 2004.

22 Greedy May Not Be Optimal

23

24

25

26

27 Exhaustive Search Achieves Optimality, But … ► Go through all k-SNP combinations.  Start from k = 1.  If not successful, k  k +1. ► Guaranteed to find the optimal tagSNP set. ► Becomes computationally prohibitive as k increase.

28 A different Strategy ► SNPs naturally fall into subsets “ precincts ” due to blocky LD structure of the genome. ► SNPs in different precincts not in strong LD. Haploview

29 International HapMap Project http://www.hapmap.org

30 南方基因组中心CountryPercentChromosomesCanada10.0% 2, 4p China 10.0 % 3, 8p, 21 Japan 25.1 % 5, 11, 14, 15, 16, 17, 19 United Kingdom 24.0% 1, 6, 10, 13, 20 United States 30.9% 4q, 7, 8q, 9, 18, 22, X, Y

31 Background ► Human Genome Project.  Complete genome sequenced. 99.9% identical. ► Link between DNA sequence and disease. ► Linkage study limited for complex diseases. ► The “ indirect ” approach. ► Officially launched October 27-29, 2002. ► “ China Chapter ” (10%) launched March, 2003 http://www.hapmap.org

32 Goal ► A freely-available public resource to increase the power and efficiency of genetic association studies to medical traits. ► To determine the common patterns of DNA sequence variation in the human genome.

33 Benefits High density genotyping across the genome provide information about:  SNP validation.  SNP frequency.  Assay condition.  Correlation structure of alleles in the genome.

34 Samples ► Most of the common haplotypes occur in all human populations; however, their frequencies differ among populations. Therefore, data from several populations are needed to reveal the pattern.

35 HapMap samples ► 90 Yoruba individuals (30 parent-parent- offspring trios) from Ibadan, Nigeria (YRI) ► 90 individuals (30 trios) of European descent from Utah (CEU) ► 45 Han Chinese individuals from Beijing (CHB) ► 45 Japanese individuals from Tokyo (JPT)

36 Goal ► Samples will be genotyped for at least 1 million SNPs across the genome. ► First pass of 600,000 equally spaced SNPs (~1 SNP per 5kb). ► Additional SNPs genotyped in low LD regions to help define haplotypes. ► Genotyping perfomed by 10 centres worldwide, using five different technologies: quality control assessed across all centres on a test set of SNPs.

37 Result ► PHASE I  1,000,000 SNPs successfully typed in all 270 HapMap samples  ENCODE variation reference resource available ► PHASE II  >3,500,000 SNPs typed in total !!!

38 Encode project ► Ten “ typical ” 500 kb regions.  across a range of chromosomes  with different recombination rates and gene density. ► Strategy: fully resequence 16 CEPH individuals, 16 Nigerians, 8 Chinese and 8 Japanese from HapMap samples;fully resequence 16 CEPH individuals, 16 Nigerians, 8 Chinese and 8 Japanese from HapMap samples; genotype SNPs in complete HapMap sample, initially SNPs in dbSNP, then additional SNPs discovered via resequencing.genotype SNPs in complete HapMap sample, initially SNPs in dbSNP, then additional SNPs discovered via resequencing. ► Currently, 1 SNP every 279 bp.

39 Goal of ENCODE project ► Investigate the effects of marker density on LD structure, recombination rate estimates, tag SNP selection: the “ hidden SNP ” problem.

40 Data analysis www.hapmap.org ► Genotype data freely available for download. ► Basic summary statistics: genotype frequencies, minor allele frequencies, etc. ► LD summary statistics via HAPLOVIEW software: basic LD measures, blocks, common haplotypes and their frequencies. ► Software to select tag SNPs …

41 What can HapMap data tell us ► LD pattern.  R 2, D ’, etc. ► Recombination rates. ► Haplotype pattern. ► Selecting tagSNPs.

42 HapMap applications ► Study design.  tagSNP selection. ► Study interpretation.  Comparison of multiple studies.  Connection to genes/genomeic features.  Integration with other functional data. ► Others.

43 Tagging from HapMap Since HapMap describes the majority of common variation in the genome, choosing non-redundant sets of SNPs from HapMap offers considerable efficiency without power loss in association studies.

44

45

46

47

48 Select population Select tagging algorithm and parameters [optional] upload list of SNPs to be included, excluded, or design scores 9b: Press “Configure” to save changes

49

50

51

52

53 http://www.broad.mit.edu/mpg/tagger/server.html

54

55 Parameters for SNP Selection ► Allele Frequency ► Putative Function (cSNPs) ► Genomic Context (Unique vs. Repeat) ► Patterns of Linkage Disequilibrium (tag SNPs)

56 常用软件 ► Haploview  http://www.broad.mit.edu/mpg/haploview/ http://www.broad.mit.edu/mpg/haploview/

57 练习 ► 利用 HapMap 数据进行挑选 tag SNPs ;  http://www.hapmap.org http://www.hapmap.org


Download ppt "人类群体遗传学 基本原理和分析方法 中科院 - 马普学会计算生物学伙伴研究所 中国科学院上海生命科学研究院研究生课程 人类群体遗传学 徐书华 金 力."

Similar presentations


Ads by Google