Presentation is loading. Please wait.

Presentation is loading. Please wait.

Analyzing human population genetic history through the study of genetic variation Mark Mata Mentor: Eleazar Eskin UCLA Zar Lab SoCalBSI 2009.

Similar presentations


Presentation on theme: "Analyzing human population genetic history through the study of genetic variation Mark Mata Mentor: Eleazar Eskin UCLA Zar Lab SoCalBSI 2009."— Presentation transcript:

1 Analyzing human population genetic history through the study of genetic variation Mark Mata Mentor: Eleazar Eskin UCLA Zar Lab SoCalBSI 2009

2 Background  To study human population genetic history is to study parts of human evolution  Human evolution is one of the fundamental questions in science  We ask ourselves many questions like:  Where do we come from?  Why are we all different?  How are we all different?

3 Background  The ZarLab does studies with the most recent events in human evolution:  Now that we have modern humans, what variations have occurred in our genes since our ancient African ancestors  To answer this question our group is looking at human variation to produce a genetic history of these changes

4 Why do we care?  Many diseases are caused by variations that have occurred in our genetic history  Better understanding of our genetic history and human variation may eventually lead to better treatment plans  Personalized medicine:  “The right drug, in the right dose, to the right person, at the right time.” PerkinElmer website: http://las.perkinelmer.com/content/snps/genotyping.asp#snps

5 Human Variation  Modern humans share 99.9% of our DNA  0.1% account for variations between humans  Of this, 80% of the variation are the result of SNPs  SNP (single-nucleotide polymorphism) – position in the genome where there are two different bases present in the population. The base at a SNP on a chromosome is referred to as the “allele”  A haplotype is the sequence of alleles on a genome  The other 20% are from deletions or insertions on the genome PerkinElmer website: http://las.perkinelmer.com/content/snps/genotyping.asp#snps

6 Human Variation  We are studying the 80% of the variations that come in the form of SNPs  These SNPs are compiled into a list of SNPs which are called haplotypes  Deletions and insertions are “ignored” because of the limitations of microarrays from which the data is generated

7 International HapMap Project  Study done by the International HapMap Consortium  “…create a public, genome-wide database of common human sequence variation…”  Identified SNPs and compiled the SNP alleles into a database of haplotypes for four different populations (Phase 1)  Population used were a group of 60 Mormons in Utah  Have been widely studied in the past  Western and Northern European descent  Have very detailed records  Used their chromosome 19 “A haplotype map of the human genome” by: The International HapMap Consortium. Nature. Published 27 October 2005

8 My Project Goals  Reconstruct human genetic history  This is a very difficult problem  Sub-problem: Identify recent genetic events  Make the assumption that these new genetic events are rare or very few in number  Easier to classify and identify relationships when compared to older more common haplotypes  These new events are important because they identify shared recent ancestry  Disease causing variations could be from recent events

9 Identifying Recent Genetic Events 1.Select a region in a haplotype and find the frequency of variation 2.Group variations into common and rare 3.Find recent point mutations 4.Find recent recombinations

10 Workflow Individual’s Frequency of Identify HaplotypesVariationEvents TTTTTTTTTTTTTTTAAAAAAAAA A AAAAAAAAAAAAAAA TTTTTTTTTTTTTTTCommonAAAAAAAAA T * AAAAAAAAAAAAAAAAAAAAAAAAA – 49% TTTTTTTTTTTTTTTTTTTTTTTTT – 48% AA AAAAAAAA AAAAAAAAAAAAAAA TTTTTTTTTTTTTTTRareAA|TTTTTTTT AAAAAAAAATTTTTTAAAAAAAAAT – 1% AATTTTTTTTTTTTTAATTTTTTTT – 1% TTTTTTTTTT TTTTTTATTTTTTTTTTTTTTATTT – 1% AAAAAAAAAAAAAAATTTTTT T TTT AAAAAAAAAAAAAAA TTTTTT A *TTT 1.Select a region in a haplotype and find the frequency of variation 2.Group variations into common and rare 3.Find recent point mutations 4.Find recent recombination events

11 Choosing a region size  Need to pick a region size that will be large enough to pick up a lot of different variations but small enough to see what caused the variations  Through numerous tests, selecting a region of 20 nucleotides and using progressively smaller regions, it was determined that a region size of 10 nucleotides was the best 1.Select a region in a haplotype and find the frequency of variation 2.Group variations into common and rare 3.Find recent point mutations 4.Find recent recombinations

12 1.Select a region in a haplotype and find the frequency of variation 2.Group variations into common and rare 3.Find recent point mutations 4.Find recent recombinations Region Size 20

13 Region Size 10 1.Select a region in a haplotype and find the frequency of variation 2.Group variations into common and rare 3.Find recent point mutations 4.Find recent recombination events

14 Frequency of Variation Individual’s RegionHow Many Haplotype TTTTTTTTTTTTTTTTTTTTTTTTT AAAAAAAAAAAAAAAAAAAAAAAAA TTTTTTTTTTTTTTTTTTTTTTTTT AAAAAAAAAAAAAAAAAAAAAAAAA TTTTTTTTTTTTTTTTTTTTTTTTT AAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAA - 59 TTTTTTTTTTTTTTTTTTTTTTTTT TTTTTTTTTT - 58 AAAAAAAAATTTTTTAAAAAAAAAT AAAAAAAAAT - 1 AATTTTTTTTTTTTTAATTTTTTTT AATTTTTTTT - 1 TTTTTTATTTTTTTTTTTTTTATTT TTTTTTATTT - 1 AAAAAAAAAAAAAAAAAAAAAAAAA 1.Select a region in a haplotype and find the frequency of variation 2.Group variations into common and rare 3.Find recent point mutations 4.Find recent recombination events

15 Frequency of Variation Individual’s How ManyFrequency of HaplotypeVariation TTTTTTTTTT|TTTTT AAAAAAAAAA|AAAAA TTTTTTTTTT|TTTTT AAAAAAAAAA|AAAAA TTTTTTTTTT|TTTTT AAAAAAAAAA|AAAAAAAAAAAAAAA – 59/120~49% TTTTTTTTTT|TTTTTTTTTTTTTTT – 58/120~48% AAAAAAAAAT|TTTTTAAAAAAAAAT – 1/120~1% AATTTTTTTT|TTTTTAATTTTTTTT – 1/120~1% TTTTTTATTT|TTTTTTTTTTTATTT – 1/120~1% AAAAAAAAAA|AAAAA 1.Select a region in a haplotype and find the frequency of variation 2.Group variations into common and rare 3.Find recent point mutations 4.Find recent recombination events

16 Grouping Variations Classified as either common or rare haplotypes  Make the assumption that new genetic events are rare or very few in number  A cut off rate of 5% frequency or higher was used to separate common subsequences from rare subsequences  5% was a number that came from the International HapMap Consortium study “A haplotype map of the human genome” by: The International HapMap Consortium. Nature. Published 27 October 2005 1.Select a region in a haplotype and find the frequency of variation 2.Group variations into common and rare 3.Find recent point mutations 4.Find recent recombination events

17 Grouping Variations Individual’s Frequency ofGroup GenesVariation TTTTTTTTTT|TTTTT AAAAAAAAAA|AAAAA TTTTTTTTTT|TTTTT AAAAAAAAAA|AAAAACommon: TTTTTTTTTT|TTTTTAAAAAAAAAA AAAAAAAAAA|AAAAAAAAAAAAAAA – 49%TTTTTTTTTT TTTTTTTTTT|TTTTTTTTTTTTTTT – 48% AAAAAAAAAT|TTTTTAAAAAAAAAT – 1%Rare: AATTTTTTTT|TTTTTAATTTTTTTT – 1%AAAAAAAAAT TTTTTTATTT|TTTTTTTTTTTATTT – 1%AATTTTTTTT AAAAAAAAAA|AAAAATTTTTTATTT AAAAAAAAAA|AAAAA 1.Select a region in a haplotype and find the frequency of variation 2.Group variations into common and rare 3.Find recent point mutations 4.Find recent recombination events

18 Recent Events  Make comparisons to identify two forms of variation:  Point mutations  Recombination events Common:Rare: AAAAAAAAAAAAAAAAAAAT TTTTTTTTTTAATTTTTTTT TTTTTTATTT 1.Select a region in a haplotype and find the frequency of variation 2.Group variations into common and rare 3.Find recent point mutations 4.Find recent recombination events

19 Point Mutations Individual’s Frequency of Identify HaplotypesVariationEvents TTTTTTTTTTTTTTTAAAAAAAAA A AAAAAAAAAAAAAAA TTTTTTTTTTTTTTTAAAAAAAAA T * AAAAAAAAAAAAAAA TTTTTTTTTTTTTTT AA AAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAA – 49% TTTTTTTTTTTTTTTTTTTTTTTTT – 48%AA|TTTTTTTT AAAAAAAAATTTTTTAAAAAAAAAT – 1% AATTTTTTTTTTTTTAATTTTTTTT – 1% TTTTTTTTTT TTTTTTATTTTTTTTTTTTTTATTT – 1% AAAAAAAAAAAAAAATTTTTT T TTT AAAAAAAAAAAAAAA TTTTTT A *TTT 1.Select a region in a haplotype and find the frequency of variation 2.Group variations into common and rare 3.Find recent point mutations 4.Find recent recombination events

20 Point Mutations Individual’s Frequency of Identify HaplotypesVariationEvents TTTTTTTTTTTTTTT AAAAAAAAAAAAAAA TTTTTTTTTTTTTTT AAAAAAAAAAAAAAA TTTTTTTTTTTTTTT AAAAAAAAAAAAAAAAAAAAAAAAA – 49% TTTTTTTTTTTTTTTTTTTTTTTTT – 48% AAAAAAAAATTTTTTAAAAAAAAAT – 1% AATTTTTTTTTTTTTAATTTTTTTT – 1% TTTTTTATTTTTTTTTTTTTTATTT – 1% AAAAAAAAAAAAAAATTTTTT T TTT AAAAAAAAAAAAAAA TTTTTT A *TTT 1.Select a region in a haplotype and find the frequency of variation 2.Group variations into common and rare 3.Find recent point mutations 4.Find recent recombination events

21 Recent Events Point mutations  Are found by comparing a common haplotype and with a rare haplotype  A difference of one shows that a rare haplotype is a point mutation of a common haplotype  Marked by a “*” next to the point mutation Common: TTTTTTTTTT TTTTTTA*TTT Rare:TTTTTTATTT 1.Select a region in a haplotype and find the frequency of variation 2.Group variations into common and rare 3.Find recent point mutations 4.Find recent recombination events

22 Recombination Individual’s Frequency of Identify HaplotypesVariationEvents TTTTTTTTTTTTTTTAAAAAAAAA A AAAAAAAAAAAAAAA TTTTTTTTTTTTTTTAAAAAAAAA T * AAAAAAAAAAAAAAA TTTTTTTTTTTTTTT AA AAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAA – 49% TTTTTTTTTTTTTTTTTTTTTTTTT – 48%AA|TTTTTTTT AAAAAAAAATTTTTTAAAAAAAAAT – 1% AATTTTTTTTTTTTTAATTTTTTTT – 1% TTTTTTTTTT TTTTTTATTTTTTTTTTTTTTATTT – 1% AAAAAAAAAAAAAAATTTTTT T TTT AAAAAAAAAAAAAAA TTTTTT A *TTT 1.Select a region in a haplotype and find the frequency of variation 2.Group variations into common and rare 3.Find recent point mutations 4.Find recent recombination events

23 Recombination Individual’s Frequency of Identify HaplotypesVariationEvents TTTTTTTTTTTTTTT AAAAAAAAAAAAAAA TTTTTTTTTTTTTTT AAAAAAAAAAAAAAA TTTTTTTTTTTTTTT AA AAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAA – 49% TTTTTTTTTTTTTTTTTTTTTTTTT – 48%AA|TTTTTTTT AAAAAAAAATTTTTTAAAAAAAAAT – 1% AATTTTTTTTTTTTTAATTTTTTTT – 1% TTTTTTTTTT TTTTTTATTTTTTTTTTTTTTATTT – 1% AAAAAAAAAAAAAAA 1.Select a region in a haplotype and find the frequency of variation 2.Group variations into common and rare 3.Find recent point mutations 4.Find recent recombination events

24 Recent Events Recombination  Combine portions of two common haplotypes and see if they form a rare haplotype Common:Possible Recombinations: AAAAAAAAAAAA|TTTTTTTT TTTTTTTTTTAAA|TTTTTTT AAAA|TTTTTT AAAAA|TTTTT AAAAAA|TTTT AAAAAAA|TTT AAAAAAAA|TT 1.Select a region in a haplotype and find the frequency of variation 2.Group variations into common and rare 3.Find recent point mutations 4.Find recent recombination events

25 Rare Mutations  Marked by a “|” at the border between one haplotype and another haplotype Possible Recombinations:Actual Recombinations: AA|TTTTTTTTAA|TTTTTTTT AAA|TTTTTTT AAAA|TTTTTT AAAAA|TTTTT AAAAAA|TTTT AAAAAAA|TTT AAAAAAAA|TT 1.Select a region in a haplotype and find the frequency of variation 2.Group variations into common and rare 3.Find recent point mutations 4.Find recent recombination events

26 Sample input and output chr-haplotypes.txt: new_chr-haplotypes.txt:Indv1 TTTTTTTTTTTTTTTT T T T T T T T T TIndv1 AAAAAAAAATTTTTTA A A A A A A A A T*Indv2 AATTTTTTTTTTTTTA A|T T T T T T T TIndv2 TTTTTTATTTTTTTTT T T T T T A*T T T

27 Visualization Tool

28 Expanding to the Whole Chromosome  Now that we have a way to look for variations in regions of a chromosome, we can expand the technique to look for variations in a whole chromosome  We used a technique of overlapping windows AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA |AAAAAAAAAA|

29 Overlapping Windows Individual’s Frequency of Identify HaplotypesVariationEvents TTTTTTTTTTTTTTTAAAAAAAAA A AAAAAAAAAAAAAAA TTTTTTTTTTTTTTTAAAAAAAAA T * AAAAAAAAAAAAAAA TTTTTTTTTTTTTTT AA AAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAA – 49% TTTTTTTTTTTTTTTTTTTTTTTTT – 48%AA|TTTTTTTT AAAAAAAAATTTTTTAAAAAAAAAT – 1% AATTTTTTTTTTTTTAATTTTTTTT – 1% TTTTTTTTTT TTTTTTATTTTTTTTTTTTTTATTT – 1% AAAAAAAAAAAAAAATTTTTT T TTT AAAAAAAAAAAAAAA TTTTTT A *TTT

30 Overlapping Windows Individual’s Frequency of Identify HaplotypesVariationEvents TTTTTTTTTTTTTTTAAAAAAAAA A AAAAAAAAAAAAAAA TTTTTTTTTTTTTTTAAAAAAAAA T * AAAAAAAAAAAAAAA TTTTTTTTTTTTTTT AAAAAAAAAAAAAAAAAAAAAAAAA – 49% TTTTTTTTTTTTTTTTTTTTTTTTT – 48% AAAAAAAAATTTTTTAAAAAAAAAT – 1% AATTTTTTTTTTTTTAATTTTTTTT – 1% TTTTTTATTTTTTTTTTTTTTATTT – 1% AAAAAAAAAAAAAAA

31 Overlapping  Recombination events that looked like point mutations Common:AAAAAAAAAAAAAAA TTTTTTTTTTTTTTT Rare:AAAAAAAAATTTTTT First 10Slide over 5 and next 10 Common:AAAAAAAAA A Common: AAAA AAAAAA TTTT TTTTTT Rare:AAAAAAAAA T *Rare: AAAA | TTTTTT AAAAAAAAA|T*TTTTT AAAAAAAAA|TTTTTT

32 Applying to a Population’s Chromosome  Now that we have a technique to look for new variations in a whole chromosome  We can apply it to a population and identify regions where recent genetic events took place

33 Identified Recent Genetic Events In chromosome 19: Unique point mutations= 13723 Unique recombination events = 4065 Total unique events = 15697 Total point mutations = 46072 Total recombination events= 11381 Total number of events= 57453 Average point mutations per individual = 383 Average recombination events per individual= 94 Average events per individual = 478

34 Point Mutations Number of Events SNP Position in the Haplotype

35 Recombination Events Haplotype Number of Events SNP Position in the Haplotype

36 Point Mutations and Recombination Events Number of Events Haplotype SNP Position in the Haplotype

37 Conclusion  We have developed an algorithm for identifying recent genetic events in an individual  There were more point mutations identified than there were recombination events  Certain regions in the genome where there were many recent genetic events and there are regions with fewrecent genetic events

38 Future Work  Run the algorithm over the whole genome  Extend the algorithm to multiple populations  Identify recent events that are unique to a population vs. ones that are shared  Identify genetic relations between common haplotypes  Create a chronological order of recent events in an individual  Adapt the algorithm for high-throughput sequencing data

39 UCLA ZarLab  Dr. Eleazar Eskin  All the lab people SoCalBSI  Dr. Jamil Momand  Dr. Sandra Sharp  Dr. Nancy Warter-Perez  Dr. Wendie Johnston  Dr. Beverly Krilowicz  Dr. Silvia Heubach  Dr. Jennifer Faust  Ronnie ChengFunded By:  SoCalBSI 2009 Interns

40 The other ancestors are determined through SNP differences of 2 or more Determining ancestors

41 My Project Red line Point Mutation Blue line Ancestor to common relationship Black dashed line Haplotype resulted from cross over mutation

42 Graph Graph is generated by a program called Graphviz which is a graphical visualization program

43 Graph


Download ppt "Analyzing human population genetic history through the study of genetic variation Mark Mata Mentor: Eleazar Eskin UCLA Zar Lab SoCalBSI 2009."

Similar presentations


Ads by Google