Analyzing human population genetic history through the study of genetic variation Mark Mata Mentor: Eleazar Eskin UCLA Zar Lab SoCalBSI 2009.

Slides:



Advertisements
Similar presentations
The Human Genome Project Main reference: Nature (2001) 409,
Advertisements

CZ5225 Methods in Computational Biology Lecture 9: Pharmacogenetics and individual variation of drug response CZ5225 Methods in Computational Biology.
Combinatorial Algorithms for Haplotype Inference Pure Parsimony Dan Gusfield.
MALD Mapping by Admixture Linkage Disequilibrium.
Signatures of Selection
CS177 Lecture 9 SNPs and Human Genetic Variation Tom Madej
A Genomic Survey of Polymorphism and Linkage Disequilibrium Imran Mohiuddin Magnus Nordborg, Ph.D. University of Southern California.
Predicting the Function of Single Nucleotide Polymorphisms Corey Harada Advisor: Eleazar Eskin.
Genetica per Scienze Naturali a.a prof S. Presciuttini Human and chimpanzee genomes The human and chimpanzee genomes—with their 5-million-year history.
Applying haplotype models to association study design Natalie Castellana June 7, 2005.
Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College BI820 – Seminar in Quantitative and Computational Problems.
Computational Tools for Finding and Interpreting Genetic Variations Gabor T. Marth Department of Biology, Boston College
The Extraction of Single Nucleotide Polymorphisms and the Use of Current Sequencing Tools Stephen Tetreault Department of Mathematics and Computer Science.
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen, Hungary, May 2006.
Human Migrations Saeed Hassanpour Spring Introduction Population Genetics Co-evolution of genes with language and cultural. Human evolution: genetics,
Genotyping of James Watson’s genome from Low-coverage Sequencing Data Sanjiv Dinakar and Yözen Hernández.
Polymorphisms – SNP, InDel, Transposon BMI/IBGP 730 Victor Jin, Ph.D. (Slides from Dr. Kun Huang) Department of Biomedical Informatics Ohio State University.
Genetic Effects of Stress in Vervet Monkey Olivera Grujic Dr. Eleazar Eskin’s Lab, UCLA Dr. Nelson Freimer’s Lab,UCLA SoCalBSI, 2008.
What DNA (especially the Y Chromosome) Can Tell Us.
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
Haplotype Blocks An Overview A. Polanski Department of Statistics Rice University.
Analyzing DNA Differences PHAR 308 March 2009 Dr. Tim Bloom.
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
Doug Brutlag 2011 Genomics & Medicine Doug Brutlag Professor Emeritus of Biochemistry &
SNPs Daniel Fernandez Alejandro Quiroz Zárate. A SNP is defined as a single base change in a DNA sequence that occurs in a significant proportion (more.
National Taiwan University Department of Computer Science and Information Engineering Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
©Edited by Mingrui Zhang, CS Department, Winona State University, 2008 Identifying Lung Cancer Risks.
Conservation of genomic segments (haplotypes): The “HapMap” n In populations, it appears the the linear order of alleles (“haplotype”) is conserved in.
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
CS177 Lecture 10 SNPs and Human Genetic Variation
SNP Haplotypes as Diagnostic Markers Shrish Tiwari CCMB, Hyderabad.
SNPs and the Human Genome Prof. Sorin Istrail. A SNP is a position in a genome at which two or more different bases occur in the population, each with.
Gene Hunting: Linkage and Association
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Genome-Wide Association Study (GWAS)
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
Finnish Genome Center Monday, 16 November Genotyping & Haplotyping.
Linkage and Mapping. Figure 4-8 For linked genes, recombinant frequencies are less than 50 percent.
Genotype Calling Jackson Pang Digvijay Singh Electrical Engineering, UCLA.
Identification of Copy Number Variants using Genome Graphs
Julia N. Chapman, Alia Kamal, Archith Ramkumar, Owen L. Astrachan Duke University, Genome Revolution Focus, Department of Computer Science Sources
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky.
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
The International Consortium. The International HapMap Project.
Lab Write-up needs to be typed! And follow the lab format on the purple sheet! Title- Variation Lab Purpose (Problem on purple sheet)- To observe a typical.
Deletions Project Tom Carpel CS CM124 6/11/2008.
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
Computational Biology and Genomics at Boston College Biology Gabor T. Marth Department of Biology, Boston College
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College
Analyzing DNA using Microarray and Next Generation Sequencing (1) Background SNP Array Basic design Applications: CNV, LOH, GWAS Deep sequencing Alignment.
Signals of natural selection in the HapMap project data The International HapMap Consortium Gil McVean Department of Statistics, Oxford University.
Notes: Human Genome (Right side page)
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
1 Bioinformatics Tools for Genotyping Frances Tong Dr. Garry Larson, Ph.D City of Hope Department of Molecular Medicine Southern California Bioinformatics.
Single Nucleotide Polymorphisms (SNPs
Lesson: Sequence processing
Common variation, GWAS & PLINK
Gil McVean Department of Statistics
Of Sea Urchins, Birds and Men
Population Genetics As we all have an interest in genomic epidemiology we are likely all either in the process of sampling and ananlysising genetic data.
Bellwork: What is the human genome project. What was its purpose
Genetic Variation Within Populations
Linking Genetic Variation to Important Phenotypes
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Unit Genomic sequencing
Haplotypes When the presence of two or more polymorphisms on a single chromosome is statistically correlated in a population, this is a haplotype Example.
SNPs and CNPs By: David Wendel.
Haplotype Inference Yao-Ting Huang Kun-Mao Chao.
Presentation transcript:

Analyzing human population genetic history through the study of genetic variation Mark Mata Mentor: Eleazar Eskin UCLA Zar Lab SoCalBSI 2009

Background  To study human population genetic history is to study parts of human evolution  Human evolution is one of the fundamental questions in science  We ask ourselves many questions like:  Where do we come from?  Why are we all different?  How are we all different?

Background  The ZarLab does studies with the most recent events in human evolution:  Now that we have modern humans, what variations have occurred in our genes since our ancient African ancestors  To answer this question our group is looking at human variation to produce a genetic history of these changes

Why do we care?  Many diseases are caused by variations that have occurred in our genetic history  Better understanding of our genetic history and human variation may eventually lead to better treatment plans  Personalized medicine:  “The right drug, in the right dose, to the right person, at the right time.” PerkinElmer website:

Human Variation  Modern humans share 99.9% of our DNA  0.1% account for variations between humans  Of this, 80% of the variation are the result of SNPs  SNP (single-nucleotide polymorphism) – position in the genome where there are two different bases present in the population. The base at a SNP on a chromosome is referred to as the “allele”  A haplotype is the sequence of alleles on a genome  The other 20% are from deletions or insertions on the genome PerkinElmer website:

Human Variation  We are studying the 80% of the variations that come in the form of SNPs  These SNPs are compiled into a list of SNPs which are called haplotypes  Deletions and insertions are “ignored” because of the limitations of microarrays from which the data is generated

International HapMap Project  Study done by the International HapMap Consortium  “…create a public, genome-wide database of common human sequence variation…”  Identified SNPs and compiled the SNP alleles into a database of haplotypes for four different populations (Phase 1)  Population used were a group of 60 Mormons in Utah  Have been widely studied in the past  Western and Northern European descent  Have very detailed records  Used their chromosome 19 “A haplotype map of the human genome” by: The International HapMap Consortium. Nature. Published 27 October 2005

My Project Goals  Reconstruct human genetic history  This is a very difficult problem  Sub-problem: Identify recent genetic events  Make the assumption that these new genetic events are rare or very few in number  Easier to classify and identify relationships when compared to older more common haplotypes  These new events are important because they identify shared recent ancestry  Disease causing variations could be from recent events

Identifying Recent Genetic Events 1.Select a region in a haplotype and find the frequency of variation 2.Group variations into common and rare 3.Find recent point mutations 4.Find recent recombinations

Workflow Individual’s Frequency of Identify HaplotypesVariationEvents TTTTTTTTTTTTTTTAAAAAAAAA A AAAAAAAAAAAAAAA TTTTTTTTTTTTTTTCommonAAAAAAAAA T * AAAAAAAAAAAAAAAAAAAAAAAAA – 49% TTTTTTTTTTTTTTTTTTTTTTTTT – 48% AA AAAAAAAA AAAAAAAAAAAAAAA TTTTTTTTTTTTTTTRareAA|TTTTTTTT AAAAAAAAATTTTTTAAAAAAAAAT – 1% AATTTTTTTTTTTTTAATTTTTTTT – 1% TTTTTTTTTT TTTTTTATTTTTTTTTTTTTTATTT – 1% AAAAAAAAAAAAAAATTTTTT T TTT AAAAAAAAAAAAAAA TTTTTT A *TTT 1.Select a region in a haplotype and find the frequency of variation 2.Group variations into common and rare 3.Find recent point mutations 4.Find recent recombination events

Choosing a region size  Need to pick a region size that will be large enough to pick up a lot of different variations but small enough to see what caused the variations  Through numerous tests, selecting a region of 20 nucleotides and using progressively smaller regions, it was determined that a region size of 10 nucleotides was the best 1.Select a region in a haplotype and find the frequency of variation 2.Group variations into common and rare 3.Find recent point mutations 4.Find recent recombinations

1.Select a region in a haplotype and find the frequency of variation 2.Group variations into common and rare 3.Find recent point mutations 4.Find recent recombinations Region Size 20

Region Size 10 1.Select a region in a haplotype and find the frequency of variation 2.Group variations into common and rare 3.Find recent point mutations 4.Find recent recombination events

Frequency of Variation Individual’s RegionHow Many Haplotype TTTTTTTTTTTTTTTTTTTTTTTTT AAAAAAAAAAAAAAAAAAAAAAAAA TTTTTTTTTTTTTTTTTTTTTTTTT AAAAAAAAAAAAAAAAAAAAAAAAA TTTTTTTTTTTTTTTTTTTTTTTTT AAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAA - 59 TTTTTTTTTTTTTTTTTTTTTTTTT TTTTTTTTTT - 58 AAAAAAAAATTTTTTAAAAAAAAAT AAAAAAAAAT - 1 AATTTTTTTTTTTTTAATTTTTTTT AATTTTTTTT - 1 TTTTTTATTTTTTTTTTTTTTATTT TTTTTTATTT - 1 AAAAAAAAAAAAAAAAAAAAAAAAA 1.Select a region in a haplotype and find the frequency of variation 2.Group variations into common and rare 3.Find recent point mutations 4.Find recent recombination events

Frequency of Variation Individual’s How ManyFrequency of HaplotypeVariation TTTTTTTTTT|TTTTT AAAAAAAAAA|AAAAA TTTTTTTTTT|TTTTT AAAAAAAAAA|AAAAA TTTTTTTTTT|TTTTT AAAAAAAAAA|AAAAAAAAAAAAAAA – 59/120~49% TTTTTTTTTT|TTTTTTTTTTTTTTT – 58/120~48% AAAAAAAAAT|TTTTTAAAAAAAAAT – 1/120~1% AATTTTTTTT|TTTTTAATTTTTTTT – 1/120~1% TTTTTTATTT|TTTTTTTTTTTATTT – 1/120~1% AAAAAAAAAA|AAAAA 1.Select a region in a haplotype and find the frequency of variation 2.Group variations into common and rare 3.Find recent point mutations 4.Find recent recombination events

Grouping Variations Classified as either common or rare haplotypes  Make the assumption that new genetic events are rare or very few in number  A cut off rate of 5% frequency or higher was used to separate common subsequences from rare subsequences  5% was a number that came from the International HapMap Consortium study “A haplotype map of the human genome” by: The International HapMap Consortium. Nature. Published 27 October Select a region in a haplotype and find the frequency of variation 2.Group variations into common and rare 3.Find recent point mutations 4.Find recent recombination events

Grouping Variations Individual’s Frequency ofGroup GenesVariation TTTTTTTTTT|TTTTT AAAAAAAAAA|AAAAA TTTTTTTTTT|TTTTT AAAAAAAAAA|AAAAACommon: TTTTTTTTTT|TTTTTAAAAAAAAAA AAAAAAAAAA|AAAAAAAAAAAAAAA – 49%TTTTTTTTTT TTTTTTTTTT|TTTTTTTTTTTTTTT – 48% AAAAAAAAAT|TTTTTAAAAAAAAAT – 1%Rare: AATTTTTTTT|TTTTTAATTTTTTTT – 1%AAAAAAAAAT TTTTTTATTT|TTTTTTTTTTTATTT – 1%AATTTTTTTT AAAAAAAAAA|AAAAATTTTTTATTT AAAAAAAAAA|AAAAA 1.Select a region in a haplotype and find the frequency of variation 2.Group variations into common and rare 3.Find recent point mutations 4.Find recent recombination events

Recent Events  Make comparisons to identify two forms of variation:  Point mutations  Recombination events Common:Rare: AAAAAAAAAAAAAAAAAAAT TTTTTTTTTTAATTTTTTTT TTTTTTATTT 1.Select a region in a haplotype and find the frequency of variation 2.Group variations into common and rare 3.Find recent point mutations 4.Find recent recombination events

Point Mutations Individual’s Frequency of Identify HaplotypesVariationEvents TTTTTTTTTTTTTTTAAAAAAAAA A AAAAAAAAAAAAAAA TTTTTTTTTTTTTTTAAAAAAAAA T * AAAAAAAAAAAAAAA TTTTTTTTTTTTTTT AA AAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAA – 49% TTTTTTTTTTTTTTTTTTTTTTTTT – 48%AA|TTTTTTTT AAAAAAAAATTTTTTAAAAAAAAAT – 1% AATTTTTTTTTTTTTAATTTTTTTT – 1% TTTTTTTTTT TTTTTTATTTTTTTTTTTTTTATTT – 1% AAAAAAAAAAAAAAATTTTTT T TTT AAAAAAAAAAAAAAA TTTTTT A *TTT 1.Select a region in a haplotype and find the frequency of variation 2.Group variations into common and rare 3.Find recent point mutations 4.Find recent recombination events

Point Mutations Individual’s Frequency of Identify HaplotypesVariationEvents TTTTTTTTTTTTTTT AAAAAAAAAAAAAAA TTTTTTTTTTTTTTT AAAAAAAAAAAAAAA TTTTTTTTTTTTTTT AAAAAAAAAAAAAAAAAAAAAAAAA – 49% TTTTTTTTTTTTTTTTTTTTTTTTT – 48% AAAAAAAAATTTTTTAAAAAAAAAT – 1% AATTTTTTTTTTTTTAATTTTTTTT – 1% TTTTTTATTTTTTTTTTTTTTATTT – 1% AAAAAAAAAAAAAAATTTTTT T TTT AAAAAAAAAAAAAAA TTTTTT A *TTT 1.Select a region in a haplotype and find the frequency of variation 2.Group variations into common and rare 3.Find recent point mutations 4.Find recent recombination events

Recent Events Point mutations  Are found by comparing a common haplotype and with a rare haplotype  A difference of one shows that a rare haplotype is a point mutation of a common haplotype  Marked by a “*” next to the point mutation Common: TTTTTTTTTT TTTTTTA*TTT Rare:TTTTTTATTT 1.Select a region in a haplotype and find the frequency of variation 2.Group variations into common and rare 3.Find recent point mutations 4.Find recent recombination events

Recombination Individual’s Frequency of Identify HaplotypesVariationEvents TTTTTTTTTTTTTTTAAAAAAAAA A AAAAAAAAAAAAAAA TTTTTTTTTTTTTTTAAAAAAAAA T * AAAAAAAAAAAAAAA TTTTTTTTTTTTTTT AA AAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAA – 49% TTTTTTTTTTTTTTTTTTTTTTTTT – 48%AA|TTTTTTTT AAAAAAAAATTTTTTAAAAAAAAAT – 1% AATTTTTTTTTTTTTAATTTTTTTT – 1% TTTTTTTTTT TTTTTTATTTTTTTTTTTTTTATTT – 1% AAAAAAAAAAAAAAATTTTTT T TTT AAAAAAAAAAAAAAA TTTTTT A *TTT 1.Select a region in a haplotype and find the frequency of variation 2.Group variations into common and rare 3.Find recent point mutations 4.Find recent recombination events

Recombination Individual’s Frequency of Identify HaplotypesVariationEvents TTTTTTTTTTTTTTT AAAAAAAAAAAAAAA TTTTTTTTTTTTTTT AAAAAAAAAAAAAAA TTTTTTTTTTTTTTT AA AAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAA – 49% TTTTTTTTTTTTTTTTTTTTTTTTT – 48%AA|TTTTTTTT AAAAAAAAATTTTTTAAAAAAAAAT – 1% AATTTTTTTTTTTTTAATTTTTTTT – 1% TTTTTTTTTT TTTTTTATTTTTTTTTTTTTTATTT – 1% AAAAAAAAAAAAAAA 1.Select a region in a haplotype and find the frequency of variation 2.Group variations into common and rare 3.Find recent point mutations 4.Find recent recombination events

Recent Events Recombination  Combine portions of two common haplotypes and see if they form a rare haplotype Common:Possible Recombinations: AAAAAAAAAAAA|TTTTTTTT TTTTTTTTTTAAA|TTTTTTT AAAA|TTTTTT AAAAA|TTTTT AAAAAA|TTTT AAAAAAA|TTT AAAAAAAA|TT 1.Select a region in a haplotype and find the frequency of variation 2.Group variations into common and rare 3.Find recent point mutations 4.Find recent recombination events

Rare Mutations  Marked by a “|” at the border between one haplotype and another haplotype Possible Recombinations:Actual Recombinations: AA|TTTTTTTTAA|TTTTTTTT AAA|TTTTTTT AAAA|TTTTTT AAAAA|TTTTT AAAAAA|TTTT AAAAAAA|TTT AAAAAAAA|TT 1.Select a region in a haplotype and find the frequency of variation 2.Group variations into common and rare 3.Find recent point mutations 4.Find recent recombination events

Sample input and output chr-haplotypes.txt: new_chr-haplotypes.txt:Indv1 TTTTTTTTTTTTTTTT T T T T T T T T TIndv1 AAAAAAAAATTTTTTA A A A A A A A A T*Indv2 AATTTTTTTTTTTTTA A|T T T T T T T TIndv2 TTTTTTATTTTTTTTT T T T T T A*T T T

Visualization Tool

Expanding to the Whole Chromosome  Now that we have a way to look for variations in regions of a chromosome, we can expand the technique to look for variations in a whole chromosome  We used a technique of overlapping windows AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA |AAAAAAAAAA|

Overlapping Windows Individual’s Frequency of Identify HaplotypesVariationEvents TTTTTTTTTTTTTTTAAAAAAAAA A AAAAAAAAAAAAAAA TTTTTTTTTTTTTTTAAAAAAAAA T * AAAAAAAAAAAAAAA TTTTTTTTTTTTTTT AA AAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAA – 49% TTTTTTTTTTTTTTTTTTTTTTTTT – 48%AA|TTTTTTTT AAAAAAAAATTTTTTAAAAAAAAAT – 1% AATTTTTTTTTTTTTAATTTTTTTT – 1% TTTTTTTTTT TTTTTTATTTTTTTTTTTTTTATTT – 1% AAAAAAAAAAAAAAATTTTTT T TTT AAAAAAAAAAAAAAA TTTTTT A *TTT

Overlapping Windows Individual’s Frequency of Identify HaplotypesVariationEvents TTTTTTTTTTTTTTTAAAAAAAAA A AAAAAAAAAAAAAAA TTTTTTTTTTTTTTTAAAAAAAAA T * AAAAAAAAAAAAAAA TTTTTTTTTTTTTTT AAAAAAAAAAAAAAAAAAAAAAAAA – 49% TTTTTTTTTTTTTTTTTTTTTTTTT – 48% AAAAAAAAATTTTTTAAAAAAAAAT – 1% AATTTTTTTTTTTTTAATTTTTTTT – 1% TTTTTTATTTTTTTTTTTTTTATTT – 1% AAAAAAAAAAAAAAA

Overlapping  Recombination events that looked like point mutations Common:AAAAAAAAAAAAAAA TTTTTTTTTTTTTTT Rare:AAAAAAAAATTTTTT First 10Slide over 5 and next 10 Common:AAAAAAAAA A Common: AAAA AAAAAA TTTT TTTTTT Rare:AAAAAAAAA T *Rare: AAAA | TTTTTT AAAAAAAAA|T*TTTTT AAAAAAAAA|TTTTTT

Applying to a Population’s Chromosome  Now that we have a technique to look for new variations in a whole chromosome  We can apply it to a population and identify regions where recent genetic events took place

Identified Recent Genetic Events In chromosome 19: Unique point mutations= Unique recombination events = 4065 Total unique events = Total point mutations = Total recombination events= Total number of events= Average point mutations per individual = 383 Average recombination events per individual= 94 Average events per individual = 478

Point Mutations Number of Events SNP Position in the Haplotype

Recombination Events Haplotype Number of Events SNP Position in the Haplotype

Point Mutations and Recombination Events Number of Events Haplotype SNP Position in the Haplotype

Conclusion  We have developed an algorithm for identifying recent genetic events in an individual  There were more point mutations identified than there were recombination events  Certain regions in the genome where there were many recent genetic events and there are regions with fewrecent genetic events

Future Work  Run the algorithm over the whole genome  Extend the algorithm to multiple populations  Identify recent events that are unique to a population vs. ones that are shared  Identify genetic relations between common haplotypes  Create a chronological order of recent events in an individual  Adapt the algorithm for high-throughput sequencing data

UCLA ZarLab  Dr. Eleazar Eskin  All the lab people SoCalBSI  Dr. Jamil Momand  Dr. Sandra Sharp  Dr. Nancy Warter-Perez  Dr. Wendie Johnston  Dr. Beverly Krilowicz  Dr. Silvia Heubach  Dr. Jennifer Faust  Ronnie ChengFunded By:  SoCalBSI 2009 Interns

The other ancestors are determined through SNP differences of 2 or more Determining ancestors

My Project Red line Point Mutation Blue line Ancestor to common relationship Black dashed line Haplotype resulted from cross over mutation

Graph Graph is generated by a program called Graphviz which is a graphical visualization program

Graph