Linkage. Announcements 23andme genotyping. 23andme will genotype in ~3 weeks. You need to deliver finished spit kit by Friday NOON.

Slides:



Advertisements
Similar presentations
Statistical methods for genetic association studies
Advertisements

What is an association study? Define linkage disequilibrium
Review of main points from last week Medical costs escalating largely due to new technology This is an ethical/social problem with major conseq. Many new.
Genetics QUIZ.
Unit 5 Genetics Terry Kotrla, MS, MT(ASCP)BB. Terminology  Genes  Chromosomes  Autosome  Sex chromosome  Locus  Alleles  Homozygous  Heterozygous.
Multiple Comparisons Measures of LD Jess Paulus, ScD January 29, 2013.
Basics of Linkage Analysis
Linkage Genes linked on the same chromosome may segregate together.
Understanding GWAS Chip Design – Linkage Disequilibrium and HapMap Peter Castaldi January 29, 2013.
Published Genome-Wide Associations through ,617 published GWA at p≤5X10 -8 for 249 traits Autism marker Multiple Sclerosis Marker The GWAS Human.
Association Mapping David Evans. Outline Definitions / Terminology What is (genetic) association? How do we test for association? When to use association.
Ingredients for a successful genome-wide association studies: A statistical view Scott Weiss and Christoph Lange Channing Laboratory Pulmonary and Critical.
CS177 Lecture 9 SNPs and Human Genetic Variation Tom Madej
3%20GWASancestry.pptx.
Single nucleotide polymorphisms and applications Usman Roshan BNFO 601.
Integrating domain knowledge with statistical and data mining methods for high-density genomic SNP disease association analysis Dinu et al, J. Biomedical.
Mapping human disease genes with Affymetrix 10K SNP chips and the HMS Orchestra Shared Research Cluster Steve DePalma, Ph.D. Seidman lab, HMS Genetics.
Single nucleotide polymorphisms Usman Roshan. SNPs DNA sequence variations that occur when a single nucleotide is altered. Must be present in at least.
2050 VLSB. Dad phase unknown A1 A2 0.5 (total # meioses) Odds = 1/2[(1-r) n r k ]+ 1/2[(1-r) n r k ]odds ratio What single r value best explains the data?
Thoughts about the TDT. Contribution of TDT: Finding Genes for 3 Complex Diseases PPAR-gamma in Type 2 diabetes Altshuler et al. Nat Genet 26:76-80, 2000.
Genomewide Association Studies.  1. History –Linkage vs. Association –Power/Sample Size  2. Human Genetic Variation: SNPs  3. Direct vs. Indirect Association.
Resolving membership in a study in shared aggregate genetics data David W. Craig, Ph.D. Investigator & Associate Director Neurogenomics Division
Single nucleotide polymorphisms and applications Usman Roshan BNFO 601.
SNPedia The SNPedia website A thank you from SNPedia
Genetic Linkage 1 rs rs Chr. 4 Chr. 12.
Class GWAS Go to genotation.stanford.edu Go to “traits”, then “GWAS” Look up your SNPs Fill out the table Submit information.
Haplotype Discovery and Modeling. Identification of genes Identify the Phenotype MapClone.
Design Considerations in Large- Scale Genetic Association Studies Michael Boehnke, Andrew Skol, Laura Scott, Cristen Willer, Gonçalo Abecasis, Anne Jackson,
Task 1: Privacy Preserving Genomic Data Sharing Presented by Noman Mohammed School of Computer Science McGill University 24 March 2014.
Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.
Single Nucleotide Polymorphisms Mrs. Stewart Medical Interventions Central Magnet School.
11.2 Probability and Punnett Squares
SNPs Daniel Fernandez Alejandro Quiroz Zárate. A SNP is defined as a single base change in a DNA sequence that occurs in a significant proportion (more.
Non-Mendelian Genetics
CS177 Lecture 10 SNPs and Human Genetic Variation
Gene Hunting: Linkage and Association
Genome-Wide Association Study (GWAS)
Lecture 19: Association Studies II Date: 10/29/02  Finish case-control  TDT  Relative Risk.
Eran Halperin November 10, 2009
Finnish Genome Center Monday, 16 November Genotyping & Haplotyping.
Lab 13: Association Genetics December 5, Goals Use Mixed Models and General Linear Models to determine genetic associations. Understand the effect.
POLYMORPHISM AND VARIANT ANALYSIS Saurabh Sinha, University of Illinois.
Probability and Punnett Squares. Tossing Coins If you toss a coin, what is the probability of getting heads? Tails? If you toss a coin 10 times, how many.
Linear Reduction Method for Tag SNPs Selection Jingwu He Alex Zelikovsky.
1 Balanced Translocation detected by FISH. 2 Red- Chrom. 5 probe Green- Chrom. 8 probe.
GenABEL: an R package for Genome Wide Association Analysis
Height Do “height” exercise in Genotation/traits/height Fill out form. Submit SNPs.
Deletions Project Tom Carpel CS CM124 6/11/2008.
Linkage Disequilibrium and Recent Studies of Haplotypes and SNPs
What is Genetics? Genetics is the scientific study of heredity.
Lectures 7 – Oct 19, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
Linkage. Announcements Problem set 1 is available for download. Due April 14. class videos are available from a link on the schedule web page, and at.
1 Genetic Mapping Establishing relative positions of genes along chromosomes using recombination frequencies Enables location of important disease genes.
Linkage. Announcements Problem set 1 is available for download. Due April 14. class videos are available from a link on the schedule web page, and at.
Notes: Human Genome (Right side page)
Genome-Wides Association Studies (GWAS) Veryan Codd.
Increasing Power in Association Studies by using Linkage Disequilibrium Structure and Molecular Function as Prior Information Eleazar Eskin UCLA.
Date of download: 11/12/2016 Copyright © 2016 American Medical Association. All rights reserved. From: Influence of Child Abuse on Adult DepressionModeration.
Power Calculations for GWAS
Of Sea Urchins, Birds and Men
Xiaole Shirley Liu STAT115/STAT215/
upstream vs. ORF binding and gene expression?
Genome Wide Association Studies using SNP
What should be in your SNPedia write-up?
Recombination (Crossing Over)
POLYMORPHISMS & ASSOCIATION TESTS
Single Nucleotide Polymorphisms
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
Exercise: Effect of the IL6R gene on IL-6R concentration
Presentation transcript:

Linkage

Announcements 23andme genotyping. 23andme will genotype in ~3 weeks. You need to deliver finished spit kit by Friday NOON. Problem set 1 is available for download. Due April 17. class videos are available from a link on the schedule web page, and at 6

Personalized Medicine blog Write a 750 word essay on one of the 10 reasons why the human genome matters in medicine. The essay counts as a class project (e.g. instead of a SNPedia write-up) or it can count as extra credit for the course (up to 10%). The essay is due April 11 th.why the human genome matters in medicine Besides course credit, you can also enter your essay into a contest run by 23andme. The contest entry is also due April 11 th. Everyone will receive a free t shirt for entering. Winners will get $100 Amazon gift card and a 23andme kit. Class will get $300 for a class social event.a contest run by 23andme The essay is now posted on the course requirements page for the class. Contact Stuart Kim if you have questions or comments.course requirements Stuart Kim

Terminology Genotype frequency: If the SNPs segregate randomly, you can calculate this by multiplying each of the allele frequencies. Linkage equilibrium: If the SNPs segregate randomly, they are said to be in equilibrium. If they do not segregate randomly, they are in linkage disequilibrium. Haplotype: a set of markers that co-segregate with each other. abcor abcor ABC abcABCABC Phase: refers to whether the alleles are in cis or in trans. abor aB ABAb

Scenario 1 C A G G Chrom 1Chrom 2 First polymorphism Second polymorphism C A G C Chrom 1Chrom 2

Scenario 2

Data 1 rs AA 5 AG20 GG rs CC9 CT18 TT rs ___A alleles___ G alleles ___ total rs ___C alleles___ T alleles ___ total rs ___A freq.___ G freq. rs ___C freq.___ T freq.

What can we say about rs and rs ? rs , rs haplotypeexpectedobserved AACC___0 AACT ___.04 AATT ___.07 AGCC ___ 0 AGCT ___.04 AGTT ___.14 GGCC ___.04 GGCT ___.25 GGTT ___.43

Genetic Linkage 1 rs rs Chr. 4 Chr. 12

Data 2 rs CC12 CG6 GG rs AA12 AG9 GG rs ___C alleles___ G alleles ___ total rs ___A alleles___ G alleles ___ total rs ___C freq.___ G freq. rs ___A freq.___ G freq.

What can we say about rs and rs ? rs , rs haplotypeexpectedobserved CCAA___0 CCAG ___ 0 CCGG ___.33 CGAA ___ 0 CGAG ___.44 CGGG ___ 0 GGAA ___.22 GGAG ___ 0 GGGG ___ 0

Genetic Linkage 2 rs rs Chr kb R 2 =.901

Data 3 rs GG7 GA5 AA rs CC5 CT9 TT rs ___G alleles___ A alleles ___ total rs ___C alleles___ T alleles ___ total rs ___G freq.___ A freq. rs ___C freq.___ T freq.

What can we say about rs and rs ? rs , rs haplotypeexpectedobserved GGCC___.09 GGCT ___.09 GGTT ___.3 GACC ___.17 GACT ___.09 GATT ___.04 AACC ___.13 AACT ___.04 AATT ___.04

Genetic Linkage 3 Chr. 2 Chr. 26 rs rs Ear wax, TT-> dry earwax Lactase, GG -> lactose intolerance

Sequence APOA2 in 72 people Look at patterns of polymorphisms

Find polymorphisms at these positions. Reference sequence is listed.

Sequence of the first chromosome. Circle is same as reference.

slide created by Goncarlo Abecasis

2818 C 2818 T 3027 T.87 T alleles 3027 C.13 C alleles.92 C Allele.08 T allele

2818 C 2818 T 3027 T.87 x.92 = x.08 = T alleles 3027 C.13 x.92 = x.08 = C alleles.92 C Allele.08 T allele Expected haplotype frequencies if unlinked

2818 C 2818 T 3027 T T alleles 3027 C C alleles.92 C Allele.08 T allele Expected if unlinked Observed

R – correlation coefficient P AB – P A P B R = SQR(P A x P a x P B x P b )

Calculate R R =.86 – (.87)(.92) / SQR (.87 *.13 *.92 *.08) =.06 / SQR (7.2 x ) =.06 /.085 =.706

slide created by Goncarlo Abecasis

R 2 = =.497

Haplotype blocks

slide created by Goncarlo Abecasis

Published Genome-Wide Associations through 07/2012 Published GWA at p≤5X10 -8 for 18 trait categories NHGRI GWA Catalog

Genome Wide Association Studies Genotype of SNPxxx GGGGGGGGGGGGGGGGGG GGGGGGGGGGGGGGGGGG GGGGGGGGGGGGGGGGGG AAAAAAAAAAAAAAAAAAAA Genotype of SNPxxx GGGGGGGGGGGGGGGGGG AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA G is risk, A is protective

Colorectal cancer 1057 cases 960 controls 550K SNPs

1027 Colorectal cancer 960 controls Cancer: 0.57G 0.43T controls: 0.49G 0.51T Colorectal cancer data from rs

Cancer: 0.57G 0.43T controls: 0.49G 0.51T Are these different? Chi squared

Chi squared

Chi squared = 31 P values = 10 -7

Stuart’s genotype Homozygous bad allele 

Other models Dominant: Assume G is dominant. GG or GT vs TT GG or GTTT Cases Controls706254

Other models Recessive: Assume G is recessive. GG vs GT or TT GGGT or TT Cases Controls235725

Other models additive: GG > GT > TT Do linear regression 3 genotype x 2 groups

% cancer TT GT GG %cancer =  (genotype) + 

Allelic odds ratio: ratio of the allele ratios in the cases divided by the allele ratios in the controls How different is this SNP in the cases versus the controls? Cancer.57 G/.43 T = 1.32 Control.49 G/.51T = 0.96 Allelic Odds Ratio = 1.32/0.96 = 1.37

Allelic odds ratio*: ratio of the allele ratios in the cases divided by the allele ratio in the entire population (need allele ratio from entire population to do this) How different is this SNP in the cases versus everyone?

Likelihood ratio: What is the likelihood of seeing a genotype given the disease compared to the likelihood of seeing the genotype given no disease? (need data from entire population to do this. We can do this in the class GWAS. For cancer vs controls, the two groups were separate and so we do not know the genotype frequencies of the population as a whole. )

Increased Risk: What is the likelihood of seeing a trait given a genotype compared to overall likelihood of seeing the trait in the population? (need data from entire population to do this. We can do this in the class GWAS. For cancer vs controls, the two groups were separate and so we do not know the genotype frequencies of the population as a whole. )

Multiple hypothesis testing P =.05 means that there is a 5% chance for this to occur randomly. If you try 100 times, you will get about 5 hits. If you try 547,647 times, you should expect 547,647 x.05 = 27,382 hits. So 27,673 (observed) is about the same as one would randomly expect. “Of the 547,647 polymorphic tag SNPs, 27,673 showed an association with disease at P <.05.”

Multiple hypothesis testing Here, have 547,647 SNPs = # hypotheses False discover rate = q = p x # hypotheses. This is called the Bonferroni correction. Want q =.05. This means a positive SNP has a.05 likelihood of rising by chance. At q =.05, p =.05 / 547,647 =.91 x This is the p value cutoff used in the paper. “Of the 547,647 polymorphic tag SNPs, 27,673 showed an association with disease at P <.05.”

Multiple hypothesis testing The Bonferroni correction is too conservative. It assumes that all of the tests are independent. But the SNPs are linked in haplotype blocks, so there really are less independent hypotheses than SNPs. Another way to correct is to permute the data many times, and see how many times a SNP comes up in the permuted data at a particular threshold. “Of the 547,647 polymorphic tag SNPs, 27,673 showed an association with disease at P <.05.”

SNPedia The SNPedia website A thank you from SNPedia Class website for SNPedia List of last years write-ups How to write up a SNPedia entry

SNPedia Summarize the trait Summarize the study How large was the cohort? How strong was the p-value? What was the OR, likelihood ratio or increased risk? Which population? What is known about the SNP? Associated genes? Protein coding? Allele frequency? Does knowledge of the SNP affect diagnosis or treatment?

Class GWAS Go to genotation.stanford.edu Go to “traits”, then “GWAS” Look up your SNPs Fill out the table Submit information