Forensics and DNA Statistics Harry R Erwin, PhD CIS308 Faculty of Applied Sciences University of Sunderland.

Slides:

Advertisements

Similar presentations

Attaching statistical weight to DNA test results 1.Single source samples 2.Relatives 3.Substructure 4.Error rates 5.Mixtures/allelic drop out 6.Database.

Advertisements

Population Genetics 1 Chapter 23 in Purves 7 th edition, or more detail in Chapter 15 of Genetics by Hartl & Jones (in library) Evolution is a change in.

“Students” t-test.

Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.

Lab 3 : Exact tests and Measuring Genetic Variation.

1 1 Slide STATISTICS FOR BUSINESS AND ECONOMICS Seventh Edition AndersonSweeneyWilliams Slides Prepared by John Loucks © 1999 ITP/South-Western College.

1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 9 Hypothesis Testing Developing Null and Alternative Hypotheses Developing Null and.

Statistical Issues in Research Planning and Evaluation

Fundamentals of Forensic DNA Typing Slides prepared by John M. Butler June 2009 Appendix 3 Probability and Statistics.

Chapter Seventeen HYPOTHESIS TESTING

Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.

Topic 2: Statistical Concepts and Market Returns

Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Basic Business Statistics.

Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 7 th Edition Chapter 9 Hypothesis Testing: Single.

Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides

JS 115- Population Genetics- Assessing the Strength of the Evidence I.Pre class activities a.Quiz b.Review Assignments and Schedules c.Return and Review.

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.

Chapter 9 Hypothesis Testing.

Statistics for Managers Using Microsoft® Excel 5th Edition

Other DNA Issues Harry R Erwin, PhD CIS308 Faculty of Applied Sciences University of Sunderland.

Inferential Statistics

Chapter 12 Inferential Statistics Gay, Mills, and Airasian

Forensic Statistics From the ground up…. Basics Interpretation Hardy-Weinberg equations Random Match Probability Likelihood Ratio Substructure.

Statistics 11 Hypothesis Testing Discover the relationships that exist between events/things Accomplished by: Asking questions Getting answers In accord.

Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides

1 © Lecture note 3 Hypothesis Testing MAKE HYPOTHESIS ©

Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Business Statistics,

Statistical inference: confidence intervals and hypothesis testing.

Chapter 8 Hypothesis testing 1. ▪Along with estimation, hypothesis testing is one of the major fields of statistical inference ▪In estimation, we: –don’t.

CHAPTER 16: Inference in Practice. Chapter 16 Concepts 2  Conditions for Inference in Practice  Cautions About Confidence Intervals  Cautions About.

Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.

Introduction To Biological Research. Step-by-step analysis of biological data The statistical analysis of a biological experiment may be broken down into.

Chapter 8 Introduction to Hypothesis Testing

Hardy-Weinberg equilibrium. Is this a ‘true’ population or a mixture? Is the population size dangerously low? Has migration occurred recently? Is severe.

LECTURE 19 THURSDAY, 14 April STA 291 Spring

Chapter 16 The Chi-Square Statistic

Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.

Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 7-1 Review and Preview.

Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.

Lecture 9 Chap 9-1 Chapter 2b Fundamentals of Hypothesis Testing: One-Sample Tests.

Chapter 20 Testing Hypothesis about proportions

Copyright © 2010 Pearson Education, Inc. Slide

Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Overview.

Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.

Chapter 10 The t Test for Two Independent Samples

© Copyright McGraw-Hill 2004

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.

Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.

Chapter 13 Understanding research results: statistical inference.

BIOL 582 Lecture Set 2 Inferential Statistics, Hypotheses, and Resampling.

Today: Hypothesis testing p-value Example: Paul the Octopus In 2008, Paul the Octopus predicted 8 World Cup games, and predicted them all correctly Is.

Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 8 th Edition Chapter 9 Hypothesis Testing: Single.

Statistical principles: the normal distribution and methods of testing Or, “Explaining the arrangement of things”

Chapter 9 Hypothesis Testing Understanding Basic Statistics Fifth Edition By Brase and Brase Prepared by Jon Booze.

Chapter 9 Introduction to the t Statistic

Chapter Nine Hypothesis Testing.

Module 10 Hypothesis Tests for One Population Mean

Chapter 9 -Hypothesis Testing

Chapter 9 Hypothesis Testing.

Two-Sample Hypothesis Testing

Hypothesis Testing I The One-sample Case

Chapters 20, 21 Hypothesis Testing-- Determining if a Result is Different from Expected.

Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.

Chapter 12 Tests with Qualitative Data

Chapter 25 Comparing Counts.

Chapter 9 Hypothesis Testing.

Chapter Nine Part 1 (Sections 9.1 & 9.2) Hypothesis Testing

Lecture 4: Testing for Departures from Hardy-Weinberg Equilibrium

Analyzing the Association Between Categorical Variables

Chapter 26 Comparing Counts.

Presentation transcript:

Forensics and DNA Statistics Harry R Erwin, PhD CIS308 Faculty of Applied Sciences University of Sunderland

References Goodwin, Linacre, and Hadi (2007) An Introduction to Forensic Genetics, Wiley. Butler (2005) Forensic DNA Typing, 2 nd edition, Elsevier, chapters

Statistics and DNA According to Butler, “Statistical genetic information is often more difficult for DNA analysts to grasp than the technology and biology issues…because of its heavy use of mathematics particularly algebra. The concepts of probabilities can be challenging to forensic scientists schooled in biology rather than mathematics.” The implication is that you may need to provide the necessary expertise. 8(

Lecture Plan Review STR population database analyses (ch. 20) Profile frequency estimates, likelihood ratios, and source attribution (ch. 21) Approaches to statistical analysis of mixtures and degraded DNA (ch. 22) Kinship and parentage testing (ch. 23)

Review: What to Remember Probability – Laws of probability – Likelihood ratios – Bayesian statistics Statistics – Hypothesis testing – Chi-square test – Confidence intervals – Randomization tests

Introduction There are three possible outcomes of a DNA test: 1.No match 2.Inconclusive 3.Match Only a match requires statistics to provide meaning. Which statistics to apply is debatable.

Laws of probability Probability: number of times an event occurs divided by the number of opportunities for it to occur. Three laws of probability to remember 1.Probabilities range between 0.0 and If two events are mutually exclusive the probability of either taking place is the sum of their probabilities. 3.If two events are independent the probability of both occurring is the product of their individual probabilities.

Likelihood ratios A Likelihood Ratio (LR) is the comparison of the probabilities of the evidence under two alternative (mutually exclusive) hypotheses. – The Null Hypothesis, and – The Alternative Hypothesis. These hypotheses should cover all possible cases – LR = Pr(H p )/Pr(H d )

Bayesian statistics Posterior odds = (Likelihood ratio)*(Prior odds) – Pr(H p |E)/Pr(H d |E) = LR*Pr(E|H p )/Pr(E|H d ) Verbal terminology for likelihood ratios Likelihood RatioVerbal Equivalent 1-10Limited support for the prosecution hypothesis Moderate support for the prosecution hypothesis Moderately strong support for the prosecution hypothesis Strong support for the prosecution hypothesis Very strong support for the prosecution hypothesis

Fallacies to avoid Prosecutor’s fallacy Defendant’s fallacy

Statistics Statistics measures uncertainty and reliability. A population is the set of objects of interest. A sample is an observable subset of a population. A statistic is some observable property of a sample.

Hypothesis testing Select appropriate statistical model Choose two alternative hypotheses, H 0 and H 1 Specify the level of significance and its critical value, C Collect data and calculate statistic Check region of rejection for statistic Reject? Accept H 0 Accept H 1 YesNo

A Weakness of Hypothesis Testing— The Base Rate Fallacy In a scientific research area where there is a lot of activity, most interesting uninvestigated hypotheses are false and most published results are accidental. Why? Because the Neyman-Pearson protocol only controls the probability of false alarms. Implication: test DNA against as small a group of possible matches as possible to avoid false alarms.

Chi-square test A goodness-of-fit test. Answers “How close do the observations come to the expected results?” The Χ 2 statistic is parameterised by degrees of freedom, df, and large values indicate there’s a significant deviation from theory.

Confidence intervals Usually the sample mean plus and minus two standard deviations. An observation outside that interval is 95% unlikely. Other confidence intervals can be defined. These are used to help visualise measurements against a population.

Randomization tests These explore whether collecting the data differently would affect the results. Usually starts by treating the collected data as representative of the population, and permuting it, leaving samples out, or randomly ‘resampling’ it multiple times to see the range of descriptive statistics Get a computational statistician involved if these questions come up. The tools are available in R to do these kinds of analyses. Keywords: resampling, bootstrap, jackknife

Principles of Population Genetics Laws of genetics Number of alleles and number of possible genotypes

Populations What is a population? – A group of people sharing common ancestry. – Usually defined broadly Hardy-Weinberg Equilibrium – Within a randomly mating population, the genotype frequencies at any single genetic locus will remain constant. This allows genotype frequencies to be predicted from allele frequencies. (See Punnett Square.) – All human populations deviate (mildly) from HWE and your statistics will require (mild) corrections.

Punnett Square Mother: A p Mother: a q Father: A p AA p 2 aA qp Father: a q Aa pq aa q 2 AAP2P2 Aa2pq aq2q2 Note the following: p + q = 1.0 The fitness of the alleles (A and a) must be equal in the population. This usually is the result of hybrid vigor, where the heterozygote has an advantage over both homozygotes.

Deviations from HWE in Human Populations Finite populations produce random genetic drift—not an issue for populations larger than a small town. Non-random mating is not believed to affect the STR loci. Migration effects disappear over a period of several generations. Natural selection is not believed to affect the STR loci. Mutation rates at ~0.2%/generation are not likely to affect allelic frequencies.

STR Population Database Analyses Chapter 20 of Butler, 2 nd edition. Population DNA databases Statistical tests on DNA databases Practical considerations “As population databases increase in numbers, virtually all populations will show some statistically significant departures from random mating proportions. Although statistically significant, many of the differences will be small enough to be practically unimportant.”

Creating a Population DNA Database Not for amateurs! Need >100 samples per local population group Often uses anonymous samples from a blood bank—watch for sampling effects Analysis—use appropriate STR kits Determine allele frequencies at each locus—note sampling bias issues. (Suppose there was a large local population that never gave blood samples for religious reasons.) Check HWE. Note the potential existence of non-interbreeding populations Exploratory sampling from existing databases suggests an unexpectedly high probability of random match on 9 or more loci.

Decide on Number of Samples and Ethnic/Racial Grouping Gather Samples Get IRB approval Analyze Samples at Desired Genetic Loci Summarize DNA types Ethnic/ Racial Group 1 Ethnic/ Racial Group 2 Determine Allele Frequencies for Each Locus Perform Statistical Tests on Data Hardy-Weinberg equilibrium for allele independence Linkage equilibrium for locus independence Usually >100 per group (see Table 20.1) Use Database(s) to Estimate an Observed DNA Profile Frequency See Chapter 21 Often anonymous samples from a blood bank See Table 20.2 and Appendix II See Chapter 5 (STR kits available) and Chapter 15 (STR typing/interpretation) Examination of genetic distance between populations Figure 20.1, J.M. Butler (2005) Forensic DNA Typing, 2 nd Edition © 2005 Elsevier Academic Press

Statistical tests on DNA databases There are a number of computer programmes available to evaluate the usefulness of a DNA database. Consider using DNATYPE first of all Need to test for independence of alleles at each genetic locus and between loci Unfortunately, independence testing does not validate the product rule 8( Compare to other population data sets Watch for population substructure

Figure 20.2, J.M. Butler (2005) Forensic DNA Typing, 2 nd Edition © 2005 Elsevier Academic Press

Programmes Available DNATYPE PowerStats GDA GENEPOP DNA-VIEW ARLEQUIN PowerMarker PopStats TFPGA

Practical considerations Watch these journals for population data: – For the Record articles in Journal of Forensic Science – Announcements of Population Data in Forensic Science International Understand the numbers reported. Understand why the markers in use have been chosen. Understand what the most common and rarest genotypes are for the DNA markers in use.

* * Figure 20.3, J.M. Butler (2005) Forensic DNA Typing, 2 nd Edition © 2005 Elsevier Academic Press

How Statistical Calculations are Made Generate data with set(s) of samples from desired population group(s) – Generally only samples are needed to obtain reliable allele frequency estimates Determine allele frequencies at each locus – Count number of each allele seen Allele frequency information is used to estimate the rarity of a particular DNA profile – Homozygotes (p 2 ), Heterozygotes (2pq) – Product rule used (multiply locus frequency estimates) For more information, see Chapters 20 and 21 in Forensic DNA Typing, 2 nd Edition

Assumptions with Hardy-Weinberg Equilibrium None of these assumptions are really true… Table 20.6, J.M. Butler (2005) Forensic DNA Typing, 2 nd Edition © 2005 Elsevier Academic Press

Individual Genotypes Are Summarized and Converted into Allele Frequencies The 11,12 genotype was seen 54 times in 302 samples (604 examined chromosomes) Table 20.2, J.M. Butler (2005) Forensic DNA Typing, 2 nd Edition © 2005 Elsevier Academic Press

Allele Frequency Tables Caucasian N= * African American N= * * * * * D3S1358 Butler et al. (2003) JFS 48(4): Allele frequencies denoted with an asterisk (*) are below the 5/2N minimum allele threshold recommended by the National Research Council report (NRCII) The Evaluation of Forensic DNA Evidence published in Most common allele Caucasian N= 7, Einum et al. (2004) JFS 49(6) Allele * African American N= 7, * Allele

Frequency Estimates, Likelihood Ratios, and Source Attribution Chapter 21 of Butler, 2 nd edition. Frequency estimate calculations Likelihood ratio Source attribution Other topics

Frequency Estimate Calculations Work through a frequency estimate calculation. Take a DNA profile and use the allele frequencies in a population database. A random match probability is not the probability that someone is guilty or that someone else left the biological material. Understand how rare alleles and tri-allelic patterns are handled. Understand the product rule Understand the differences between population databases. Understand the impact of population structure Understand the impact of relatives.

Likelihood ratio Practice quantifying the evidentiary value of a match between a reference sample, K, and a questioned sample, Q Explore likelihood ratios.

Source attribution When p x is the random match probability for a profile X, (1-p x ) N is the probability of not observing the particular profile in a sample of N unrelated individuals. When this probability is greater than or equal to a confidence level 1-a, then (1-p x ) N >= 1-a or p x <= 1 – (1-a) 1/N In the American population, a random match probability (RMP) of 3.35 x will confer a 99% confidence that the profile is unique in the population. For the UK, the RMP is 2.01 x

Other topics DNA database searches—multiply the RMP by the number of persons in the database to adjust for the possibility of matching that many people. The more persons in your database, the more likely the match will occur randomly. For lineage markers—use the count of the profile in the database as an estimate of its underlying probability in the population and do a frequency estimate with a confidence interval based on that.

DNA Profile Frequency with all 13 CODIS STR loci Locus allele value allele value1 inCombined D3S VWA FGA D8S ,364 D21S ,073 D18S ,845,217 D5S ,818,259 D13S x 10 9 D7S x D16S x THO x TPOX x CSF1PO x The Random Match Probability for this profile in the U.S. Caucasian population is 1 in 837 trillion (10 12 ) AmpFlSTR ® Identifiler™ (Applied Biosystems) AMEL D3 TH01TPOX D2D19 FGA D21 D18 CSF D16 D7 D13 D5 VWA D8 What would be entered into a DNA database for searching: 16,17- 17,18- 21,22- 12,14- 28,30- 14,16- 12,13- 11,14- 9,9- 9,11- 6,6- 8,8- 10,10 PRODUCTRULEPRODUCTRULE

The Same 13 Locus STR Profile in Different Populations 1 in 0.84 quadrillion (10 15 ) in U.S. Caucasian population (NIST) 1 in 2.46 quadrillion (10 15 ) in U.S. Caucasian population (FBI)* 1 in 1.86 quadrillion (10 15 ) in Canadian Caucasian population* 1 in 16.6 quadrillion (10 15 ) in African American population (NIST) 1 in 17.6 quadrillion (10 15 ) in African American population (FBI)* 1 in 18.0 quadrillion (10 15 ) in U.S. Hispanic population (NIST) * 1 in 837 trillion These values are for unrelated individuals assuming no population substructure (using only p 2 and 2 pq) NIST study: Butler, J.M., et al. (2003) Allele frequencies for 15 autosomal STR loci on U.S. Caucasian, African American, and Hispanic populations. J. Forensic Sci. 48(4): (

STR Cumulative Profile Frequency with Multiple Population Databases to D.N.A. Box 21.1, J.M. Butler (2005) Forensic DNA Typing, 2 nd Edition © 2005 Elsevier Academic Press

Example Calculations with Population Substructure Adjustments

Example Calculations with Corrections for Relatives

Statistical Analysis of Mixtures and Degraded DNA Chapter 22 of Butler, 2 nd edition. Mixture interpretation Partial DNA profiles

Mixture interpretation This is nasty, but any truth is better than indefinite doubt. The most conservative approach is to judge whether the suspect might be represented by the mixture found in the sample. Some times you can pull apart the alleles, one known person at a time. (In radar/sonar, this is known as adaptive beam forming.) Duplicate alleles among the persons in the mixture are then a problem. When contributions of donors are about equal, you have a serious problem.

A1A Possible Alleles D13S317 A2A2 A3A3 Evidence Suspect Included Suspect Excluded Allele 13 is not possible based on evidence showing only alleles 11, 12, and 14 11,1411,1211,1111,13 Figure 22.1, J.M. Butler (2005) Forensic DNA Typing, 2 nd Edition © 2005 Elsevier Academic Press

Exclusion Probabilities Use the combined probability of exclusion. This is an estimate of the proportion of the population that has at least one allele not observed. The combined probability of exclusion assumes independence and multiplies the excluded population proportion at each locus. Vulnerable to non-detection of alleles Provides a conservative estimate

A1A Possible Alleles D13S317 A2A2 A3A3 Evidence Let p = A 1 + A 2 + A 3 (with q = 1-p) PE = 2pq + q 2 AFreq (p) * p = p(A11) + p(A12) + p(A14) = = q = 1-p = = From Table 20.2 PE = 2( )( ) + ( ) 2 = = Figure 22.2, J.M. Butler (2005) Forensic DNA Typing, 2 nd Edition © 2005 Elsevier Academic Press

Likelihood Ratio Set up two competing hypotheses The problem is defining the hypotheses is not straightforward. Uses the evidence better than the exclusion method.

Mixtures Complicated to interpret Basic approach is to identify the alleles from known contributors. Any detected alleles outside that set had to come from unknowns (one or more…) When the mixture results are affected by low- copy number stochastic limits, degradation, or PCR inhibition, so that alleles are missing, all bets are off.

FGA Figure 22.3, J.M. Butler (2005) Forensic DNA Typing, 2 nd Edition © 2005 Elsevier Academic Press

Partial DNA profiles Only loci with results can be interpreted. Degraded samples or low-copy-number samples will cause PCR to fail. Interpret only the detected alleles Any data are better than none at all.

Kinship and Parentage Testing Chapter 23 of Butler, 2 nd edition. When DNA samples being compared are from related individuals, the assumption of independence is violated, and different statistical equations must be applied. Parentage testing – Statistical calculations – Impact of mutational events – Reference samples Reverse parentage testing – Data from both parents is often not available

child Mother (known parent) Alleged father ? Rules of Inheritance 1)Child has two alleles for each autosomal marker (one from mother and one from biological father) 2)Child will have mother’s mitochondrial DNA haplotype (barring mutation) 3)Child, if a son, will have father’s Y- chromosome haplotype (barring mutation) Random man Missing child Alleged mother Alleged father ? Parentage (Paternity) Testing Reverse Parentage Testing (Missing Persons Investigation) (A) (B) Figure 23.1, J.M. Butler (2005) Forensic DNA Typing, 2 nd Edition © 2005 Elsevier Academic Press

11,14 8,12 12,14 11,128,1412,148,11 Obligate paternal allele C,D A,B B,C mother father child (B) Example (A) Mendelian Inheritance Figure 23.2, J.M. Butler (2005) Forensic DNA Typing, 2 nd Edition © 2005 Elsevier Academic Press

Conclusions Unfortunately, you’re likely to be the expert. If you have the opportunity, study this on your own or do a forensics qualification (post-graduate or subject-area) You know where to find help. – Michael Oakes – Peter Dunne – Malcolm Farrow Be honest about your level of skill More statistics won’t hurt you.