Presentation on theme: "Random Match Probability Statistics"— Presentation transcript:
1Random Match Probability Statistics From single source to three person mixtures with allelic drop out
2Statistics“There are three kinds of lies: lies, damned lies, and statistics.” –Benjamin Disraeli, British Prime Minister as popularized by Mark Twain18.7% of all statistics are made upMy introduction to forensics statistics…. It had been a loooooong time since sophomore genetics
3Heterozygote Alleles P and Q Could be PQ or it could be QP So… 2pq Where p is frequency of PAnd q is frequency of QIf p = 0.2 and q = 0.15, then 2(0.2)(.15) = 0.06Most of us understood this pretty quickly
4Homozygote Allele P Above stochastic threshold So… p x p or p2 But there’s that Θ businessMost of us understood this pretty quickly too
5Homozygote You don’t use p2 Use p2 + p(1-p)Θ Where did Θ come from? But I understood thatUse p2 + p(1-p)ΘI didn’t understand thisWhere did Θ come from?“It’s the inbreeding coefficient.”
6Homozygote OK, but where did p2 + p(1-p)Θ come from? “It’s the correction factor for inbreeding.”Not so helpfulWhy isn’t it just p2 – Θ?
7Homozygote We start with what we thought But some percentage is from inbreedingCorrect for that amount of inbreedingCombine themp2Θp(1-Θ)p2+
8Homozygote Now it’s algebra Θp + (1 – Θ)p2 (inbred p + non-inbred p2) Θp + p2 – Θp2 (expand the terms)p2 + Θp – Θp2 (we like to see p2 term first)p2 + p(Θ – Θp) (pull out p)p2 + p(1 – p)Θ (pull out Θ to get final form)
9Single source stat Do the 2pq calculation at each heterozygous locus Do p2 + p(1 – p)Θ at each homozygous locusThen multiply the results for all loci
10Partial single source stat What if you don’t detect everything from a single contributor?Consistent with one contributor, but obvious there is a lot of drop out
11Partial single source stat No resultNo resultDrop out??Drop outDrop outNo result
12With a sample like this, would you Inconclusive dataExclude onlyExclude or “inc a person”Exclude/include no statExclude/include stat for 2 allele lociExclude/include for all loci with something detectedOther0 of 30Countdown30
13Partial single source stat Heterozygous loci still 2pq
14Partial single source stat What about loci that you don’t know about?
15Partial single source stat Any person that is a 9.3 could be the sourceHow to calculate 9.3, Any?
16Partial single source stat The 9.3 could be a homozygoteSo p2 + p(1-p)Θ covers thatBut the 9.3 could be a heterozygote with any other alleleSo 2pq, but what is q?
17Partial single source stat You could go to the ladder2(p)(q)p = 9.3q = so 2(f9.3)(f4)q = so 2(f9.3)(f5)q = so 2(f9.3)(f6)…..q = so 2(f9.3)(f13.3)Then add them upBut what about off ladder alleles,microvariants, etc? How do youdo 2pq for those?
18Partial single source stat Instead – if p is what you see (or detect)Then q must be what you don’t see (or detect)Since this is a binary system(What you see/detect) + (what you don’t) = 1.0(what you don’t see) = 1 – (what you see/detect)So q = (1-p)Therefore 2pq becomes 2p(1-p)
19Partial single source stat Now just combine the homozygote and heterozygote options (p = f9.3)[p2 + p(1-p)Θ] + [2p(1-p)] for anyone with 9.3
20Partial single source stat What about loci that look like homozygotes?Use your PHR and stochastic threshold studiesIf you treat a locus as a homozygote, you better be above your stochastic thresholdWhen in doubt, use Allele, Any – you’re coveredAt USACIL, Allele, Any = “modified” RMP
21Partial single source stat The “2p” ruleSection –SWGDAMFor single-allele profiles where the zygosity is in question (e.g., it falls below the stochastic threshold):The formula 2p, as described in recommendation4.1 of NRCII, may be applied to this result.Instead of using 2p, the algebraically identical formulae 2p – p2 and p2 + 2p(1-p) may be used to address this situation without double-counting the proportion of homozygotes in the population.
22Partial single source stat 2p is an extremely conservative approximationThere is a better way2p-p2p2 + 2p(1-p)But this is even betterp2 + p(1-p)Θ + 2p(1-p)(computers can calculate anything)
23Partial single source stat “Algebraically identical formulae”f9.3 =2p –p p2 + 2p(1-p)2(0.3054) - (0.3054) (0.3054)2 + 2(0.3054) ( )(0.6946)
24Partial single source stat So for 9.3, Any2p =2p-p2 =p2 + 2p(1-p) =p2 + p(1-p)Θ + 2p(1-p) =
26When the minor is probative, would you Inconclusive dataExclude onlyExclude or “inc a person”Exclude/include no statExclude/include stat for some allele lociExclude/include for all lociOther0 of 30Countdown30
27Minor contributor stat For our purposes, it is an intimate sample from known female contributorFemale is majorMajor would have a single source statBut isn’t probativeFocus on the minor (or foreign) contributor
28Minor contributor stat Situations you need to be able to calculateWhen you know the minor typeWhen you are concerned about drop outWhen you are not concerned about drop out, but you don’t know the minor type (masking/sharing)When you do not see any minor alleles, but still think the minor contributor is representedWe haven’t discussed the last two yet
29Minor contributor stat When you know the minor type10, 112pq2(f10)(f11)6, 9.32(f6)(f9.3)
30Minor contributor stat When you are concerned about drop out24, Anyp2 + p(1-p)Θ + 2p(1-p)(f24)2 + (f24)(1-(f24))Θ + 2(f24) (1-(f24))
31Minor contributor stat When you are not concerned about drop out, but don’t know the minor typeWhat types are possible?9, 98, 99, 11“Combo stat”
32Minor contributor stat “Combo stat”9 is above stochastic threshold9, 98, 99, 11Add them upp2 + p(1-p)Θ2pq2pr++(f9)2 + (f9)(1-(f9))Θ + 2(f8) (f9) + 2(f9) (f11)
33Minor contributor stat Section SWGDAMWhen the interpretation is conditioned upon the assumption of a particular number of contributors greater than one, the RMP is the sum of the individual frequencies for the genotypes included following a mixture deconvolution. Examples are provided below.In a sperm fraction mixture (at a locus having alleles P, Q, and R) assumed to be from two contributors, one of whom is the victim (having genotype QR), the sperm contributor genotypes included post-deconvolution might be PP, PQ, and PR. In this case, the RMP for the sperm DNA contributor could be calculated as [p2 + p(1-p)] + 2pq + 2pr.
35Minor contributor stat No minor alleles present, but you know the minor is contributingEvery other locus has minor allelesDid the enzyme just get lazy?“Just inc the locus for stats”That doesn’t make any more sense than throwing out any other locusYou just need the right calculator
36Minor contributor stat Two scenarios to considerNo stochastic concernsStochastic concernsTwo slightly different stats, but can deal with both
37Minor contributor stat No stochastic concernsIn some cases, PHR and P may help17, 17 or possibly 16, 17Maybe not 16, 16But, you know minor must be:16, 1616, 1717, 17p2 + p(1-p)Θ2pqThis is the “combo” statq2 + q(1-q)Θ++
38Minor contributor stat Couple more definitions:“Unrestricted” RMPThe “combo” stat where we used all possibilities16,16 and 16,17 and 17,17 from previous slide“Restricted” RMPThe “combo” stat where we chose not to use one (or more) possible types based on what fits peak heights, peak height ratios, or proportions of contributors17,17 or 16,17 but not 16,16 from previous slide
39Minor contributor stat What if stochastic concerns?You would take anyone with16, Any17, AnyBut that has the 16, 17 counted twiceSubtract 16, 17But only once!(p2 + p(1-p)Θ) + 2p(1-p)(q2 + q(1-q)Θ) + 2q(1-q)– 2pq+
40Modified random match probability Let’s look at this “double any” calculationSimplify by removing ΘThis is the basis for dealing with any number of “Allele, Any” contributorsUSACIL calls this a modified RMP because “Anys” are involved(p2 + p(1-p)Θ) + 2p(1-p) +p2 +2p(1-p) +(q2 + q(1-q)Θ) + 2q(1-q)q2 +2q(1-q)– 2pq– 2pqp2 +2p(1-p) +q2 +2q(1-q)– 2pq
41Modified random match probability Let’s say we’ve got a two contributor mixture with signs that both contributors are having stochastic issues.But what you see is consistent with two contributorsRemember “Take a stand on the stand….”Validation studies, interpretation guidelines, your experience, Tech Review agrees…
42Modified random match probability We’ll start with this same patternBut stochastic concernsHomozygote thresholdMixture interpretation thresholdStochastic thresholdDrop out thresholdLets just call it the “Danger Zone”Why do I always think of “Top Gun” when I have low peak heights?1623017260(We’re not suggestingthat you MUST do this- only that you can calculate it.)
43Modified random match probability Remember the “Allele, Any”2pq = 2p(1-p)2x(what you do see)x(what you don’t see)(We used it for a single allele below stochastic threshold for partial or minor contributor)Because we have two contributors:16, Any17, AnyOr bothor1623017260
44Modified random match probability Also, remember the “combo stat” for the combinations you can seep2 + 2pq + q2We’ll rearrange this in a minute1623017260
45Modified random match probability Allele, Any for p (16)2(what you see)(what you don’t)2p(1-?)You “see” two alleles nowBoth p and q (16 and 17)Stick with “1 – what you see” for what you don’t see2p(1-(p+q)) for p (16)Same thing for q (17)2q(1-(p+q))1623017260
46Modified random match probability So, the obvious combinations:“Combo” for visibleThe “Allele, Any” combinations:Allele, Any for the 16Allele, Any for the 17Add them upp2 + 2pq + q216230172602p(1-(p+q))2q(1-(p+q))++
47Modified random match probability Here is the formula for multiple Allele, AnyNow we rearrange that first partThat last line should look familiarp2 + 2pq + q2+2p(1-(p+q))+2q(1-(p+q))p2 + 2pq + q2(p + q) x (p + q)(p + q)2
48Modified random match probability Remember back in the good old days?CPI statFor two allelesFor three alleles…For nine alleles(p + q)2(p + q + r)2(p + q + r + s + t + u + v + w + x)2CPI
49Modified random match probability Two ways to think about Allele, AnyThe way we derived it for that minor contributorThe way that works for as many contributors as we may needThey are equivalent(Remember we dropped Θ for the top one)(CPI math is the foundation for the bottom one, and doesn’t use Θ)[p2 + 2p(1-p)] + [q2 + 2q(1-q)] – 2pq(p + q)2 + 2p(1-(p+q)) + 2q(1-(p+q))
50Modified random match probability Expand this one (“Double” Allele, Any – duplicate)To getRearrange the termsp2 + 2p(1-p) + q2 + 2q(1-q) – 2pqp2 + 2p – 2p2 + q2 + 2q – 2q2 – 2pqp2 + q2 + 2p + 2q – 2pq – 2p2 – 2q2
51Modified random match probability Now expand the other one (Multiple Allele, Any)To getRearrange the termsCondense the 2pq terms(p + q)2 + 2p(1-(p+q)) + 2q(1-(p+q))p2 + 2pq + q2 + 2p – 2p2 – 2pq + 2q – 2q2 – 2pqp2 + q2 + 2p + 2q + 2pq – 2pq – 2pq – 2p2 – 2q2p2 + q2 + 2p + 2q – 2pq – 2p2 – 2q2
52Modified random match probability Now compare them:This was the “single source” one (2 slides ago)This is the “generic” form for multiple contributors (previous slide)p2 + q2 + 2p + 2q – 2pq – 2p2 – 2q2p2 + q2 + 2p + 2q – 2pq – 2p2 – 2q2
53Modified random match probability Section SWGDAMIn a mixture having at a locus alleles P, Q, and R, assumed to be from two contributors, where all three alleles are below the stochastic threshold, the interpretation may be that the two contributors could be a heterozygote-homozygote pairing where all alleles were detected, a heterozygote-heterozygote pairing where all alleles were detected, or a heterozygote-heterozygote pairing where a fourth allele might have dropped out. In this case, the RMP must account for all heterozygotes and homozygotes represented by these three alleles, but also all heterozygotes that include one of the detected alleles. The RMP for this interpretation could be calculated as (2p – p2) + (2q – q2) + (2r – r2) – 2pq – 2pr – 2qr.Since 2p includes 2pq and 2pr, 2q includes 2pq and 2qr, and 2r includes 2pr and 2rq, the formula in subtracts 2pq, 2pr, and 2qr to avoid double-counting these genotype frequencies.
54Modified random match probability To use RMP you must state the number of contributorsValidation studiesExperienceYadda, yaddaNow that we know how to deal with drop out via Allele, Any, we can use RMP more oftenModified RMP (modified denotes “Anys”)This is the language we use at our lab
55CPI compared to RMP But CPI is NOT the same as RMP CPI is used when you are unsure about the number of contributorsConsequently, you have problems when you have alleles in the stochastic range – “Danger Zone”If you don’t know how many contributors you have, you don’t know how many alleles are missing
56CPI compared to RMP But we can use the CPI math in our RMP stat We must make two changes to the “base” CPI formula that we use in the RMP1. We must correct for situations that change the number of contributors2. We must account for allelic drop outWe’ve been through that second, so let’s deal with the first
57CPI compared to RMP Consider a four allele pattern We interpret the overall profile as having two contributors.CPI considers all possible “visible” combinations of contributors(p + q + r + s)2This includes P, P and Q, Q and R, R and S, S types
58CPI compared to RMPBut if you think you could have a P, P contributor, that leaves three alleles leftWe stated that there were only 2 contributorsIf Contributor #1 is P, PContributor #2 cannot account for Q, R and S allelesHaving a homozygote changes the assumption of the number of contributors
59CPI compared to RMPSo all we need to do is subtract the homozygotes – but only when the presence of a homozygote changes the number of contributors2 contributors and 4 alleles detected3 contributors and 6 alleles detected
60CPI compared to RMP Easy to do with a friendly computer USACIL defines this as an “Unrestricted” RMPWe kind of think of it as a CPI stat corrected for a defined number of contributors(p + q + r + s)2 – p2 – q2 – r2 – s2(p + q + r + s + t + u)2 – p2 – q2 – r2 – s2 – t2 – u2
61Unrestricted RMP Section 188.8.131.52 - SWGDAM The unrestricted RMP might be calculated for mixtures that display no indications of allelic dropout. The formulae include an assumption of the number of contributors, but relative peak height information is not utilized. For two-person mixtures, the formulae for loci displaying one, two, or three alleles are identical to the CPI calculation discussed in section 5.3. For loci displaying four alleles (P, Q, R, and S), homozygous genotypes would not typically be included. The unrestricted RMP in this case would require the subtraction for homozygote genotype frequencies, e.g., (p + q + r + s) 2 – p2 – q2 – r2 – s2.
62Modified random match probability Same thing for our “Allele, Any” situationNo need to consider an “Allele, Any” if it changes the number of contributorsIt doesn’t matter how many alleles are below your stochastic thresholdIf you say there are 2 contributors and you detect 4 alleles, by definition there are no alleles missingSimilar for 3 contributors and 6 alleles detected
63Modified random match probability About as bad as it can get3 contributorsAll alleles are in the Danger ZoneEach allele could be missing it’s sister allele(p+q+r+s+t)2 + 2p(1-(p+q+r+s+t)) + 2q(1-(p+q+r+s+t)) + 2r(1-(p+q+r+s+t)) + 2s(1-(p+q+r+s+t)) + 2t(1-(p+q+r+s+t))
64Modified random match probability GIANT DISCLAIMER!!We are not saying that you can charge ahead and now use any profile of any number of people with any number of alleles dropping out if you just use a modified RMP calculationBad data is bad dataIt’s science, not Voodoo