Presentation on theme: "Random Match Probability Statistics From single source to three person mixtures with allelic drop out."— Presentation transcript:
Random Match Probability Statistics From single source to three person mixtures with allelic drop out
Statistics “There are three kinds of lies: lies, damned lies, and statistics.” –Benjamin Disraeli, British Prime Minister as popularized by Mark Twain 18.7% of all statistics are made up My introduction to forensics statistics…. It had been a loooooong time since sophomore genetics
Heterozygote Alleles P and Q Could be PQ or it could be QP So… 2pq – Where p is frequency of P – And q is frequency of Q If p = 0.2 and q = 0.15, then 2(0.2)(.15) = 0.06 Most of us understood this pretty quickly
Homozygote Allele P Above stochastic threshold So… p x p or p 2 But there’s that Θ business Most of us understood this pretty quickly too
Homozygote You don’t use p 2 – But I understood that Use p 2 + p(1-p)Θ – I didn’t understand this Where did Θ come from? “It’s the inbreeding coefficient.”
Homozygote OK, but where did p 2 + p(1-p)Θ come from? “It’s the correction factor for inbreeding.” – Not so helpful – Why isn’t it just p 2 – Θ?
Homozygote We start with what we thought But some percentage is from inbreeding Correct for that amount of inbreeding Combine them p2p2 ΘpΘp (1-Θ) p2p2 +
Homozygote Now it’s algebra Θp + (1 – Θ)p 2 (inbred p + non-inbred p 2 ) Θp + p 2 – Θp 2 (expand the terms) p 2 + Θp – Θp 2 (we like to see p 2 term first) p 2 + p(Θ – Θp) (pull out p) p 2 + p(1 – p)Θ (pull out Θ to get final form)
Single source stat Do the 2pq calculation at each heterozygous locus Do p 2 + p(1 – p)Θ at each homozygous locus Then multiply the results for all loci
Partial single source stat What if you don’t detect everything from a single contributor? Consistent with one contributor, but obvious there is a lot of drop out
Partial single source stat No result Drop out ?? Drop out
With a sample like this, would you 1.Inconclusive data 2.Exclude only 3.Exclude or “inc a person” 4.Exclude/include no stat 5.Exclude/include stat for 2 allele loci 6.Exclude/include for all loci with something detected 7.Other 0 of 30 Countdown 30
Partial single source stat Heterozygous loci still 2pq
Partial single source stat What about loci that you don’t know about?
Partial single source stat Any person that is a 9.3 could be the source How to calculate 9.3, Any?
Partial single source stat The 9.3 could be a homozygote So p 2 + p(1-p)Θ covers that But the 9.3 could be a heterozygote with any other allele So 2pq, but what is q?
Partial single source stat You could go to the ladder – 2(p)(q) – p = 9.3 – q = 4 so 2(f 9.3 )(f 4 ) – q = 5 so 2(f 9.3 )(f 5 ) – q = 6 so 2(f 9.3 )(f 6 ) – ….. – q = 13.3 so 2(f 9.3 )(f 13.3 ) – Then add them up But what about off ladder alleles, microvariants, etc? How do you do 2pq for those?
Partial single source stat Instead – if p is what you see (or detect) Then q must be what you don’t see (or detect) Since this is a binary system – (What you see/detect) + (what you don’t) = 1.0 – (what you don’t see) = 1 – (what you see/detect) So q = (1-p) Therefore 2pq becomes 2p(1-p)
Partial single source stat Now just combine the homozygote and heterozygote options (p = f 9.3 ) [p 2 + p(1-p)Θ] + [2p(1-p)] for anyone with 9.3
Partial single source stat What about loci that look like homozygotes? Use your PHR and stochastic threshold studies – If you treat a locus as a homozygote, you better be above your stochastic threshold – When in doubt, use Allele, Any – you’re covered – At USACIL, Allele, Any = “modified” RMP
Partial single source stat The “2p” rule Section –SWGDAM For single-allele profiles where the zygosity is in question (e.g., it falls below the stochastic threshold): The formula 2p, as described in recommendation 4.1 of NRCII, may be applied to this result Instead of using 2p, the algebraically identical formulae 2p – p 2 and p 2 + 2p(1-p) may be used to address this situation without double-counting the proportion of homozygotes in the population.
Partial single source stat 2p is an extremely conservative approximation There is a better way – 2p-p 2 – p 2 + 2p(1-p) But this is even better – p 2 + p(1-p)Θ + 2p(1-p) – (computers can calculate anything)
Partial single source stat “Algebraically identical formulae” f 9.3 = p –p 2 p 2 + 2p(1-p) 2(0.3054) - (0.3054) 2 (0.3054) 2 + 2(0.3054) ( ) (0.6946)
Partial single source stat So for 9.3, Any – 2p = – 2p-p 2 = – p 2 + 2p(1-p) = – p 2 + p(1-p)Θ + 2p(1-p) =
Minor contributor stat
When the minor is probative, would you 1.Inconclusive data 2.Exclude only 3.Exclude or “inc a person” 4.Exclude/include no stat 5.Exclude/include stat for some allele loci 6.Exclude/include for all loci 7.Other 0 of 30 Countdown 30
Minor contributor stat For our purposes, it is an intimate sample from known female contributor Female is major – Major would have a single source stat – But isn’t probative Focus on the minor (or foreign) contributor
Minor contributor stat Situations you need to be able to calculate – When you know the minor type – When you are concerned about drop out – When you are not concerned about drop out, but you don’t know the minor type (masking/sharing) – When you do not see any minor alleles, but still think the minor contributor is represented We haven’t discussed the last two yet
Minor contributor stat When you know the minor type – 10, 11 2pq 2(f 10 )(f 11 ) – 6, 9.3 2pq 2(f 6 )(f 9.3 )
Minor contributor stat When you are concerned about drop out – 24, Any p 2 + p(1-p)Θ + 2p(1-p) (f 24 ) 2 + (f 24 )(1-(f 24 ))Θ + 2(f 24 ) (1-(f 24 ))
Minor contributor stat When you are not concerned about drop out, but don’t know the minor type What types are possible? – 9, 9 – 8, 9 – 9, 11 “Combo stat”
Minor contributor stat “Combo stat” 9 is above stochastic threshold – 9, 9 – 8, 9 – 9, 11 Add them up p 2 + p(1-p)Θ 2pq 2pr ++ (f 9 ) 2 + (f 9 )(1-(f 9 ))Θ + 2(f 8 ) (f 9 ) + 2(f 9 ) (f 11 )
Minor contributor stat Section SWGDAM When the interpretation is conditioned upon the assumption of a particular number of contributors greater than one, the RMP is the sum of the individual frequencies for the genotypes included following a mixture deconvolution. Examples are provided below In a sperm fraction mixture (at a locus having alleles P, Q, and R) assumed to be from two contributors, one of whom is the victim (having genotype QR), the sperm contributor genotypes included post-deconvolution might be PP, PQ, and PR. In this case, the RMP for the sperm DNA contributor could be calculated as [p 2 + p(1-p)] + 2pq + 2pr.
Minor contributor stat
No minor alleles present, but you know the minor is contributing Every other locus has minor alleles Did the enzyme just get lazy? “Just inc the locus for stats” – That doesn’t make any more sense than throwing out any other locus – You just need the right calculator
Minor contributor stat Two scenarios to consider – No stochastic concerns – Stochastic concerns Two slightly different stats, but can deal with both
Minor contributor stat No stochastic concerns In some cases, PHR and P may help – 17, 17 or possibly 16, 17 – Maybe not 16, 16 But, you know minor must be: – 16, 16 – 16, 17 – 17, 17 q 2 + q(1-q)Θ p 2 + p(1-p)Θ 2pq ++ This is the “combo” stat
Minor contributor stat Couple more definitions: – “Unrestricted” RMP The “combo” stat where we used all possibilities 16,16 and 16,17 and 17,17 from previous slide – “Restricted” RMP The “combo” stat where we chose not to use one (or more) possible types based on what fits peak heights, peak height ratios, or proportions of contributors 17,17 or 16,17 but not 16,16 from previous slide
Minor contributor stat What if stochastic concerns? You would take anyone with – 16, Any – 17, Any But that has the 16, 17 counted twice – Subtract 16, 17 – But only once! (p 2 + p(1-p)Θ) + 2p(1-p) (q 2 + q(1-q)Θ) + 2q(1-q) – 2pq +
Let’s look at this “double any” calculation Simplify by removing Θ This is the basis for dealing with any number of “Allele, Any” contributors USACIL calls this a modified RMP because “Anys” are involved (q 2 + q(1-q)Θ) + 2q(1-q)(p 2 + p(1-p)Θ) + 2p(1-p) +p 2 + 2p(1-p) + q 2 + Modified random match probability – 2pq p 2 + 2p(1-p) +2q(1-q) q 2 + – 2pq
Modified random match probability Let’s say we’ve got a two contributor mixture with signs that both contributors are having stochastic issues. But what you see is consistent with two contributors – Remember “Take a stand on the stand….” – Validation studies, interpretation guidelines, your experience, Tech Review agrees…
We’ll start with this same pattern But stochastic concerns – Homozygote threshold – Mixture interpretation threshold – Stochastic threshold – Drop out threshold Lets just call it the “Danger Zone” – Why do I always think of “Top Gun” when I have low peak heights? Modified random match probability (We’re not suggesting that you MUST do this - only that you can calculate it.)
Modified random match probability Remember the “Allele, Any” – 2pq = 2p(1-p) – 2x(what you do see)x(what you don’t see) – (We used it for a single allele below stochastic threshold for partial or minor contributor) Because we have two contributors: – 16, Any – 17, Any – Or both or
Modified random match probability Also, remember the “combo stat” for the combinations you can see – p 2 + 2pq + q 2 – We’ll rearrange this in a minute
Modified random match probability Allele, Any for p (16) – 2(what you see)(what you don’t) – 2p(1-?) You “see” two alleles now Both p and q (16 and 17) – Stick with “1 – what you see” for what you don’t see – 2p(1-(p+q)) for p (16) Same thing for q (17) – 2q(1-(p+q))
Modified random match probability So, the obvious combinations: – “Combo” for visible The “Allele, Any” combinations: – Allele, Any for the 16 – Allele, Any for the 17 Add them up p 2 + 2pq + q 2 2p(1-(p+q)) 2q(1-(p+q)) ++
Modified random match probability Here is the formula for multiple Allele, Any Now we rearrange that first part – – – That last line should look familiar p 2 + 2pq + q 2 2p(1-(p+q))2q(1-(p+q))++ p 2 + 2pq + q 2 (p + q) x (p + q) (p + q) 2
Modified random match probability Remember back in the good old days? CPI stat – For two alleles – For three alleles – … – For nine alleles (p + q) 2 (p + q + r) 2 (p + q + r + s + t + u + v + w + x) 2 CPI
Modified random match probability Two ways to think about Allele, Any – The way we derived it for that minor contributor – The way that works for as many contributors as we may need They are equivalent – (Remember we dropped Θ for the top one) – (CPI math is the foundation for the bottom one, and doesn’t use Θ) [p 2 + 2p(1-p)] + [q 2 + 2q(1-q)] – 2pq (p + q) 2 + 2p(1-(p+q)) + 2q(1-(p+q))
Modified random match probability Expand this one (“Double” Allele, Any – duplicate) To get Rearrange the terms p 2 + 2p(1-p) + q 2 + 2q(1-q) – 2pq p 2 + 2p – 2p 2 + q 2 + 2q – 2q 2 – 2pq p 2 + q 2 + 2p + 2q – 2pq – 2p 2 – 2q 2
Modified random match probability Now expand the other one (Multiple Allele, Any) To get Rearrange the terms Condense the 2pq terms p 2 + 2pq + q 2 + 2p – 2p 2 – 2pq + 2q – 2q 2 – 2pq (p + q) 2 + 2p(1-(p+q)) + 2q(1-(p+q)) p 2 + q 2 + 2p + 2q + 2pq – 2pq – 2pq – 2p 2 – 2q 2 p 2 + q 2 + 2p + 2q – 2pq – 2p 2 – 2q 2
Modified random match probability Now compare them: This was the “single source” one (2 slides ago) This is the “generic” form for multiple contributors (previous slide) p 2 + q 2 + 2p + 2q – 2pq – 2p 2 – 2q 2
Modified random match probability Section SWGDAM In a mixture having at a locus alleles P, Q, and R, assumed to be from two contributors, where all three alleles are below the stochastic threshold, the interpretation may be that the two contributors could be a heterozygote- homozygote pairing where all alleles were detected, a heterozygote- heterozygote pairing where all alleles were detected, or a heterozygote- heterozygote pairing where a fourth allele might have dropped out. In this case, the RMP must account for all heterozygotes and homozygotes represented by these three alleles, but also all heterozygotes that include one of the detected alleles. The RMP for this interpretation could be calculated as (2p – p 2 ) + (2q – q 2 ) + (2r – r 2 ) – 2pq – 2pr – 2qr Since 2p includes 2pq and 2pr, 2q includes 2pq and 2qr, and 2r includes 2pr and 2rq, the formula in subtracts 2pq, 2pr, and 2qr to avoid double-counting these genotype frequencies.
Modified random match probability To use RMP you must state the number of contributors – Validation studies – Experience – Yadda, yadda Now that we know how to deal with drop out via Allele, Any, we can use RMP more often Modified RMP (modified denotes “Anys”) – This is the language we use at our lab
CPI compared to RMP But CPI is NOT the same as RMP – CPI is used when you are unsure about the number of contributors – Consequently, you have problems when you have alleles in the stochastic range – “Danger Zone” – If you don’t know how many contributors you have, you don’t know how many alleles are missing
CPI compared to RMP But we can use the CPI math in our RMP stat We must make two changes to the “base” CPI formula that we use in the RMP – 1. We must correct for situations that change the number of contributors – 2. We must account for allelic drop out We’ve been through that second, so let’s deal with the first
CPI compared to RMP Consider a four allele pattern We interpret the overall profile as having two contributors. CPI considers all possible “visible” combinations of contributors – (p + q + r + s) 2 – This includes P, P and Q, Q and R, R and S, S types
CPI compared to RMP But if you think you could have a P, P contributor, that leaves three alleles left We stated that there were only 2 contributors – If Contributor #1 is P, P – Contributor #2 cannot account for Q, R and S alleles Having a homozygote changes the assumption of the number of contributors
CPI compared to RMP So all we need to do is subtract the homozygotes – but only when the presence of a homozygote changes the number of contributors – 2 contributors and 4 alleles detected – 3 contributors and 6 alleles detected
CPI compared to RMP Easy to do with a friendly computer – – USACIL defines this as an “Unrestricted” RMP We kind of think of it as a CPI stat corrected for a defined number of contributors (p + q + r + s) 2 – p 2 – q 2 – r 2 – s 2 (p + q + r + s + t + u) 2 – p 2 – q 2 – r 2 – s 2 – t 2 – u 2
Unrestricted RMP Section SWGDAM The unrestricted RMP might be calculated for mixtures that display no indications of allelic dropout. The formulae include an assumption of the number of contributors, but relative peak height information is not utilized. For two-person mixtures, the formulae for loci displaying one, two, or three alleles are identical to the CPI calculation discussed in section 5.3. For loci displaying four alleles (P, Q, R, and S), homozygous genotypes would not typically be included. The unrestricted RMP in this case would require the subtraction for homozygote genotype frequencies, e.g., (p + q + r + s) 2 – p 2 – q 2 – r 2 – s 2.
Modified random match probability Same thing for our “Allele, Any” situation No need to consider an “Allele, Any” if it changes the number of contributors It doesn’t matter how many alleles are below your stochastic threshold – If you say there are 2 contributors and you detect 4 alleles, by definition there are no alleles missing – Similar for 3 contributors and 6 alleles detected
About as bad as it can get 3 contributors All alleles are in the Danger Zone Each allele could be missing it’s sister allele Modified random match probability (p+q+r+s+t) 2 + 2p(1-(p+q+r+s+t)) + 2q(1-(p+q+r+s+t)) + 2r(1-(p+q+r+s+t)) + 2s(1-(p+q+r+s+t)) + 2t(1-(p+q+r+s+t))
Modified random match probability GIANT DISCLAIMER!! We are not saying that you can charge ahead and now use any profile of any number of people with any number of alleles dropping out if you just use a modified RMP calculation Bad data is bad data It’s science, not Voodoo