# Random Match Probability Statistics

## Presentation on theme: "Random Match Probability Statistics"— Presentation transcript:

Random Match Probability Statistics
From single source to three person mixtures with allelic drop out

Statistics “There are three kinds of lies: lies, damned lies, and statistics.” –Benjamin Disraeli, British Prime Minister as popularized by Mark Twain 18.7% of all statistics are made up My introduction to forensics statistics…. It had been a loooooong time since sophomore genetics

Heterozygote Alleles P and Q Could be PQ or it could be QP So… 2pq
Where p is frequency of P And q is frequency of Q If p = 0.2 and q = 0.15, then 2(0.2)(.15) = 0.06 Most of us understood this pretty quickly

Homozygote Allele P Above stochastic threshold So… p x p or p2
But there’s that Θ business Most of us understood this pretty quickly too

Homozygote You don’t use p2 Use p2 + p(1-p)Θ Where did Θ come from?
But I understood that Use p2 + p(1-p)Θ I didn’t understand this Where did Θ come from? “It’s the inbreeding coefficient.”

Homozygote OK, but where did p2 + p(1-p)Θ come from?
“It’s the correction factor for inbreeding.” Not so helpful Why isn’t it just p2 – Θ?

But some percentage is from inbreeding Correct for that amount of inbreeding Combine them p2 Θp (1-Θ) p2 +

Homozygote Now it’s algebra Θp + (1 – Θ)p2 (inbred p + non-inbred p2)
Θp + p2 – Θp2 (expand the terms) p2 + Θp – Θp2 (we like to see p2 term first) p2 + p(Θ – Θp) (pull out p) p2 + p(1 – p)Θ (pull out Θ to get final form)

Single source stat Do the 2pq calculation at each heterozygous locus
Do p2 + p(1 – p)Θ at each homozygous locus Then multiply the results for all loci

Partial single source stat
What if you don’t detect everything from a single contributor? Consistent with one contributor, but obvious there is a lot of drop out

Partial single source stat
No result No result Drop out ?? Drop out Drop out No result

With a sample like this, would you
Inconclusive data Exclude only Exclude or “inc a person” Exclude/include no stat Exclude/include stat for 2 allele loci Exclude/include for all loci with something detected Other 0 of 30 Countdown 30

Partial single source stat
Heterozygous loci still 2pq

Partial single source stat

Partial single source stat
Any person that is a 9.3 could be the source How to calculate 9.3, Any?

Partial single source stat
The 9.3 could be a homozygote So p2 + p(1-p)Θ covers that But the 9.3 could be a heterozygote with any other allele So 2pq, but what is q?

Partial single source stat
You could go to the ladder 2(p)(q) p = 9.3 q = so 2(f9.3)(f4) q = so 2(f9.3)(f5) q = so 2(f9.3)(f6) ….. q = so 2(f9.3)(f13.3) Then add them up But what about off ladder alleles, microvariants, etc? How do you do 2pq for those?

Partial single source stat
Instead – if p is what you see (or detect) Then q must be what you don’t see (or detect) Since this is a binary system (What you see/detect) + (what you don’t) = 1.0 (what you don’t see) = 1 – (what you see/detect) So q = (1-p) Therefore 2pq becomes 2p(1-p)

Partial single source stat
Now just combine the homozygote and heterozygote options (p = f9.3) [p2 + p(1-p)Θ] + [2p(1-p)] for anyone with 9.3

Partial single source stat
What about loci that look like homozygotes? Use your PHR and stochastic threshold studies If you treat a locus as a homozygote, you better be above your stochastic threshold When in doubt, use Allele, Any – you’re covered At USACIL, Allele, Any = “modified” RMP

Partial single source stat
The “2p” rule Section –SWGDAM For single-allele profiles where the zygosity is in question (e.g., it falls below the stochastic threshold): The formula 2p, as described in recommendation 4.1 of NRCII, may be applied to this result. Instead of using 2p, the algebraically identical formulae 2p – p2 and p2 + 2p(1-p) may be used to address this situation without double-counting the proportion of homozygotes in the population.

Partial single source stat
2p is an extremely conservative approximation There is a better way 2p-p2 p2 + 2p(1-p) But this is even better p2 + p(1-p)Θ + 2p(1-p) (computers can calculate anything)

Partial single source stat
“Algebraically identical formulae” f9.3 = 2p –p p2 + 2p(1-p) 2(0.3054) - (0.3054) (0.3054)2 + 2(0.3054) ( ) (0.6946)

Partial single source stat
So for 9.3, Any 2p = 2p-p2 = p2 + 2p(1-p) = p2 + p(1-p)Θ + 2p(1-p) =

Minor contributor stat

When the minor is probative, would you
Inconclusive data Exclude only Exclude or “inc a person” Exclude/include no stat Exclude/include stat for some allele loci Exclude/include for all loci Other 0 of 30 Countdown 30

Minor contributor stat
For our purposes, it is an intimate sample from known female contributor Female is major Major would have a single source stat But isn’t probative Focus on the minor (or foreign) contributor

Minor contributor stat
Situations you need to be able to calculate When you know the minor type When you are concerned about drop out When you are not concerned about drop out, but you don’t know the minor type (masking/sharing) When you do not see any minor alleles, but still think the minor contributor is represented We haven’t discussed the last two yet

Minor contributor stat
When you know the minor type 10, 11 2pq 2(f10)(f11) 6, 9.3 2(f6)(f9.3)

Minor contributor stat
When you are concerned about drop out 24, Any p2 + p(1-p)Θ + 2p(1-p) (f24)2 + (f24)(1-(f24))Θ + 2(f24) (1-(f24))

Minor contributor stat
When you are not concerned about drop out, but don’t know the minor type What types are possible? 9, 9 8, 9 9, 11 “Combo stat”

Minor contributor stat
“Combo stat” 9 is above stochastic threshold 9, 9 8, 9 9, 11 Add them up p2 + p(1-p)Θ 2pq 2pr + + (f9)2 + (f9)(1-(f9))Θ + 2(f8) (f9) + 2(f9) (f11)

Minor contributor stat
Section SWGDAM When the interpretation is conditioned upon the assumption of a particular number of contributors greater than one, the RMP is the sum of the individual frequencies for the genotypes included following a mixture deconvolution. Examples are provided below. In a sperm fraction mixture (at a locus having alleles P, Q, and R) assumed to be from two contributors, one of whom is the victim (having genotype QR), the sperm contributor genotypes included post-deconvolution might be PP, PQ, and PR. In this case, the RMP for the sperm DNA contributor could be calculated as [p2 + p(1-p)] + 2pq + 2pr.

Minor contributor stat

Minor contributor stat
No minor alleles present, but you know the minor is contributing Every other locus has minor alleles Did the enzyme just get lazy? “Just inc the locus for stats” That doesn’t make any more sense than throwing out any other locus You just need the right calculator

Minor contributor stat
Two scenarios to consider No stochastic concerns Stochastic concerns Two slightly different stats, but can deal with both

Minor contributor stat
No stochastic concerns In some cases, PHR and P may help 17, 17 or possibly 16, 17 Maybe not 16, 16 But, you know minor must be: 16, 16 16, 17 17, 17 p2 + p(1-p)Θ 2pq This is the “combo” stat q2 + q(1-q)Θ + +

Minor contributor stat
Couple more definitions: “Unrestricted” RMP The “combo” stat where we used all possibilities 16,16 and 16,17 and 17,17 from previous slide “Restricted” RMP The “combo” stat where we chose not to use one (or more) possible types based on what fits peak heights, peak height ratios, or proportions of contributors 17,17 or 16,17 but not 16,16 from previous slide

Minor contributor stat
What if stochastic concerns? You would take anyone with 16, Any 17, Any But that has the 16, 17 counted twice Subtract 16, 17 But only once! (p2 + p(1-p)Θ) + 2p(1-p) (q2 + q(1-q)Θ) + 2q(1-q) – 2pq +

Modified random match probability
Let’s look at this “double any” calculation Simplify by removing Θ This is the basis for dealing with any number of “Allele, Any” contributors USACIL calls this a modified RMP because “Anys” are involved (p2 + p(1-p)Θ) + 2p(1-p) + p2 + 2p(1-p) + (q2 + q(1-q)Θ) + 2q(1-q) q2 + 2q(1-q) – 2pq – 2pq p2 + 2p(1-p) + q2 + 2q(1-q) – 2pq

Modified random match probability
Let’s say we’ve got a two contributor mixture with signs that both contributors are having stochastic issues. But what you see is consistent with two contributors Remember “Take a stand on the stand….” Validation studies, interpretation guidelines, your experience, Tech Review agrees…

Modified random match probability
We’ll start with this same pattern But stochastic concerns Homozygote threshold Mixture interpretation threshold Stochastic threshold Drop out threshold Lets just call it the “Danger Zone” Why do I always think of “Top Gun” when I have low peak heights? 16 230 17 260 (We’re not suggesting that you MUST do this - only that you can calculate it.)

Modified random match probability
Remember the “Allele, Any” 2pq = 2p(1-p) 2x(what you do see)x(what you don’t see) (We used it for a single allele below stochastic threshold for partial or minor contributor) Because we have two contributors: 16, Any 17, Any Or both or 16 230 17 260

Modified random match probability
Also, remember the “combo stat” for the combinations you can see p2 + 2pq + q2 We’ll rearrange this in a minute 16 230 17 260

Modified random match probability
Allele, Any for p (16) 2(what you see)(what you don’t) 2p(1-?) You “see” two alleles now Both p and q (16 and 17) Stick with “1 – what you see” for what you don’t see 2p(1-(p+q)) for p (16) Same thing for q (17) 2q(1-(p+q)) 16 230 17 260

Modified random match probability
So, the obvious combinations: “Combo” for visible The “Allele, Any” combinations: Allele, Any for the 16 Allele, Any for the 17 Add them up p2 + 2pq + q2 16 230 17 260 2p(1-(p+q)) 2q(1-(p+q)) + +

Modified random match probability
Here is the formula for multiple Allele, Any Now we rearrange that first part That last line should look familiar p2 + 2pq + q2 + 2p(1-(p+q)) + 2q(1-(p+q)) p2 + 2pq + q2 (p + q) x (p + q) (p + q)2

Modified random match probability
Remember back in the good old days? CPI stat For two alleles For three alleles For nine alleles (p + q)2 (p + q + r)2 (p + q + r + s + t + u + v + w + x)2 CPI

Modified random match probability
Two ways to think about Allele, Any The way we derived it for that minor contributor The way that works for as many contributors as we may need They are equivalent (Remember we dropped Θ for the top one) (CPI math is the foundation for the bottom one, and doesn’t use Θ) [p2 + 2p(1-p)] + [q2 + 2q(1-q)] – 2pq (p + q)2 + 2p(1-(p+q)) + 2q(1-(p+q))

Modified random match probability
Expand this one (“Double” Allele, Any – duplicate) To get Rearrange the terms p2 + 2p(1-p) + q2 + 2q(1-q) – 2pq p2 + 2p – 2p2 + q2 + 2q – 2q2 – 2pq p2 + q2 + 2p + 2q – 2pq – 2p2 – 2q2

Modified random match probability
Now expand the other one (Multiple Allele, Any) To get Rearrange the terms Condense the 2pq terms (p + q)2 + 2p(1-(p+q)) + 2q(1-(p+q)) p2 + 2pq + q2 + 2p – 2p2 – 2pq + 2q – 2q2 – 2pq p2 + q2 + 2p + 2q + 2pq – 2pq – 2pq – 2p2 – 2q2 p2 + q2 + 2p + 2q – 2pq – 2p2 – 2q2

Modified random match probability
Now compare them: This was the “single source” one (2 slides ago) This is the “generic” form for multiple contributors (previous slide) p2 + q2 + 2p + 2q – 2pq – 2p2 – 2q2 p2 + q2 + 2p + 2q – 2pq – 2p2 – 2q2

Modified random match probability
Section SWGDAM In a mixture having at a locus alleles P, Q, and R, assumed to be from two contributors, where all three alleles are below the stochastic threshold, the interpretation may be that the two contributors could be a heterozygote-homozygote pairing where all alleles were detected, a heterozygote-heterozygote pairing where all alleles were detected, or a heterozygote-heterozygote pairing where a fourth allele might have dropped out. In this case, the RMP must account for all heterozygotes and homozygotes represented by these three alleles, but also all heterozygotes that include one of the detected alleles. The RMP for this interpretation could be calculated as (2p – p2) + (2q – q2) + (2r – r2) – 2pq – 2pr – 2qr. Since 2p includes 2pq and 2pr, 2q includes 2pq and 2qr, and 2r includes 2pr and 2rq, the formula in subtracts 2pq, 2pr, and 2qr to avoid double-counting these genotype frequencies.

Modified random match probability
To use RMP you must state the number of contributors Validation studies Experience Yadda, yadda Now that we know how to deal with drop out via Allele, Any, we can use RMP more often Modified RMP (modified denotes “Anys”) This is the language we use at our lab

CPI compared to RMP But CPI is NOT the same as RMP
CPI is used when you are unsure about the number of contributors Consequently, you have problems when you have alleles in the stochastic range – “Danger Zone” If you don’t know how many contributors you have, you don’t know how many alleles are missing

CPI compared to RMP But we can use the CPI math in our RMP stat
We must make two changes to the “base” CPI formula that we use in the RMP 1. We must correct for situations that change the number of contributors 2. We must account for allelic drop out We’ve been through that second, so let’s deal with the first

CPI compared to RMP Consider a four allele pattern
We interpret the overall profile as having two contributors. CPI considers all possible “visible” combinations of contributors (p + q + r + s)2 This includes P, P and Q, Q and R, R and S, S types

CPI compared to RMP But if you think you could have a P, P contributor, that leaves three alleles left We stated that there were only 2 contributors If Contributor #1 is P, P Contributor #2 cannot account for Q, R and S alleles Having a homozygote changes the assumption of the number of contributors

CPI compared to RMP So all we need to do is subtract the homozygotes – but only when the presence of a homozygote changes the number of contributors 2 contributors and 4 alleles detected 3 contributors and 6 alleles detected

CPI compared to RMP Easy to do with a friendly computer
USACIL defines this as an “Unrestricted” RMP We kind of think of it as a CPI stat corrected for a defined number of contributors (p + q + r + s)2 – p2 – q2 – r2 – s2 (p + q + r + s + t + u)2 – p2 – q2 – r2 – s2 – t2 – u2

Unrestricted RMP Section 5.2.2.6 - SWGDAM
The unrestricted RMP might be calculated for mixtures that display no indications of allelic dropout. The formulae include an assumption of the number of contributors, but relative peak height information is not utilized. For two-person mixtures, the formulae for loci displaying one, two, or three alleles are identical to the CPI calculation discussed in section 5.3. For loci displaying four alleles (P, Q, R, and S), homozygous genotypes would not typically be included. The unrestricted RMP in this case would require the subtraction for homozygote genotype frequencies, e.g., (p + q + r + s) 2 – p2 – q2 – r2 – s2.

Modified random match probability
Same thing for our “Allele, Any” situation No need to consider an “Allele, Any” if it changes the number of contributors It doesn’t matter how many alleles are below your stochastic threshold If you say there are 2 contributors and you detect 4 alleles, by definition there are no alleles missing Similar for 3 contributors and 6 alleles detected

Modified random match probability
About as bad as it can get 3 contributors All alleles are in the Danger Zone Each allele could be missing it’s sister allele (p+q+r+s+t)2 + 2p(1-(p+q+r+s+t)) + 2q(1-(p+q+r+s+t)) + 2r(1-(p+q+r+s+t)) + 2s(1-(p+q+r+s+t)) + 2t(1-(p+q+r+s+t))

Modified random match probability
GIANT DISCLAIMER!! We are not saying that you can charge ahead and now use any profile of any number of people with any number of alleles dropping out if you just use a modified RMP calculation Bad data is bad data It’s science, not Voodoo