Random Match Probability Statistics

Name: Random Match Probability Statistics
Uploaded: 2017-07-28T14:00:33+00:00
Duration: PTM27S6
Channel: Kathryn Meggitt
Description: Random Match Probability Statistics

Random Match Probability Statistics
From single source to three person mixtures with allelic drop out

Statistics “There are three kinds of lies: lies, damned lies, and statistics.” –Benjamin Disraeli, British Prime Minister as popularized by Mark Twain 18.7% of all statistics are made up My introduction to forensics statistics…. It had been a loooooong time since sophomore genetics

Heterozygote Alleles P and Q Could be PQ or it could be QP So… 2pq
Where p is frequency of P And q is frequency of Q If p = 0.2 and q = 0.15, then 2(0.2)(.15) = 0.06 Most of us understood this pretty quickly

Homozygote Allele P Above stochastic threshold So… p x p or p2
But there’s that Θ business Most of us understood this pretty quickly too

Homozygote You don’t use p2 Use p2 + p(1-p)Θ Where did Θ come from?
But I understood that Use p2 + p(1-p)Θ I didn’t understand this Where did Θ come from? “It’s the inbreeding coefficient.”

Homozygote OK, but where did p2 + p(1-p)Θ come from?
“It’s the correction factor for inbreeding.” Not so helpful Why isn’t it just p2 – Θ?

Homozygote We start with what we thought
But some percentage is from inbreeding Correct for that amount of inbreeding Combine them p2 Θp (1-Θ) p2 +

Homozygote Now it’s algebra Θp + (1 – Θ)p2 (inbred p + non-inbred p2)
Θp + p2 – Θp2 (expand the terms) p2 + Θp – Θp2 (we like to see p2 term first) p2 + p(Θ – Θp) (pull out p) p2 + p(1 – p)Θ (pull out Θ to get final form)

Single source stat Do the 2pq calculation at each heterozygous locus
Do p2 + p(1 – p)Θ at each homozygous locus Then multiply the results for all loci

Partial single source stat
What if you don’t detect everything from a single contributor? Consistent with one contributor, but obvious there is a lot of drop out

No result No result Drop out ?? Drop out Drop out No result

With a sample like this, would you
Inconclusive data Exclude only Exclude or “inc a person” Exclude/include no stat Exclude/include stat for 2 allele loci Exclude/include for all loci with something detected Other 0 of 30 Countdown 30

Heterozygous loci still 2pq

What about loci that you don’t know about?

Any person that is a 9.3 could be the source How to calculate 9.3, Any?

The 9.3 could be a homozygote So p2 + p(1-p)Θ covers that But the 9.3 could be a heterozygote with any other allele So 2pq, but what is q?

You could go to the ladder 2(p)(q) p = 9.3 q = so 2(f9.3)(f4) q = so 2(f9.3)(f5) q = so 2(f9.3)(f6) ….. q = so 2(f9.3)(f13.3) Then add them up But what about off ladder alleles, microvariants, etc? How do you do 2pq for those?

Instead – if p is what you see (or detect) Then q must be what you don’t see (or detect) Since this is a binary system (What you see/detect) + (what you don’t) = 1.0 (what you don’t see) = 1 – (what you see/detect) So q = (1-p) Therefore 2pq becomes 2p(1-p)

Now just combine the homozygote and heterozygote options (p = f9.3) [p2 + p(1-p)Θ] + [2p(1-p)] for anyone with 9.3

What about loci that look like homozygotes? Use your PHR and stochastic threshold studies If you treat a locus as a homozygote, you better be above your stochastic threshold When in doubt, use Allele, Any – you’re covered At USACIL, Allele, Any = “modified” RMP

The “2p” rule Section –SWGDAM For single-allele profiles where the zygosity is in question (e.g., it falls below the stochastic threshold): The formula 2p, as described in recommendation 4.1 of NRCII, may be applied to this result. Instead of using 2p, the algebraically identical formulae 2p – p2 and p2 + 2p(1-p) may be used to address this situation without double-counting the proportion of homozygotes in the population.

2p is an extremely conservative approximation There is a better way 2p-p2 p2 + 2p(1-p) But this is even better p2 + p(1-p)Θ + 2p(1-p) (computers can calculate anything)

“Algebraically identical formulae” f9.3 = 2p –p p2 + 2p(1-p) 2(0.3054) - (0.3054) (0.3054)2 + 2(0.3054) ( ) (0.6946)

So for 9.3, Any 2p = 2p-p2 = p2 + 2p(1-p) = p2 + p(1-p)Θ + 2p(1-p) =

Minor contributor stat

When the minor is probative, would you
Inconclusive data Exclude only Exclude or “inc a person” Exclude/include no stat Exclude/include stat for some allele loci Exclude/include for all loci Other 0 of 30 Countdown 30

For our purposes, it is an intimate sample from known female contributor Female is major Major would have a single source stat But isn’t probative Focus on the minor (or foreign) contributor

Situations you need to be able to calculate When you know the minor type When you are concerned about drop out When you are not concerned about drop out, but you don’t know the minor type (masking/sharing) When you do not see any minor alleles, but still think the minor contributor is represented We haven’t discussed the last two yet

When you know the minor type 10, 11 2pq 2(f10)(f11) 6, 9.3 2(f6)(f9.3)

When you are concerned about drop out 24, Any p2 + p(1-p)Θ + 2p(1-p) (f24)2 + (f24)(1-(f24))Θ + 2(f24) (1-(f24))

When you are not concerned about drop out, but don’t know the minor type What types are possible? 9, 9 8, 9 9, 11 “Combo stat”

“Combo stat” 9 is above stochastic threshold 9, 9 8, 9 9, 11 Add them up p2 + p(1-p)Θ 2pq 2pr + + (f9)2 + (f9)(1-(f9))Θ + 2(f8) (f9) + 2(f9) (f11)

Section SWGDAM When the interpretation is conditioned upon the assumption of a particular number of contributors greater than one, the RMP is the sum of the individual frequencies for the genotypes included following a mixture deconvolution. Examples are provided below. In a sperm fraction mixture (at a locus having alleles P, Q, and R) assumed to be from two contributors, one of whom is the victim (having genotype QR), the sperm contributor genotypes included post-deconvolution might be PP, PQ, and PR. In this case, the RMP for the sperm DNA contributor could be calculated as [p2 + p(1-p)] + 2pq + 2pr.

No minor alleles present, but you know the minor is contributing Every other locus has minor alleles Did the enzyme just get lazy? “Just inc the locus for stats” That doesn’t make any more sense than throwing out any other locus You just need the right calculator

Two scenarios to consider No stochastic concerns Stochastic concerns Two slightly different stats, but can deal with both

No stochastic concerns In some cases, PHR and P may help 17, 17 or possibly 16, 17 Maybe not 16, 16 But, you know minor must be: 16, 16 16, 17 17, 17 p2 + p(1-p)Θ 2pq This is the “combo” stat q2 + q(1-q)Θ + +

Couple more definitions: “Unrestricted” RMP The “combo” stat where we used all possibilities 16,16 and 16,17 and 17,17 from previous slide “Restricted” RMP The “combo” stat where we chose not to use one (or more) possible types based on what fits peak heights, peak height ratios, or proportions of contributors 17,17 or 16,17 but not 16,16 from previous slide

What if stochastic concerns? You would take anyone with 16, Any 17, Any But that has the 16, 17 counted twice Subtract 16, 17 But only once! (p2 + p(1-p)Θ) + 2p(1-p) (q2 + q(1-q)Θ) + 2q(1-q) – 2pq +

Modified random match probability
Let’s look at this “double any” calculation Simplify by removing Θ This is the basis for dealing with any number of “Allele, Any” contributors USACIL calls this a modified RMP because “Anys” are involved (p2 + p(1-p)Θ) + 2p(1-p) + p2 + 2p(1-p) + (q2 + q(1-q)Θ) + 2q(1-q) q2 + 2q(1-q) – 2pq – 2pq p2 + 2p(1-p) + q2 + 2q(1-q) – 2pq

Let’s say we’ve got a two contributor mixture with signs that both contributors are having stochastic issues. But what you see is consistent with two contributors Remember “Take a stand on the stand….” Validation studies, interpretation guidelines, your experience, Tech Review agrees…

We’ll start with this same pattern But stochastic concerns Homozygote threshold Mixture interpretation threshold Stochastic threshold Drop out threshold Lets just call it the “Danger Zone” Why do I always think of “Top Gun” when I have low peak heights? 16 230 17 260 (We’re not suggesting that you MUST do this - only that you can calculate it.)

Remember the “Allele, Any” 2pq = 2p(1-p) 2x(what you do see)x(what you don’t see) (We used it for a single allele below stochastic threshold for partial or minor contributor) Because we have two contributors: 16, Any 17, Any Or both or 16 230 17 260

Also, remember the “combo stat” for the combinations you can see p2 + 2pq + q2 We’ll rearrange this in a minute 16 230 17 260

Allele, Any for p (16) 2(what you see)(what you don’t) 2p(1-?) You “see” two alleles now Both p and q (16 and 17) Stick with “1 – what you see” for what you don’t see 2p(1-(p+q)) for p (16) Same thing for q (17) 2q(1-(p+q)) 16 230 17 260

So, the obvious combinations: “Combo” for visible The “Allele, Any” combinations: Allele, Any for the 16 Allele, Any for the 17 Add them up p2 + 2pq + q2 16 230 17 260 2p(1-(p+q)) 2q(1-(p+q)) + +

Here is the formula for multiple Allele, Any Now we rearrange that first part That last line should look familiar p2 + 2pq + q2 + 2p(1-(p+q)) + 2q(1-(p+q)) p2 + 2pq + q2 (p + q) x (p + q) (p + q)2

Remember back in the good old days? CPI stat For two alleles For three alleles … For nine alleles (p + q)2 (p + q + r)2 (p + q + r + s + t + u + v + w + x)2 CPI

Two ways to think about Allele, Any The way we derived it for that minor contributor The way that works for as many contributors as we may need They are equivalent (Remember we dropped Θ for the top one) (CPI math is the foundation for the bottom one, and doesn’t use Θ) [p2 + 2p(1-p)] + [q2 + 2q(1-q)] – 2pq (p + q)2 + 2p(1-(p+q)) + 2q(1-(p+q))

Expand this one (“Double” Allele, Any – duplicate) To get Rearrange the terms p2 + 2p(1-p) + q2 + 2q(1-q) – 2pq p2 + 2p – 2p2 + q2 + 2q – 2q2 – 2pq p2 + q2 + 2p + 2q – 2pq – 2p2 – 2q2

Now expand the other one (Multiple Allele, Any) To get Rearrange the terms Condense the 2pq terms (p + q)2 + 2p(1-(p+q)) + 2q(1-(p+q)) p2 + 2pq + q2 + 2p – 2p2 – 2pq + 2q – 2q2 – 2pq p2 + q2 + 2p + 2q + 2pq – 2pq – 2pq – 2p2 – 2q2 p2 + q2 + 2p + 2q – 2pq – 2p2 – 2q2

Now compare them: This was the “single source” one (2 slides ago) This is the “generic” form for multiple contributors (previous slide) p2 + q2 + 2p + 2q – 2pq – 2p2 – 2q2 p2 + q2 + 2p + 2q – 2pq – 2p2 – 2q2

Section SWGDAM In a mixture having at a locus alleles P, Q, and R, assumed to be from two contributors, where all three alleles are below the stochastic threshold, the interpretation may be that the two contributors could be a heterozygote-homozygote pairing where all alleles were detected, a heterozygote-heterozygote pairing where all alleles were detected, or a heterozygote-heterozygote pairing where a fourth allele might have dropped out. In this case, the RMP must account for all heterozygotes and homozygotes represented by these three alleles, but also all heterozygotes that include one of the detected alleles. The RMP for this interpretation could be calculated as (2p – p2) + (2q – q2) + (2r – r2) – 2pq – 2pr – 2qr. Since 2p includes 2pq and 2pr, 2q includes 2pq and 2qr, and 2r includes 2pr and 2rq, the formula in subtracts 2pq, 2pr, and 2qr to avoid double-counting these genotype frequencies.

To use RMP you must state the number of contributors Validation studies Experience Yadda, yadda Now that we know how to deal with drop out via Allele, Any, we can use RMP more often Modified RMP (modified denotes “Anys”) This is the language we use at our lab

CPI compared to RMP But CPI is NOT the same as RMP
CPI is used when you are unsure about the number of contributors Consequently, you have problems when you have alleles in the stochastic range – “Danger Zone” If you don’t know how many contributors you have, you don’t know how many alleles are missing

CPI compared to RMP But we can use the CPI math in our RMP stat
We must make two changes to the “base” CPI formula that we use in the RMP 1. We must correct for situations that change the number of contributors 2. We must account for allelic drop out We’ve been through that second, so let’s deal with the first

CPI compared to RMP Consider a four allele pattern
We interpret the overall profile as having two contributors. CPI considers all possible “visible” combinations of contributors (p + q + r + s)2 This includes P, P and Q, Q and R, R and S, S types

CPI compared to RMP But if you think you could have a P, P contributor, that leaves three alleles left We stated that there were only 2 contributors If Contributor #1 is P, P Contributor #2 cannot account for Q, R and S alleles Having a homozygote changes the assumption of the number of contributors

CPI compared to RMP So all we need to do is subtract the homozygotes – but only when the presence of a homozygote changes the number of contributors 2 contributors and 4 alleles detected 3 contributors and 6 alleles detected

CPI compared to RMP Easy to do with a friendly computer
USACIL defines this as an “Unrestricted” RMP We kind of think of it as a CPI stat corrected for a defined number of contributors (p + q + r + s)2 – p2 – q2 – r2 – s2 (p + q + r + s + t + u)2 – p2 – q2 – r2 – s2 – t2 – u2

Unrestricted RMP Section 5.2.2.6 - SWGDAM
The unrestricted RMP might be calculated for mixtures that display no indications of allelic dropout. The formulae include an assumption of the number of contributors, but relative peak height information is not utilized. For two-person mixtures, the formulae for loci displaying one, two, or three alleles are identical to the CPI calculation discussed in section 5.3. For loci displaying four alleles (P, Q, R, and S), homozygous genotypes would not typically be included. The unrestricted RMP in this case would require the subtraction for homozygote genotype frequencies, e.g., (p + q + r + s) 2 – p2 – q2 – r2 – s2.

Same thing for our “Allele, Any” situation No need to consider an “Allele, Any” if it changes the number of contributors It doesn’t matter how many alleles are below your stochastic threshold If you say there are 2 contributors and you detect 4 alleles, by definition there are no alleles missing Similar for 3 contributors and 6 alleles detected

About as bad as it can get 3 contributors All alleles are in the Danger Zone Each allele could be missing it’s sister allele (p+q+r+s+t)2 + 2p(1-(p+q+r+s+t)) + 2q(1-(p+q+r+s+t)) + 2r(1-(p+q+r+s+t)) + 2s(1-(p+q+r+s+t)) + 2t(1-(p+q+r+s+t))

GIANT DISCLAIMER!! We are not saying that you can charge ahead and now use any profile of any number of people with any number of alleles dropping out if you just use a modified RMP calculation Bad data is bad data It’s science, not Voodoo

Random Match Probability Statistics

Similar presentations

Presentation on theme: "Random Match Probability Statistics"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Random Match Probability Statistics

Similar presentations

Presentation on theme: "Random Match Probability Statistics"— Presentation transcript:

Similar presentations

About project

Feedback