# Bayesian Statistics: Asking the Right Questions Michael L. Raymer, Ph.D.

## Presentation on theme: "Bayesian Statistics: Asking the Right Questions Michael L. Raymer, Ph.D."— Presentation transcript:

Bayesian Statistics: Asking the Right Questions Michael L. Raymer, Ph.D.

8/29/03M. Raymer – WSU, FBS2 Statistical Games The defendants DNA is consistent with the evidentiary sample, and the defendants DNA type occurs with a frequency of one in 10,000,000,000. Only about 0.1% of wife batterers actually murder their wives. Therefore, evidence of abuse and battering should not be admissible in a murder trial.

8/29/03M. Raymer – WSU, FBS3 The Question Given the evidentiary DNA type and the defendants DNA type, what is the probability that the evidence sample contains the defendants DNA? Information available: How common is each allele in a particular population? CPI, RMP etc.

8/29/03M. Raymer – WSU, FBS4 An Example Problem Suppose the rate of breast cancer is 1% Mammograms detect breast cancer in 80% of cases where it is present 10% of the time, mammograms will indicate breast cancer in a healthy patient If a woman has a positive mammogram result, what is the probability that she has breast cancer?

8/29/03M. Raymer – WSU, FBS5 Results 75% -- 3 50% -- 1 25% -- 2 <10% -- a lot

8/29/03M. Raymer – WSU, FBS6 Determining Probabilities Counting all possible outcomes If you flip a coin 4 times, what is the probability that you will get heads twice? TTTTTHTTHTTTHHTT TTTHTHTHHTTHHHTH TTHTTHHTHTHTHHHT TTHHTHHHHTHHHHHH P(2 heads) = 6/16 = 0.375

8/29/03M. Raymer – WSU, FBS7 Statistical Preliminaries Frequency and Probability We can guess at probabilities by counting frequencies: P(heads) = 0.5 The law of large numbers: the more samples we take the closer we will get to 0.5.

8/29/03M. Raymer – WSU, FBS8 Distributions Counting frequencies gives us distributions Binomial Distribution (Discrete) Gaussian Distribution (Continuous)

8/29/03M. Raymer – WSU, FBS9 Density Estimation Parametric Assume a Gaussian (e.g.) distribution. Estimate the parameters (, ). Non-parametric Histogram sampling Bin size is critical Gaussian smoothing can help

8/29/03M. Raymer – WSU, FBS10 Combining Probabilities Non-overlapping outcomes: Possible Overlap: Independent Events: The Product Rule

8/29/03M. Raymer – WSU, FBS11 Product Rule Example P(Engine > 200 H.P.) = 0.2 P(Color = red) = 0.3 Assuming independence: P(Red & Fast) = 0.2 × 0.3 = 0.06 1/4 * 1/10 * 1/6 * 1/8 * 1/5 1/10,000

8/29/03M. Raymer – WSU, FBS12 Statistical Decision Making One variable: A ring was found at the scene of the crime. The ring is size 11. The defendants ring size is also 11. If a random ring were left at the crime scene, what is the probability that it would have been size 11?

8/29/03M. Raymer – WSU, FBS13 Multiple Variables Assume independence: Note what happens to significant digits! The ring is size 11, and also made of platinum.

8/29/03M. Raymer – WSU, FBS14 Which Question? If a fruit has a diameter of 4, how likely is it to be an apple? Apples 4 Fruit

8/29/03M. Raymer – WSU, FBS15 Inverting the question Given an apple, what is the probability that it will have a diameter of 4? Given a 4 diameter fruit, what is the probability that it is an apple?

8/29/03M. Raymer – WSU, FBS16 Forensic DNA Evidence Given alleles (17, 17), (19, 21), (14, 15.1), what is the probability that a DNA sample belongs to Bob? Find all (17,17), (19,21), (14,15.1) individuals, how many of them are Bob? How common are 17, 19, 21, 14, and 15.1 in the population?

8/29/03M. Raymer – WSU, FBS17 Conditional Probabilities For related events, we can express probability conditionally: Statistical Independence:

8/29/03M. Raymer – WSU, FBS18 Bayesian Decision Making Terminology We have an object, and we want to decide if it belongs to a class Is this fruit a type of apple? Does this DNA come from a Caucasian American? Is this car a sports car? We measure features of the object (evidence): Size, weight, color Alleles at various loci

8/29/03M. Raymer – WSU, FBS19 Bayesian Notation Feature/Evidence Vector: Classes & Posterior Probability:

8/29/03M. Raymer – WSU, FBS20 A Simple Example You are given a fruit with a diameter of 4 – is it a pear or an apple? To begin, we need to know the distributions of diameters for pears and apples.

8/29/03M. Raymer – WSU, FBS21 Maximum Likelihood P(x)P(x) Class- Conditional Distributions 1 2 3 4 5 6

8/29/03M. Raymer – WSU, FBS22 A Key Problem We based this decision on (class conditional) What we really want to use is (posterior probability) What if we found the fruit in a pear orchard? We need to know the prior probability of finding an apple or a pear!

8/29/03M. Raymer – WSU, FBS23 Prior Probabilities Prior probability + Evidence Posterior Probability Without evidence, what is the prior probability that a fruit is an apple? What is the prior probability that a DNA sample comes from the defendant?

8/29/03M. Raymer – WSU, FBS24 The heart of it all Bayes Rule

8/29/03M. Raymer – WSU, FBS25 Bayes Rule or

8/29/03M. Raymer – WSU, FBS26 Example Revisited Is it an ordinary apple or an uncommon pear?

8/29/03M. Raymer – WSU, FBS27 Bayes Rule Example

8/29/03M. Raymer – WSU, FBS28 Bayes Rule Example

8/29/03M. Raymer – WSU, FBS29 Posing the question 1.What are the classes? 2.What is the evidence? 3.What is the prior probability? 4.What is the class-conditional probability?

8/29/03M. Raymer – WSU, FBS30 An Example Problem Suppose the rate of breast cancer is 1% Mammograms detect breast cancer in 80% of cases where it is present 10% of the time, mammograms will indicate breast cancer in a healthy patient If a woman has a positive mammogram result, what is the probability that she has breast cancer?

8/29/03M. Raymer – WSU, FBS31 Practice Problem Revisited Classes: healthy, cancer Evidence: positive mammogram (pos), negative mammogram (neg) If a woman has a positive mammogram result, what is the probability that she has breast cancer?

8/29/03M. Raymer – WSU, FBS32 A Counting Argument Suppose we have 1000 women 10 will have breast cancer 8 of these will have a positive mammogram 990 will not have breast cancer 99 of these will have a positive mammogram Of the 107 women with a positive mammogram, 8 have breast cancer 8/107 0.075 = 7.5%

8/29/03M. Raymer – WSU, FBS33 Solution

8/29/03M. Raymer – WSU, FBS34 An Example Problem Suppose the chance of a randomly chosen person being guilty is.001 When a person is guilty, a DNA sample will match that individual 99% of the time..0001 of the time, a DNA will exhibit a false match for an innocent individual If a DNA test demonstrates a match, what is the probability of guilt?

8/29/03M. Raymer – WSU, FBS35 Solution

8/29/03M. Raymer – WSU, FBS36 Marginal Distributions

8/29/03M. Raymer – WSU, FBS37 Combining Marginals Assuming independent features: If we assume independence and use Bayes rule, we have a Naïve Bayes decision maker (classifier).

8/29/03M. Raymer – WSU, FBS38 Bayes Decision Rule Provably optimum when the features (evidence) follow Gaussian distributions, and are independent.

8/29/03M. Raymer – WSU, FBS39 Forensic DNA Classes: DNA from defendant, DNA not from defendant Evidence: Allele matches at various loci Assumption of independence Prior Probabilities? Assumed equal (0.5) What is the true prior probability that an evidence sample came from a particular individual?

8/29/03M. Raymer – WSU, FBS40 The Importance of Priors

8/29/03M. Raymer – WSU, FBS41 Likelihood Ratios When deciding between two possibilities, we dont need the exact probabilities. We only need to know which one is greater. The denominator for all the classes is always equal. Can be eliminated Useful when there are many possible classes

8/29/03M. Raymer – WSU, FBS42 Likelihood Ratio Example

8/29/03M. Raymer – WSU, FBS43 Likelihood Ratio Example

8/29/03M. Raymer – WSU, FBS44 From alleles to identity: It is relatively easy to find the allele frequencies in the population Marginal probability distributions Independence assumption Class conditional probabilities Equal prior probabilities Bayesian posterior probability estimate

8/29/03M. Raymer – WSU, FBS45 Thank you.

8/29/03M. Raymer – WSU, FBS46 A Key Advantage The oldest citation: T. Bayes. An essay towards solving a problem in the doctrine of chances. Phil. Trans. Roy. Soc., 53, 1763.