Comparing Distributions I: DIMAC and Fishers Exact By Peter Woolf University of Michigan Michigan Chemical Process Dynamics and Controls.

Slides:



Advertisements
Similar presentations
Introductory Mathematics & Statistics for Business
Advertisements

Introductory Statistics By Peter Woolf University of Michigan Michigan Chemical Process Dynamics and Controls Open Textbook version.
Hypothesis Testing making decisions using sample data.
Research Methods for Counselors COUN 597 University of Saint Joseph Class # 8 Copyright © 2015 by R. Halstead. All rights reserved.
Chapter 10 Section 2 Hypothesis Tests for a Population Mean
Testing Hypotheses About Proportions Chapter 20. Hypotheses Hypotheses are working models that we adopt temporarily. Our starting hypothesis is called.
Bayesian Networks I: Static Models & Multinomial Distributions By Peter Woolf University of Michigan Michigan Chemical Process Dynamics.
Comparing Distributions II: Bayes Rule and Acceptance Sampling By Peter Woolf University of Michigan Michigan Chemical Process Dynamics.
Comparing Distributions III: Chi squared test, ANOVA By Peter Woolf University of Michigan Michigan Chemical Process Dynamics and Controls.
1/55 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 10 Hypothesis Testing.
1 SOC 3811 Basic Social Statistics. 2 Announcements  Assignment 2 Revisions (interpretation of measures of central tendency and dispersion) — due next.
Dynamical Systems Analysis IV: Root Locus Plots & Routh Stability
Hypothesis Tests for Means The context “Statistical significance” Hypothesis tests and confidence intervals The steps Hypothesis Test statistic Distribution.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 7 th Edition Chapter 9 Hypothesis Testing: Single.
BCOR 1020 Business Statistics Lecture 20 – April 3, 2008.
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 8 Tests of Hypotheses Based on a Single Sample.
The Chi-Square Test Used when both outcome and exposure variables are binary (dichotomous) or even multichotomous Allows the researcher to calculate a.
Chapter Nine: Evaluating Results from Samples Review of concepts of testing a null hypothesis. Test statistic and its null distribution Type I and Type.
Analysis of Variance. ANOVA Probably the most popular analysis in psychology Why? Ease of implementation Allows for analysis of several groups at once.
Chapter 10 Hypothesis Testing
Fundamentals of Hypothesis Testing: One-Sample Tests
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 20 Testing Hypotheses About Proportions.
Inference for Proportions(C18-C22 BVD) C19-22: Inference for Proportions.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap th Lesson Introduction to Hypothesis Testing.
Section #4 October 30 th Old: Review the Midterm & old concepts 1.New: Case II t-Tests (Chapter 11)
Copyright © Cengage Learning. All rights reserved. 8 Tests of Hypotheses Based on a Single Sample.
+ Chapter 9 Summary. + Section 9.1 Significance Tests: The Basics After this section, you should be able to… STATE correct hypotheses for a significance.
Today’s lesson Confidence intervals for the expected value of a random variable. Determining the sample size needed to have a specified probability of.
CHAPTER 16: Inference in Practice. Chapter 16 Concepts 2  Conditions for Inference in Practice  Cautions About Confidence Intervals  Cautions About.
1 G Lect 6b G Lecture 6b Generalizing from tests of quantitative variables to tests of categorical variables Testing a hypothesis about a.
Lecture 7 Introduction to Hypothesis Testing. Lecture Goals After completing this lecture, you should be able to: Formulate null and alternative hypotheses.
LECTURE 19 THURSDAY, 14 April STA 291 Spring
Education Research 250:205 Writing Chapter 3. Objectives Subjects Instrumentation Procedures Experimental Design Statistical Analysis  Displaying data.
The Scientific Method. The Scientific Method The Scientific Method is a problem solving-strategy. *It is just a series of steps that can be used to solve.
10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 20 Testing Hypotheses About Proportions.
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
Statistics - methodology for collecting, analyzing, interpreting and drawing conclusions from collected data Anastasia Kadina GM presentation 6/15/2015.
Science Process Skills. Observe- using our senses to find out about objects, events, or living things. Classify- arranging or sorting objects, events,
Significance Test A claim is made. Is the claim true? Is the claim false?
Introduction to Inferece BPS chapter 14 © 2010 W.H. Freeman and Company.
MATH 2400 Ch. 15 Notes.
Economics 173 Business Statistics Lecture 4 Fall, 2001 Professor J. Petry
Fall 2002Biostat Statistical Inference - Confidence Intervals General (1 -  ) Confidence Intervals: a random interval that will include a fixed.
Ex St 801 Statistical Methods Inference about a Single Population Mean.
PCB 3043L - General Ecology Data Analysis.
Inferential Statistics Introduction. If both variables are categorical, build tables... Convention: Each value of the independent (causal) variable has.
1 G Lect 7a G Lecture 7a Comparing proportions from independent samples Analysis of matched samples Small samples and 2  2 Tables Strength.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 1 – Slide 1 of 26 Chapter 11 Section 1 Inference about Two Means: Dependent Samples.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
A significance test or hypothesis test is a procedure for comparing our data with a hypothesis whose truth we want to assess. The hypothesis is usually.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 9 Testing a Claim 9.2 Tests About a Population.
Uncertainty and confidence Although the sample mean,, is a unique number for any particular sample, if you pick a different sample you will probably get.
AP STATISTICS LESSON 11 – 1 (DAY 2) The t Confidence Intervals and Tests.
Copyright © 2009 Pearson Education, Inc. 9.2 Hypothesis Tests for Population Means LEARNING GOAL Understand and interpret one- and two-tailed hypothesis.
10.1 Estimating with Confidence Chapter 10 Introduction to Inference.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 1 FINAL EXAMINATION STUDY MATERIAL III A ADDITIONAL READING MATERIAL – INTRO STATS 3 RD EDITION.
Statistics 20 Testing Hypothesis and Proportions.
CHAPTER 9 Testing a Claim
Understanding Results
Lecture 10/24/ Tests of Significance
Exact Test Fisher’s Statistics
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
Chapter 9 Hypothesis Testing: Single Population
CHAPTER 9 Testing a Claim
Presentation transcript:

Comparing Distributions I: DIMAC and Fishers Exact By Peter Woolf University of Michigan Michigan Chemical Process Dynamics and Controls Open Textbook version 1.0 Creative commons

Scenario: You run a small plastic factory described in an earlier lecture You have already developed the P&ID, control architecture, and parameterized your controllers. The system is running well most of the time, but not always. Generally you get a 30% yield, but not always. If the yield is above 32% or below 28% then the batch can’t be sold. How do you tell if the system is out of control? What do you do if it is out of control? What strategies can you adopt to maintain tighter control?

DMAIC: Define, measure, analyze, improve, and control Goal: Consistent yield Measure yield Control charts, detective work Change system and/or policies

How do you tell if the system is out of control? 1) Make some measurements

How do you tell if the system is out of control? 2) Construct a control chart Statistically out of control because run 9 exceeds the UCL Now what??

Log it and do nothing. Wait for it to happen again before taking action –Note lost opportunity to improve process, and possible safety risk. Passive solution What if you are out of control?

Resample to make sure it is not an error –Odd that this is not done when things are okay.. Adjust calculated mean up or down to adjust to the new situation –Treat the symptom, not the cause –Lost opportunity to learn about the process Semi-passive solutions

Look for a special cause and remove or enhance it. –Not all changes are bad, some may actually improve the process. What if you are out of control? Active solution

Look for a special cause Possible sources of information: 1)Patterns in the data 2)Association with unmeasured events 3)Known physical effects 4)Operators Field observation: “The feed for run 9 seemed unusually runny--maybe that is the reason?”

Hypothesis: Runny feed causes the product to go out of our desirable range. 1)Gather data 2)Evaluate hypothesis 3)Make a model of the relationship (1) Is this significant? (2) What causes the feed to be runny? (3) Can we develop strategies to cope with this? Data from 25 runs 5 Normal feed Runny feed Bad product Good product 118 1

Marginal results (sums on the side that count over one of the states) Is this significant? --> What are the odds? 2 answers depending on the question: (1)What are the odds of choosing 25 random samples with this particular configuration (2)What are the odds of choosing 25 samples with these marginals in this configuration or more extreme? 5 Normal feed Runny feed Bad product Good product totals

5 Normal feed Runny feed Bad product Good product totals What are the odds of choosing 25 samples with these marginals in this configuration or more extreme? What are the odds? Urn Remove 6 balls Restate as an urn problem: with 25 balls, 6 are white and 19 black, what are the odds of drawing 6 balls of which 5 are white and 1 is black? Break down the problem: For the 6 bad products, odds of 5 with runny feed, 1 normal?

Urn Remove 6 balls Restate as an urn problem: with 25 balls, 6 are white and 19 black, what are the odds of drawing 6 balls of which 5 are white and 1 is black? Number of ways of choosing 5 out of 6 of the white balls Number of ways of choosing 6 out of 25 balls Number of ways of choosing 1 out of 19 of the black balls Odds of this draw

Urn Remove 6 balls Restate as an urn problem: with 25 balls, 6 are white and 19 black, what are the odds of drawing 6 balls of which 5 are white and 1 is black? Odds of this draw Hypergeometric distribution: probability sampling exactly k special items in a sample of n from an urn containing N items of which m are special where Reads “a choose b”

5 Normal feed Runny feed Bad product Good product totals What are the odds of choosing 25 samples with these marginals in this configuration or more extreme? What are the odds? Analogous arguments can be made for: 1 in 19 of the good products having runny feed 1 in 6 of the runny feed products being good products 1 in 19 of the normal feeds being bad product Composite probability can be calculated using Fisher’s exact test

Fisher’s exact is the probability of sampling a particular configuration of a 2 by 2 table with constrained marginals a Normal feed Runny feed Bad product Good product cd ba+b c+d b+da+ba+b+c+d totals # of ways the marginals can be arranged # of ways the total can be arranged # of ways each observation can be arranged

5 Normal feed Runny feed Bad product Good product totals What are the odds of choosing 25 samples with these marginals in this configuration? In Mathematica: But this is for this configuration alone! Is this one of many bad configurations?

5 Normal feed Runny feed Bad product Good product totals What are the odds of choosing 25 samples with these marginals in this configuration? Probability estimate at a particular value Estimate at a value or further Or more extreme values.. One tail test..

5 Normal feed Runny feed Bad product Good product totals What are the odds of choosing 25 samples with these marginals in this configuration or more extreme? A more extreme case with the same marginals 6 Normal feedRunny feed Bad product Good product totals P fisher = P fisher = P-value = =

P-values P-values can be interpreted as the probability that the null hypothesis is true. Null hypothesis: Most common interpretation is completely random event, sometimes with constraints Examples of null hypotheses: Runny feed has no impact on product quality Points on a control chart are all drawn from the same distribution Two shipments of feed are statistically the same Often p-values are considered significant if they are less than 0.05 or 0.001, but this limit is not guaranteed to be appropriate in all cases..

Look for a special cause 5 Normal feed Runny feed Bad product Good product totals 1) Data 2) Analysis: p-value= <0.05 3) Conclusion: runny feed significantly impacts product quality Note: Runny feed is not the only cause as sometimes we get good product from runny feed..

Look for a special cause 3) Conclusion: runny feed is likely to impact product quality What next? Look for root causes: What causes runny feed? Supplier? Temperature? Storage conditions? Lot number? Storage time? - very process dependent Develop a method to detect runny feed before it goes into the process

Take Home Messages After you identify a system is out of control, take appropriate action Associations between variables can be identified using Fisher’s exact tests and its associated p-value Once the cause of a disturbance is found, find a way to eliminate it