# Introductory Statistics By Peter Woolf University of Michigan Michigan Chemical Process Dynamics and Controls Open Textbook version.

## Presentation on theme: "Introductory Statistics By Peter Woolf University of Michigan Michigan Chemical Process Dynamics and Controls Open Textbook version."— Presentation transcript:

Introductory Statistics By Peter Woolf (pwoolf@umich.edu) University of Michigan Michigan Chemical Process Dynamics and Controls Open Textbook version 1.0 Creative commons

“A foolish consistency is the hobgoblin of little minds” R. W. Emerson But is this always true??

Consistency Why might we want consistency? –Integration of products within a larger system –Examples: want parts to fit together, want consistent chemical feeds, want consistent material properties, want consistent energy content, want consistent flavor

Consistency What can be the downsides of consistency? –Make something consistently bad, but consistent. –Sometimes people trade consistency for quality--this is not the goal. –Examples: Fast food vs home made food (depends the cook)

Measures of “Quality” (or lack there of): Six Sigma: “Number of defects per million opportunities” Genichi Taguchi: “Uniformity around a target value” or “The loss a product imposes on society after it is shipped” Process control is a central tool for reducing variability by adjusting and correcting for variations. Key Questions: How can we know if our control system is working well enough? How can we measure variability?

Process Specific Questions 1)Do recent data indicate that the process is broken or changed? 2)Is the process “out of control”? 3)What are the odds that two samples come from the same distribution? 4)What factors influence this outcome?

Detecting if a process has changed Scenario: You are a small Acai juice vendor trying to expand to a world market with a consistent product.

Acai juice production Acai berries in the market Berry crusher juice

Acai juice production juice A key selling point of your acai juice is that it contains a large concentration of antioxidants. With your berry crusher you get a good quality product most of the time, but not always. You don’t want to waste berries if your crusher is hurting your product, but how can you know if it is not working right? How can you test this?

Acai juice production juiceHow can you test this? 1) Gather many samples from your current process and measure the antioxidant concentration N sample values: 40.1, 41.3, 44.3, 39.3, 38.6,….. How do we summarize this?

juice N sample values: 40.1, 41.3, 44.3, 39.3, 38.6,….. How do we summarize this? Average: Deviation from the average: (std deviation)

Deviation from the average: (std deviation) Interpretation: The average distance from the mean OR the width of the dispersion around the mean Problem: What if I have only one sample (e.g. N=1)?  =0!! Does this mean that the underlying process has no variation or that I have not sampled it sufficiently? Result:When N is small, the standard deviation will underestimate the true variation Solution: sample standard deviation

Population standard deviation (“Real deviation”) Sample standard deviation (“Observed deviation”) With a measure of the mean and standard deviation, you have enough information to define a Gaussian distribution Bell curve shape based on a model of a large number of random, uncorrelated changes

Gaussian or Normal Distribution: From previous lecture on Noise: Approximate Gaussian distribution in Excel by: =RAND()+RAND()+RAND()-RAND()-RAND()-RAND() The approximation is better and better for larger numbers of pairs of add and subtract Gaussian distribution is the basis of much of statistical quality control, six sigma, and quality engineering in general. 22 33 -2  -3  66 How do we mathematically define a normal distribution?

How can we calculate this? Excel: Error function is Erf(), thus the solution above could be expressed as =1/2*(1+erf((z-m)/(s*sqrt(2)))) Mathematica: Nintegrate[ f(x), {x,start, end}] Or N[1/2*(1+Erf[(z-m)/(s*Sqrt[2])])] General numerical integration Using analytical solution with error function

juice Acai juice problem revisited From 100 samples of the current process we calculate the following: Mean=40 units Standard deviation= 2 units From these data, what are the odds that the next batch will have an antioxidant value of 37.5 or less?

Mean=40 units Standard deviation= 2 units From these data, what are the odds that the next batch will have an antioxidant value of 37.5 or less? In Mathematica: short hand notation Answer: ~10% of the time we expect this situation

Acai juice production as a function of time time Antioxidant value  Is this process “out of control”? Yes: “It is unusual to see so many batches with such a high value-- this is strange and suggests something has changed.” No: “This is just normal variation-- nothing is fundamentally different.” Key question: How do we define “unusual”

One definition: Variation outside of the six sigma window is unusual 22 33 -2  -3  66 What are the odds of finding something that falls out of this bound by chance? Find by integration! For both tails the probability is ~0.0027 or 1 in 370 Common confusion: The “Six Sigma” process defines unusual as 3.4 defects out of 1 million, not within 6 standard deviations (more like 10.2 deviations)

Acai juice production as a function of time time Antioxidant value  Is this process “out of control”? Translation: if we assume outside of 6 sigma variation is “unusual”: Is this pattern expected to happen less than 1 in 370 of our samples? Solution: Control charts!

Image from wikipedia western_electric_rules Control charts determine if a process is behaving in an unusual way.

Image from wikipedia western_electric_rules Control charts determine if a process is behaving in an unusual way. What are the odds? If each dot is a single measurement, and UCL is +3 sigma then UCL=Upper control limit X-bar= average LCL=Lower control limit For both tails the probability is ~0.0027 or 1 in 370 Rule 1:

Control charts determine if a process is behaving in an unusual way. What are the odds? UCL=Upper control limit X-bar= average LCL=Lower control limit Rule 2:Can do using probability theory. Assuming each sample is independent, then can find the total probability of: 2*[P1(out+)P2(out+)P3(out+)+P1(out+)P2(out+)P3(in) +P1(out+)P2(in)P3(out+)+P1(in)P2(out+)P3(out+)] =P(out+)P(in)=1-P(out+) =0.00305 or 1 in 326 1 in 370 1 in 326

What are the odds?Alternative solution by sampling 1 in 370 1 in 326 Approach: Generate thousands of samples and test to see how many satisfy the rule ~ similar to 1 in 370

What are the odds? 1 in 326 Alternative solution by sampling Rule 2: ~ similar to 1 in 326 Message: Many complex decision processes can be evaluated numerically with good accuracy (see mathematica code on website under Lecture 21.nb)

Odds 1 in 370 Odds 1 in 326 Odds 1 in 256 Odds 1 in 180 In all cases these represent somewhat “rare” cases in a statistical sense, but they are not all equally rare. These are not only constrained on statistics though.. e.g. What are the odds of finding 15 consecutive samples in zone c? =Odds 1 in 306 Thus is this system “out of control”? Yes, but in a good way.

Acai juice problem revisited What if you know that each batch of berries has some variation, but you are unsure if the machine is behaving strangely? Can you still use your control charts? Solution: Take samples from each batch, average them and plot these average values and statistics on a control chart. Problem: The process of averaging out different samples will change your odds--averaging reduces out variation. Day 1: 40.36, 39.36, 38.43, 39.67 Day 2: 39.96, 40.32, 39.88, 39.75 …

Acai process control using X-bar charts Raw Data: Plotting the raw data, it is hard to say if anything is going on..

To get something like this need UCL and LCL Acai process control using X-bar charts Raw Data: Data in excel example online Lecture.21.xls

To get something like this need UCL and LCL UCL= grand avg+ A3*(avg stdev) = 39.86+ 1.628*0.55 =40.76

To get something like this need UCL and LCL UCL= grand avg+ A3*(avg stdev) = 39.86+ 1.628*0.55 =40.76 Note: If you use A2, you use the average R. The result is 40.77--nearly the same. LCL=grand avg-A3*(avg stdev) =38.96 UCL represents 3 standard deviations away from the mean, so the line between zones A/B is 2 standard deviations away: A/B line=grand avg+ A3*(avg stdev)*(2/3)= 40.46

X-bar chart Is it “in control”? Rule 2: fail: points 9 and 10 are in zone A Rule 1: okay, no points outside of zone A Rules 3 and 4: okay Conclusion: Not in statistical control.

Take Home Messages Statistical process control is a method for systematically identifying inconsistencies. Probabilities are often based on a Gaussian process Control charts provide a systematic method for evaluating if a process is under control.

“A foolish consistency is the hobgoblin of little minds” --R. W. Emerson “An intelligent consistency is a virtue in an integrated global economy” --Anonymous

Download ppt "Introductory Statistics By Peter Woolf University of Michigan Michigan Chemical Process Dynamics and Controls Open Textbook version."

Similar presentations