Six Sigma Green Belt Training Presented by Harry H Holdorf

Six Sigma Green Belt Training Presented by Harry H Holdorf

Analyze Basic Stats Intro to Hypothesis Testing Lean Sigma
Multi-Vari Studies Identifying Root Cause

Analyze Lean Sigma Basic Stats Intro to Hypothesis Testing
Multi-Vari Studies Identifying Root Cause

Basic Statistics Objectives
Purpose of Statistics Types of Data Normal Distribution Z Values Measures of Locations Graphing Data

Purpose of Basic Statistics
The purpose of Basic Statistics is to: Provide a numerical summary of the data being analyzed. Data (n) Factual information organized for analysis. Numerical or other information represented in a form suitable for processing by computer Values from scientific experiments. Provide the basis for making inferences about the future. Provide the foundation for assessing process capability. Provide a common language to be used throughout an organization to describe processes. Relax….it won’t be that bad!

What is statistics? Numbers, data and statistics
100, 140, 213, 230, 180, 211, 120, 160, 200, 110, 260, 235, 280, 180, 300 NUMBERS Human Weight (pound): 100, 140, 213, 230, 180, 211, 120, 160, 200, 110, 260, 235, 280, 180, 300 Statistics Mean Value = pound Minimum = 100 pound Maximum = 300 pound Range = 200 pounds

Statistical Notation – Cheat Sheet
Summation The Standard Deviation of sample data The Standard Deviation of population data The variance of sample data The variance of population data The range of data The average range of data Multi-purpose notation, i.e. # of subgroups, # of classes The absolute value of some term Greater than, less than Greater than or equal to, less than or equal to An individual value, an observation A particular (1st) individual value For each, all, individual values The Mean, average of sample data The grand Mean, grand average The Mean of population data A proportion of sample data A proportion of population data Sample size Population size

Parameters vs. Statistics
Population: All the items that have the “property of interest” under study. Sample: A significantly smaller subset of the population used to make an inference. Population Sample Population Parameters: Arithmetic descriptions of a population µ,  , P, 2, N Sample Statistics: Arithmetic descriptions of a sample X-bar , s, p, s2, n

The Performance Standard can be expressed by either type of data
Two Types of Data Discrete Data Continuous Data Values can be measured to any degree of precision... Values can vary only by whole units… Examples: Binary - pass/fail, on-time/late Count - number of students, number of questions on a test Attribute - light is red / yellow / green Examples: Miles Time Weight Dollar amount on invoice The Performance Standard can be expressed by either type of data

Possible Values for the Variable
Discrete Variables Discrete Variable Possible Values for the Variable The number of defective needles in boxes of 100 diabetic syringes 0,1,2, …, 100 The number of individuals in groups of 30 with a Type A personality 0,1,2, …, 30 The number of surveys returned out of 300 mailed in a customer satisfaction study. 0,1,2, … 300 The number of employees in 100 having finished high school or obtained a GED 0,1,2, … 100 The number of times you need to flip a coin before a head appears for the first time 1,2,3, … (note, there is no upper limit because you might need to flip forever before the first head appears)

Possible Values for the Variable
Continuous Variables Continuous Variable Possible Values for the Variable The length of prison time served for individuals convicted of first degree murder All the real numbers between a and b, where a is the smallest amount of time served and b is the largest. The household income for households with incomes less than or equal to $30,000 All the real numbers between a and $30,000, where a is the smallest household income in the population The blood glucose reading for those individuals having glucose readings equal to or greater than 200 All real numbers between 200 and b, where b is the largest glucose reading in all such individuals

Definitions of Scaled Data
Understanding the nature of data and how to represent it can affect the types of statistical tests possible. Nominal Scale – data consists of names, labels, or categories. Cannot be arranged in an ordering scheme. No arithmetic operations are performed for nominal data. Ordinal Scale – data is arranged in some order, but differences between data values either cannot be determined or are meaningless. Interval Scale – data can be arranged in some order and for which differences in data values are meaningful. The data can be arranged in an ordering scheme and differences can be interpreted. Ratio Scale – data that can be ranked and for which all arithmetic operations including division can be performed. (division by zero is of course excluded) Ratio level data has an absolute zero and a value of zero indicates a complete absence of the characteristic of interest.

Possible nominal level data values for the variable
Nominal Scale Qualitative Variable Possible nominal level data values for the variable Blood Types A, B, AB, O State of Residence Alabama, …, Wyoming Country of Birth United States, China, other

Possible Ordinal level data values
Ordinal Scale Qualitative Variable Possible Ordinal level data values Automobile Sizes Subcompact, compact, intermediate, full size, luxury Product rating Poor, good, excellent Baseball team classification Class A, Class AA, Class AAA, Major League

Interval Scale Interval Variable Possible Scores
IQ scores of students in Black Belt Training 100… (the difference between scores is measurable and has meaning but a difference of 20 points between 100 and 120 does not indicate that one student is 1.2 times more intelligent) Weight of a Person 100 lbs 115 lbs 150 lbs 175 lbs 210 lbs

Ratio Scale Ratio Variable Possible Scores
Grams of fat consumed per adult in the United States 0 … (If person A consumes 25 grams of fat and person B consumes 50 grams, we can say that person B consumes twice as much fat as person A. If a person C consumes zero grams of fat per day, we can say there is a complete absence of fat consumed on that day. Note that a ratio is interpretable and an absolute zero exists.)

Converting Attribute Data to Continuous Data
Continuous Data is always more desirable for statistical analysis In many cases Attribute Data can be converted to Continuous Which is more useful? 15 scratches or Total scratch length of 9.25” 22 foreign materials or 2.5 fm/square inch 200 defects or 25 defects/hour

Normal Distribution The Normal Distribution is the most recognized distribution in statistics. What are the characteristics of a Normal Distribution? Only random error is present Process free of assignable cause Process free of drifts and shifts So what is present when the data is Non-normal?

The Normal Curve The Normal Curve is a smooth, symmetrical, bell-shaped curve, generated by the density function. It is the most useful continuous probability model as many naturally occurring measurements such as heights, weights, etc. are approximately Normally Distributed.

The Normal (Z) Distribution
Characteristics of Normal Distribution (Gaussian curve) are: It is considered to be the most important distribution in statistics. The total area under the curve is equal to 1. The distribution is mounded and symmetric; it extends indefinitely in both directions, approaching but never touching the horizontal axis. All processes will exhibit a normal curve shape if you have pure random variation (white noise). The Z distribution has a Mean of 0 and a Standard Deviation of 1. The Mean divides the area in half, 50% on one side and 50% on the other side. The Mean, Median and Mode are at the same data point. +6 -1 -3 -4 -5 -6 -2 +4 +3 +2 +1 +5

Standard Normal Distribution
Each combination of Mean and Standard Deviation generates a unique Normal curve: “Standard” Normal Distribution: Has a μ = 0, and σ = 1 Data from any Normal Distribution can be made to fit the standard Normal by converting raw scores to standard scores. Z-scores measure how many Standard Deviations from the mean a particular data-value lies.

Distribution For a normal distribution we care about the mean m
Measurements Vary From Each Other But They Form a Pattern That, If Stable , Can Be Described as a Distribution Distributions Can Differ In: Location Spread Shape For a normal distribution we care about the mean m , and the standard deviation s .

Non-Normal Distributions
1 Skewed 2 Kurtosis 3 Multi-Modal 4 Granularity

The Statistical Problem
From A Statistical Perspective, There are only Two Problems (imperfections) Target LSL USL Target LSL USL Centering – the process is not on target (mean shift) Spread – the process variation is too large (s) Target LSL USL Center Process Reduce Spread

Levels of Standard Deviation
= 0.04 s = 0.41 s = 0.81 Process A Process B Process C Which process is the BEST? s = standard deviation WHY ?

As the standard deviation increases
Distribution & DPMO DPMO = defects per million units . = Proportion of observations outside spec * 1,000,000 Lower spec Upper spec. B distribution A distribution C distribution Defects Defects As the standard deviation increases DPMO increases.

More on Normal Distribution
Normal Distribution and Defects (1 Sigma process, Z=1) Defects Defects 1s Mean 1s Upper Customer Specification Limit USL Lower Customer Specification Limit LSL 1s( pronounced one sigma) = one standard deviation What percentage of defects would you expect?

Z predicts the defect level
O u t p 68% 95% 4 3 2 1 - . N o r m a l C v e n d P b i y A s 99.73% Spec Limit = USL LSL Z SL - s Targeting, Mean Spread, standard deviation Long or short term X General Formula for Z: Between Percent of area under normal (Z) curve m - 3 s and + 3 99.73 99.7 - 2 + 2 95.44 95 -1 + 1 68.26 68

Six Sigma Capability & Mean Shifts
USL LSL A 1.5 Sigma Shift is typical, over time. With a 6 Sigma Process Capability - performance is always acceptable. In the Real World “SHIFT Happens”

The Empirical Rule

Why Assess Normality? While many processes in nature behave according to the Normal Distribution, many processes in business, particularly in the areas of service and transactions, do not. There are many types of distributions: There are many statistical tools that assume Normal Distribution properties in their calculations. So understanding just how “Normal” the data are will impact how we look at the data.

Tools for Assessing Normality
The shape of any Normal curve can be calculated based on the Normal Probability density function. Tests for Normality basically compare the shape of the calculated curve to the actual distribution of your data points. So how do we assess for normality? Watch that curve!

if the P-value is less than .05, your data are non-normal.
Goodness-of-Fit The Anderson-Darling test measures departure of the actual data from the expected Normal Distribution. The Anderson-Darling Goodness-of-Fit test assesses the magnitude of these departures using an Observed minus Expected formula. P-value if the P-value is less than .05, your data are non-normal.

If the Data Are Not Normal, Don’t Panic!
Normal Data are not common in the transactional world. There are lots of meaningful statistical tools you can use to analyze your data (more on that later). It just means you may have to think about your data in a slightly different way. Don’t touch that button!

Normality Exercise Exercise objective: To demonstrate how to test for Normality. Generate the graphical summary using the “Descriptive Statistics.MTW” file. Use only the columns Dist A and Dist D.

Descriptive Statistics
Measures of Location (central tendency) Mean Trimmed Mean Median Mode Measures of Variation (dispersion) Range Interquartile Range Standard deviation Variance

Descriptive Statistics
Open the MINITAB™ Project “Measure Data Sets.mpj” and select the worksheet “basicstatistics.mtw”

Non-Normal Right (Positive) Skewed
Moment coefficient of Skewness will be close to zero for symmetric distributions, negative for left Skewed and positive for right Skewed.

Bimodal Distributions
2 Different Distributions 2 different machines 2 different operators 2 different administrators

Extreme Bi-Modal (Outliers)

Bi-Modal – Multiple Outliers

Measures of location

Measures of Location Mean is: Commonly referred to as the average.
The arithmetic balance point of a distribution of data. Population Sample

Measures of Location Calculating Mean in Minitab

Measures of Location Median is:
The mid-point, or 50th percentile, of a distribution of data. Arrange the data from low to high, or high to low. It is the single middle value in the ordered list if there is an odd number of observations It is the average of the two middle values in the ordered list if there are an even number of observations

Measures of Location Trimmed Mean is a:
Compromise between the Mean and Median. The Trimmed Mean is calculated by eliminating a specified percentage of the smallest and largest observations from the data set and then calculating the average of the remaining observations Useful for data with potential extreme values. Stat>Basic Statistics>Display Descriptive Statistics…>Statistics…> Trimmed Mean Descriptive Statistics: Data Variable N N* Mean SE Mean TrMean StDev Minimum Q1 Median Data Variable Q3 Maximum Data

Measures of Location Mode is:
The most frequently occurring value in a distribution of data. Mode = 5

Measure of variation

Use Range or Interquartile Range when the data distribution is Skewed.
Measures of Variation Range is the: Difference between the largest observation and the smallest observation in the data set. A small range would indicate a small amount of variability and a large range a large amount of variability. Interquartile Range is the: Difference between the 75th percentile and the 25th percentile. Descriptive Statistics: Data Variable N N* Mean SE Mean StDev Minimum Q1 Median Q3 Data Variable Maximum Data Use Range or Interquartile Range when the data distribution is Skewed.

Measures of Variation Standard Deviation is:
Equivalent of the average deviation of values from the Mean for a distribution of data. A “unit of measure” for distances from the Mean. Use when data are symmetrical. Population Sample Cannot calculate population Standard Deviation because this is sample data.

Measures of Variation Variance is the:
Average squared deviation of each individual data point from the Mean. Sample Population

Graphing Data

Introduction to Graphing
The purpose of Graphing is to: Identify potential relationships between variables. Identify risk in meeting the critical needs of the Customer, Business and People. Provide insight into the nature of the X’s which may or may not control Y. Show the results of passive data collection. In this section we will cover… Box Plots Scatter Plots Dot Plots Time Series Plots Histograms

Data Sources Data sources are suggested by many of the tools that have been covered so far: Process Map X-Y Matrix FMEA Fishbone Diagrams Examples are: 1. Time Shift Day of the week Week of the month Season of the year 3. Operator Training Experience Skill Adherence to procedures 2. Location/position Facility Region Office 4. Any other sources?

Graphical Concepts The characteristics of a good graph include:
Variety of data Selection of Variables Graph Range Information to interpret relationships Explore quantitative relationships

The Histogram A Histogram displays data that have been summarized into intervals. It can be used to assess the symmetry or Skewness of the data. To construct a Histogram, the horizontal axis is divided into equal intervals and a vertical bar is drawn at each interval to represent its frequency (the number of values that fall within the interval).

Histogram Caveat All the Histograms below were generated using random samples of the data from the worksheet “Graphing Data.mtw”. Be careful not to determine Normality simply from a Histogram plot, if the sample size is low the data may not look very Normal.

Variation on a Histogram
Using the worksheet “Graphing Data.mtw” create a simple Histogram for the data column called granular.

Dot Plot The Dot Plot can be a useful alternative to the Histogram especially if you want to see individual values or you want to brush the data.

Box Plot Box Plots summarize data about the shape, dispersion and center of the data and also help spot outliers. Box Plots require that one of the variables, X or Y, be categorical or Discrete and the other be Continuous. A minimum of 10 observations should be included in generating the Box Plot. Middle 50% of Data 50th Percentile (Median) 25th Percentile 75th Percentile min(1.5 x Interquartile Range or minimum value) Outliers Maximum Value Mean

Box Plot Anatomy * Box Outlier Upper Limit: Q3+1.5(Q3-Q1)
Median Upper Whisker Lower Whisker Upper Limit: Q3+1.5(Q3-Q1) Lower Limit: Q1+1.5(Q3-Q1) Q3: 75th Percentile Q1: 25th Percentile Q2: Median 50th Percentile Box * Outlier

Box Plot Examples What can you tell about the data expressed in a Box Plots? Eat this – then check the Box Plot!

Box Plot Example

Individual Value Plot Enhancement

Attribute Y Box Plot Box Plot with an Attribute Y (pass/fail) and a Continuous X Graph> Box Plot…One Y, With Groups…Scale…Transpose value and category scales

Attribute Y Box Plot

Individual Value Plot The Individual Value Plot when used with a Categorical X or Y enhances the information provided in the Box Plot: Recall the inherent problem with the Box Plot when a bimodal distribution exists (Box Plot looks perfectly symmetrical) The Individual Value Plot will highlight the problem Stat>ANOVA> One-Way (Unstacked )>Graphs…Individual value plot, Box Plots of data

Jitter Example Once your graph is created, click once on any of the data points (that action should select all the data points). Then go to MINITAB™ menu path: “Editor> Edit Individual Symbols>Identical Points>Jitter…” Increase the Jitter in the x-direction to .075, click OK, then click anywhere on the graph except on the data points to see the results of the change.

Time Series Plot Time Series Plots allow you to examine data over time. Depending on the shape and frequency of patterns in the plot, several X’s can be found as critical or eliminated. Graph> Time Series Plot> Simple... Use Analysis Continuous data Find shifts, trends, outliers Logical time order See natural spread of process

Time Series Example Looking at the Time Series Plot below, the response appears to be very dynamic. What other characteristic is present?

Summary Understand the different types of data
At this point, you should be able to: Understand the different types of data Describe a normal distribution and check for normality in Minitab Check for measures of location and variation in Minitab Create different graphs in Minitab

Six Sigma Green Belt Training Presented by Harry H Holdorf

Similar presentations

Presentation on theme: "Six Sigma Green Belt Training Presented by Harry H Holdorf"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Six Sigma Green Belt Training Presented by Harry H Holdorf

Similar presentations

Presentation on theme: "Six Sigma Green Belt Training Presented by Harry H Holdorf"— Presentation transcript:

Similar presentations

About project

Feedback