Statistics and ANOVA ME 470 Fall 2013.

Slides:

Advertisements

Similar presentations

1 COMM 301: Empirical Research in Communication Lecture 15 – Hypothesis Testing Kwan M Lee.

Advertisements

BPS - 5th Ed. Chapter 241 One-Way Analysis of Variance: Comparing Several Means.

1 Manufacturing Process A sequence of activities that is intended to achieve a result (Juran). Quality of Manufacturing Process depends on Entry Criteria.

Inference for Regression

Statistical Techniques I EXST7005 Lets go Power and Types of Errors.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16.

The Two Factor ANOVA © 2010 Pearson Prentice Hall. All rights reserved.

© 2010 Pearson Prentice Hall. All rights reserved The Complete Randomized Block Design.

Copyright (c) 2009 John Wiley & Sons, Inc.

Chapter 7 Analysis of ariance Variation Inherent or Natural Variation Due to the cumulative effect of many small unavoidable causes. Also referred to.

Software Quality Control Methods. Introduction Quality control methods have received a world wide surge of interest within the past couple of decades.

The Simple Regression Model

1 Psych 5500/6500 The t Test for a Single Group Mean (Part 5): Outliers Fall, 2008.

1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.

Introduction to Probability and Statistics Linear Regression and Correlation.

Inferences About Process Quality

Today Concepts underlying inferential statistics

Richard M. Jacobs, OSA, Ph.D.

1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 11 Regression.

Psy B07 Chapter 1Slide 1 ANALYSIS OF VARIANCE. Psy B07 Chapter 1Slide 2 t-test refresher  In chapter 7 we talked about analyses that could be conducted.

The Chi-Square Distribution 1. The student will be able to  Perform a Goodness of Fit hypothesis test  Perform a Test of Independence hypothesis test.

Inference for regression - Simple linear regression

Chapter 13: Inference in Regression

Describing distributions with numbers

1 Chapter 1: Introduction to Design of Experiments 1.1 Review of Basic Statistical Concepts (Optional) 1.2 Introduction to Experimental Design 1.3 Completely.

Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Comparing Three or More Means 13.

CHAPTER 18: Inference about a Population Mean

Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.

Introduction to Linear Regression

Statistics and ANOVA ME 470 Spring Planning Product Development Process Concept Development Concept Development System-Level Design System-Level.

Statistics and ANOVA ME 470 Fall We will use statistics to make good design decisions! We will categorize populations by the mean, standard deviation,

Brian Macpherson Ph.D, Professor of Statistics, University of Manitoba Tom Bingham Statistician, The Boeing Company.

PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?

1 Chapter 1: Introduction to Design of Experiments 1.1 Review of Basic Statistical Concepts (Optional) 1.2 Introduction to Experimental Design 1.3 Completely.

Analysis of Variance 1 Dr. Mohammed Alahmed Ph.D. in BioStatistics (011)

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.

Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.

© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.

Copy Data to Minitab Statistical Tests Open Excel File > Highlight All Data & Column Headings > Copy Launch Minitab: Start > Programs > Minitab > Minitab.

Hypothesis Testing. Why do we need it? – simply, we are looking for something – a statistical measure - that will allow us to conclude there is truly.

Agresti/Franklin Statistics, 1 of 88 Chapter 11 Analyzing Association Between Quantitative Variables: Regression Analysis Learn…. To use regression analysis.

Chapter 6: Analyzing and Interpreting Quantitative Data

CHAPTER 27: One-Way Analysis of Variance: Comparing Several Means

Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.

PCB 3043L - General Ecology Data Analysis.

ANOVA, Regression and Multiple Regression March

Statistical Techniques

Analysis of Variance STAT E-150 Statistical Methods.

Chapter 1 Introduction to Statistics. Section 1.1 Fundamental Statistical Concepts.

Designs for Experiments with More Than One Factor When the experimenter is interested in the effect of multiple factors on a response a factorial design.

Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 8 th Edition Chapter 9 Hypothesis Testing: Single.

Chapter 51Introduction to Statistical Quality Control, 7th Edition by Douglas C. Montgomery. Copyright (c) 2012 John Wiley & Sons, Inc.

Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &

AP Stats Check In Where we’ve been… Chapter 7…Chapter 8… Where we are going… Significance Tests!! –Ch 9 Tests about a population proportion –Ch 9Tests.

Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.

Week 2 Normal Distributions, Scatter Plots, Regression and Random.

Stats Methods at IC Lecture 3: Regression.

Comparing Three or More Means

PCB 3043L - General Ecology Data Analysis.

Basic Practice of Statistics - 5th Edition

Chapter 11 Simple Regression

Chi-Square Goodness of Fit

CHAPTER 29: Multiple Regression*

AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…

Process Capability.

Essentials of Statistics for Business and Economics (8e)

CHAPTER 18: Inference about a Population Mean

Chapter 9 Hypothesis Testing: Single Population

STATISTICS INFORMED DECISIONS USING DATA

Presentation transcript:

Statistics and ANOVA ME 470 Fall 2013

Here are some interesting on-the-spot designs from the past and this class. Winner, Spring 2010 15” Tall 8$ Cost 0.533 Cost/Height Fall 2009, 0.27 Cost/Height

Fall 2011 Height = 12 Cost = 6 Cost/Height = 0.5 Fall 2011 Height = 24 Cost = 12 Cost/height = 0.5

I really enjoy the on-the-spot design. What did you learn about the design process? There are many challenges in product development Trade-offs Dynamics Details Time pressure Economics Why do I love product development? Getting something to work Satisfying societal needs Team diversity Team spirit Design is a process that requires making decisions.

Product Development Phases Planning Concept Development System-Level Design Detail Design Testing and Refinement Production Ramp-Up Concept Development Process You will practice the entire concept development process with your group project Mission Statement Development Plan Identify Customer Needs Establish Target Specifications Generate Product Concepts Select Product Concept(s) Test Product Concept(s) Set Final Specifications Plan Downstream Development So we are at the beginning and this slide differentiates between Product Development and Concept Development With their projects, students are going to accomplish the entire concept development process. Perform Economic Analysis Benchmark Competitive Products Build and Test Models and Prototypes

We will use statistics to make good design decisions! We will categorize populations by the mean, standard deviation, and use control charts to determine if a process is in control. We may be forced to run experiments to characterize our system. We will use valid statistical tools such as Linear Regression, DOE, and Robust Design methods to help us make those characterizations.

5.9L High Output Cummins Engine Cummins asked a capstone group to investigate improvements for turbo charger lubrication sealing. 5.9L High Output Cummins Engine

Cummins Inc. was dissatisfied with the integrity of their turbocharger oil sealing capabilities.

Here are pictures of oil leakage. Oil Leakage into Compressor Housing Oil Leakage on Impellor Plate The students developed four prototypes for testing. After testing, they wanted to know which solution to present to Cummins. You will analyze their data to make a suggestion.

How can we use statistics to make sense of data that we are getting? Quiz for the day What can we say about our M&Ms? We will look at the results first and then you can do the analysis on your own.

Statistics can help us examine the data and draw justified conclusions. What does the data look like? What is the mean, the standard deviation? What are the extreme points? Is the data normal? Is there a difference between years? Did one class get more M&Ms than another? If you were packaging the M&Ms, are you doing a good job? If you are the designer, what factors might cause the variation?

Why would we care about this data in design? If we designed the manufacturing line, we might be interested in the variation in the number of M&Ms per bag. If we were designing the bags, we would want to know the maximum number of M&Ms that we might hold in a bag. There are lots of reasons that we might be interested as designers in the number of M&Ms in a bag.

If I am a plant manager, do I like one distribution better than another? Boxplots give us an idea of the variability of the data. Notice the outliers that we have each year. If I am the plant manager, do I like one distribution better than another?

How do we interpret the boxplot? S N O x 2 . 4 5 3 o p l t f (Q2), median Q1 Q3 largest value excluding outliers smallest value excluding outliers outliers are marked as ‘*’ Values between 1.5 and 3 times away from the middle 50% of the data are outliers.

This is a density description of the data. This is a density description of the data. Some people prefer this arrangement. I am weird in that I look at everything. I don’t omit anything.

The Anderson-Darling normality test is used to determine if data follow a normal distribution. If the p-value is lower than the pre-determined level of significance, the data do not follow a normal distribution.

Anderson-Darling Normality Test Measures the area between the fitted line (based on chosen distribution) and the nonparametric step function (based on the plot points). The statistic is a squared distance that is weighted more heavily in the tails of the distribution. Anderson-Smaller Anderson-Darling values indicates that the distribution fits the data better. The Anderson-Darling Normality test is defined as: H0: The data follow a normal distribution. Ha: The data do not follow a normal distribution. Another quantitative measure for reporting the result of the normality test is the p-value. A small p-value is an indication that the null hypothesis is false. (Remember: If p is low, H0 must go.) P-values are often used in hypothesis tests, where you either reject or fail to reject a null hypothesis. The p-value represents the probability of making a Type I error, which is rejecting the null hypothesis when it is true. The smaller the p-value, the smaller is the probability that you would be making a mistake by rejecting the null hypothesis. It is customary to call the test statistic (and the data) significant when the null hypothesis H0 is rejected, so we may think of the p-value as the smallest level α at which the data are significant.

You can use the “fat pencil” test in addition to the p-value. Note that our p value is quite low, which makes us consider rejecting the fact that the data are normal. However, in assessing the closeness of the points to the straight line, “imagine a fat pencil lying along the line. If all the points are covered by this imaginary pencil, a normal distribution adequately describes the data.” Montgomery, Design and Analysis of Experiments, 6th Edition, p. 39 If you are confused about whether or not to consider the data normal, it is always best if you can consult a statistician. The author has observed statisticians feeling quite happy with assuming very fat lines are normal. For more on Normality and the Fat Pencil http://www.statit.com/support/quality_practice_tips/normal_probability_plot_interpre.shtml

Adapted from M. Hinckley, Quality by Design, 1996 Walter Shewhart Developer of Control Charts in the late 1920’s (READ SLIDE TEXT) A caveat about this definition: We do not use errors and mistakes as synonyms. However, in this presentation we draw on the work of many people, and some authors will use the word mistake as a synonym with error. Where one of these is quoted, we have not changed their words. We do indicate their less precise use of the word mistake by italicizing it on the slide. You did Control Charts in DFM. There the emphasis was on tolerances. Here the emphasis is on determining if a process is in control. If the process is in control, we want to know the capability. www.york.ac.uk/.../ histstat/people/welcome.htm

What does the data tell us about our process? SPC is a continuous improvement tool which minimizes tampering or unnecessary adjustments (which increase variability) by distinguishing between special cause and common cause sources of variation Control Charts have two basic uses: Give evidence whether a process is operating in a state of statistical control and to highlight the presence of special causes of variation so that corrective action can take place. Maintain the state of statistical control by extending the statistical limits as a basis for real time decisions. If a process is in a state of statistical control, then capability studies my be undertaken. (But not before!! If a process is not in a state of statistical control, you must bring it under control.) SPC applies to design activities in that we use data from manufacturing to predict the capability of a manufacturing system. Knowing the capability of the manufacturing system plays a crucial role in selecting the concepts.

Voice of the Process Control limits are not spec limits. Control limits define the amount of fluctuation that a process with only common cause variation will have. Control limits are calculated from the process data. Any fluctuations within the limits are simply due to the common cause variation of the process. Anything outside of the limits would indicate a special cause (or change) in the process has occurred. Control limits are the voice of the process.

Cp = (allowable range)/6s = (USL - LSL)/6s The capability index depends on the spec limit and the process standard deviation. Cp = (allowable range)/6s = (USL - LSL)/6s LSL USL (Upper Specification Limit) LCL UCL (Upper Control Limit) http://lorien.ncl.ac.uk/ming/spc/spc9.htm

Upper Control Limit for 2008 If there is no difference in this year, students should be getting between 6 and 10 M&M with the average being 8. Lower Control Limit for 2008

Minitab prints results in the Session window that lists any failures. Test Results for I Chart of StackedTotals by C4 TEST 1. One point more than 3.00 standard deviations from center line. Test Failed at points: 129 TEST 2. 9 points in a row on same side of center line. Test Failed at points: 15, 110, 111, 112, 113 TEST 5. 2 out of 3 points more than 2 standard deviations from center line (on one side of CL). Test Failed at points: 52, 66, 119, 160, 161 TEST 6. 4 out of 5 points more than 1 standard deviation from center line (on Test Failed at points: 91, 97 TEST 7. 15 points within 1 standard deviation of center line (above and below CL). Test Failed at points: 193, 194, 195, 196, 197, 198, 199, 200

This chart is extremely helpful for deciding what statistical technique to use. X Data Single X Multiple Xs X Data X Data Discrete Continuous Discrete Continuous Logistic Regression Discrete Multiple Logistic Regression Multiple Logistic Regression Single Y Discrete Chi-Square Y Data Y Data Y Data One-sample t-test Two-sample t-test ANOVA Continuous Simple Linear Regression Multiple Linear Regression Continuous ANOVA Multiple Ys Note: our Y data is really not continuous. You should not use this analysis without consulting a statistician. I actually talked to Dr. DeVasher and he said, when you have a lot of data, it is ok to use this as a first pass.

When to use ANOVA The use of ANOVA is appropriate when Dependent variable is continuous Independent variable is discrete, i.e. categorical Independent variable has 2 or more levels under study Interested in the mean value There is one independent variable or more We will first consider just one independent variable Ok, the dependent variable is not really continuous. We normally don’t have portions of M&Ms. However, it is ok for this in class example, AND we let you eat M&Ms.

ANOVA Analysis of Variance Used to determine the effects of categorical independent variables on the average response of a continuous variable Choices in MINITAB One-way ANOVA Use with one factor, varied over multiple levels Two-way ANOVA Use with two factors, varied over multiple levels Balanced ANOVA Use with two or more factors and equal sample sizes in each cell General Linear Model Use anytime! I always use the General Linear Model because the software then figures out if you have any of the special cases.

Practical Applications Determine if our break pedal sticks more than other companies Compare 3 different suppliers of the same component Compare 6 combustion recipes through simulation Determine the variation in the crush force Compare 3 distributions of M&M’s And MANY more …

The null hypothesis for ANOVA is that there is no difference between years. General Linear Model: StackedTotals versus C4 Factor Type Levels Values C4 fixed 3 2008, 2010, 2011 Analysis of Variance for StackedTotals, using Adjusted SS for Tests Source DF Seq SS Adj SS Adj MS F P C4 2 6.6747 6.6747 3.3374 4.71 0.010 Error 203 143.8559 143.8559 0.7086 Total 205 150.5306 S = 0.841813 R-Sq = 4.43% R-Sq(adj) = 3.49% This p value indicates that the assumption that there is no difference between years is not correct!

What are some conclusions that you can reach? Ok, we can see that there is one data point that doesn’t fit anything! It also looks like we have one year with lower variation.

Is there a statistical difference between years? Wow – there looks like there is a big difference until you read the numbers – then there isn’t much difference between years.

The p value indicates that there is a difference between the years The p value indicates that there is a difference between the years. The Tukey printout tells us which years are different. Grouping Information Using Tukey Method and 95.0% Confidence C4 N Mean Grouping 2010 57 7.9 A 2008 86 7.7 A B 2011 63 7.4 B Means that do not share a letter are significantly different. The averages for 2010 and 2008 are not statistically different. The averages for 2008 and 2011 are not statistically different.

This is what Minitab should look like when students open it This is what Minitab should look like when students open it. Column C1 has the individual totals from students in 2008. Column C2 has the totals from 2010. C3 has the totals from spring 2011.There are previous years in other columns.

Command: >Stat>Basic Statistics>Display Descriptive Statistics Have students open the student supplied file for today. It is a Minitab file called M&Mtotals, student file.

Why would we care about this data in design? If we designed the manufacturing line, we might be interested in the variation in the number of M&Ms per bag. If we were designing the bags, we would want to know the maximum number of M&Ms that we might hold in a bag. There are lots of reasons that we might be interested as designers in the number of M&Ms in a bag.

If I am a plant manager, do I like one distribution better than another? Boxplots give us an idea of the variability of the data. Notice the outliers that we have each year. If I am the plant manager, do I like one distribution better than another?

This is a density description of the data. This is a density description of the data. Some people prefer this arrangement. I am weird in that I look at everything. I don’t omit anything.

>Stat>Basic Statistics>Normality Test Select 2008 This test is for normality of all of the data. This is because, at this point, I don’t know whether the years are different or not. If I have reason to suspect that the data are different, I would do a normality test on each year. This will be taken care of later in the ANOVA test.

The Anderson-Darling normality test is used to determine if data follow a normal distribution. If the p-value is lower than the pre-determined level of significance, the data do not follow a normal distribution.

Command: >Stat>Control Charts>Variable Charts for Individuals>Individuals Note: This is assuming that the packages are coming in sequence off of the manufacturing line. If this is actually true it is a coincidence.

When doing control charts for ME470, select all tests. It may be hard to see, but highlight the “tests” tab. Why do we care about 15 points in a row within 1 standard deviation of center line? It isn’t normal! At Honda, there was a suspension problem – car always pulled left. They were always up near the upper spec limit

Minitab prints results in the Session window that lists any failures. Test Results for I Chart of StackedTotals by C4 TEST 1. One point more than 3.00 standard deviations from center line. Test Failed at points: 129 TEST 2. 9 points in a row on same side of center line. Test Failed at points: 15, 110, 111, 112, 113 TEST 5. 2 out of 3 points more than 2 standard deviations from center line (on one side of CL). Test Failed at points: 52, 66, 119, 160, 161 TEST 6. 4 out of 5 points more than 1 standard deviation from center line (on Test Failed at points: 91, 97 TEST 7. 15 points within 1 standard deviation of center line (above and below CL). Test Failed at points: 193, 194, 195, 196, 197, 198, 199, 200

Upper Control Limit for 2008 If there is no difference in this year, students should be getting between 6 and 10 M&M with the average being 8. Lower Control Limit for 2008

Command: >Stat>ANOVA>General Linear Model

What are some conclusions that you can reach? Ok, we can see that there is one data point that doesn’t fit anything! It also looks like we have one year with lower variation.

Is there a statistical difference between years? Wow – there looks like there is a big difference until you read the numbers – then there isn’t much difference between years.

The null hypothesis for ANOVA is that there is no difference between years. General Linear Model: StackedTotals versus C4 Factor Type Levels Values C4 fixed 3 2008, 2010, 2011 Analysis of Variance for StackedTotals, using Adjusted SS for Tests Source DF Seq SS Adj SS Adj MS F P C4 2 6.6747 6.6747 3.3374 4.71 0.010 Error 203 143.8559 143.8559 0.7086 Total 205 150.5306 S = 0.841813 R-Sq = 4.43% R-Sq(adj) = 3.49% This p value indicates that the assumption that there is no difference between years is not correct!

Command: >Stat>ANOVA>General Linear Model

The p value indicates that there is a difference between the years The p value indicates that there is a difference between the years. The Tukey printout tells us which years are different. Grouping Information Using Tukey Method and 95.0% Confidence C4 N Mean Grouping 2010 57 7.9 A 2008 86 7.7 A B 2011 63 7.4 B Means that do not share a letter are significantly different. The averages for 2010 and 2008 are not statistically different. The averages for 2008 and 2011 are not statistically different.

Here is a useful reference if you feel that you need to do more reading. http://www.StatisticalPractice.com This recommendation is thanks to Dr. DeVasher. You can also use the help in Minitab for more information.

Let’s look at what happened with plain M&M’s We are going to look at these for fun. See if students can get these same results on their own.

What do you see with the boxplot? The boxplot makes me think that 2009 is lower and that maybe 2004 is higher

Do we see anything that looks unusual? Except for that strange outlier, I really don’t see anything weird at all.

Both 2004 and 2005 look remarkably in control Both 2004 and 2005 look remarkably in control. 2006 looks awful, 2009 isn’t too bad.

General Linear Model: stackedTotal versus StackedYear Factor Type Levels Values StackedYear fixed 4 2004, 2005, 2006, 2009 Analysis of Variance for stackedTotal, using Adjusted SS for Tests Source DF Seq SS Adj SS Adj MS F P StackedYear 3 1165.33 1165.33 388.44 149.39 0.000 Look at low P-value! Error 266 691.63 691.63 2.60 Total 269 1856.96 S = 1.61249 R-Sq = 62.75% R-Sq(adj) = 62.33% Unusual Observations for stackedTotal Obs stackedTotal Fit SE Fit Residual St Resid 25 27.0000 23.4667 0.2082 3.5333 2.21 R 34 20.0000 23.4667 0.2082 -3.4667 -2.17 R 209 40.0000 21.7917 0.1700 18.2083 11.36 R 215 21.0000 17.4917 0.2082 3.5083 2.19 R R denotes an observation with a large standardized residual.

Grouping Information Using Tukey Method and 95.0% Confidence StackedYear N Mean Grouping 2004 60 23.5 A 2006 90 21.8 B 2005 60 20.7 C 2009 60 17.5 D Means that do not share a letter are significantly different. Tukey 95.0% Simultaneous Confidence Intervals Response Variable stackedTotal All Pairwise Comparisons among Levels of StackedYear StackedYear = 2004 subtracted from: StackedYear Lower Center Upper -------+---------+---------+--------- 2005 -3.531 -2.775 -2.019 (---*---) 2006 -2.365 -1.675 -0.985 (-*--) 2009 -6.731 -5.975 -5.219 (--*--) -------+---------+---------+--------- -5.0 -2.5 0.0 Zero is not contained in the intervals. Each year is statistically different. (2004 got the most!)

StackedYear = 2005 subtracted from: StackedYear Lower Center Upper -------+---------+---------+--------- 2006 0.410 1.100 1.790 (-*--) 2009 -3.956 -3.200 -2.444 (--*--) -------+---------+---------+--------- -5.0 -2.5 0.0 StackedYear = 2006 subtracted from: 2009 -4.990 -4.300 -3.610 (--*--)

Implications for design Is there a difference in production performance between the plain and peanut M&Ms? What do you think? The plain M&Ms have different means, but maybe this is simply because we have a narrower resolution.

Name:____________ Section No:__________ CM:_______ Individual Quiz Name:____________ Section No:__________ CM:_______ You will be given a bag of M&M’s. Do NOT eat the M&M’s. Count the number of M&M’s in your bag. Record the number of each color, and the overall total. You may approximate if you get a piece of an M&M. When finished, you may eat the M&M’s. Note: You are not required to eat the M&M’s. Color Number % Brown Yellow Red Orange Green Blue Other Total

Instructions for Minitab Installation

Minitab on DFS:

Let’s Look at Toyota Recalls Nov 02, 2009 – US: 3.8 million Toyota and Lexus vehicles again recalled due to floor mat problem, this time for all driver's side mats.[5] Nov 26, 2009 – US: floor mat recall amended to include brake override[32] and increased to 4.2 million vehicles.[citation needed] Jan 21, 2010 – US: 2.3 million Toyota vehicles recalled due to faulty accelerator pedals[6] (of those, 2.1 million already involved in floor mat recall).[3] Jan 27, 2010 – US: 1.1 million Toyotas added to amended floor mat recall.[33] Jan 29, 2010 – Europe, China: 1.8 million Toyotas added to faulty accelerator pedal recall.[7]

Let’s consider the Toyota problem. What was the first clue that there was a problem? Starting in 2003, NHSTA received information regarding reports of accelerator pedals that were operating improperly. How many reports causes the manufacturer to suspect a problem? To issue a recall NHTSA would need to prove that a substantial number of failures attributable to the defect have occurred or is likely to occur in consumers’ use of the vehicle or equipment and that the failures pose an unreasonable risk to motor vehicle safety. ODI conducted a VOQ-based assessment of UA rates on the subject Lexus in comparison to two peer vehicles and concluded the Lexus LS400t vehicles were not overrepresented in the VOQ database. How might we look at two populations and decide this? VOQ – Vehicle Owner questionnaire; UA – unintended acceleration, ODI - Office of Defects Investigation Unintended Acceleration Office of Defects Investigation Vehicle Owner Questionnaire