Presentation is loading. Please wait.

Presentation is loading. Please wait.

DATA ACQUISITION: FOCUSING ON THE CHALLENGE Gerald J. Hahn and Necip Doganaksoy Adjunct Faculty, RPI GE Global Research Presentation to 2003 Quality &

Similar presentations

Presentation on theme: "DATA ACQUISITION: FOCUSING ON THE CHALLENGE Gerald J. Hahn and Necip Doganaksoy Adjunct Faculty, RPI GE Global Research Presentation to 2003 Quality &"— Presentation transcript:

1 DATA ACQUISITION: FOCUSING ON THE CHALLENGE Gerald J. Hahn and Necip Doganaksoy Adjunct Faculty, RPI GE Global Research Presentation to 2003 Quality & Productivity Research Conference IBM Watson Research Center May 2003

2 THE OBVIOUS, THE EXPECTATION, THE REALITY AND THE CHALLENGES The Obvious –Statistical analyses are based upon data (and assumptions about sampled populations, etc.) –Such analyses are only as good as the data upon which they are based –Bad data lead to more complex, less powerful or invalid analyses The Expectation: Much attention is given to the data acquisition process in training practitioners and statisticians The Reality: Little attention is generally given to the data acquisition process in training practitioners and statisticians The Challenge: Move data-acquisition to front burner –Understand limitations of observational data –Develop disciplined process for data acquisition –Emphasize data acquisition at all levels of training

3 PROBLEMS WITH OBSERVATIONAL DATA Problems with available databases –Data obtained for purposes other than statistical analysis –Data resides in different data bases –Data purging Problems with observational data –Missing variables, values and events –Unrepresentative (non-random) observations –Loss of traceability –Loss of timeliness –Inconsistent or imprecise measurements –Correlated variables and limited variability Observation from the trenches (Kati Illouz, GE): Data owners tend to be overly optimistic about their data Key point (not always recognized by practitioners): The quality, rather than the quantity, of the data is what counts Data inadequacies help define future information needs Two types of situations –Routine operations (e.g., process monitoring) –Special investigations (e.g., process optimization)

4 PROCESS FOR DATA ACQUISITION (DEUPM) (in spirit of Six Sigma) Proposed process: –D: Define the problem –E: Evaluate the available data –U: Understand data acquisition opportunities and limitations –P: Plan data acquisition and implement –M: Monitor, clean data, analyze and validate Example: Demonstrate, in 6 months, ten-year reliability for new washing machine design Basic idea: Disciplined, targeted plan for data acquisition

5 D: DEFINE THE PROBLEM Steps: Define –Specific questions to be answered and resulting actions –Population or process of interest –Data we would ideally like to have (Wayne Nelson) Washing machine design example: –Stated objective: Show within 6 months and with 95% confidence that following goals can be met: 95% reliability in first year of operation 90% reliability after five years 80% reliability after ten years (reliability defined as no repair or servicing need) –Resulting actions: Proceed with full scale production (if validated) –Further question: How can we improve? –Population: 6 million machines to be built in next 5 years –Ideal data: Field repair and servicing needs for 6 million future machines

6 E: EVALUATE THE AVAILABLE DATA Steps: –Understand the process and its physical basis –Analyze existing data –Ask: Is available data sufficient (and if not, what is needed)? Washing machine design example –Participate in design reviews, FMEAs (Failure Mode and Effects Analysis), etc. –Analyze In-house test results on previous designs Field data on previous designs Component and sub-assembly test results (e.g., motor testing) –Conclusions Previous design does not meet current reliability goals Proposed new design corrects many past problems Possible concern: Introduction of new failure modes Component and sub-assembly test results look promising No information about system performance in realistic environmentneed such information

7 U: UNDERSTAND DATA ACQUISITION OPPORTUNITIES AND LIMITATIONS Steps: Gain understanding of –Data acquisition process, measurement error, etc. –Limitations in data acquisition –Limitations in inferencing Washing machine design example –Data acquisition: Conduct in-house accelerated cycling of washing machines Simulate 3.5 years of operation per month Evaluate weekly for failures Take apart at end of test and measure degradation –Limitations in data acquisition 6 months of testing 36 available test stands 3 prototype lots –Limitations in inferencing Assume prototype lots are from same population as high volume production Assume failures, etc. are cycle dependent Assume realistic simulation of field environment Conclusion: This is analytic (not enumerative) study; statistical confidence bounds only partially capture uncertainty

8 P:PLAN DATA ACQUISITION AND IMPLEMENT Steps: Develop and evaluate specific strategy, including –Testing conditions or operational environment –Samples size and selection process –Assessment of sampling plan –Testing protocol –Pilot study Washing machine example –Testing conditions: Run washing machines with full load of soiled towels, mixed with sand, wrapped in plastic bag –Sample size: 12 units each from 3 prototype lots –After 3 months Remove 4 units from each of 3 lots and measure degradation Replace with 12 units from 4 th lot –After 6 months: To have 95% probability of demonstrating 80% reliability after 10 years in field with 95% confidence requires actual reliability to be 95%--or sample size of 96 if actual reliability is 90% (assuming Weibull distribution with shape parameter of 2.5) –Specify protocol, including high-precision measurements, definition of failure, data recording requirements, replacements of failed units, etc. –Pilot study: Three washing machines run for one week

9 M: MONITOR, CLEAN DATA, ANALYZE AND VALIDATE Steps: –Clean dataas gathered –Monitor to ensure that process is being followed –Conduct preliminary analyses; determine whether process need be changed –Conduct final analysis –Validate: Propose appropriate validation testing Washing machine design example –Clean data: Develop proactive checks for missing or inconsistent data that automatically query data provider –Monitor: Continued involvement –Analyze failure data after 1 week, 1 month and 3 months; identify problems for correction –Do final analyses after 6 months (failure and degradation data) –Validate: propose added programs: Continue 6 of 36 units on test beyond 6 months Beta test 100 machines with company employees and 60 in laundromats Audit sample 6 production units each week: Test five for 1 week; one for 3 months Develop system for capturing and analyzing field reliability data DISCIPLINED, TARGETED DATA ACQUISITION PROCESS

10 TEACHING DATA ACQUISITION: PROPOSAL Preferred: Course in data acquisition as second required course in statistics for practitioners and aspiring statisticians Compromise: Devote one third of one-semester introductory course to data acquisition Industrial: Devote one third of short courses to data acquisition In addition: Discuss data acquisition process and challenges in all data analysis examples P.S. Most courses on design of experiments and survey sampling cover only tip of iceberg and are offered to limited audience

11 PROPOSED COURSE IN DATA ACQUISITION: OUTLINE Motivation: Need for good data and limitations of observational studies Key concepts –Populations, sampling frames, processes, random (and other) samples –Analytic versus enumerative studies –Measurement error Disciplined, targeted process for data acquisition (and examples) Some formal approaches: –Design of experiments (including factorial, fractional factorial, response surface) –Survey sampling (including questionnaire construction, non-response problems) –Data acquisition systems (e.g., for SPC, field reliability, student performance assessment) –Some special studies and situations (e.g., life testing, dosage studies, attribute ys) Data acquisition as a learning process (Box et al) Graphical data analyses Sample size determination: Analytical and simulation approaches In process data cleaning Statistics in the news: Data acquisition considerations (Source: Laurie Snell Chance Student generated studies and critiques (Source: Bill Hunter, 1977 American Statistician article Some Ideas about Teaching Design of Experiments)

12 ELEVATOR SPEECH We need put the the horse (data acquisition) before the CART (data analysis) Specific proposals –Formal process for data acquisition –High focus on training, including required course on data acquisition To analyze data is human--to plan to gather the right data is divine P.S. For copies of slides, contact Comments based upon chapter from Statistics in the Corporate WorldConnecting the Dots (tentative title) to be published in 2004 (we hope) by Wileyyour inputs invited!

Download ppt "DATA ACQUISITION: FOCUSING ON THE CHALLENGE Gerald J. Hahn and Necip Doganaksoy Adjunct Faculty, RPI GE Global Research Presentation to 2003 Quality &"

Similar presentations

Ads by Google