© 2013 WESTERN DIGITAL TECHNOLOGIES, INC. ALL RIGHTS RESERVED Machine Learning and Failure Prediction in Hard Disk Drives Dr. Amit Chattopadhyay Director.

Slides:

Advertisements

Similar presentations

Classification.. continued. Prediction and Classification Last week we discussed the classification problem.. – Used the Naïve Bayes Method Today..we.

Advertisements

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.

Linear Classifiers (perceptrons)

Indian Statistical Institute Kolkata

CMPUT 466/551 Principal Source: CMU

Decision making in episodic environments

How computers answer questions An introduction to machine learning Peter Barnum August 7, 2008.

x – independent variable (input)

Instance Based Learning

Learning From Data Chichang Jou Tamkang University.

1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.

Chapter 5 Data mining : A Closer Look.

Machine learning Image source:

Collaborative Filtering Matrix Factorization Approach

Learning at Low False Positive Rate Scott Wen-tau Yih Joshua Goodman Learning for Messaging and Adversarial Problems Microsoft Research Geoff Hulten Microsoft.

Machine learning Image source:

CSCI 347 / CS 4206: Data Mining Module 06: Evaluation Topic 07: Cost-Sensitive Measures.

Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.

More Machine Learning Linear Regression Squared Error L1 and L2 Regularization Gradient Descent.

Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.

INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.

6/28/2014 CSE651C, B. Ramamurthy1.  Classification is placing things where they belong  Why? To learn from classification  To discover patterns  To.

CISC 4631 Data Mining Lecture 03: Introduction to classification Linear classifier Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook.

Machine Learning CSE 681 CH2 - Supervised Learning.

Evaluating What’s Been Learned. Cross-Validation Foundation is a simple idea – “ holdout ” – holds out a certain amount for testing and uses rest for.

Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.

SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.

DIEGO AGUIRRE COMPUTER VISION INTRODUCTION 1. QUESTION What is Computer Vision? 2.

Machine learning system design Prioritizing what to work on

An Introduction to Support Vector Machines (M. Law)

A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.

School of Engineering and Computer Science Victoria University of Wellington Copyright: Peter Andreae, VUW Image Recognition COMP # 18.

CSSE463: Image Recognition Day 11 Lab 4 (shape) tomorrow: feel free to start in advance Lab 4 (shape) tomorrow: feel free to start in advance Test Monday.

CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.

1 CSC 4510, Spring © Paula Matuszek CSC 4510 Support Vector Machines (SVMs)

Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.

Classification Ensemble Methods 1

COMP24111: Machine Learning Ensemble Models Gavin Brown

Machine learning Image source:

CSSE463: Image Recognition Day 11 Due: Due: Written assignment 1 tomorrow, 4:00 pm Written assignment 1 tomorrow, 4:00 pm Start thinking about term project.

A Brief Introduction and Issues on the Classification Problem Jin Mao Postdoc, School of Information, University of Arizona Sept 18, 2015.

Notes on HW 1 grading I gave full credit as long as you gave a description, confusion matrix, and working code Many people’s descriptions were quite short.

CS 2750: Machine Learning The Bias-Variance Tradeoff Prof. Adriana Kovashka University of Pittsburgh January 13, 2016.

MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.

Machine learning Image source:

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Session 7: Face Detection (cont.)

CSSE463: Image Recognition Day 11

Table 1. Advantages and Disadvantages of Traditional DM/ML Methods

Supervised Learning Seminar Social Media Mining University UC3M

COMP61011 : Machine Learning Ensemble Models

Support Vector Machines (SVM)

Machine Learning & Data Science

CSSE463: Image Recognition Day 11

Collaborative Filtering Matrix Factorization Approach

Experiments in Machine Learning

Decision making in episodic environments

Advanced Artificial Intelligence Classification

MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING

MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING

Model generalization Brief summary of methods

Junheng, Shengming, Yunsheng 11/09/2018

CSSE463: Image Recognition Day 11

CSSE463: Image Recognition Day 11

CAMCOS Report Day December 9th, 2015 San Jose State University

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

An introduction to Machine Learning (ML)

Presentation transcript:

© 2013 WESTERN DIGITAL TECHNOLOGIES, INC. ALL RIGHTS RESERVED Machine Learning and Failure Prediction in Hard Disk Drives Dr. Amit Chattopadhyay Director of Technology Advanced Reliability Engineering Western Digital, San Jose - HEPiX, October 2015

© 2013 WESTERN DIGITAL TECHNOLOGIES, INC. ALL RIGHTS RESERVED 2 Acknowledgement: - Work done in collaboration with Sourav Chatterjee at Western Digital, San Jose

© 2013 WESTERN DIGITAL TECHNOLOGIES, INC. ALL RIGHTS RESERVED 3 What is Machine Learning - Algorithms that:  Improve their performance P  At some task T  With Experience E Task: predict an event in the future  Will this hard drive fail? Training data: provides Experience  Reliability Test Performance: how many mistakes does it make? In principle, similar to how a child learns from experience

© 2013 WESTERN DIGITAL TECHNOLOGIES, INC. ALL RIGHTS RESERVED 4 A simple task, can you identify the object? - Task T: Can you identify this object? Experience E:  Training Examples Performance:  How many does it get right?  Other measures possible Apple Pear Tomato Cow Dog Horse Often not simple manually coded Attributes Compare to explicit Parameter based algorithm: If (shape == round) then apple else { if (four legged) }

© 2013 WESTERN DIGITAL TECHNOLOGIES, INC. ALL RIGHTS RESERVED 5 Typical Workflow - Prediction Training Labels Training Images Training Image Features Testing Test Image Learned model

© 2013 WESTERN DIGITAL TECHNOLOGIES, INC. ALL RIGHTS RESERVED 6 Where is machine learning used? - Recommendation Engines A Spam filter Self driving Cars Smart machines such as NEST thermostat

© 2013 WESTERN DIGITAL TECHNOLOGIES, INC. ALL RIGHTS RESERVED 7 A more sophisticated example: Self driving car - Can learn much harder tasks Not limited by simple parametric attribute based decisions Training  Images from windshield camera  Labels: driver action Feature: each pixel Lesson: Sophisticated algorithms can often solve problems better than human beings A Training Image Label: Turn Left

© 2013 WESTERN DIGITAL TECHNOLOGIES, INC. ALL RIGHTS RESERVED 8 Kinds of Machine Learning problems - Predict: turn LEFT or RIGHT Predict: How many degrees to left? 30.5 degrees No training Examples

© 2013 WESTERN DIGITAL TECHNOLOGIES, INC. ALL RIGHTS RESERVED 9 Kinds of Machine Learning problems -

© 2013 WESTERN DIGITAL TECHNOLOGIES, INC. ALL RIGHTS RESERVED 10 Classification Problems - Functional approximation Define a function such that it divides positive and negatives  Linear: Easier and more robust (eg. Naïve Bayes, Logistic Regression)  Non Linear: Harder (eg. Support Vector Machine, Neural Network) Linear decision boundarynonlinear decision boundary Training Examples: Points in 2D space Label: Red/Green [, Red]

© 2013 WESTERN DIGITAL TECHNOLOGIES, INC. ALL RIGHTS RESERVED 11 Issues in categorization - Can we generalize?  May show very good accuracy with training data  However not so much with test data Think polynomial regression example  Model complexity: Order of polynomial

© 2013 WESTERN DIGITAL TECHNOLOGIES, INC. ALL RIGHTS RESERVED 12 Confusion matrix and ROC curves - Confusion Matrix  98 positives 2 negatives  98% accuracy not meaningful  Does it predict true negatives? ROC curves  Tradeoff between false positive and true positives  How many false positives are we willing to accept? False +ve rate True +ve rate Area under this curve defines success rate

© 2013 WESTERN DIGITAL TECHNOLOGIES, INC. ALL RIGHTS RESERVED 13 Disclaimer: This presentation is mostly about “what is possible” How is Machine Learning relevant to Hard Disk drives? -

© 2013 WESTERN DIGITAL TECHNOLOGIES, INC. ALL RIGHTS RESERVED 14 Let us demonstrate with 1 specific example on product A - Every week we produce a large number of drives of this product (several 10s of 1000s) ORTO RT Sample some ( ) drives from a 50,000 lot and run a stress test (ORT : Ongoing Reliability Test) Problem statement – ORT shows an ‘excursion’ for 3 weeks of production The entire production lot during this time gets ‘stop ship’ped Sitting on a large inventory of several $M that needs action ORT

© 2013 WESTERN DIGITAL TECHNOLOGIES, INC. ALL RIGHTS RESERVED 15 What can we do? - Traditional solution – based on tribal knowledge/engineering insight Define “good” and “bad” explicitly, based on specific limit checks (If A..., then good) Run reliability tests on 300 “good” drives  to convince ourselves these are good Lose 6 weeks - that is the duration of the reliability test Trust that the rest of “good” drives will behave this way Ship and make $ Machine Learning solution– Use Supervised Machine Learning on ORT data Supervisor: ORT failure rate – Pass test :“good” drive Fail test: “bad” drive Allow the algorithm to decide and build a predictive model of failure rate Key benefit: Predict the failure rate without having to run the test !!

© 2013 WESTERN DIGITAL TECHNOLOGIES, INC. ALL RIGHTS RESERVED 16 - How is this done?  Do a split on the ORT population, into “Training” and “Test”  Each drive goes through detailed characterization (about 2000 variables) before getting to ORT.  On the “Training” population:  Use a classifier (χ 2, Boosted forest,…) to choose the top features from this variable list.  Build a Logistic Regression model. We can use others as needed (SVM, Neural Nets, Naïve Bayes etc)  Calculate the failure probability of each drive  Based on failure probabilities, generate a ‘health hierarchy’/grading system of drives before they get shipped  Validate the model we just built on the “Test” set – it already has ORT data; how does the calculated failure probabilities match up

© 2013 WESTERN DIGITAL TECHNOLOGIES, INC. ALL RIGHTS RESERVED 17 Failure Rate: sensitivity to key features chosen by the classifier -

© 2013 WESTERN DIGITAL TECHNOLOGIES, INC. ALL RIGHTS RESERVED 18 Model building on the 1 st 50% (training set)  These drives actually ran the test, so we can see how this holds up against real data  If the learning was ideal, the blue line -would be the perfect classifier -Have all passers to the left, and all failures to the right

© 2013 WESTERN DIGITAL TECHNOLOGIES, INC. ALL RIGHTS RESERVED 19 Model validation on the 2 nd 50% (‘test’ set) - Then – Green dotted line: average actual Failure rate of this population This is all we had before - An average failure rate that looked bad Now – Lot more insight. 10x spread in failure rates Grades 1-4 : pretty good drives; Grades 8-10 : quite bad

© 2013 WESTERN DIGITAL TECHNOLOGIES, INC. ALL RIGHTS RESERVED 20 Measures of success – ROC curve and TTF chart -  Now we can use this algorithm on the entire production and calculate Failure Rate of each drive  Appropriate business actions taken accordingly  For this analysis, area under ROC curve ~ 85%- 87%  Algorithm making the right business call > 8 times out of 10.

© 2013 WESTERN DIGITAL TECHNOLOGIES, INC. ALL RIGHTS RESERVED 21 Summary -  Drive Failure Prediction using Initial (Factory) conditions has always been tricky.  Conventional methods do not have a high success rate – mostly under-predict  Machine Learning allows us to approach the problem in a different way  Still early, with continuous progress being made.  This analytical technique, coupled with periodic in-field measurements, can provide a robust framework for fleet management.

© 2013 WESTERN DIGITAL TECHNOLOGIES, INC. ALL RIGHTS RESERVED 22 - Thank you !