Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lessons Learned from Applications of Machine Learning Robert C. Holte University of Alberta.

Similar presentations


Presentation on theme: "Lessons Learned from Applications of Machine Learning Robert C. Holte University of Alberta."— Presentation transcript:

1 Lessons Learned from Applications of Machine Learning Robert C. Holte University of Alberta

2 Source Material Personal involvement in a commercial project to use ML to detect oil spills in satellite images Other people’s papers on specific applications Other people’s lessons learned –e.g. discussions with Foster Provost

3 Lesson 1 – ML works Numerous examples of machine learning being successfully applied in science and industry –saving time or money –doing something that would not have been possible otherwise –sometimes superior to human performance Corollary: it would be beneficial to have an on-line repository of success stories

4 Example – D. Michie American Express (UK) Loan applications automatically categorized by a statistical method: –definitely accept –definitely reject –refer to a human expert Human experts 50% accurate predicting loan defaults Learned rules 70% accurate

5 Oil Spill project – the task In a continuous stream of satellite radar images, identify the images that are likely to contain one or more oil slicks, highlight the suspected region(s), and forward the selected, annotated images to a human expert for a final decision and action. –Macdonald Dettwiler Associates

6 oil slick

7 Oil Spill project - team MDA - satellite image processing experts Canada Centre for Remote Sensing - human expert in recognizing oil slicks in radar images attempts to build a classifier by hand failed me, Stan Matwin, Miroslav Kubat 1995-97 see Machine Learning, vol.30, February 1998

8 Lesson 2 – Research Spinoffs Many new, general research issues arose during the oil spill project, but could not be properly investigated within the scope of the project. A great deal of follow-on research is needed. Corollary: when you write up an application, look for general techniques, issues, and phenomena

9 Research Issues (1) hand-picked data (purchased) –not a “representative” sample small data sets (9 images, 937 dark regions) –risk of overtuning task formulation –classifying images, regions, or pixels ? –subcategories of non-slicks ?

10 Research Issues (2) imbalanced data sets (41 oil slicks, 896 non-slicks) –accuracy inappropriate performance measure –standard learners optimize accuracy, tend to classify everything as “not an oil slick” data is in distinct batches –“leave one batch out” (LOBO) testing method –how to learn from batched data ?

11 Research Issues (3) feature engineering –image processing parameter settings affects learning in 2 ways: which regions are extracted from the image the features of each region which are calculated and then fed into the learning algorithm –best settings for one were not best for the other

12 oil slick Good Classification, Poor Region

13 Lesson 3 – Need Version Control Over the course of the project we had a vast variety of data sets –images from three different types of satellite –a growing set of images for each type –a different data set for every different setting of the image processing parameters and many variations on the learning algorithms, experimental method, etc.

14 Lesson 4 – Understand the Deployment Context What is the task ? classification, filtering, control, diagnosis… non-uniform misclassification costs –costs vary with user, time, not known during learning some tasks require explanations in addition to classifications, or classifiers that can be understood by domain experts Corollary: your experiments and performance measure should reflect how the system will be used and judged after deployment

15 Example – Evans & Fisher Printing press “banding” problem ML built a decision tree to predict if banding would occur or not. Some features exogenous (e.g. humidity), others were controllable (e.g. ink viscosity). Actually, used to set the controllable variables given the values of the exogenous ones But different variables were under the control of different craftsmen who would not necessarily co- operate with each other

16 Lesson 5 – Expect Skepticism It will be very hard to convince a decision- maker to actually deploy something new. It will help if the learned system is in a form that the decision-maker is familiar with or can easily comprehend, and is consistent with all available background knowledge.

17 Counterexample – Evans & Fisher One of the learned rules flatly contradicted the advice of an expert consultant, and the latter was more intuitive. Upon further analysis by the local engineers, the learned rule was adopted.

18 Lesson 6 – Exploit Human Experts Capture as much expertise as you can Involve the expert in the induction process –e.g. interactive induction (Evans & Fisher, PROTOS ) –e.g. Structured Induction (Alen Shapiro)

19 Lesson 7 – Start Simple 1R, Naïve Bayes, Perceptron, 1-NN often work surprisingly well provide a performance baseline successes and failures inform you about your data

20 Lesson 8 – Visualize Visualize your data –e.g. project onto 1 or 2 dimensions Visualize your classifier’s performance –e.g. with ROC or “cost” curves –e.g. in instance space (which examples are problematic ?) –e.g. systematic error

21 Lessons 1.ML works 2.Applications spin off research issues 3.Need version control for experiments 4.Understand the deployment context 5.Expect skepticism from decision-makers 6.Exploit human experts 7.Start simple 8.Visualize your data and your classifier


Download ppt "Lessons Learned from Applications of Machine Learning Robert C. Holte University of Alberta."

Similar presentations


Ads by Google