Presentation is loading. Please wait.

Presentation is loading. Please wait.

End-User Debugging of Machine Learning Systems Weng-Keen Wong Oregon State University School of Electrical Engineering and Computer Science

Similar presentations


Presentation on theme: "End-User Debugging of Machine Learning Systems Weng-Keen Wong Oregon State University School of Electrical Engineering and Computer Science"— Presentation transcript:

1 End-User Debugging of Machine Learning Systems Weng-Keen Wong Oregon State University School of Electrical Engineering and Computer Science

2 Collaborators Margaret Burnett Simone Stumpf Tom Dietterich Jon Herlocker Erin Fitzhenry Lida Li Ian Oberst Vidya Rajaram Russell Drummond Erin Sullivan FacultyGrad StudentsUndergrads

3 Papers Stumpf S., Rajaram V., Li L., Burnett M., Dietterich T., Sullivan E., Drummond R., Herlocker J. (2007). Toward Harnessing User Feedback For Machine Learning. In Proceedings of IUI Stumpf, S., Rajaram V., Li L., Wong, W.-K., Burnett, M., Dietterich, T., Sullivan, E., Herlocker, J. (2008) Interacting Meaningfully with Machine Learning Systems: Three Experiments. (Submitted to IJHCS) Stumpf, S., Sullivan, E., Fitzhenry, E., Oberst, I., Wong, W.-K., Burnett., M. (2008). Integrating Rich User Feedback into Intelligent User Interfaces. In Proceedings of IUI 2008.

4 Motivation Date: Mon, 28 Apr :59:00 (PST) From: John Doe To: Weng-Keen Wong Subject: CS 162 Assignment I can’t get my Java assignment to work! It just won’t compile and it prints out lots of error messages! Please help! public class MyFrame extends JFrame { private AsciiFrameManager reader; private JPanel displayPanel; public MyFrame(String filename) throws Exception { reader = new AsciiFrameManager(filename); displayPanel = new JPanel();... CS 162 John Doe Trash ? Machine learning tool adapts to end user Similar situation in recommender systems, smart desktops, etc.

5 Motivation Date: Mon, 28 Apr :51:00 (PST) From: Bella Bose To: Weng-Keen Wong Subject: Teaching Assignments I’ve compiled the teaching preferences for all the faculty. Here are the teaching assignments for next year: Fall Quarter CS 160 (Computer Science Orientation) – Paul Paulson CS 161 (Introduction to Programming I) – Chris Wallace CS 162 (Introduction to Programming II) – Weng-Keen Wong... Trash Machine Learning systems are great when they work correctly, aggravating when they don’t The end user is the only person at the computer Can we let end users correct machine learning systems?

6 6 Motivation Learn to correct behavior quickly Sparse data on start Concept drift Rich end-user knowledge Effects of user feedback on accuracy? Effects on users?

7 Overview Explanation End user feedback End-User Machine Learning Algorithm

8 Related Work Explanation Expert Systems (Swartout 83, Wick and Thompson 92) TREPAN (Craven and Shavlik 95) Description Logics (McGuinness 96) Bayesian networks (LaCave and Diez 00) Additive classifiers (Poulin et al. 06) Others (Crawford et al. 02, Herlocker et al. 00) End user interaction Active Learning (Cohn et al. 96, many others) Constraints (Altendorf et al. 05, Huang and Mitchell 06) Ranks (Radlinski and Joachims 05) Feature Selection (Raghavan et al. 06) Crayons (Fails and Olsen 03) Programming by Demonstration (Cypher 93, Lau and Weld 99, Lieberman 01)

9 9 Outline 1.What types of explanations do end users understand? What types of corrective feedback could end users provide? (IUI 2007) 2.How do we incorporate this feedback into a ML algorithm? (IJHCS 2008) 3.What happens when we put this together? (IUI 2008)

10 What Types of Explanations do End Users Understand? Thinkaloud study with 13 participants Classify Enron s Explanation systems: rule-based, keyword-based, similarity-based Findings: Rule-based best but not a clear winner Evidence indicates multiple explanation paradigms needed

11 What types of corrective feedback could end users provide? Suggested corrective feedback in response to explanations: 1. Adjust importance of word 2. Add/remove word from consideration 3. Parse / extract text in a different way 4. Word combinations 5. Relationships between messages/people

12 12 Outline 1.What types of explanations do end users understand? What types of corrective feedback could end users provide? (IUI 2007) 2.How do we incorporate this feedback into a ML algorithm? (IJHCS 2008) 3.What happens when we put this together? (IUI 2008)

13 Incorporating Feedback into ML Algorithms Two approaches: Constraint-based User co-training

14 Constraint-based approach Constraints: 1.If weight on word reduced or word removed, remove the word as a feature 2.If weight of word increased, word assumed to be important for that folder 3.If weight of word increased, word is a better predictor for that folder than other words Estimate parameters for Naive Bayes using MLE with these constraints

15 Standard Co-training Create classifiers C 1 and C 2 based on the two independent feature sets. Repeat i times Add most confidently classified messages by any classifier to training data Rebuild C 1 and C 2 with the new training data

16 User Co-training C USER = “ Classifier ” based on user feedback C ML = Machine learning algorithm For each “ session ” of user feedback Add most confidently classified messages by C USER to training data Rebuild C ML with the new training data

17 User Co-training C USER = “ Classifier ” based on user feedback C ML = Machine learning algorithm For each “ session ” of user feedback Add most confidently classified messages by C USER to training data Rebuild C ML with the new training data We’ll expand the inner loop on the next slide

18 User Co-training For each folder f, let vector v f = words with weights increased by the user For each message m in the unlabeled set For each folder f, Compute Prob f from the machine learning classifier Score f =# of words in v f appearing in the message * Prob f Score m =Score fmax –Score other Sort Score m for all messages in decreasing order Select the top k messages to add to the training set along with their folder label f max Rebuild C ML with the new training data

19 Constraint-based vs User co-training Constraint-based Difficult to set “hardness” of constraint Constraints often already satisfied End-user can over-constrain the learning algorithm Slow User co-training Requires unlabeled s in inbox Better accuracy than constraint-based

20 Results Feedback from keyword-based paradigm Feedback from similarity-based paradigm

21 21 Outline 1.What types of explanations work for end users? What types of corrective feedback could end users provide? (IUI 2007) 2.How do we incorporate this feedback into a ML algorithm? (IJHCS 2008) 3.What happens when we put this together? (IUI 2008)

22 Experiment: program 22

23 Experiment: Procedure Intelligent system to classify s into folders 43 English-speaking, non-CS students Background questionnaire Tutorial ( program and folders) Experiment task on feedback set Correct folders. Add, remove, change weight on keywords. 30 interaction logs Post-session questionnaire 23

24 Experiment: Data Enron data set 9 folders 50 training messages 10 each for 5 folders with folder labels 50 feedback messages For use in experiment Same for each participant 1051 test messages For evaluation after experiment 24

25 Experiment: Classification algorithm “User co-training” Two classifiers: User, Naïve Bayes Slight modification on user classifier Score f =sum of weights in v f appearing in the message Weights can be modified interactively by user 25

26 Results: Accuracy improvements of rich feedback 26 Rich Feedback: participant folder labels and keyword changes Folder feedback: participant folder labels Subject Accuracy Δ over folder feedback

27 Results: Accuracy improvements of rich feedback 27 Rich Feedback: participant folder labels and keyword changes Baseline: original Enron labels Subject Accuracy Δ over baseline

28 Results: Accuracy improvements of rich feedback 28 Rich Feedback: participant folder labels and keyword changes Baseline: original Enron labels Folder feedback: participant folder labels Accuracy Δ Subject

29 Results: Accuracy summary 60% of participants saw accuracy improvements, some very substantial Some dramatic decreases More time between filing s or more folder assignments → higher accuracy 29

30 Interesting bits 1. Need to communicate the effects of the user’s corrective feedback 2. Unstable classifier period With sparse training data, a single new training example can dramatically change the classifier’s decision boundaries Wild fluctuations in classifier’s predictions frustrate end users Causes “wall of red”

31 Interesting bits: Unstable classifier period 31 Moved test s into training set to look for effect on accuracy (Baseline, participant 101)

32 Interesting bits 3. “Unlearning” important, especially to correct undesirable changes 4. Gender differences Females took longer to complete Females added twice as many keywords Comment more on unlearning

33 Interesting directions for HCI 1. Gender differences 2. More directed debugging 3. Other forms of feedback 4. Communicating effects of corrective feedback Users need to detect the system is listening to their feedback 5. Explanations Form Fidelity

34 Interesting directions for Machine Learning 1. Algorithms for learning from corrective feedback 2. Modeling reliability of user feedback 3. Explanations 4. Incorporating new features

35 35 Future work ML Whyline (with Andy Ko)

36 For more information 36


Download ppt "End-User Debugging of Machine Learning Systems Weng-Keen Wong Oregon State University School of Electrical Engineering and Computer Science"

Similar presentations


Ads by Google