Presentation on theme: "Improving intelligent assistants for desktop activities Simone Stumpf, Margaret Burnett, Thomas Dietterich Oregon State University School of Electrical."— Presentation transcript:
Improving intelligent assistants for desktop activities Simone Stumpf, Margaret Burnett, Thomas Dietterich Oregon State University School of Electrical Engineering and Computer Science
2 Overview Background Activity switching problems How to improve activity prediction Reducing interruptions Improving accuracy Conclusion
3 Background: TaskTracer System Intelligent PIM system The user organizes everyday life into different activities that have a set of resources e.g., “teach cs534”, “iui-07 paper”, etc. How it works The user indicates the current activity TaskTracer tracks events (File open, etc.) TaskTracer automatically associates resources with the current activity TaskTracer provides useful information finding services through intelligent assistants
4 Example TaskTracer services TaskExplorer Presents a list of resources for each activity for easier access FolderPredicor Predicts the location of resources useful for current activity
5 Activity switching problems To provide services: Assumes that users switches activity so data is not too noisy TaskPredictor assists by predicting activity, based on resource use AAAI web page IL local folder IL netw IL DOCAAAI PPT Physical cost (mouseclicks, keypresses) Cognitive cost (deciding to switch)
6 TaskPredictor Window-document segment (WDS) = unbroken time period in which a window in focus is showing a single document Assumptions A prediction is only necessary when the WDS changes A prediction is only made if predictor is confident enough Shen et al. IUI 2006 Source of features: words in window titles, file pathnames, website URLs, (document content) Hybrid approach: Naïve Bayes and SVM Accuracy: 80% on 10% coverage
7 Reducing interruptions…
8 Problems in activity prediction Potential notifications still high Wait to see if user stays on WDS to reduce number of notifications Physical cost to interact (mouseclicks, keypresses) Cognitive cost to interact (deciding to switch)
9 Activity boundaries Iqbal et al. CHI 2005, 2006 Interruption costs are lower on boundaries Costs high within a unit So what happens if the user does stay on WDS? Prepare IL paper Download latest version Edit document Save document Upload latest version Open document
10 Reducing interruptions Move from single-window prediction to multiple-window prediction (Shen et al, IJCAI 2007) Identify user costs to make prediction Determine opportunities intelligently Trade-off of user cost/benefit Make predictions at boundaries, then commit changes on user feedback
11 Improving accuracy…
12 Why improve accuracy? 100% accuracy rare TaskPredictor and other predictors may make wrong predictions Limited feedback – only labels Users know more – can we harness it? How can learning systems explain their reasoning to the user? What is the users’ feedback to the learning system? (Stumpf et al. IUI 2007)
13 Pre-study explanation generation n … n … Ripper NB n n … … Enron farmer-d 122 s, 4 folders (Bankrupt, Enron News, Personal, Resume) Rule- based Keyword- based Similarity- based Concrete, and simplified but faithful
14 Classification Standard Weka implementations Stratified 5-fold cross-validation Stop words and stemming Features: sender, set of recipients, words in Subject and Body Ripper generates ordered set of rules NB learns weights on words
15 Rule- based
16 Keyword- based 5 words in having highest positive weight 5 words in having most negative weight
17 Similarity- based Most decrease if removed from training set Up to 5 words in both s having highest weights
19 Giving feedback Participants were asked to provide feedback to improve the predictions No restrictions on form of feedback
20 Responses to explanations Negative comments (20%) …those are arbitrary words. Confusion (8%) I don’t understand why there is a second . Positive comments (19%) The Resume rules are good. Understanding (17%) I see why it used “Houston” as negative. Correcting or suggesting changes (32%) Different words could have been found in common, like “Agreement”, “Ken Lay.”
21 Understanding explanations Rule-based best, then Keyword-based Serious problems with Similarity-based Factors: General idea of the algorithm I guess it went in here because it was similar to another I had already put in that folder. Keyword-based explanations’ negative keyword list I guess I really don’t understand what it’s doing here. If those words weren’t in the message? Word choices’ topical appropriateness “Day”, “soon”, and “listed” are incredibly arbitrary keywords.
22 Preferring explanations Preference trend follows understanding Factors: Perceived reasoning soundness and accuracy I think this is a really good filter… Clear communication of reasoning I like this because it shows relationships between other messages in the same folder rather than just spitting out a bunch of rules with no reason behind it. Informal wording This is funny... (laughs)... This seems more personable. Seems like a narration rather than just straight rules. It’s almost like a conversation.
23 The user explains back Select different features (53%) It should put in ‘Enron News’ if it has the keywords “changes” and “policy”. Adjust weights (12%) The second set of words should be given more importance. Parse/extract in different way (10%) I think that it should look for typos in the punctuation for indicators toward ‘Personal’. Employ feature combinations (5%) I think it would be better if it recognized a last and a first name together. Use relational features (4%) This message should be in ‘EnronNews’ since it is from the chairman of the company.
24 Underlying knowledge sources Commonsense (36%) “Qualifications” would seem like a really good Resume word, I wonder why that’s not down here. English (30%) Does the computer know the difference between “resumé” and “resume”? Domain (15%) Different words could have been found in common like … “Ken Lay”.
25 Current work More than 50% of suggestions could be easily incorporated New algorithms to handle changes to weights and keywords User feedback as constraints on MLE of the parameters Co-Training Investigate effects on accuracy using study data Constraints: Not hurting but not much improvement either Co-training approach better
26 Conclusion User costs important Higher accuracy Timing of prediction notifications Usefulness of predictions Explanations of why a prediction was made