Presentation is loading. Please wait.

Presentation is loading. Please wait.

Framework for Inferring Ongoing Activities of Workstation Users Yifen Huang, Sophie Wang and Tom Mitchell School of Computer Science Carnegie Mellon University.

Similar presentations


Presentation on theme: "Framework for Inferring Ongoing Activities of Workstation Users Yifen Huang, Sophie Wang and Tom Mitchell School of Computer Science Carnegie Mellon University."— Presentation transcript:

1 Framework for Inferring Ongoing Activities of Workstation Users Yifen Huang, Sophie Wang and Tom Mitchell School of Computer Science Carnegie Mellon University

2 Activity Example: Learned Activity Frame from TM email corpus [1448 msgs, Feb 2004] ActivityCluster4 (105 emails) Keywords: CALO, TFC, SRI, examples, heads, labeled, Leslie, HMM, contacts, email, task, estimates, zero, reschedule, baseline, Rebecca PrimarySenders: Mitchell(39), Kaelbling(7), McCallum(6), Perrault(4), UserActivityFraction: 105/1448=.072 of total emails IntensityOfUserInvolvement: created 37% of traffic; (default 31%) ExtractedNames: Leslie(23), Rebecca(21), Carlos(12), Ray(10), Stuart(9), William(9), April(9), … ExtractedDates: Wed(39), Tues(33), Fri(25), Mon(23), Thurs(20),… Feb 18 (16) ExtractedTimes: 5pm(24), noon(14), morning(8), 8am(7), before 5pm(7),... RequestEmails:,, …

3 ActivityCluster5 (105 emails) Keywords: CALO, TFC, SRI, examples, heads, labeled, Leslie, HMM, contacts, email, task, estimates, zero, reschedule, baseline, Rebecca PrimarySenders: Mitchell(39), Kaelbling(7), McCallum(6), Perrault(4), UserActivityFraction: 105/1448=.072 of total email IntensityOfUserInvolvement: created 37% of traffic; (default 31%) ExtractedNames: Leslie(23), Rebecca(21), Carlos(12), Ray(10), Stuart(9), William(9), April(9), … ExtractedDates: Wed(39), Tues(33), Fri(25), Mon(23), Thurs(20),… Feb 18 (16) ExtractedTimes: 5pm(24), noon(14), morning(8), 8am(7), before 5pm(7),... RequestEmails:,, … Activity Example: Learned Activity Frame from TM email corpus [1448 msgs, Feb 2004]

4 ActivityCluster5 (105 emails) Keywords: CALO, TFC, SRI, examples, heads, labeled, Leslie, HMM, contacts, email, task, estimates, zero, reschedule, baseline, Rebecca PrimarySenders: Mitchell(39), Kaelbling(7), McCallum(6), Perrault(4), UserActivityFraction: 105/1448=.072 of total email IntensityOfUserInvolvement: created 37% of traffic; (default 31%) ExtractedNames: Leslie(23), Rebecca(21), Carlos(12), Ray(10), Stuart(9), William(9), April(9), … ExtractedDates: Wed(39), Tues(33), Fri(25), Mon(23), Thurs(20),… Feb 18 (16) ExtractedTimes: 5pm(24), noon(14), morning(8), 8am(7), before 5pm(7),... RequestEmails:,, …

5 ActivityCluster5 (105 emails) Keywords: CALO, TFC, SRI, examples, heads, labeled, Leslie, HMM, contacts, email, task, estimates, zero, reschedule, baseline, Rebecca PrimarySenders: Mitchell(39), Kaelbling(7), McCallum(6), Perrault(4), UserActivityFraction: 105/1448=.072 of total email IntensityOfUserInvolvement: created 37% of traffic; (default 31%) ExtractedNames: Leslie(23), Rebecca(21), Carlos(12), Ray(10), Stuart(9), William(9), April(9), … ExtractedDates: Wed(39), Tues(33), Fri(25), Mon(23), Thurs(20),… Feb 18 (16) ExtractedTimes: 5pm(24), noon(14), morning(8), 8am(7), before 5pm(7),... RequestEmails:,, … I need to get to DARPA by COB tomorrow a list of CALO participants who need access to the IPTO booth. It seems to me we should ask for this for any of you who is likely to be there. Could you let me know asap if you *might* be there? No big deal if you end up not going. THanks, --r Activity Example: Learned Activity Frame from TM email corpus [1448 msgs, Feb 2004]

6 Content Inferring on-going activities by clustering, social network filtering and information extraction Getting information from the whole workstation Accepting user’s feedback Future work

7 Activity clusters and descriptions Inferring Activities Using Emails Clustering Social network filtering Information extraction

8 Unsupervised Learning of Activities 1.Cluster emails (Text) We use multi-nomial Naïve Bayes model and refine clusters by applying EM algorithm, Represent email by bag of words in subject and body (Social network) Subdivide each cluster based on graph of email co-recipients Make each clique of co-recipients a subcluster 2.For each cluster, extract information from the email text and headers

9 Email To: Bill@ cmu.edu Subj: fMRI meeting We need to meet soon to discuss the paper deadline. To: Sue @ cmu.edu Subj: Re: fMRI meeting Ok, I suggest Wednesday at 4pm. To: Bill@ cmu.edu Subj: Re: fMRI meeting See you then. Attached is the current draft. Calendar Directories Web Activity fMRI paper writing People: Sue, Bill Document: Meetings: Aug 24, Emails: 1423, 1644, Leader: Bill Deadline: Jan 15

10 Email To: Bill@ cmu.edu Subj: fMRI meeting We need to meet soon to discuss the paper deadline. To: Sue @ cmu.edu Subj: Re: fMRI meeting Ok, I suggest Wednesday at 4pm. To: Bill@ cmu.edu Subj: Re: fMRI meeting See you then. Attached is the current draft. Calendar Directories Web Activity fMRI paper writing People: Sue, Bill Document: Meetings: Aug 24, Emails: 1423, 1644, Leader: Bill Deadline: Jan 15

11 Getting Information from the Whole Workstation Bag of word features for any queries using Google desktop search We can produce feature vectors for meetings, person names, and project keywords. –Cluster initialization using project keywords –Co-clustering meetings and emails –Inferring any queries to activities

12 Cluster Initialization Using Bag of Features of Project Keywords from YH email corpus [623 msgs, 2004] DI: an improved version of random initialization (0.46) GI: bag of features from Google desktop search for user-provided keywords (0.44)

13 Content Inferring on-going activities by clustering, social network filtering and information extraction Getting information from the whole workstation Accepting user’s feedback Future work

14 Collecting User’s Feedback

15 Speclustering Model split specific topics from general topics G X W M N β S ξ π 1.Each document has a cluster label S. 2.For each word in a document, there is a hidden variable X to indicate the word is generated by the cluster specific topic S or by the general topic G. 3. Parameters can be estimated using the EM algorithm. Activity

16 EM Modification with User’s Feedback Email-cluster association –Re-assign posterior probability p(cluster|email) according to user’s approval or disapproval. Keyword-cluster association –Re-assign if the keyword is confirmed by the user and if the keyword is removed by the user. G X W M N β S ξ π

17 Folder Reconstruction Accuracy Using Speclustering Algorithm 149 feedback entries (76 keyword-cluster pairs, and 73 email-cluster pairs) Iteration accuracy

18 Future Work Jointly cluster meetings, people, files and other interesting entities. –preliminary results of jointly cluster emails and meetings Found good match between emails and meetings Didn’t visibly improve cluster quality Allow richer user feedback. Move from bag of features to structural data.


Download ppt "Framework for Inferring Ongoing Activities of Workstation Users Yifen Huang, Sophie Wang and Tom Mitchell School of Computer Science Carnegie Mellon University."

Similar presentations


Ads by Google