Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ranking Users for Intelligent Message Addressing

Similar presentations


Presentation on theme: "Ranking Users for Intelligent Message Addressing"— Presentation transcript:

1 Ranking Users for Intelligent Message Addressing
Vitor R. Carvalho and William Cohen Carnegie Mellon University Glasgow, April 2nd 2008

2 Outline Intelligent Message Addressing Models Data & Experiments
Auto-completion Mozilla Thunderbird Extension* Learning to Rank Results*

3

4 Ramesh Nallapati <ramesh@cs.cmu.edu> [Add]
William Cohen [Add] Akiko Matsui [Add] Yifen Huang [Add]

5 Ramesh Nallapati <ramesh@cs.cmu.edu> [Add]
William Cohen [Add] Akiko Matsui [Add] Yifen Huang [Add]

6 Ramesh Nallapati <ramesh@cs.cmu.edu> [Add]
Akiko Matsui [Add] Yifen Huang [Add]

7 einat <einat@cs.cmu.edu> [Add]
Ramesh Nallapati [Add] Jon Elsas [Add] Andrew Arnold [Add]

8 einat <einat@cs.cmu.edu> [Add]
Ramesh Nallapati [Add] Jon Elsas [Add] Andrew Arnold [Add]

9 Ramesh Nallapati <ramesh@cs.cmu.edu> [Add]
Jon Elsas [Add] Andrew Arnold [Add]

10 Tom Mitchell <tom@cs.cmu.edu> [Add]
Andrew Arnold [Add] Jon Elsas [Add] Frank Lin [Add]

11 Tom Mitchell <tom@cs.cmu.edu> [Add]
Andrew Arnold [Add] Jon Elsas [Add] Frank Lin [Add]

12 The Task: Intelligent Message Addressing
Predicting likely recipients of messages given: (1) contents of message being composed (2) other recipients already specified (3) a few initial letters of the intended recipient contact (intelligent auto-completion).

13 What for? Prevent high-cost management errors
Identifying people related to specific topics (or have specific relevant skills.) Relation to Expert Finding message ↔ (long) query addresses ↔ experts Improved Address Auto-completion Prevent high-cost management errors People just forget to add important recipients preventing costly misunderstandings communication delays missed opportunities. [Dom et al, 03; Campbell et al,03] Particularly in large corporations

14 How Frequent are These Errors?
Grep for “forgot”, “sorry” or “accident” in the Enron corpus - half a million real messages from a large corporation. “Sorry, I forgot to CC you his final offer” “Oops, I forgot to send it to Vince.” “Adding John to the discussion…..(sorry John)” “Sorry....missed your name on the cc: list!”. More frequent than expected at least 9.27% of the users forgot to add a desired recipient. At least 20.52% of the users were not included as recipients (even though they were intended recipients) in at least one received message. Lowerbound

15 Two Ranking Tasks TO+CC+BCC Prediction CC+BCC Prediction

16 Models Non-textual Models Expert Finding Models [Balog et al, 2006]
Frequency only Recency only Expert Finding Models [Balog et al, 2006] M1: Candidate Model M2: Document Model Rocchio (TFIDF) K-Nearest Neighbors (KNN) Rank Aggregation of the above

17 Non-Textual Models Frequency model Recency Model
Rank by total number of messages in training set Recency Model Exponential decay on chronologically ordered messages.

18 Expert Search Models M1: Candidate Model [Balog et al, 2006]
M2: Document Model [Balog et al, 2006] f(doc,ca) is estimated as user centric (UC) or document centric (DC)

19 Other Models Rocchio (TFIDF) [Joachims, 1997; Salton & Buckley, 1988]
K-Nearest Neighbors [Yang & Liu, 1999]

20 Model Parameters Chosen from preliminary tests.
Recency b = [10,20,50,100,200,500] KNN, K= [3,5,10,20,30,40,50,100] Rocchio’s b = [0,0.1,0.25,0.5]

21 Data: Enron Email Collection
Some good reasons: Large, half a million messages Natural work-related , not lists Public and free Different roles: managers, assistants, etc. Unfortunates No clear message thread information No complete Address Book information no first/last/full names of many recipients

22 Enron Data Preprocessing
Setup a realistic temporal setup (per user) For each user, 10% (most recent) sent messages will be used as test 36 users All users had their Address Books (AB) extracted TOCCBCC CCBCC

23 Enron Data Preprocessing
Bag-of-words representation Message were represented as the union of BOW of body and BOW of subject Removed inconsistencies and repeated messages Disambiguated Several Enron addresses Stop words removed, No stemming Self-addressed messages were removed

24 Threading No explicit thread information in Enron – Try to reconstruct. Build “Message Thread Set” MTS(msg) set of messages with same “subject” as the current one.

25 Results

26 Results

27 Results

28 Rank Aggregation Ranking combined by Reciprocal Rank:

29 Rank Aggregation Results

30 Observations ‘Threading’ improves MAP for all models
KNN seems is best choice overall: document-model with focus on a few top docs Data Fusion method for rank aggregation improved performance significantly Base systems making different types of mistakes

31 Intelligent Email Auto-completion
TOCCBCC CCBCC

32 Intelligent Email Auto-completion

33 Mozilla Thunderbird extension (Cut Once)
Suggestions: Click to add

34 Mozilla Thunderbird extension (Cut Once)
Interested? Just google “mozilla extension carnegie mellon” User Study using Cut Once Instead…write-then-address behavior

35 Can we do better ranking?
Learning to Rank: machine learning to improve ranking Feature-based ranking function Many recently proposed methods: RankSVM ListNet RankBoost Perceptron Variations Online, scalable. [Joachims, KDD-02] [Cao et al., ICML-07] [Freund et al, 2003] [Elsas, Carvalho & Carbonell, WSDM-08]

36 Learning to Rank Recipients
Ranking scores as features Textual Scores (KNN) Network Scores Frequency score Recency score Co-Occurrence Features Combine textual scores with other “network” features Textual Feature (KNN scores) Network Features

37 Learning to Rank Recipients: Results

38 Conclusions Problem: Predicting recipients of email messages
Useful for auto-completion, finding related people, and management addressing errors Evidence from Large collection 2 subtasks: TOCCBCC and CCBCC Various models: KNN best model in general Rank Aggregation improved performance Improvements in -auto completion Thunderbird Extension (Cut Once)* Promising Results on learning to rank recipients*

39 Thank you

40 Thank you

41 Comments (Thanks, reviewers!) No account for structural info (body ≠ subject ≠ quoted) Identifying Name entities (“Dear Mr. X”, etc.) Implicitly doing, but could be better Enron did not provide many first/last names Fair estimation of f(doc,ca) on ? Might explain weaker performance of M2 models.


Download ppt "Ranking Users for Intelligent Message Addressing"

Similar presentations


Ads by Google