Presentation is loading. Please wait.

Presentation is loading. Please wait.

25/10/20151Gianluca Demartini Desktop Search Evaluation Sergey Chernov and Gianluca Demartini TREC 2006, 16th November 2006 Pre-Track Workshop.

Similar presentations


Presentation on theme: "25/10/20151Gianluca Demartini Desktop Search Evaluation Sergey Chernov and Gianluca Demartini TREC 2006, 16th November 2006 Pre-Track Workshop."— Presentation transcript:

1 25/10/20151Gianluca Demartini Desktop Search Evaluation Sergey Chernov and Gianluca Demartini TREC 2006, 16th November 2006 Pre-Track Workshop

2 25/10/20152Gianluca Demartini Outline Why we need a Desktop Track? What are the settings? Does it solve THE Privacy Problem? What we do next?

3 Microsoft Copernicus Beagle GoogleYahoo And roughly 20 more… How we compare their performance?

4 Proposed Track Building the DataSet  Personal documents  Include activity logs containing the history of each file, query logs, email and clipboard usage, instant messenger history,... Activity logs and metadaat should substitute missing hyperlink structure on a desktop

5 Main problem Privacy Issue Track will not run in 2007 Can be proposed for 2008

6 Questions 1 how to build the collection? (desktops from participants?) 2 how to protect privacy? 3 Data? (text docs, mails, pics, audio) 4 Tasks? 5 Topics? 6 Evaluation measures? binary or multi-graded relevance? 7 Logged information? Logged applications?

7 “Permanent” Information to Log Permanent Information (Applied to) URL (HTML) Author (All files) Recipients (Email messages) Metadata tags (MP3) Has/is attachment (Emails and attachments) Saved picture's URL and saving time (Graphic files)

8 “Timeline” Information to Log Timeline information (Applied to) Time of being in focus ( All files) Time of being opened ( All files) Being edited ( All files) History of moving/renaming ( All files) Request type: bookmark, clicked link, typed URL ( HTML) Adding/editing an entry in calendar and tasks (Outlook Journal) Being printed (All files) Search queries in Google/MSN Search/Yahoo!/etc. (Browser search field) Clicked links (HTML) Text selections from the clipboard Text pieces within a file and the filename (Text files) Bookmarking time (Browser bookmarks) Instant Messenger status, contact's statuses, sent filenames and links (IM History) Running applications (Task queue) IP address User's address and addresses user connects to Email status Change between received/read (email client)

9 Data Gathering  Data is not publicly avalilable  Data format is known  Retrieval Systems can be run on the data by track coordinator and results are sent back (See Spam Track)

10 Collection Structure Text Documents, EMails and Instant Messages – yes Images - ??? Audio – only metadata would be extracted Video - no What else?

11 Proposed Tasks AdHoc Retrieval Task  Find several documents containing pieces of necessary information Known-Item Retrieval Task  find a single specific document Folder Retrieval Task  Find the folders with the relevant information

12 Topic Format title Eleonet project deliverable June metadata date:June topic:Eleonet project type:deliverable task description I am combining a new deliverable for the Eleonet project. narrative I am combining a new deliverable for the Eleonet project and I am looking for the last deliverable of the same type. I remember that the main contribution to this document has been done in June 2006.

13 Relevance & Evaluation Measures trec_eval to a set of common metrics Binary relevance assessments or 3 levels? Ranking is important:  MAP  Gain & Discount Metrics (DCG, nDCG, AWP, AGR, Q-m) Uncomplete assessments:  Bpref (/Rpref)

14 Logged Applications Acrobat Reader MS Word MS Excel MS Powerpoint MS Internet Explorer MS Outlook Mozilla Firefox Mozilla Thunderbird

15 The same questions again 1 how to build the collection? (desktops from participants?) 2 how to protect privacy? 3 Data? (text docs, mails, pics, audio) 4 Tasks? 5 Topics? 6 Evaluation measures? 7 Logged information? Logged applications?

16 Desktop Search Workshop Summary Strong interest – about 20 participants Main novelty – activity logs Privacy is still an issue We need a clear task definition (suggestion: “Find all documents related to a project”?) We are planning a workshop to discuss it further A mailing list is available – to subscribe visit https://info.l3s.uni-hannover.de/mailman/listinfo/personal-activity-search


Download ppt "25/10/20151Gianluca Demartini Desktop Search Evaluation Sergey Chernov and Gianluca Demartini TREC 2006, 16th November 2006 Pre-Track Workshop."

Similar presentations


Ads by Google