Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sackler – May 11, 2003 Organizing Search Results Susan Dumais Microsoft Research.

Similar presentations


Presentation on theme: "Sackler – May 11, 2003 Organizing Search Results Susan Dumais Microsoft Research."— Presentation transcript:

1 Sackler – May 11, 2003 Organizing Search Results Susan Dumais Microsoft Research

2 Sackler – May 11, 2003 Organizing Search Results Algorithms and interfaces that improve the effectiveness of search Algorithms and interfaces that improve the effectiveness of search Beyond ranked lists Beyond ranked lists Main goal to support search Main goal to support search Also information analysis and discovery Also information analysis and discovery Example applications Example applications SWISH, results classification SWISH, results classification GridViz, results summarization GridViz, results summarization SIS, personal landmarks for context SIS, personal landmarks for context

3 Sackler – May 11, 2003 Searching with Information Structured Hierarchically (SWISH) Collaborators Collaborators Edward Cutrell, Hao Chen (Berkeley) Edward Cutrell, Hao Chen (Berkeley) Key Themes Key Themes Going beyond long lists of results Going beyond long lists of results Classification algorithms Classification algorithms UI techniques UI techniques More about it More about it http://research.microsoft.com /~sdumais http://research.microsoft.com /~sdumais http://research.microsoft.com /~sdumais http://research.microsoft.com /~sdumais

4 Sackler – May 11, 2003 Query: “jaguar” Organizing Search Results List Organization => Shopping => Automotive => Computers SWISH Category Organization

5 Sackler – May 11, 2003 LookSmart Directory Structure LookSmart Directory Structure ~400k pages; 17k categories; 7 levels ~400k pages; 17k categories; 7 levels 13 top-level categories; 150 second-level categories 13 top-level categories; 150 second-level categories Top-level Categories Top-level Categories Web Directory Automotive Business & Finance Computers & Internet Entertainment & Media Health & Fitness Hobbies & Interests Home & Family People & Chat Reference & Education Shopping & Services Society & Politics Sports & Recreation Travel & Vacations Buy or Sell a Car Chat Finance & Insurance Magazines & Books Maintenance & Repair Makes, Models & Clubs Motorcycles New Car Showrooms Off-Road, 4X4 & RVs Other Auto Interests Shows & Museums Trucks & Tractors Vintage & Classic

6 Sackler – May 11, 2003 SWISH System Combines the advantages of Combines the advantages of Directories - Manually crafted structure but small Directories - Manually crafted structure but small Search engines - Broad coverage but limited metadata Search engines - Broad coverage but limited metadata Project search engine results to category structure Project search engine results to category structure Two main components Two main components Text classification models Text classification models UI for integrating search results and structure UI for integrating search results and structure Context (category structure) plus focus (search results) Context (category structure) plus focus (search results)

7 Sackler – May 11, 2003 SWISH Architecture manually classified web pages SVM model Train (offline) web search results local search results... Classify (online)

8 Sackler – May 11, 2003 Learning & Classification Support Vector Machine (SVM) Support Vector Machine (SVM) Accurate and efficient for text classification (Dumais et al., Joachims) Accurate and efficient for text classification (Dumais et al., Joachims) Model = weighted vector of words Model = weighted vector of words “Automobile” = motorcycle, vehicle, parts, automobile, harley, car, auto, honda, porsche … “Automobile” = motorcycle, vehicle, parts, automobile, harley, car, auto, honda, porsche … “Computers & Internet” = rfc, software, provider, windows, user, users, pc, hosting, os, downloads... “Computers & Internet” = rfc, software, provider, windows, user, users, pc, hosting, os, downloads... Hierarchical models for LS directory Hierarchical models for LS directory 1 model for top level; N models for second 1 model for top level; N models for second Very useful in conjunction w/ user interaction Very useful in conjunction w/ user interaction

9 Sackler – May 11, 2003 List OrganizationCategory Organization User Interface Experiments

10 Sackler – May 11, 2003 HoverInline No Cat Names BrowseHoverInline + Cat Names Group InterfaceList Interface

11 Sackler – May 11, 2003 Effect of Query Difficulty HARDHARD HARDHARD EASYEASY EASYEASY Group List Easy queries are faster (p<0.01) Group faster than List (p<0.01) Benefit is larger for hard queries (p<0.06)

12 Sackler – May 11, 2003 SWISH: Summary and Design Implications Text Classification Text Classification Learn accurate category models Learn accurate category models Classify new web pages on- the-fly Classify new web pages on- the-fly Organize search results Organize search results User Interface User Interface Tightly couple search results with category structure Tightly couple search results with category structure User manipulation of presentation of category structure User manipulation of presentation of category structure

13 Sackler – May 11, 2003 Organizing Search Results, other examples

14 Sackler – May 11, 2003 GridViz Collaborators Collaborators George Robertson, Edward Cutrell, Jeremy Goecks (Georgia Tech) George Robertson, Edward Cutrell, Jeremy Goecks (Georgia Tech) Key Themes Key Themes Abstract beyond individual results Abstract beyond individual results Highly interactive interface to support understanding of trends and relationships Highly interactive interface to support understanding of trends and relationships More about it More about it http://research.microsoft.com/~sdumais http://research.microsoft.com/~sdumais http://research.microsoft.com/~sdumais

15 Sackler – May 11, 2003 GridViz Summarize the results of a search Summarize the results of a search Grid-based design Grid-based design Axes represent topic, time, people Axes represent topic, time, people Cells encode frequency, recency Cells encode frequency, recency Supports activities like: Supports activities like: What newsgroups are active (on topic x)? What newsgroups are active (on topic x)? What people are active, authoritative (on topic x)? What people are active, authoritative (on topic x)? When did I last interact w/ people? When did I last interact w/ people?

16 Sackler – May 11, 2003 GridViz Demo

17 Sackler – May 11, 2003 User Interface Experiments List View GridViz

18 Sackler – May 11, 2003 GridViz Summary Abstracting beyond individual results Abstracting beyond individual results Highly interactive interface Highly interactive interface Grid-based design Grid-based design Axes represent people, topic, time Axes represent people, topic, time Cells encode frequency, recency Cells encode frequency, recency Preliminary but promising Preliminary but promising

19 Sackler – May 11, 2003 Stuff I’ve Seen (SIS) Collaborators Collaborators Edward Cutrell, Raman Sarin, JJ Cadiz, Gavin Jancke, Daniel Robbins, Merrie Ringel (Stanford) Edward Cutrell, Raman Sarin, JJ Cadiz, Gavin Jancke, Daniel Robbins, Merrie Ringel (Stanford) Key Themes Key Themes Your content Your content Information re-use Information re-use Integration across sources Integration across sources More about it More about it … internal for now … internal for now

20 Sackler – May 11, 2003 Search Today … Many locations, interfaces for finding things (e.g., web, mail, local files, help, history, intranet) Often slow

21 Sackler – May 11, 2003 Search with SIS Unified index of stuff you’ve seen Unify access to information regardless of source – mail, archives, calendar, files, web pages, etc. Full-text index of content plus metadata attributes (e.g., creation time, author, title, size) Automatic and immediate update of index Rich UI possibilities, since it’s your content Architecture Client side indexing and storage Built using MS Search components

22 Sackler – May 11, 2003 SIS Demo

23 Sackler – May 11, 2003 SIS Alpha Observations 800+ internal users Usage logs (incl different interfaces), survey data File types opened 76% Email 14% Web pages 10% Files Age of items accessed 7% today 22% within the last week 46% within the last month

24 Sackler – May 11, 2003 SIS Alpha Observations Use of other search tools Non-SIS search for web, email, and files decreases Importance of people 25% of the queries involve people’s names Importance of time Date by far the most popular sort field, followed by rank, author, title Even when rank is the default

25 Sackler – May 11, 2003 SIS UI Innovations Timeline w/ Landmarks Importance of time Importance of time Timeline interface Timeline interface Contextualize results using important landmarks as pointers into human memory Contextualize results using important landmarks as pointers into human memory General: holidays, world events General: holidays, world events Personal: important photos, appointments Personal: important photos, appointments

26 Sackler – May 11, 2003 Milestones in Time Demo

27 Sackler – May 11, 2003 Milestones in Timeline

28 Sackler – May 11, 2003 SIS Summary Unified index of stuff you’ve seen Fast access to full-text and metadata, from heterogeneous sources Automatic and immediate update of index Rich UI possibilities Next steps Better support for tagging -> “flatland” Implicit queries for finding related info, and identifying “Stuff I Should See” Integration with richer activity-based info, Eve

29 Sackler – May 11, 2003 Organizinging Search Results Algorithms and interfaces to improve search Algorithms and interfaces to improve search Use structure and context Use structure and context Examples and key themes Examples and key themes SWISH … grouping SWISH … grouping GridViz … abstraction GridViz … abstraction SIS … personal content and landmarks SIS … personal content and landmarks Also Also Important attributes: People, topics, time Important attributes: People, topics, time Interaction Interaction Evaluation Evaluation More information More information http://research.microsoft.com/~sdumais http://research.microsoft.com/~sdumais http://research.microsoft.com/~sdumais sdumais@microsoft.com sdumais@microsoft.com sdumais@microsoft.com Christopher Lee of (SIG)IR … Christopher Lee of (SIG)IR … http://www.cdvp.dcu.ie/SIGIR/index.html http://www.cdvp.dcu.ie/SIGIR/index.html http://www.cdvp.dcu.ie/SIGIR/index.html


Download ppt "Sackler – May 11, 2003 Organizing Search Results Susan Dumais Microsoft Research."

Similar presentations


Ads by Google