Presentation is loading. Please wait.

Presentation is loading. Please wait.

A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.

Similar presentations


Presentation on theme: "A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005."— Presentation transcript:

1 Search@SIMS A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005

2 BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS The Problem: How to help people navigate and organize the world’s information?

3 BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS The SIMS Solution Focus on METADATA System Support for Structured Search Search User Interfaces Cheshire Flamenco Community-based Metadata Creation MMM Content Analysis for Metadata Creation Mamba

4 BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS Example: Search and Navigation of Large Collections Image Collections E-Government Sites Example: the University of California Library Catalog Shopping Sites Digital Libraries

5 BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

6 BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

7 BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

8 BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS What do we want done differently? Organization of results Hints of where to go next Flexible ways to move around … How to structure the information?

9 BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS How to Structure Information for Search and Browsing? Hierarchy is too rigid KL-One is too complex Hierarchical faceted metadata: –A useful middle ground

10 BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS What are facets? Sets of categories, each of which describe a different aspect of the objects in the collection. Each of these can be hierarchical. (Not necessarily mutually exclusive nor exhaustive, but often that is a goal.) Time/DateTopicRoleGeoRegion 

11 BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS Facet example: Recipes Course Main Course Cooking Method Stir-fry Cuisine Thai Ingredient Red Bell Pepper Curry Chicken

12 BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS How to Put In an Interface? Some Challenges: Users don’t like new search interfaces. How to show lots of information without overwhelming or confusing?

13 BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS A Solution (The Flamenco Project) Use proper HCI methods. Organize search results according to the faceted metadata so navigation looks similar throughout –Easy to see what to go next, were you’ve been –Avoids empty result sets –Integrates seamlessly with keyword search

14 BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS Art History Images Collection

15 BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

16 BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

17 BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

18 BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

19 BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

20 BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

21 BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

22 BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

23 BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

24 BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

25 BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

26 BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

27 BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

28 BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS Usability Studies Usability studies done on 3 collections: –Recipes: 13,000 items –Architecture Images: 40,000 items –Fine Arts Images: 35,000 items Conclusions: –Users like and are successful with the dynamic faceted hierarchical metadata, especially for browsing tasks –Very positive results, in contrast with studies on earlier iterations.

29 BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS Post-Test Comparison 1516 230 129 428 823 624 283 131 229 FacetedBaseline Overall Assessment More useful for your tasks Easiest to use Most flexible More likely to result in dead ends Helped you learn more Overall preference Find images of roses Find all works from a given period Find pictures by 2 artists in same media Which Interface Preferable For:

30 BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS Cheshire: System Support for Metadata-based Search Cheshire is an XML/SGML Information Retrieval system using probabilistic relevance ranking Cheshire3 includes Grid-based data storage and processing support, permitting very large-scale databases and high efficiency while providing effective relevance ranked results

31 BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS Cheshire The system is currently in production use for many JISC-funded national information services and projects in the UK including: –The Archives Hub –MerseyLibraries –Resource Discovery Network (RDN) –National Center for Text Mining (NaCTeM)

32 BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS Mamba: Creating Classifications from Data Most approaches are associational –AKA clustering, LSA, LDA, etc. –This leads to poor results when applied to text To derive facets, need a different angle –We have a simple approach based on WordNet

33 BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS Example: Recipes (3500 docs)

34 BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS Stoica & Hearst ’04 WordNet-based

35 BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS Stoica & Hearst ’04 WordNet-based

36 BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS Stoica & Hearst ’04 WordNet-based

37 BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS Our Approach Leverage the structure of WordNet Documents WordNet Get hypernym paths Select terms Build tree Compress tree

38 BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS A New Opportunity Tagging, folksonomies –(flickr de.lici.ous) –People are created facets in a decentralized manner –They are assigning multiple facets to items –This is done on a massive scale –This leads naturally to meaningful associations

39 BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS

40 BT Visit, Aug 18, 2005Marti Hearst: UC Berkeley SIMS Recap Organizing and Navigating Information is a huge IT opportunity Several research projects at SIMS tackle this with a special perspective: METADATA –System support for efficient search over structured information –User interfaces using hierarchical faceted metadata –Community-based metadata creation –Automated analysis algorithms for metadata creation Thank you!


Download ppt "A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005."

Similar presentations


Ads by Google