Semi-Automated Creation of Facet Hierarchies Marti Hearst School of Information, UC Berkeley Joint work with Dr. Emilia Stoica.

Slides:



Advertisements
Similar presentations
Content Metadata and Search Remarks to the Dublin Core Workshop Marti Hearst SIMS, UC Berkeley September 28, 2003.
Advertisements

Automating Creation of Hierarchical Faceted Metadata Structures Emilia Stoica, Marti Hearst and Megan Richardson* School of Information, Berkeley *Dept.
Information retrieval mon jan data…. framework for today’s lecture…
Leveraging Your Taxonomy to Increase User Productivity MAIQuery and TM Navtree.
Faceted Metadata for Information Architecture and Search CHI 2007 Course Notes Session I Marti Hearst, School of Information, UC Berkeley Preston Smalley.
1 Using Words to Search a Thousand Images Hierarchical Faceted Metadata in Search & Browsing Marti Hearst SIMS, UC Berkeley Research funded by: NSF CAREER.
Castanet: Using WordNet to Build Facet Hierarchies Emilia Stoica and Marti Hearst School of Information, Berkeley.
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Measuring Information Architecture CHI 01 Panel Position Statement Marti Hearst UC Berkeley.
1 Ideas for Integrating Browsing and Search in the CDL Marti Hearst SIMS, UC Berkeley
WMES3103 : INFORMATION RETRIEVAL
Faceted Metadata for Site Navigation and Search Marti Hearst 12/17/2009.
Social Tagging and Search Marti Hearst UC Berkeley.
Faceted Metadata for Information Architecture and Search CHI Course - April 24, 2006 Session I Marti Hearst, School of Information, UC Berkeley Preston.
Nearly-Automated Metadata Hierarchy Creation Emilia Stoica and Marti Hearst SIMS University of California, Berkeley.
Faceted Metadata in Search Interfaces Marti Hearst UC Berkeley School of Information This Research Supported by NSF IIS
1 Flexible Search and Navigation using Faceted Metadata Prof. Marti Hearst Dr. Rashmi Sinha, Ame Elliott, Jennifer English, Kirsten Swearingen, Ping Yee.
Measuring Information Architecture Marti Hearst UC Berkeley.
Castanet: Using WordNet to Build Facet Hierarchies Emilia Stoica and Marti Hearst School of Information, Berkeley.
Measuring Information Architecture Marti Hearst UC Berkeley.
Semi-Automated Creation of Facet Hierarchies Marti Hearst School of Information, UC Berkeley Joint work with Dr. Emilia Stoica.
A metadata-based approach Marti Hearst Associate Professor BT Visit August 18, 2005.
Yahoo Visit Day Joint Reseach Opportunities Marti Hearst UC Berkeley School of Information.
Best Practices for Search for the Federal Government Marti Hearst Web Manager University November 10, 2009.
Faceted Metadata in Search Interfaces Marti Hearst UC Berkeley School of Information This Research Supported by NSF IIS
Faceted Metadata in Search Interfaces Marti Hearst UC Berkeley School of Information This Research Supported by NSF IIS
1 i247: Information Visualization and Presentation Marti Hearst April 7, 2008.
Faceted Metadata for Information Architecture and Search Marti Hearst, SIMS at UC Berkeley Preston Smalley & Corey Chandler, eBay User Experience & Design.
Some Thoughts on Tagging Marti Hearst UC Berkeley.
Facets of a Metaproject: a case in human interface design research Human Factors and Interface Design Ransom Byers April 25, 2005.
Thoughts on Tagging & Search Marti Hearst UC Berkeley.
UIs for Faceted Navigation Recent Advances and Remaining Open Problems HCIR’08 Marti Hearst, UC Berkeley (including some slides from Corey Chandler of.
Measuring Information Architecture Marti Hearst UC Berkeley.
Fundamentals of Python: From First Programs Through Data Structures
Faceted Metadata in Search Interfaces Marti Hearst UC Berkeley School of Information This Research Supported by NSF IIS
Transforming Tags to (Faceted) Tagsonomies Marti Hearst UC Berkeley School of Information This Research Supported by NSF IIS
1 Flexible Search and Navigation using Faceted Metadata Prof. Marti Hearst University of California, Berkeley Search Engines Meeting, April 2002 Research.
Some Thoughts on Tagging Marti Hearst UC Berkeley.
1 Using Words to Search a Thousand Images Hierarchical Faceted Metadata in Search & Browsing Marti Hearst SIMS, UC Berkeley Research funded by: NSF CAREER.
Faceted Metadata for Information Architecture and Search CHI Course - April 24, 2006 Session I Marti Hearst, School of Information, UC Berkeley Preston.
Definitions Collaboration – working together on team projects and sharing information, often through ad-hoc processes, to accomplish project goals. Document.
Information retrieval thur jan data…. framework for today’s lecture…
Drupal Features by Lois Delcambre with much assistance from Payal Agrawal and from Yinlin and Potluri and the ensemble team.
Information retrieval wed sept data…. -start at 6.45.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
-1- Philipp Heim, Thomas Ertl, Jürgen Ziegler Facet Graphs: Complex Semantic Querying Made Easy Philipp Heim 1, Thomas Ertl 1 and Jürgen Ziegler 2 1 Visualization.
The Auditor Role The auditor has the same view of the course as the student does, but no marks are recorded for auditors.
Definition of a taxonomy “System for naming and organizing things into groups that share similar characteristics” Taxonomy Architectures Applications.
How can Search Interfaces Enhance the Value of Semantic Annotations (and Vice Versa?) Keynote Talk ESAIR’13: Sixth International Workshop on Exploiting.
Information Retrieval Effectiveness of Folksonomies on the World Wide Web P. Jason Morrison.
Recuperação de Informação B Cap. 10: User Interfaces and Visualization , , 10.9 November 29, 1999.
The Structure of Information Retrieval Systems LBSC 708A/CMSC 838L Douglas W. Oard and Philip Resnik Session 1: September 4, 2001.
MetaLib 4 User Guide. 2 MetaLib 4 Access MetaLib at: – MetaLib may be used at two different levels –
NVivo Software – A Qualitative Research And Data Analysis Tool: New User Tutorial Created Through a CMU Faculty Insight Team Grant by Joanne Hopper Bradley.
Tag Clouds Presented By: Laura F. Bright February 27th, 2006 INF385T: Semantic Web Spring 2006 / Turnbull.
WEB 2.0 PATTERNS Carolina Marin. Content  Introduction  The Participation-Collaboration Pattern  The Collaborative Tagging Pattern.
Module 10a: Display and Arrangement IMT530: Organization of Information Resources Winter, 2008 Michael Crandall.
Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture.
Using Wikipedia for Hierarchical Finer Categorization of Named Entities Aasish Pappu Language Technologies Institute Carnegie Mellon University PACLIC.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
NVivo Software – A Qualitative Research
Metataxis Can you really implement taxonomies in native SharePoint? Marc Stephenson March 2017.
NLP Support for Faceted Navigation in Scholarly Collections
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
The Use of Facets in Web Search Engines
Document Clustering Matt Hughes.
Magnet & /facet Zheng Liang
Presentation transcript:

Semi-Automated Creation of Facet Hierarchies Marti Hearst School of Information, UC Berkeley Joint work with Dr. Emilia Stoica

Marti Hearst, Taxonomy Bootcamp ‘06 Outline  Faceted Metadata  Definition  Advantages  Flamenco:  Search Interface Design using Faceted Metadata  Castanet:  (Semi) Automated Tool for Creation of Category Systems  Comparison to State-of-the-Art Alternatives  Conclusions

Marti Hearst, Taxonomy Bootcamp ‘06 Focus: Search and Navigation of Large Collections Image Collections E-Government Sites Shopping Sites Digital Libraries

Marti Hearst, Taxonomy Bootcamp ‘06  Study by Vividence in 2001 on 69 Sites  70% eCommerce  31% Service  21% Content  2% Community  Poorly organized search results  Frustration and wasted time  Poor information architecture  Confusion  Dead ends  "back and forthing"  Forced to search Problems with Site Search

Marti Hearst, Taxonomy Bootcamp ‘06 What we want to Achieve  Integrate browsing and searching seamlessly  Support exploration and learning  Avoid dead-ends, “pogo’ing”, and “lostness”

Marti Hearst, Taxonomy Bootcamp ‘06 Main Idea  Use hierarchical faceted metadata  Design the interface to:  Allow flexible navigation  Provide previews of next steps  Organize results in a meaningful way  Support both expanding and refining the search

Marti Hearst, Taxonomy Bootcamp ‘06 The Problem With Hierarchy  Most things can be classified in more than one way.  Most organizational systems do not handle this well.  Example: Animal Classification otter penguin robin salmon wolf cobra bat Skin Covering Locomotion Diet robin bat wolf penguin otter, seal salmon robin bat salmon wolf cobra otter penguin seal robin penguin salmon cobra bat otter wolf

Marti Hearst, Taxonomy Bootcamp ‘06  Inflexible  Force the user to start with a particular category  What if I don’t know the animal’s diet, but the interface makes me start with that category?  Wasteful  Have to repeat combinations of categories  Makes for extra clicking and extra coding  Difficult to modify  To add a new category type, must duplicate it everywhere or change things everywhere The Problem with Hierarchy

Marti Hearst, Taxonomy Bootcamp ‘06 The Problem With Hierarchy start furscalesfeathers swimflyrun slither furscalesfeathersfurscalesfeathers fish rodents insects fish rodents insects fish rodents insects fish rodents insects fish rodents insects fish rodents insects fish rodents insects fish rodents insects fish rodents insects salmonbatrobinwolf …

Marti Hearst, Taxonomy Bootcamp ‘06 The Idea of Facets  Facets are a way of labeling data  A kind of Metadata (data about data)  Can be thought of as properties of items  Facets vs. Categories  Items are placed INTO a category system  Multiple facet labels are ASSIGNED TO items

Marti Hearst, Taxonomy Bootcamp ‘06 The Idea of Facets  Create INDEPENDENT categories (facets)  Each facet has labels (sometimes arranged in a hierarchy)  Assign labels from the facets to every item  Example: recipe collection Course Main Course Cooking Method Stir-fry Cuisine Thai Ingredient Bell Pepper Curry Chicken

Marti Hearst, Taxonomy Bootcamp ‘06 The Idea of Facets  Break out all the important concepts into their own facets  Sometimes the facets are hierarchical  Assign labels to items from any level of the hierarchy Preparation Method Fry Saute Boil Bake Broil Freeze Desserts Cakes Cookies Dairy Ice Cream Sorbet Flan Fruits Cherries Berries Blueberries Strawberries Bananas Pineapple

Marti Hearst, Taxonomy Bootcamp ‘06 Using Facets  Now there are multiple ways to get to each item Preparation Method Fry Saute Boil Bake Broil Freeze Desserts Cakes Cookies Dairy Ice Cream Sherbet Flan Fruits Cherries Berries Blueberries Strawberries Bananas Pineapple Fruit > Pineapple Dessert > Cake Preparation > Bake Dessert > Dairy > Sherbet Fruit > Berries > Strawberries Preparation > Freeze

Marti Hearst, Taxonomy Bootcamp ‘06 Example: Nobel Prize Winners Collection (Before and After Facets)

Marti Hearst, Taxonomy Bootcamp ‘06 Only One Way to View Laureates

Marti Hearst, Taxonomy Bootcamp ‘06 First, Choose Prize Type

Marti Hearst, Taxonomy Bootcamp ‘06 Next, view the list! The user must first choose an Award type (literature), then browse through the laureates in chronological order. No choice is given to, say organize by year and then award, or by country, then decade, then award, etc.

Marti Hearst, Taxonomy Bootcamp ‘06 Flamenco Interface: Using Hierarchical Faceted Metadata

Marti Hearst, Taxonomy Bootcamp ‘06 Opening View Select literature from PRIZE facet

Marti Hearst, Taxonomy Bootcamp ‘06 Group results by YEAR facet

Marti Hearst, Taxonomy Bootcamp ‘06 Select 1920’s from YEAR facet

Marti Hearst, Taxonomy Bootcamp ‘06 Current query is PRIZE > literature AND YEAR: 1920’s. Now remove PRIZE > literature

Marti Hearst, Taxonomy Bootcamp ‘06 Now Group By YEAR > 1920’s

Marti Hearst, Taxonomy Bootcamp ‘06 Hierarchy Traversal: Group By YEAR > 1920’s, and drill down to 1921

Marti Hearst, Taxonomy Bootcamp ‘06 Select an individual item

Marti Hearst, Taxonomy Bootcamp ‘06 Use Endgame to expand out

Marti Hearst, Taxonomy Bootcamp ‘06 Use Endgame to expand out

Marti Hearst, Taxonomy Bootcamp ‘06 Or use “More like this” to find similar items

Marti Hearst, Taxonomy Bootcamp ‘06 Start a new search using keyword “California”

Marti Hearst, Taxonomy Bootcamp ‘06 Note that category structure remains after the keyword search

Marti Hearst, Taxonomy Bootcamp ‘06 The query is now a keyword ANDed with a facet subhierarchy

Marti Hearst, Taxonomy Bootcamp ‘06 Using Facets  The system only shows the labels that correspond to the current set of items  Start with all items and all facets  The user then selects a label within a facet  This reduces the set of items (only those that have been assigned to the subcategory label are displayed)  This also eliminates some subcategories from the view.

Marti Hearst, Taxonomy Bootcamp ‘06 Advantages of Facets  Can’t end up with empty results sets  (except with keyword search)  Helps avoid feelings of being lost.  Easier to explore the collection.  Helps users infer what kinds of things are in the collection.  Evokes a feeling of “browsing the shelves”  Is preferred over standard search for collection browsing in usability studies.  (Interface must be designed properly)

Marti Hearst, Taxonomy Bootcamp ‘06 Advantages of Facets  Seamless to add new facets and subcategories  Seamless to add new items.  Helps with “categorization wars”  Don’t have to agree exactly where to place something  Interaction can be implemented using a standard relational database.  May be easier for automatic categorization

Marti Hearst, Taxonomy Bootcamp ‘06 Information previews  Use the metadata to show where to go next  More flexible than canned hyperlinks  Less complex than full search  Help users see and return to previous steps  Reduces mental work  Recognition over recall  Suggests alternatives  More clicks are ok only if (J. Spool)  The “scent” of the target does not weaken  If users feel they are going towards, rather than away, from their target.

Marti Hearst, Taxonomy Bootcamp ‘06 Facets vs. Hierarchy  Early Flamenco studies compared allowing multiple hierarchical facets vs. just one facet.  Multiple facets was preferred and more successful.

Marti Hearst, Taxonomy Bootcamp ‘06 Limitation of Facets  Do not naturally capture MAIN THEMES  Facets do not show RELATIONS explicitly Aquamarine Red Orange Door Doorway Wall  Which color associated with which object? Photo by J. Hearst, jhearst.typepad.com

Marti Hearst, Taxonomy Bootcamp ‘06 Terminology Clarification  Facets vs. Attributes  Facets are shown independently in the interface  Attributes just associated with individual items  E.g., ID number, Source, Affiliation  However, can always convert an attribute to a facet  Facets vs. Labels  Labels are the names used within facets  These are organized into subhierarchies  Synonyms  There should be alternate names for the category labels  Currently (in Flamenco) this is done with subcategories  E.g., Deer has subcategories “stag”, “fawn”, “doe”

Marti Hearst, Taxonomy Bootcamp ‘06 Usability Study Results

Marti Hearst, Taxonomy Bootcamp ‘06 Flamenco Usability Studies  Usability studies done on 3 collections:  Recipes (epicurious): 13,000 items  Architecture Images: 40,000 items  Fine Arts Images: 35,000 items  Conclusions:  Users like and are successful with the dynamic faceted hierarchical metadata, especially for browsing tasks  Very positive results, in contrast with studies on earlier iterations.

Marti Hearst, Taxonomy Bootcamp ‘06 Most Recent Usability Study  Participants & Collection  32 Art History Students  ~35,000 images from SF Fine Arts Museum  Study Design  Within-subjects  Each participant sees both interfaces  Balanced in terms of order and tasks  Participants assess each interface after use  Afterwards they compare them directly  Data recorded in behavior logs, server logs, paper-surveys; one or two experienced testers at each trial.  Used 9 point Likert scales.  Session took about 1.5 hours; pay was $15/hour

Marti Hearst, Taxonomy Bootcamp ‘06 Post-Interface Assessments All significant at p<.05 except “simple” and “overwhelming”

Marti Hearst, Taxonomy Bootcamp ‘06 Post-Test Comparison FacetedBaseline Overall Assessment More useful for your tasks Easiest to use Most flexible More likely to result in dead ends Helped you learn more Overall preference Find images of roses Find all works from a given period Find pictures by 2 artists in same media Which Interface Preferable For:

How to Create Facet Hierarchies? Our Approach: Castanet

Marti Hearst, Taxonomy Bootcamp ‘06 Example: Recipes (3500 docs)

Marti Hearst, Taxonomy Bootcamp ‘06 Castanet Output (shown in Flamenco)

Marti Hearst, Taxonomy Bootcamp ‘06 Castanet Output (shown in Flamenco)

Marti Hearst, Taxonomy Bootcamp ‘06 Castanet Output (shown in Flamenco)

Marti Hearst, Taxonomy Bootcamp ‘06 Castanet Output (shown in Flamenco)

Marti Hearst, Taxonomy Bootcamp ‘06 Castanet Output (shown in Flamenco)

Marti Hearst, Taxonomy Bootcamp ‘06

Our Approach: Leverage the structure of WordNet

Marti Hearst, Taxonomy Bootcamp ‘06 Our Approach  Leverage the structure of WordNet Documents WordNet Get hypernym paths Select terms Build tree Compress tree Divide into facets

Marti Hearst, Taxonomy Bootcamp ‘06 1. Select Terms  Select well-distributed terms from the collection  Eliminate stopwords  Retain only those terms with a distribution higher than a threshold (default: top 10%) Documents WordNet Select terms Build core tree Comp. tree Remove top level categ. Augm. core tree

Marti Hearst, Taxonomy Bootcamp ‘06 2. Build Core Tree  Get hypernym path if term: - has only one sense, or - matches a pre-selected WordNet domain  Adding a new term increases a count at each node on its path by # of docs with the term. frozen dessert sundae entity substance,matter nutriment dessert ice cream sundae frozen dessert entity substance,matter nutriment dessert sherbet,sorbet sherbet  Build a “backbone”  Create paths from unambiguous terms only  Bias the structure towards appropriate senses of words Documents WordNet Select terms Build core tree Comp. tree Remove top level categ. Augm. core tree

Marti Hearst, Taxonomy Bootcamp ‘06 2. Build Core Tree (cont.)  Merge hypernym paths to build a tree sundae entity substance,matter nutriment dessert ice cream sundae frozen dessert entity substance,matter nutriment dessert sherbet,sorbet sherbet frozen dessert sundae sherbet substance,matter nutriment dessert sherbet,sorbet frozen dessert entity ice cream sundae

Marti Hearst, Taxonomy Bootcamp ‘06 3. Augment Core Tree  Attach to Core tree the terms with more than one sense  Favor the more common path over other alternatives Documents WordNet Select terms Build core tree Comp. tree Remove top level categ. Augm. core tree

Marti Hearst, Taxonomy Bootcamp ‘06 Augment Core Tree (cont.) Date (p1) Date (p2) entity abstraction substance,matter measure, quantity food, nutrient fundamental quality nutriment time period food calendar day (18) edible fruit (78) date date Choose this path since it has more items assigned

Marti Hearst, Taxonomy Bootcamp ‘06 4. Compress Tree  Rule 1: Eliminate a parent with fewer than k children unless it is the root or its distribution is larger than 0.1*max dist ice cream sundae dessert sundae frozen dessert sherbet,sorbet sherbet parfait dessert frozen dessert sundae parfait sherbet abstraction Documents WordNet Select terms Build core tree Comp. tree Remove top level categ. Augm. core tree

Marti Hearst, Taxonomy Bootcamp ‘06 4. Compress Tree (cont.)  Rule 2:  Eliminate a child whose name appears within the parent’s name sundae dessert frozen dessert parfait sherbet dessert sundaeparfaitsherbet abstraction Documents WordNet Select terms Build core tree Comp. tree Remove top level categ. Augm. core tree

Marti Hearst, Taxonomy Bootcamp ‘06 5. Divide into Facets Divide into facets

Marti Hearst, Taxonomy Bootcamp ‘06 5. Divide into Facets (Remove top levels) sugar syrup entity substance,matter food,nutriment ingredient,fixings food stuff,food product sweetening herb flavorer parsley oregano sugar syrup sweetening herb flavorer parsley oregano Rule 1: Eliminate very general categories (e.g., entity, abstraction). If no paths are longer than threshold t, then done. Else: Divide into facets Rule 2: Undo first step. Then eliminate all top levels until the maximum length of any path in the resulting hierarchy is t.

Marti Hearst, Taxonomy Bootcamp ‘06 Disambiguation  Ambiguity in:  Word senses  Paths up the hypernym tree Sense 1 for word “tuna” organism, being => plant, flora => vascular plant => succulent => cactus => tuna Sense 2 for word “tuna” organism, being => fish => food fish => tuna => bony fish => spiny-finned fish => percoid fish => tuna 2 paths for same word2 paths for same sense

Marti Hearst, Taxonomy Bootcamp ‘06 How to Select the Right Senses and Paths?  First: build core tree  (1) Create paths for words with only one sense  (2) Use Domains  Wordnet has 212 Domains  medicine, mathematics, biology, chemistry, linguistics, soccer, etc.  Automatically scan the collection to see which domains apply  The user selects which of the suggested domains to use or may add own  Paths for terms that match the selected domains are added to the core tree  Then: add remaining terms to the core tree.

Marti Hearst, Taxonomy Bootcamp ‘06 Optional Step: Domains  To disambiguate, use Domains  Wordnet has 212 Domains  medicine, mathematics, biology, chemistry, linguistics, soccer, etc.  A better collection has been developed by Magnini 2000  Assigns a domain to every noun synset  Automatically scan the collection to see which domains apply  The user selects which of the suggested domains to use or may add own  Paths for terms that match the selected domains are added to the core tree

Marti Hearst, Taxonomy Bootcamp ‘06 Using Domains dip glosses: Sense 1: A depression in an otherwise level surface Sense 2: The angle that a magnet needle makes with horizon Sense 3: Tasty mixture into which bite-size foods are dipped dip hypernyms Sense 1 Sense 2 Sense 3 solid shape, form food => concave shape => space => ingredient, fixings => depression => angle => flavorer Given domain “food”, choose sense 3

Castanet Evaluation

Marti Hearst, Taxonomy Bootcamp ‘06 Castanet Evaluation  This is a tool for information architects, so people of this type did the evaluation  We compared output on  Recipes  Biomedical journal titles  We compared to two state-of-the-art algorithms  LDA (Blei et al. 04)  Subsumption (Sanderson & Croft ’99)

Marti Hearst, Taxonomy Bootcamp ‘06 Subsumption Output (shown in Flamenco)

Marti Hearst, Taxonomy Bootcamp ‘06 Subsumption Output (shown in Flamenco)

Marti Hearst, Taxonomy Bootcamp ‘06 Subsumption Output (shown in Flamenco)

Marti Hearst, Taxonomy Bootcamp ‘06 Subsumption Output (shown in Flamenco)

Marti Hearst, Taxonomy Bootcamp ‘06 LDA Output (shown in Flamenco)

Marti Hearst, Taxonomy Bootcamp ‘06 LDA Output (shown in Flamenco)

Marti Hearst, Taxonomy Bootcamp ‘06 LDA Output (shown in Flamenco)

Marti Hearst, Taxonomy Bootcamp ‘06 Evaluation Method  Information architects assessed the category systems  For each of 2 systems’ output:  Examined and commented on top-level  Examined and commented on two sub-levels  Then comment on overall properties  Meaningful?  Systematic?  Likely to use in your work?

Marti Hearst, Taxonomy Bootcamp ‘06 Evaluation Results  Results on recipes collection for “Would you use this system in your work?”  Yes in some cases or yes definitely:  Pine (Castanet): 29/34  Oak (LDA): 0/18  Birch (Subsumption): 6/16  Results on quality of categories:

Marti Hearst, Taxonomy Bootcamp ‘06 Opportunities for Tagging  New opportunity: Tagging, folksonomies  (flickr de.lici.ous)  People are created facets in a decentralized manner  They are assigning multiple facets to items  This is done on a massive scale  This leads naturally to meaningful associations

Marti Hearst, Taxonomy Bootcamp ‘06 Conclusions  Flexible application of hierarchical faceted metadata is a proven approach for navigating large information collections.  Midway in complexity between simple hierarchies and deep knowledge representation.  Currently in use on e-commerce sites; spreading to other domains  Systems are needed to help create faceted metadata structures  Our WordNet-based algorithm, while not perfect, seems like it will be a useful tool for Information Architects.

Marti Hearst, Taxonomy Bootcamp ‘06 Acknowledgements  Flamenco Team  Brycen Chun, Ame Elliott, Jennifer English, Kevin Li, Rashmi Sinha, Emilia Stoica, Kirsten Swearingen, Ka- Ping Yee  Castanet  Emilia Stoica  Funding  This work supported in part by NSF (IIS )

For more information: flamenco.berkeley.edu Thank you! Marti Hearst