Automatic Facets: Faceted Navigation and Entity Extraction Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.

Slides:



Advertisements
Similar presentations
Taxonomy Development in an Enterprise Context Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Advertisements

Taxonomy Development An Infrastructure Model Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
® IBM Research © 2006 IBM Corporation Faceted Logic, Ontologies, and Wikis: Possible Approaches for ONTOLOG Content John Boz Handy-Bosma, Ph.D., Senior.
Taxonomy & Ontology Impact on Search Infrastructure John R. McGrath Sr. Director, Fast Search & Transfer.
Top Tips Enterprise Content Management Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Metadata Strategies Alternatives for creating value from metadata Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
Improving Navigation and Findability Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Buy, Build, Automate: Why you should Buy Your Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Cyborg Categorization The Basics Tom Reamy Knowledge Architect Intranet Consultant.
“ Leveraging SharePoint 2010 Search Technologies ” With: Ivan Neganov.
Enterprise Information Architecture A Platform for Integrating Your Organization’s Information and Knowledge Activities Tom Reamy Chief Knowledge Architect.
Search, Browse, and Faceted Navigation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Information and Business Work
Faceted Navigation: Search and Browse Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Taxonomy Development Case Studies
Innovation in Search? Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Model of Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Knowledge Architecture Process & Case Studies Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Semantic Infrastructure Workshop Development Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Semantic Infrastructure Workshop Development Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Information Retrieval in Practice
Semantic Infrastructure Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Taxonomy Boot Camp Panel Text Analytics Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Beyond Sentiment Mining Social Media A Panel Discussion of Trends and Ideas Marie Wallace, IBM Marcello Pellacani, Expert System Fabio Lazzarini, CRIBIS.
Enterprise Semantic Infrastructure Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Beyond Sentiment Mining Social Media Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Facets and Faceted Navigation Development Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Expanding Enterprise Roles for Librarians Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics Workshop Development Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Best of Both Worlds Text Analytics and Text Mining Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Selecting Taxonomy Software Who, Why, How Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Overview of Search Engines
Taxonomy and Knowledge Organization Taxonomy in Context Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Cyborg Categorization Salvation for Search? Tom Reamy Information Architect Charles Schwab © 2001 Charles Schwab & Co., Inc., member NYSE/SIPC. All rights.
Building a Foundation for Info Apps Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional.
Enterprise Search/ Text Analytics Evaluation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics And Text Mining Best of Text and Data
SemTech Text Analytics Evaluation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Taxonomies and Faceted Navigation Getting the Best of Both
Mashup Mindset Moving Mashups to Next Level Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Integrating an Enterprise Taxonomy with Local Variations Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge.
Applying Semantics to Search Text Analytics Tom Reamy Chief Knowledge Architect KAPS Group Enterprise Search Summit New York.
Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Content Categorization Tools Taxonomies & Technologies for Infrastructure Solutions Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture.
Text Analytics Summit Text Analytics Evaluation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics Software Choosing the Right Fit Tom Reamy Chief Knowledge Architect KAPS Group Text Analytics World October 20.
Faceted Navigation Design Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Metadata and Taxonomies The Best of Both Worlds Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Integrating an Enterprise Taxonomy with Local Variations Tom Reamy Chief Knowledge Architect KAPS Group Taxonomy Boot Camp.
Enterprise Semantic Infrastructure Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Faceted Navigation An Alternative to Search and Browse Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Folksonomy Folktales Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Selecting Taxonomy Software Who, Why, How Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Advanced Semantics and Search Beyond Tag Clouds and Taxonomies Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Text Analytics for Search Applications Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture.
Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Knowledge Retrieval Taxonomies & Auto-Categorization Tom Reamy Knowledge Architect Intranet Consultant.
Faceted Navigation: Best of Browse and Search Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Taxonomy and Text Analytics Case Studies Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Semantic Web Technologies Readings discussion Research presentations Projects & Papers discussions.
Taxonomy Development An Infrastructure Model Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Information Retrieval in Practice
Enterprise Social Networks A New Semantic Foundation
Federated & Meta Search
Taxonomies, Lexicons and Organizing Knowledge
Text Analytics Workshop: Introduction
Introduction to Information Retrieval
Presentation transcript:

Automatic Facets: Faceted Navigation and Entity Extraction Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

2 Agenda  Introduction: Elements – Facets, Taxonomies, Software, People  3 Environments – E-Commerce, Enterprise, Internet  Design Issues – Facets and Entities  Conclusion – Integrated Solution

3 KAPS Group: General  Knowledge Architecture Professional Services  Virtual Company: Network of consultants –  Partners – Inxight, FAST, etc.  Consulting, Strategy, Knowledge architecture audit  Taxonomies: Enterprise, Marketing, Insurance, etc.  Services: – Taxonomy development, consulting, customization – Technology Consulting – Search, CMS, Portals, etc. – Metadata standards and implementation – Knowledge Management: Collaboration, Expertise, e-learning – Applied Theory – Faceted taxonomies, complexity theory, natural categories

4 Elements  Facet – orthogonal dimension of metadata  Entity / Noun Phrase – metadata value of a facet  Entity extraction – feeds facets, signature, ontologies  Taxonomy and categorization rules  Auto-categorization – aboutness, subject facets  People – tagging, evaluating tags, fine tune rules and taxonomy

5 Essentials of Facets  Facets are not categories – Categories are what a document is about – limited number – Entities are contained within a document – any number  Facets are orthogonal – mutually exclusive – dimensions – An event is not a person is not a document is not a place.  Facets – variety – of units, of structure – Numerical range (price), Location – big to small – Alphabetical, Hierarchical – taxonomic  Facets are designed to be used in combination Wine where color = red, price = excessive, location = Calirfornia, And sentiment = snotty

6 Advantages of Faceted Navigation  More intuitive – easy to guess what is behind each door Simplicity of internal organization 20 questions – we know and use  Dynamic selection of categories Allow multiple perspectives Ability to Handle Compound Subjects  Systematic Advantages – fewer elements – 4 facets of 10 nodes = 10,000 node taxonomy – Ability to Handle Compound Subjects  Flexible – can be combined with other navigation elements

7 Essentials of Taxonomies Internal Organization  Formal Taxonomy – parent – child relationship – Is-A-Kind-Of ---- Animal – Mammal – Zebra – Partonomy – Is-A-Part-Of ---- US-California-Oakland  Browse Classification – cluster of related concepts – Food and Dining – Catering – Restaurants  Taxonomies deal with complex, not compound – Conceptual relationships – category membership – Contextual relationships – Computers & Software  Taxonomies deal with semantics & documents – Multiple meanings and purposes – Essential attributes of documents are not single value

8 Developing Facets: Tools and Techniques Software Tools  Text Analytics – Taxonomy management, entity extraction, categorization, sentiment  Search – Integrated features, at index, Internet sources  CM – Enterprise environment, taggers and policy  Programmable Rules – Business and Subject matter expertise – Auto-populate variety of metadata – author, title, date, etc. – Relevance – best bets to weights and classes of documents  People – refine, monitor – it’s not automatic

9 Developing Facets: Tools and Techniques Software Tools – Auto-categorization  Auto-categorization – Training sets – Bayesian, Vector Machine – Terms – literal strings, stemming, dictionary of related terms – Rules – simple – position in text (Title, body, url) – Advanced – saved search queries (full search syntax) – NEAR, SENTENCE, PARAGRAPH – Boolean – X NEAR Y and Not-Z  Advanced Features – Facts / ontologies /Semantic Web – RDF + – Sentiment Analysis – positive, negative, neutral

10 Developing Facets: Tools and Techniques Software Tools – Entity Extraction  Dictionaries – variety of entities, coverage, specialty – Cost of update – service or in-house – Inxight – 50+ predefined entity types – Nstein – 800,000 people, 700,000 locations, 400,000 organizations  Rules – Capitalization, text – Mr., Inc. – Advanced – proximity and frequency of actions, associations – Need people to continually refine the rules  Entities and Categorization – Total number and pattern of entities = a type of aboutness of the document – Bar Code, Fingerprint

11 Elements: People  Programmers, Librarians, Taxonomists, Metadata specialist – Integrate, design, develop rules, monitor activity & quality  Authors, Subject Matter Experts – Input into design (important facets), rules, activity meaning  Users – Web 2.0 – Feedback – quality and usability – Suggestions – missing terms, bad categorization & entity – Tags Clouds & folksonomy – for social networking features, not for information retrieval

12 Three Environments  E-Commerce – Catalogs, small uniform collections of entities – Uniform behavior – buy this  Enterprise – More content, more types of content – Enterprise Tools – Search, ECM – Publishing Process – tagging, metadata standards  Internet – Wildly different amount and type of content, no taggers – General Purpose – Flickr, Yahoo – Vertical Portal – selected content, no taggers

13 Three Environments: E-Commerce

14 Three Environments: E-Commerce

15 Enterprise Environment – When and how add metadata  Enterprise Content – different world than eCommerce – More Content, more kinds, more unstructured – Not a catalog to start – less metadata and structured content – Complexity -- not just content but variety of users and activities  Combination of human and automatic metadata – ECM – Software aided - suggestions, entities, ontologies  Enterprise – Question of Balance / strategy – More facets = more findability (up to a point) – Fewer facets = lower cost to tag documents  Issues – Not enough facets – Wrong set of facets – business not information – Ill-defined facets – too complex internal structure

16 Facets and Taxonomies Enterprise Environment – Case One – Taxonomy, 7 facets  Taxonomy of Subjects / Disciplines: – Science > Marine Science > Marine microbiology > Marine toxins  Facets: – Organization > Division > Group – Clients > Federal > EPA – Instruments > Environmental Testing > Ocean Analysis > Vehicle – Facilities > Division > Location > Building X – Methods > Social > Population Study – Materials > Compounds > Chemicals – Content Type – Knowledge Asset > Proposals

17 External Environment – Text Mining, Vertical Portals  Internet Content – Scale – impacts design and technology – speed of indexing – Limited control – Association of publishers to selection of content to none – Major subtypes – different rules – metadata and results  Complex queries and alerts – Terrorism taxonomy + geography + people + organizations  Text Mining – General or specific content and facets and categories – Dedicated tools or component of Portal – internal or external  Vertical Portal – Relatively homogenous content and users – General range of questions

18 Internet Design  Subject Matter taxonomy – Business Topics – Finance > Currency > Exchange Rates  Facets – Location > Western World > United States – People – Alphabetical and/or Topical - Organization – Organization > Corporation > Car Manufacturing > Ford – Date – Absolute or range ( to , last 30 days) – Publisher – Alphabetical and/or Topical – Organization – Content Type – list – newspapers, financial reports, etc.

19

20

21

22 Integrated Facet Application Design Issues - General  What is the right combination of elements? – Faceted navigation, metadata, browse, search, categorized search results, file plan  What is the right balance of elements? – Dominant dimension or equal facets – Browse topics and filter by facet  When to combine search, topics, and facets? – Search first and then filter by topics / facet – Browse/facet front end with a search box

23 Integrated Facet Application Design Issues - General  Homogeneity of Audience and Content  Model of the Domain – broad – How many facets do you need? – More facets and let users decide – Allow for customization – can’t define a single set  User Analysis – tasks, labeling, communities Issue – labels that people use to describe their business and label that they use to find information  Match the structure to domain and task – Users can understand different structures

24 Automatic Facets – Special Issues  Scale requires more automated solutions – More sophisticated rules  Rules to find and populate existing metadata – Variety of types of existing metadata – Publisher, title, date – Multiple implementation Standards – Last Name, First / First Name, Last  Issue of disambiguation: – Same person, different name – Henry Ford, Mr. Ford, Henry X. Ford – Same word, different entity – Ford and Ford  Number of entities and thresholds per results set / document – Usability, audience needs  Relevance Ranking – number of entities, rank of facets

25 Putting it all together – Infrastructure Solution  Facets, Taxonomies, Software, People  Combine formal power with ability to support multiple user perspectives  Facet System – interdependent, map of domain  Entity extraction – feeds facets, signatures, ontologies  Taxonomy & Auto-categorization – aboutness, subject  People – tagging, evaluating tags, fine tune rules and taxonomy  The future is the combination of simple facets with rich taxonomies with complex semantics / ontologies

Questions? Tom Reamy KAPS Group Knowledge Architecture Professional Services