Text Analytics for Search Applications Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

Slides:



Advertisements
Similar presentations
Taxonomy Development An Infrastructure Model
Advertisements

Taxonomy Development in an Enterprise Context Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Taxonomy Development An Infrastructure Model Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Top Tips Enterprise Content Management Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Metadata Strategies Alternatives for creating value from metadata Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
Improving Navigation and Findability Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Buy, Build, Automate: Why you should Buy Your Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Cyborg Categorization The Basics Tom Reamy Knowledge Architect Intranet Consultant.
Beyond Sentiment New Dimensions for Social Media A Panel Discussion of Trends and Ideas Dave Hills, Twelvefold Media Mike Lazarus, Atigeo, LLC Moderator:
Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012 Quick Start for Text Analytics Tom Reamy Chief Knowledge Architect KAPS Group.
Enterprise Information Architecture A Platform for Integrating Your Organization’s Information and Knowledge Activities Tom Reamy Chief Knowledge Architect.
Search, Browse, and Faceted Navigation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Faceted Navigation: Search and Browse Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Taxonomy Development Case Studies
Innovation in Search? Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Model of Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Knowledge Architecture Process & Case Studies Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Semantic Infrastructure Workshop Development Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Semantic Infrastructure Workshop Development Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Taxonomy Boot Camp Panel Text Analytics Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Improving Search for Discovery Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional.
Automatic Facets: Faceted Navigation and Entity Extraction Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
Copyright © 2011, SAS Institute Inc. All rights reserved. #analytics2011 Text Analytics Evaluation A Case Study: Amdocs Tom Reamy Chief Knowledge Architect.
Beyond Sentiment Mining Social Media A Panel Discussion of Trends and Ideas Marie Wallace, IBM Marcello Pellacani, Expert System Fabio Lazzarini, CRIBIS.
Enterprise Semantic Infrastructure Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Beyond Sentiment Mining Social Media Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Facets and Faceted Navigation Development Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Expanding Enterprise Roles for Librarians Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics Workshop Development Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Best of Both Worlds Text Analytics and Text Mining Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Unstructured Content Management Taxonomic Publishing Models Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
Selecting Taxonomy Software Who, Why, How Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Taxonomy and Knowledge Organization Taxonomy in Context Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Building a Foundation for Info Apps Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional.
Enterprise Search/ Text Analytics Evaluation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics And Text Mining Best of Text and Data
Best of All Worlds Text Analytics and Text Mining and Taxonomy Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
Essentials of Knowledge Architecture Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
New Directions in Social Media Tom Reamy Chief Knowledge Architect KAPS Group
SemTech Text Analytics Evaluation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics and Taxonomies Tom Reamy Chief Knowledge Architect KAPS Group
Smart Text How to Turn Big Text into Big Data Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World.
Adding Semantics to Enterprise Search Workshop
Integrating an Enterprise Taxonomy with Local Variations Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge.
Applying Semantics to Search Text Analytics Tom Reamy Chief Knowledge Architect KAPS Group Enterprise Search Summit New York.
Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Content Categorization Tools Taxonomies & Technologies for Infrastructure Solutions Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture.
Text Analytics Summit Text Analytics Evaluation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics Software Choosing the Right Fit Tom Reamy Chief Knowledge Architect KAPS Group Text Analytics World October 20.
New Directions in Social Media Tom Reamy Chief Knowledge Architect KAPS Group
Metadata and Taxonomies The Best of Both Worlds Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Integrating an Enterprise Taxonomy with Local Variations Tom Reamy Chief Knowledge Architect KAPS Group Taxonomy Boot Camp.
Text Analytics Mini-Workshop Quick Start Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional.
Enterprise Semantic Infrastructure Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Folksonomy Folktales Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Selecting Taxonomy Software Who, Why, How Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Advanced Semantics and Search Beyond Tag Clouds and Taxonomies Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture.
Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services.
Taxonomy and Text Analytics Case Studies Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Taxonomy Development An Infrastructure Model Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics Webinar
Tom Reamy Chief Knowledge Architect KAPS Group
Enterprise Social Networks A New Semantic Foundation
Text Analytics Workshop: Introduction
Text Analytics Workshop
Program Chair: Tom Reamy Chief Knowledge Architect
Expertise Location Basic Level Categories
Presentation transcript:

Text Analytics for Search Applications Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services

2 Agenda  Introduction – Text Analytics & Infrastructure Platform – Text Analytics Features – Semantic Infrastructure – Taxonomy, Metadata, Technology – Value of Text Analytics – Getting Started with Text Analytics  Development – Taxonomy, Categorization, Faceted Metadata  Text Analytics Applications – Integration with Search and ECM – Platform for Information Applications  Questions / Discussions

3 KAPS Group: General  Knowledge Architecture Professional Services – Network of Consultants  Partners – SAS, SAP, IBM, FAST, Smart Logic, Concept Searching – Attensity, Clarabridge, Lexalytics,  Strategy – IM & KM - Text Analytics, Social Media, Integration  Services: – Taxonomy/Text Analytics development, consulting, customization – Text Analytics Quick Start – Audit, Evaluation, Pilot – Social Media: Text based applications – design & development  Clients: – Genentech, Novartis, Northwestern Mutual Life, Financial Times, Hyatt, Home Depot, Harvard Business Library, British Parliament, Battelle, Amdocs, FDA, GAO, etc.  Applied Theory – Faceted taxonomies, complexity theory, natural categories, emotion taxonomies Presentations, Articles, White Papers –

4 Agenda – Introduction Text Analytics & Semantic Infrastructure  Text Analytics Features – Categorization & Extraction  Semantic Infrastructure – Taxonomy, Metadata, Technology  Value of Text Analytics – Enterprise Search that works  Getting Started with Text Analytics – Text Analytics Strategy & Vision – Text Analytics Evaluation / Quick Start

5 Introduction to Text Analytics Text Analytics Features  Noun Phrase Extraction / Fact Extraction – Catalogs with variants, rule based dynamic – Relationships of entities – people-organizations-activities  Sentiment Analysis – Objects and phrases – statistics & rules – Positive and Negative  Summarization – replace snippets  Auto-categorization – built on a taxonomy – Training sets, Terms, Semantic Networks – Rules: AND, OR, NOT, DIST, PARAGRAPH, SENTENCE  Auto-categorization as Foundation – Disambiguation - Identification of objects, events, context – Build rules based, not simply Bag of Individual Words

Case Study – Categorization & Sentiment 6

7

8

9 Introduction to Text Analytics Taxonomy & Metadata  Thesauri, Controlled Vocabulary, Glossaries, Product Catalogs – Resources to build on  SharePoint – Managed Metadata Services – Term stores – corporate taxonomies – Enterprise Keywords (Folksonomy)  Metadata standards – Dublin Core - Mostly syntactic not semantic – Semantic – keywords – very poor performance, no structure  Facets – classes of metadata – Standard - People, Organization, Document type-purpose – Requires huge amounts of metadata

10 Introduction to Text Analytics TA & Taxonomy Complimentary Information Platform  Taxonomy provides a consistent and common vocabulary – Enterprise resource – integrated not centralized  Text Analytics provides a consistent tagging – Human indexing is subject to inter and intra individual variation  Taxonomy provides the basic structure for categorization – And candidates terms  Text Analytics provides the power to apply the taxonomy – And metadata of all kinds  Text Analytics and Taxonomy Together – Platform – Consistent in every dimension – Powerful and economic

Introduction to Text Analytics Taxonomy and Text Analytics  Standard Taxonomies = starter categorization rules – Example – Mesh – bottom 5 layers are terms  Categorization taxonomy structure – Tradeoff of depth and complexity of rules – Easier to maintain taxonomy, but need to refine rules  Analysis of taxonomy – suitable for categorization – Structure – not too flat, not too large – Orthogonal categories  Smaller modular taxonomies – More flexible relationships – not just Is-A-Kind/Child-Of  Different kinds of taxonomies – Sentiment – products and features Taxonomy of Sentiment, Emotion - Expertise – process 11

12 Introduction to Text Analytics Metadata - Tagging  How do you bridge the gap – taxonomy to documents?  Tagging documents with taxonomy nodes is tough – And expensive – central or distributed  Library staff –experts in categorization not subject matter – Too limited, narrow bottleneck – Often don’t understand business processes and business uses  Authors – Experts in the subject matter, terrible at categorization – Intra and Inter inconsistency, “intertwingleness” – Choosing tags from taxonomy – complex task – Folksonomy – almost as complex, wildly inconsistent – Resistance – not their job, cognitively difficult = non-compliance  Text Analytics is the answer(s)!

13 Introduction to Text Analytics Content Management – SharePoint  Mind the Gap – Manual, Automatic, Hybrid  All require human effort – issue of where and how effective  Manual - human effort is tagging (difficult, inconsistent)  Automatic and Hybrid - human effort is prior to tagging – Build on expertise – librarians on categorization, SME’s on subject terms  Hybrid Model – Publish Document -> Text Analytics analysis -> suggestions for categorization, entities, metadata - > present to author – Cognitive task is simple -> react to a suggestion instead of select from head or a complex taxonomy – Feedback – if author overrides -> suggestion for new category – Facets – Requires a lot of Metadata - Entity Extraction feeds facets  Hybrid – Automatic is really a spectrum – depends on context

14 Introduction to Text Analytics Benefits of Text Analytics  Why Text Analytics? – Enterprise search has failed to live up to its potential – Enterprise Content management has failed to live up to its potential – Taxonomy has failed to live up to its potential – Adding metadata, especially keywords has not worked  What is missing? – Intelligence – human level categorization, conceptualization – Infrastructure – Integrated solutions not technology, software  Text Analytics can be the foundation that (finally) drives success – search, content management, and much more

15 Text Analytics Platform – Benefits IDC White Paper  Time Wasted – Reformat information - $5.7 million per 1,000 per year – Not finding information - $5.3 million per 1,000 – Recreating content - $4.5 Million per 1,000  Small Percent Gain = large savings – 1% - $10 million – 5% - $50 million – 10% - $100 million

16 Text Analytics Platform – Benefits  Findability within and outside the enterprise – Savings per year - $millions  Rescue enterprise search and ECM projects – Add semantics to search  Clean up enterprise content – Duplication and accurate categorization  Improve the quality of information access – Finding the right information can save millions  Build smarter applications – Social networking, locate expertise within the enterprise

17 Text Analytics Platform – Benefits  Understand your customers – What they are talking about and how they feel about it  Empower your employees – Not only more time, but they work smarter  Understand your competitors – What they are working on, talking about – Combine unstructured content and rich data sources – more intelligent analysis

18 Text Analytics Platform – Dangers  Text Analytics as a software project  Not enough resources – to develop, to maintain-refine  Wrong resources – SME’s, IT, Library – Need all of the above and taxonomists+  Bad Design: – Start with bad taxonomy – Wrong taxonomy – too big or two flat  Bad Categorization / Entity Extraction – Right kind of experience

19 Getting Started with Text Analytics Text Analytics Vision & Strategy  Strategic Questions – why, what value from the text analytics, how are you going to use it – Platform or Applications?  What are the basic capabilities of Text Analytics?  What can Text Analytics do for Search? – After 10 years of failure – get search to work?  What can you do with smart search based applications? – RM, PII, Social  ROI for effective search – difficulty of believing – Problems with metadata, taxonomy

20 Getting Started with Text Analytics Text Analytics Vision & Strategy  Simple Subject Taxonomy structure – Easy to develop and maintain  Combined with categorization capabilities – Added power and intelligence  Combined with people tagging, refining tags  Combined with Faceted Metadata – Dynamic selection of simple categories – Allow multiple user perspectives Can’t predict all the ways people think Monkey, Banana, Panda  Combined with ontologies and semantic data – Multiple applications – Text mining to Search – Combine search and browse

Step 1 : TA Information Audit Start with Self Knowledge  Info Problems – what, how severe  Formal Process - KA audit – content, users, technology, business and information behaviors, applications - Or informal for smaller organization,  Contextual interviews, content analysis, surveys, focus groups, ethnographic studies, Text Mining  Category modeling – Cognitive Science – how people think  Natural level categories mapped to communities, activities Novice prefer higher levels Balance of informative and distinctiveness  Text Analytics Strategy/Model – forms, technology, people 21

Step 1 : TA Information Audit Start with Self Knowledge  Ideas – Content and Content Structure – Map of Content – Tribal language silos – Structure – articulate and integrate – Taxonomic resources  People – Producers & Consumers – Communities, Users, Central Team  Activities – Business processes and procedures – Semantics, information needs and behaviors – Information Governance Policy  Technology – CMS, Search, portals, text analytics – Applications – BI, CI, Semantic Web, Text Mining 22

23 Step 2: TA Evaluation Varieties of Taxonomy/ Text Analytics Software  Taxonomy Management - extraction  Full Platform – SAS, SAP, Smart Logic, Concept Searching, Expert System, IBM, Linguamatics, GATE  Embedded – Search or Content Management – FAST, Autonomy, Endeca, Vivisimo, NLP, etc. – Interwoven, Documentum, etc.  Specialty / Ontology (other semantic) – Sentiment Analysis – Attensity, Lexalytics, Clarabridge, Lots – Ontology – extraction, plus ontology

Step 2: Text Analytics Evaluation Different Kind of software evaluation  Traditional Software Evaluation - Start – Filter One- Ask Experts - reputation, research – Gartner, etc. Market strength of vendor, platforms, etc. Feature scorecard – minimum, must have, filter to top 6 – Filter Two – Technology Filter – match to your overall scope and capabilities – Filter not a focus – Filter Three – In-Depth Demo – 3-6 vendors  Reduce to 1-3 vendors  Vendors have different strengths in multiple environments – Millions of short, badly typed documents, Build application – Library 200 page PDF, enterprise & public search 24

Design of the Text Analytics Selection Team Traditional Candidates – IT&, Business, Library  IT - Experience with software purchases, needs assess, budget – Search/Categorization is unlike other software, deeper look  Business -understand business, focus on business value  They can get executive sponsorship, support, and budget – But don’t understand information behavior, semantic focus  Library, KM - Understand information structure  Experts in search experience and categorization – But don’t understand business or technology 25

Design of the Text Analytics Selection Team  Interdisciplinary Team, headed by Information Professionals  Relative Contributions – IT – Set necessary conditions, support tests – Business – provide input into requirements, support project – Library – provide input into requirements, add understanding of search semantics and functionality  Much more likely to make a good decision  Create the foundation for implementation 26

Step 3: Proof of Concept / Pilot Project  4 weeks POC – bake off / or short pilot  Real life scenarios, categorization with your content  2 rounds of development, test, refine / Not OOB  Need SME’s as test evaluators – also to do an initial categorization of content  Measurable Quality of results is the essential factor  Majority of time is on auto-categorization  Need to balance uniformity of results with vendor unique capabilities – have to determine at POC time  Taxonomy Developers – expert consultants plus internal taxonomists 27

Questions? Tom Reamy KAPS Group Knowledge Architecture Professional Services

29 Resources  Conferences: – Text Analytics World – All aspects of text analytics Text Analytics World Call for Speakers – Oct 3-4 Boston – Text Analytics Summit – social media focus Text Analytics Summit  LinkedIn Groups: – Text Analytics World – Text Analytics Group – Data and Text Professionals – Sentiment Analysis – Metadata Management – Semantic Technologies

30 Resources  Books – Women, Fire, and Dangerous Things George Lakoff – Knowledge, Concepts, and Categories Koen Lamberts and David Shanks – The Stuff of Thought – Steven Pinker  Journals – Academic – Cognitive Science, Linguistics, NLP – Applied – Scientific American Mind, New Scientist