Text Analytics Mini-Workshop Quick Start Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional.

Slides:



Advertisements
Similar presentations
Taxonomy Development in an Enterprise Context Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Advertisements

Taxonomy Development An Infrastructure Model Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Top Tips Enterprise Content Management Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Metadata Strategies Alternatives for creating value from metadata Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
Improving Navigation and Findability Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Beyond Sentiment New Dimensions for Social Media A Panel Discussion of Trends and Ideas Dave Hills, Twelvefold Media Mike Lazarus, Atigeo, LLC Moderator:
Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012 Quick Start for Text Analytics Tom Reamy Chief Knowledge Architect KAPS Group.
Enterprise Information Architecture A Platform for Integrating Your Organization’s Information and Knowledge Activities Tom Reamy Chief Knowledge Architect.
W w w. f a c t i v a. c o m © 2002 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All rights reserved. The Keys to Successful Strategic.
Faceted Navigation: Search and Browse Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Taxonomy Development Case Studies
Innovation in Search? Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Model of Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Knowledge Architecture Process & Case Studies Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Semantic Infrastructure Workshop Development Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Semantic Infrastructure Workshop Development Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Taxonomy Boot Camp Panel Text Analytics Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Improving Search for Discovery Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional.
Automatic Facets: Faceted Navigation and Entity Extraction Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
Copyright © 2011, SAS Institute Inc. All rights reserved. #analytics2011 Text Analytics Evaluation A Case Study: Amdocs Tom Reamy Chief Knowledge Architect.
Beyond Sentiment Mining Social Media A Panel Discussion of Trends and Ideas Marie Wallace, IBM Marcello Pellacani, Expert System Fabio Lazzarini, CRIBIS.
Enterprise Semantic Infrastructure Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Facets and Faceted Navigation Development Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Expanding Enterprise Roles for Librarians Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics Workshop Development Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Best of Both Worlds Text Analytics and Text Mining Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Selecting Taxonomy Software Who, Why, How Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Taxonomy and Knowledge Organization Taxonomy in Context Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Cyborg Categorization Salvation for Search? Tom Reamy Information Architect Charles Schwab © 2001 Charles Schwab & Co., Inc., member NYSE/SIPC. All rights.
Building a Foundation for Info Apps Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional.
IBE312: Ch15 Building an IA Team & Ch16 Tools & Software 2013.
Enterprise Search/ Text Analytics Evaluation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
ASIDIC Spring Conference ‘Smart Content’ Uncovering the Value and Benefits of Semantic Technology Richard C. Fusco Director, Content Strategy – McGraw-Hill.
Text Analytics And Text Mining Best of Text and Data
New Directions in Social Media Tom Reamy Chief Knowledge Architect KAPS Group
SemTech Text Analytics Evaluation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics and Taxonomies Tom Reamy Chief Knowledge Architect KAPS Group
Smart Text How to Turn Big Text into Big Data Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World.
Adding Semantics to Enterprise Search Workshop
Ihr Logo Data Explorer - A data profiling tool. Your Logo Agenda  Introduction  Existing System  Limitations of Existing System  Proposed Solution.
Integrating an Enterprise Taxonomy with Local Variations Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge.
Applying Semantics to Search Text Analytics Tom Reamy Chief Knowledge Architect KAPS Group Enterprise Search Summit New York.
Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Content Categorization Tools Taxonomies & Technologies for Infrastructure Solutions Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture.
Text Analytics Summit Text Analytics Evaluation Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics Software Choosing the Right Fit Tom Reamy Chief Knowledge Architect KAPS Group Text Analytics World October 20.
Metadata and Taxonomies The Best of Both Worlds Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Integrating an Enterprise Taxonomy with Local Variations Tom Reamy Chief Knowledge Architect KAPS Group Taxonomy Boot Camp.
Enterprise Semantic Infrastructure Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Folksonomy Folktales Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Selecting Taxonomy Software Who, Why, How Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Advanced Semantics and Search Beyond Tag Clouds and Taxonomies Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Text Analytics for Search Applications Workshop Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture.
Text Analytics Workshop Applications Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics Workshop Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
Data Mining With SQL Server Data Tools Mining Data Using Tools You Already Have.
Taxonomy and Text Analytics Case Studies Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Taxonomy Development An Infrastructure Model Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Text Analytics Webinar
Text Analytics Tutorial
Enterprise Social Networks A New Semantic Foundation
Text Analytics Workshop
Taxonomies, Lexicons and Organizing Knowledge
Text Analytics Workshop: Introduction
Text Analytics Workshop
Program Chair: Tom Reamy Chief Knowledge Architect
Presentation transcript:

Text Analytics Mini-Workshop Quick Start Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services

2 Agenda  Introduction – Elements of Text Analytics  Two Paths – Infrastructure to Projects – Initial Project to Infrastructure  Quick Start for Text Analytics – Knowledge Architecture Audit – Software Evaluation – Proof of Concept / Pilot  Platform for Text Analytics Applications  Questions / Discussions

3 Introduction: Elements of Text Analytics  Text Mining – NLP, statistical, predictive, machine learning – Different skills, mind set, Math not language  Semantic Technology – ontology, fact extraction  Extraction – entities – known and unknown, concepts, events – Catalogs with variants, rule based  Sentiment Analysis – Objects and phrases – statistics & rules – Positive and Negative  Summarization – Dynamic – based on a search query term – Generic – based on primary topics, position in document

4 Introduction: Elements of Text Analytics  Auto-categorization – Training sets – Bayesian, Vector space – Terms – literal strings, stemming, dictionary of related terms – Rules – simple – position in text (Title, body, url) – Semantic Network – Predefined relationships, sets of rules – Boolean– Full search syntax – AND, OR, NOT – Advanced – DIST(#), ORDDIST#, PARAGRAPH, SENTENCE  Platform for multiple features – Sentiment, Extraction – Disambiguation - Identification of objects, events, context – Distinguish Major-Minor mentions – Model more subtle sentiment

Case Study – Categorization & Sentiment 5

6

7

8 Text Analytics Workshop Two Paths: The Way Up and Down is the Same  Initial Project – Compliance, Customer Relations, Social  Plus: – Immediate value and learn by doing – Easier to get Management Buy-In  Minus: – Lack of strategic vision, platform – Potentially less value to organization in long run – Difficulty of growing to the enterprise

9 Text Analytics Workshop Two Paths: The Way Up and Down is the Same  Enterprise Infrastructure – Search, Search-based  Plus: – Establish infrastructure – faster project development – Avoid expensive mistakes – dead end technology, wrong implementation  Minus: – Longer to develop, delayed gratification – First project is more expensive  Quick Start – Strategic Vision – Software Evaluation – POC / Pilot – Infrastructure and/or good sneak path – speak enterprise

10 Text Analytics Workshop Need for a Quick Start  Text Analytics is weird, a bit academic, and not very practical It involves language and thinking and really messy stuff  On the other hand, it is really difficult to do right (Rocket Science)  Organizations don’t know what text analytics is and what it is for  Text Analytics is Infrastructure – enabling platform  Need immediate value – Difference of US and Europe/Asia  Approach – depends on who is driving the interest in text analytics  Start small and sneak enterprise perspective or sell strategic vision or ?

11 Text Analytics Workshop Need for a Quick Start  False Model – all you need is our software and your SME’s  Categorization is not a skill that SME’s have  Rule Building is more esoteric – part library science, part business analysis, part cognitive science – Experience taking taxonomy starters and customizing, rules  Interdisciplinary team – need experience putting together  Need quick win to build support AND Need strategic vision  Answer = Quick Start – KA Audit-Evaluation-POC/Pilot  Either Infrastructure or Initial Application – Difference is the depth and description

12 Text Analytics Workshop Quick Start – an adaptable methodology  Scale of Process: – 2 days – 1 week as preliminary to an initial project – 3-4 months deep research to create new foundation  Description of the process in your enterprise-speak  KA Audit: – 2 people for 2 days and a whiteboard – 2 month research project  Evaluation – Already have TA (still check current market) – 4 week project  POC / Pilot / Initial Application – 4 weeks to 6 months to years

Text Analytics Workshop The start and foundation: Knowledge Architecture Audit  Knowledge Map - Understand what you have, what you are, what you want – The foundation of the foundation  Contextual interviews, content analysis, surveys, focus groups, ethnographic studies, Text Mining  Category modeling – “Intertwingledness” -learning new categories influenced by other, related categories – Monkey, Panda, Banana  Natural level categories mapped to communities, activities – Novice prefer higher levels – Balance of informative and distinctiveness  4 Dimensions – Content, People, Technology, Activities 13

Text Analytics Workshop Knowledge Audit: Contextual Interviews  Organizational Context – Free Form – Management, enterprise wide function – What is the size and makeup of the organizational units that will be impacted by this project? – Are there special constituencies that have to be taken into account? – What is the level of political support for this project? Any opposition? – What are your major information or knowledge access issues?  These determine approach and effort for each area 14

Text Analytics Workshop Knowledge Audit: Information Interviews  Structured, feed survey – list options – Could you describe the kinds of information activities that you and your group engage in? (types of content, search, write proposals, research?) How often? – How do they carry out these activities?  Qualitative Research – What are your major information or knowledge access issues -- examples? – In an ideal world, how would information access work at your organization? – What is right and what’s wrong with today’s methods  Output = map of information communities, activities 15

Text Analytics Workshop Knowledge Audit: Map of Information Technology  Content Management – ability to integrate text analytics  Search – Integration of text analytics – Beyond XML – Metadata – facets  Existing Text Analytics – Underutilization? – Text Mining – often separate silo, how integrate?  Taxonomy Management, Databases, portals – Semantic Technologies, Wiki’s  Visualization software – Applications – business intelligence, customer support, etc.  Map- often reveals multiple redundancies, technology silos 16

Text Analytics Workshop Knowledge Audit: Content Analysis  Content Map – size, format, audience, purpose, priority, special features, data and text, etc.  Content Creation – content management workflow and real life workflow, publishing process – policy – Integrate external content – little control, massive scale  Content Structure –taxonomies, vocabularies, metadata standards  Drill Down, theme discovery – Search log analysis – Folksonomy if available – Text Mining, categorization exploration, clustering 17

Text Analytics Workshop Knowledge Audit- Output  Strategic Vision and Change Management Format – reports, enterprise ontology Political/ People and technology requirements  Business Benefits and ROI Enterprise Text Analytics- information overload – IDC study: Per 1,000 people = $ 22.5 million a year 30% improvement = $6.75 million a year Add own stories – especially cost of bad information, cost cutting  Strategic Project Plan and Road Map Text Analytics support requirements –taxonomies, resources Map of Initial Projects – and selection criteria 18

Quick Start Step Two - Software Evaluation Different Kind of software evaluation  Traditional Software Evaluation - Start – Filter One- Ask Experts - reputation, research – Gartner, etc. Market strength of vendor, platforms, etc. Feature scorecard – minimum, must have, filter to top 6 – Filter Two – Technology Filter – match to your overall scope and capabilities – Filter not a focus – Filter Three – In-Depth Demo – 3-6 vendors  Reduce to 1-3 vendors  Vendors have different strengths in multiple environments – Millions of short, badly typed documents, Build application – Library 200 page PDF, enterprise & public search 19

20 Text Analytics Workshop Current State of Text Analytics: Vendor Space  Taxonomy Management – SchemaLogic, Pool Party  From Taxonomy to Text Analytics – Data Harmony, Multi-Tes  Extraction and Analytics – Linguamatics (Pharma), Temis, whole range of companies  Business Intelligence – Clear Forest, Inxight  Sentiment Analysis – Attensity, Lexalytics, Clarabridge  Open Source – GATE  Stand alone text analytics platforms – IBM, SAS, SAP, Smart Logic, Expert System, Basis, Open Text, Megaputer, Temis, Concept Searching  Embedded in Content Management, Search – Autonomy, FAST, Endeca, Exalead, etc.

Quick Start Step Two - Software Evaluation Design of the Text Analytics Selection Team  IT - Experience with software purchases, needs assess, budget – Search/Categorization is unlike other software, deeper look  Business -understand business, focus on business value  They can get executive sponsorship, support, and budget – But don’t understand information behavior, semantic focus  Library, KM - Understand information structure  Experts in search experience and categorization – But don’t understand business or technology  Interdisciplinary Team, headed by Information Professionals  Much more likely to make a good decision  Create the foundation for implementation 21

Quick Start Step Three – Proof of Concept / Pilot Project  POC use cases – basic features needed for initial projects  Design - Real life scenarios, categorization with your content  Preparation: – Preliminary analysis of content and users information needs Training & test sets of content, search terms & scenarios – Train taxonomist(s) on software(s) – Develop taxonomy if none available  Four week POC – 2 rounds of develop, test, refine / Not OOB  Need SME’s as test evaluators – also to do an initial categorization of content  Majority of time is on auto-categorization 22

23 Text Analytics Workshop POC Design: Evaluation Criteria & Issues  Basic Test Design – categorize test set – Score – by file name, human testers  Categorization & Sentiment – Accuracy 80-90% – Effort Level per accuracy level  Combination of scores and report  Operators (DIST, etc.), relevancy scores, markup  Development Environment – Usability, Integration  Issues: – Quality of content & initial human categorization – Normalize among different test evaluators – Quality of taxonomy – structure, overlapping categories

Quick Start for Text Analytics Proof of Concept -- Value of POC  Selection of best product(s)  Identification and development of infrastructure elements – taxonomies, metadata – standards and publishing process  Training by doing –SME’s learning categorization, Library/taxonomist learning business language  Understand effort level for categorization, application  Test suitability of existing taxonomies for range of applications  Explore application issues – example – how accurate does categorization need to be for that application – 80-90%  Develop resources – categorization taxonomies, entity extraction catalogs/rules 24

Text Analytics Workshop POC and Early Development: Risks and Issues  CTO Problem –This is not a regular software process  Semantics is messy not just complex – 30% accuracy isn’t 30% done – could be 90%  Variability of human categorization  Categorization is iterative, not “the program works” – Need realistic budget and flexible project plan  Anyone can do categorization – Librarians often overdo, SME’s often get lost (keywords)  Meta-language issues – understanding the results – Need to educate IT and business in their language 25

Text Analytics Workshop – Quick Start Conclusions  Foundation for Multiple Projects – Text Analytics is a platform for multiple applications Search, BI, CI, Social Media, content enrichment, etc.  If initial project is done right, it can create a new foundation – Initial project might be 10% more, all subsequent projects will be 50% less  Apps need to learn to speak enterprise, Enterprise Text Analytics needs to learn to speak business ROI( not just productivity ROI)  Stealth text analytics is better than none, but badly done initial project can poison the well for years 26

Questions? Tom Reamy KAPS Group Upcoming: Workshop on Text Analytics: Enterprise Search Summit – New York, May Taxonomy Boot Camp, ESS, KMWorld -DC, Nov 4-7 Fall Announcement!