Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012 Quick Start for Text Analytics Tom Reamy Chief Knowledge Architect KAPS Group.

Similar presentations


Presentation on theme: "Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012 Quick Start for Text Analytics Tom Reamy Chief Knowledge Architect KAPS Group."— Presentation transcript:

1 Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012 Quick Start for Text Analytics Tom Reamy Chief Knowledge Architect KAPS Group http://www.kapsgroup.com Program Chair – Text Analytics World

2 #analytics2012 Copyright © 2012, SAS Institute Inc. All rights reserved. Quick Start for Text Analytics Agenda  Introduction – State of Text Analytics  What is Text Analytics?  Why need a Quick Start Process?  Quick Start Process - Foundation  Knowledge Audit  Text Analytics Software Evaluation  Proof of Concept / Pilot  Building on the Foundation  From POC to Development  Conclusions

3 #analytics2012 Copyright © 2012, SAS Institute Inc. All rights reserved. Quick Start for Text Analytics KAPS Group: General  Knowledge Architecture Professional Services – Network of Consultants  Program Chair – Text Analytics World – April 17-18, San Francisco  Partners – SAS, SAP, IBM, FAST, Smart Logic, Concept Searching, Attensity, Clarabridge, Lexalytics,  Services:  Strategy – IM & KM - Text Analytics, Social Media, Integration  Taxonomy/Text Analytics development, consulting, customization  Text Analytics Quick Start – Audit, Evaluation, Pilot  Social Media: Text based applications – design & development  Clients:  Genentech, Novartis, Northwestern Mutual Life, Financial Times, Hyatt, Home Depot, Harvard Business Library, British Parliament, Battelle, Amdocs, FDA, GAO, World Bank, etc.  Applied Theory – Faceted taxonomies, complexity theory, natural categories, emotion taxonomies

4 #analytics2012 Copyright © 2012, SAS Institute Inc. All rights reserved. Quick Start for Text Analytics Current State of Text Analytics  Big Data  Big Text is bigger than Big Data / 90% of information  Text Analytics as pre-processing – text into data, new variables and structure for predictive analytics  Text Mining as pre-processing for text analytics – discover patterns  New models, Watson – ensemble methods, modules - puns  Social Media / Sentiment Analysis  Next stage – beyond simple sentiment, business value  Importance of context  New emotion taxonomies  Analysis of conversations, speaker relationships, strength of social ties  Expertise Analysis, crowd sourcing, political

5 #analytics2012 Copyright © 2012, SAS Institute Inc. All rights reserved. Quick Start for Text Analytics Current State of Text Analytics  Enterprise Text Analytics (ETA)  ETA is the platform for unstructured text applications  Enterprise Content Management and Search – failed  Taxonomies and metadata – failed (Mind the Gap)  Wide Range of InfoApps – BI,CI, Fraud detection, social media  Has Text Analytics Arrived?  Survey – 28% just getting started, 11% not yet, 17.5% ETA  What is holding it back?  Lack of clarity about business value, what it is – 55%  Lack of strategic vision, real examples  Gartner – new report on text analytics

6 #analytics2012 Copyright © 2012, SAS Institute Inc. All rights reserved. 6 Introduction: Future Directions What is Text Analytics Good For?

7 #analytics2012 Copyright © 2012, SAS Institute Inc. All rights reserved. 7 Quick Start for Text Analytics What is Text Analytics?  Text Mining – NLP, statistical, predictive, machine learning  Semantic Technology – ontology, fact extraction  Extraction – entities – known and unknown, concepts, events  Catalogs with variants, rule based  Sentiment Analysis  Objects and phrases – statistics & rules – Positive and Negative  Auto-categorization  Training sets, Terms, Semantic Networks  Rules: Boolean - AND, OR, NOT  Advanced – DIST(#), ORDDIST#, PARAGRAPH, SENTENCE  Disambiguation - Identification of objects, events, context  Build rules based, not simply Bag of Individual Words

8 #analytics2012 Copyright © 2012, SAS Institute Inc. All rights reserved. Case Study – Categorization & Sentiment 8

9 #analytics2012 Copyright © 2012, SAS Institute Inc. All rights reserved. Case Study – Categorization & Sentiment 9

10 #analytics2012 Copyright © 2012, SAS Institute Inc. All rights reserved. Case Study – Taxonomy Development 10

11 #analytics2012 Copyright © 2012, SAS Institute Inc. All rights reserved. Case Study – Taxonomy Development 11

12 #analytics2012 Copyright © 2012, SAS Institute Inc. All rights reserved. 12 Quick Start for Text Analytics Need for a Quick Start  Text Analytics is weird, a bit academic, and not very practical »It involves language and thinking and really messy stuff  On the other hand, it is really difficult to do right (Rocket Science)  Organizations don’t know what text analytics is and what it is for  Survey shows - need two things: »Strategic vision of text analytics in the enterprise »Business value, problems solved, information overload »Text Analytics as platform for information access »Real life functioning program showing value and demonstrating an understanding of what it is and does  Text Analytics is more than Text Mining  Enterprise TA or One Application – same process, different scale

13 #analytics2012 Copyright © 2012, SAS Institute Inc. All rights reserved. 13 Quick Start for Text Analytics The start and foundation: Knowledge Architecture Audit  Knowledge Map - Understand what you have, what you are, what you want  The foundation of the foundation  Contextual interviews, content analysis, surveys, focus groups, ethnographic studies, Text Mining  Category modeling – “Intertwingledness” -learning new categories influenced by other, related categories  Monkey, Panda, Banana  Natural level categories mapped to communities, activities  Novice prefer higher levels  Balance of informative and distinctiveness  4 Dimensions – Content, People, Technology, Activities

14 #analytics2012 Copyright © 2012, SAS Institute Inc. All rights reserved. Quick Start for Text Analytics Knowledge Audit: Contextual Interviews  Organizational Context – Free Form  Management, enterprise wide function  What is the size and makeup of the organizational units that will be impacted by this project?  Are there special constituencies that have to be taken into account?  What is the level of political support for this project? Any opposition?  What are your major information or knowledge access issues?  These determine approach and effort for each area »Content Map often the most complex and time-consuming

15 #analytics2012 Copyright © 2012, SAS Institute Inc. All rights reserved. Quick Start for Text Analytics Knowledge Audit: Information Interviews  Structured, feed survey – list options  Could you describe the kinds of information activities that you and your group engage in? (types of content, search, write proposals, research?) How often?  How do they carry out these activities?  Qualitative Research  What are your major information or knowledge access issues -- examples?  In an ideal world, how would information access work at your organization?  What is right and what’s wrong with today’s methods  Output = map of information communities, activities

16 #analytics2012 Copyright © 2012, SAS Institute Inc. All rights reserved. Quick Start for Text Analytics Knowledge Audit: Map of Information Technology  Content Management – ability to integrate text analytics  Search – Integration of text analytics – Beyond XML  Metadata – facets  Existing Text Analytics – Underutilization?  Text Mining – often separate silo, how integrate?  Taxonomy Management, Databases, portals  Semantic Technologies, Wiki’s  Visualization software  Applications – business intelligence, customer support, etc.  Map- often reveals multiple redundancies, technology silos

17 #analytics2012 Copyright © 2012, SAS Institute Inc. All rights reserved. Quick Start for Text Analytics Knowledge Audit: Content Analysis  Content Map – size, format, audience, purpose, priority, special features, data and text, etc.  Content Creation – content management workflow and real life workflow, publishing process – policy  Integrate external content – little control, massive scale  Content Structure –taxonomies, vocabularies, metadata standards  Drill Down, theme discovery  Search log analysis  Folksonomy if available  Text Mining, categorization exploration, clustering

18 #analytics2012 Copyright © 2012, SAS Institute Inc. All rights reserved. Quick Start for Text Analytics Knowledge Audit- Output  Strategic Vision and Change Management »Format – reports, enterprise ontology »Political/ People and technology requirements  Business Benefits and ROI »Enterprise Text Analytics- information overload – IDC study: »Per 1,000 people = $ 22.5 million a year »30% improvement = $6.75 million a year »Add own stories – especially cost of bad information  Strategic Project Plan and Road Map »Text Analytics support requirements –taxonomies, resources »Map of Initial Projects – and selection criteria  Software Evaluation, Proof of Concept or Initial Project?

19 #analytics2012 Copyright © 2012, SAS Institute Inc. All rights reserved. Quick Start for Text Analytics Evaluation Process & Methodology  Build on Knowledge Audit  Deep well articulated evaluation  Standard Software Evaluation – if needed  Filter One- Ask Experts - reputation, research – Gartner, etc. »Market strength of vendor, platforms, etc. »Feature scorecard – minimum, must have, filter to top 3-6  Filter Two – Technology Filter – match to your overall scope and capabilities – Filter not a focus  Filter Three – In-Depth Demo – 3-6 vendors  Goal – Eliminate the unfit  Selection of best 1-3 – preparation for POC  Input into design of POC 19

20 #analytics2012 Copyright © 2012, SAS Institute Inc. All rights reserved. 20 Quick Start for Text Analytics Software Evaluation / Sole Source  Amdocs  Customer Support Notes – short, badly written, millions of documents  Total Cost, multiple languages, Integration with their application  Distributed expertise – SME, not classification  Platform – resell full range of services, Sentiment Analysis  Twenty to Four to POC IBM vs. SAS to SAS  GAO  Library of 200 page PDF formal documents, public web site  People – library staff – 3-4 taxonomists – centralized expertise  Enterprise search, general public  Twenty Vendors to POC with SAS

21 #analytics2012 Copyright © 2012, SAS Institute Inc. All rights reserved. Proof of Concept (POC) or Pilot Test Cases from Knowledge Audit – Identify Critical Variables  Measurable Quality of results is the essential factor  Design - Real life scenarios, categorization with your content  Preparation:  Preliminary analysis of content and users information needs »Training & test sets of content, search terms & scenarios  Train taxonomist(s) on software(s)  Develop taxonomy if none available  Four week POC – 2 rounds of develop, test, refine / Not OOB  Need SME’s as test evaluators – also to do an initial categorization of content  Majority of time is on auto-categorization  POC use cases – basic features needed for initial projects 21

22 #analytics2012 Copyright © 2012, SAS Institute Inc. All rights reserved. Quick Start for Text Analytics Proof of Concept: Design of the Team Traditional Candidates – IT&, Business, Library  IT - Experience with software purchases, needs assess, budget  Search/Categorization is unlike other software, deeper look  Business -understand business, focus on business value  They can get executive sponsorship, support, and budget  But don’t understand information behavior, semantic focus  Library, KM - Understand information structure  Experts in search experience and categorization  But don’t understand business or technology  Interdisciplinary team headed up by info professionals  Enterprise Architecture (CAO?)  Make a better decision, foundation for development 22

23 #analytics2012 Copyright © 2012, SAS Institute Inc. All rights reserved. Quick Start for Text Analytics Proof of Concept -- Value of POC  Selection of best product(s)  Identification and development of infrastructure elements – taxonomies, metadata – standards and publishing process  Training by doing –SME’s learning categorization, Library/taxonomist learning business language  Understand effort level for categorization, application  Test suitability of existing taxonomies for range of applications  Explore application issues – example – how accurate does categorization need to be for that application – 80-90%  Develop resources – categorization taxonomies, entity extraction catalogs/rules

24 #analytics2012 Copyright © 2012, SAS Institute Inc. All rights reserved. Quick Start for Text Analytics POC and Early Development: Risks and Issues  CTO Problem –This is not a regular software process  Semantics is messy not just complex  30% accuracy isn’t 30% done – could be 90%  Variability of human categorization  Categorization is iterative, not “the program works”  Need realistic budget and flexible project plan  Anyone can do categorization  Librarians often overdo, SME’s often get lost (keywords)  Meta-language issues – understanding the results  Need to educate IT and business in their language  Need for a quick start win – deep understanding 24

25 #analytics2012 Copyright © 2012, SAS Institute Inc. All rights reserved. Quick Start for Text Analytics Building on the Foundation  Initial Projects – Processes to apply Text Analytics  New Electronic Publishing Process »Use text analytics to tag, new hybrid workflow  New Enterprise Search »Build faceted navigation on metadata, extraction  Applications – Business Intelligence – Behavior Prediction »Combine with big data – free text in surveys, social media »Internet application – spider and categorize, extract  Interdisciplinary Processes – Integrating the pieces  Content management, search, InfoApps  Social Media – adding intelligence to simple sentiment  In depth project based on your real needs, not buzz

26 #analytics2012 Copyright © 2012, SAS Institute Inc. All rights reserved. 26 Quick Start for Text Analytics Beyond Simple Sentiment  Beyond Good and Evil (positive and negative)  Social Media is approaching next stage (growing up)  Where is the value? How get better results?  Sentiment Analysis is easy to do - wrong  Importance of Context – around positive and negative words  Rhetorical reversals – “I was expecting to love it”  Issues of sarcasm, (“Really Great Product”), slanguage  Limited value of Positive and Negative  Early Categorization – Politics or Sports  Degrees of intensity, complexity of emotions and documents  Addition of focus on behaviors – why someone calls a support center – and likely outcomes

27 #analytics2012 Copyright © 2012, SAS Institute Inc. All rights reserved. 27

28 #analytics2012 Copyright © 2012, SAS Institute Inc. All rights reserved. 28 Quick Start for Text Analytics Behavior Prediction – Telecom Customer Service  Problem – distinguish customers likely to cancel from mere threats  Analyze customer support notes  General issues – creative spelling, second hand reports  Develop categorization rules  First – distinguish cancellation calls – not simple  Second - distinguish cancel what – one line or all  Third – distinguish real threats

29 #analytics2012 Copyright © 2012, SAS Institute Inc. All rights reserved. 29 Quick Start for Text Analytics Behavior Prediction – Telecom Customer Service  Basic Rule  (START_20, (AND,  (DIST_7,"[cancel]", "[cancel-what-cust]"),  (NOT,(DIST_10, "[cancel]", (OR, "[one-line]", "[restore]", “[if]”)))))  Examples :  customer called to say he will cancell his account if the does not stop receiving a call from the ad agency.  cci and is upset that he has the asl charge and wants it off or her is going to cancel his act  ask about the contract expiration date as she wanted to cxl teh acct  Combine sophisticated rules with sentiment statistical training and Predictive Analytics and behavior monitoring

30 #analytics2012 Copyright © 2012, SAS Institute Inc. All rights reserved. Quick and Smart Start for Text Analytics Conclusion  Start with self-knowledge – what will you use it for?  Knowledge audit – content, people, activities, technology  Create Information Strategy / Vision – Text Analytics Focus  Focus on business value and solutions not the technology  POC/ Pilot – your content, real world scenarios  Foundation for development, experience with software, process  Integration – need an integrated team on integrated platform  Initial Development Projects – within strategic context, POC+  Quick Win – real value, learn, and can be used to sell the vision  Text Analytics is all about context – in enterprise and in text  Think Big, Start Small, Scale Fast 30

31 Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012 Questions Tom Reamy Chief Knowledge Architect KAPS Group http://www.kapsgroup.com Program Chair – Text Analytics World


Download ppt "Copyright © 2012, SAS Institute Inc. All rights reserved. #analytics2012 Quick Start for Text Analytics Tom Reamy Chief Knowledge Architect KAPS Group."

Similar presentations


Ads by Google