Presentation is loading. Please wait.

Presentation is loading. Please wait.

Improving Search for Discovery Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional.

Similar presentations


Presentation on theme: "Improving Search for Discovery Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional."— Presentation transcript:

1 Improving Search for Discovery Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services http://www.kapsgroup.com

2 Improving Search for Discovery and Everything Else Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services http://www.kapsgroup.com

3 3 Agenda  Introduction  What is Wrong With Search?  What Works? – Metadata & taxonomies – Infrastructure / Information Life Cycle  Yes, But – – Missing Link - Text Analytics – Search and Beyond  Conclusion

4 4 Introduction: KAPS Group  Knowledge Architecture Professional Services – Network of Consultants  Applied Theory – Faceted taxonomies, complexity theory, natural categories, emotion taxonomies  Services: – Strategy – IM & KM - Text Analytics, Social Media, Integration – Taxonomy/Text Analytics development, consulting, customization – Text Analytics Quick Start – Audit, Evaluation, Pilot – Social Media: Text based applications – design & development  Partners: Smart Logic, Expert Systems, SAS, SAP, IBM, FAST, Concept Searching, Attensity, Clarabridge, Lexalytics  Clients: Genentech, Novartis, Northwestern Mutual Life, Financial Times, Hyatt, Home Depot, Harvard Business Library, British Parliament, Battelle, Amdocs, FDA, GAO, World Bank, etc.  Presentations, Articles, White Papers – www.kapsgroup.comwww.kapsgroup.com  Program Chair – Text Analytics World

5 5 Improving Search for Discovery  They  Won’t  Work!

6 6 Improving Search for Discovery Why Won’t It Work?  Search Engines are Stupid! – (and people have better things to do)  Documents deal in language BUT it’s all chicken scratches to Search  Relevance – requires meaning – Imagine trying to understand what a document is about in a language you don’t know  Mzndin agenpfre napae ponaoen afpenafpenae timtnoe. – Dictionary of chicken scratches (variants, related) – Count the number of chicken scratches = relevance - Not  Google = popularity of web sites and Best Bets – For documents in an enterprise – Counting and Weighting

7 7 Improving Search for Discovery Why Won’t It Work?  Option – Add metadata – good for archiving & indexing  Keywords – don’t scale – Pilots or small doc set and many authors – Folksonomies don’t really work  Tagging – Governance – Thou Shalt Tag! – No they won’t or really badly  Add taxonomies – beautiful to behold, but gap between taxonomy and documents – and too complex for authors  Power Search – statistical signature of a document – apply all kinds of math = Find Similar!  Not trashing search, but just want to say: – Survey Says – Users Unhappy with Search – Text Analytics is (part of) the answer

8 8 Semantic Infrastructure Text Analytics Features  Text Mining – NLP, machine learning, complex statistics  Noun Phrase Extraction – Feed facets – People, Organizations, Dates, Geographic, Methods, etc. – Catalogs with variants, rule based dynamic.  Sentiment Analysis – Positive and Negative Phrases – Dictionaries & rules – “I hate your product”  Summarization – replace snippets  Ontologies – fact extraction + reasoning about relationships  Auto-categorization – built on a taxonomy – Training sets, Terms, Semantic Networks – Rules: AND, OR, NOT, DIST, PARAGRAPH, SENTENCE – Foundation – subjects, disambiguation, add intelligence to all

9 Case Study – Categorization & Sentiment 9

10 Improving Search Adding Meaning and Structure  Text Analytics and Taxonomy Together – Text Analytics provides the power to apply the taxonomy – And metadata of all kinds – Consistent in every dimension, powerful and economic  Hybrid Model – Publish Document -> Text Analytics analysis -> suggestions for categorization, entities, metadata - > present to author – Cognitive task is simple -> react to a suggestion instead of select from head or a complex taxonomy – Feedback – if author overrides -> suggestion for new category – Facets – Requires a lot of Metadata - Entity Extraction feeds facets  Hybrid – Automatic is really a spectrum – depends on context – Automatic – adding structure at search results 10

11 11 Improving Search Adding Meaning and Structure  Documents are not unstructured – they have a variety of structures  Categorization by page, sections (text markers) or even sentence or phrase  Use generic components – like the level of generality of terms or concepts (general and context specific)  Additional metadata - document types-purpose, authors  Relevance – complex rules – based on structure (intelligent use of titles, headlines, sections + complex categorization

12 12 Improving Search Document Type Rules  (START_2000, (AND, (OR, _/article:"[Abstract]", _/article:"[Methods]“), (OR,_/article:"clinical trial*", _/article:"humans",  (NOT, (DIST_5, (OR,_/article:"approved", _/article:"safe", _/article:"use", _/article:"animals"),  If the article has sections like Abstract or Methods  AND has phrases around “clinical trials / Humans” and not words like “animals” within 5 words of “clinical trial” words – count it and add up a relevancy score  Primary issue – major mentions, not every mention – Combination of noun phrase extraction and categorization – Results – virtually 100%

13 13 Need One More Piece: Smart Semantic Infrastructure  Integrate entire information life cycle & environment  Semantic Layer = Content, Taxonomies, Metadata, Vocabularies + Text Analytics – Integrated / Federated Search – all content  Technology Layer – Search, Content Management, SharePoint, Intranets  People – communities (formal and dynamic), business processes (embedded information needs and behaviors)  Publishing process – Hybrid human automatic structure (tagging)  Feedback is essential – direct user comments to deep analytics

14 Search Can Work!  Simple Subject Taxonomy structure – Easy to develop and maintain  Combined with categorization capabilities – Added power and intelligence  Combined with Faceted Metadata – Dynamic selection of simple categories – Allow multiple user perspectives Can’t predict all the ways people think Monkey, Banana, Panda  Combined with ontologies and semantic data – Multiple applications – Text mining to Search  Combined with feedback before and after Search  ROI is enormous - $7M per 1,000 employees a year 14

15 15 Enterprise Text Analytics Building on the Foundation: Applications  Focus on business value, cost cutting  Enhancing information access is means, not an end – Governance, Records Management, Doc duplication, Compliance – Business Intelligence, CI, Behavior Prediction – eDiscovery, litigation support, Risk Management – Productivity / Portals -KM communities & knowledge bases  Sentiment Analysis, Social Media Analysis – Adding Search-based intelligence – context – New taxonomies – emotion, Appraisal

16 16 Beyond Search: Info Apps Search-based Applications Plus  Legal Review – Significant trend – computer-assisted review – TA- categorize and filter to smaller, more relevant set – Payoff is big – One firm with 1.6 M docs – saved $2M  Expertise Location – Data (HR) plus text – authored documents – subject & level  Financial Services – Combine structured data (what) and unstructured text (why) – Anti-Money Laundering

17 17 Beyond Search: Info Apps Behavior Prediction – Telecom Customer Service  Problem – distinguish customers likely to cancel from mere threats  Basic Rule – (START_20, (AND, (DIST_7,"[cancel]", "[cancel-what-cust]"), – (NOT,(DIST_10, "[cancel]", (OR, "[one-line]", "[restore]", “[if]”)))))  Examples: – customer called to say he will cancell his account if the does not stop receiving a call from the ad agency. – cci and is upset that he has the asl charge and wants it off or her is going to cancel his act  More sophisticated analysis of text and context in text  Combine text analytics with Predictive Analytics and traditional behavior monitoring for new applications

18 18 Beyond Search: Info Apps Pronoun Analysis: Fraud Detection - Enron Emails  Patterns of “Function” words reveal wide range of insights  Function words = pronouns, articles, prepositions, conjunctions, etc. – Used at a high rate, short and hard to detect, very social, processed in the brain differently than content words  Areas: sex, age, power-status, personality – individuals and groups  Lying / Fraud detection: Documents with lies have – Fewer and shorter words, fewer conjunctions, more positive emotion words – More use of “if, any, those, he, she, they, you”, less “I” – More social and causal words, more discrepancy words  Current research – 76% accuracy in some contexts  Text Analytics can improve accuracy and utilize new sources

19 19 Conclusions  Traditional Search improvements – nice, but  Relevance needs meaning, Keyword and human tagging don’t work  Search + Text Analytics + Semantic Infrastructure work  Text Analytics THE essential component of a multi-modal solution  Semantic Infrastructure – Content, People, Technology, Processes – Integration of text analytics, search, content management – Hybrid Model of tagging – best of human & machine  Smart Search as foundation for new universe of Apps  = Success beyond your wildest dreams!

20 20 Conclusions  Now You Believe!  So, what next – how can you get started?  Quick Start – software evaluation, Knowledge Map, POC or Pilot = Good choice and Learn by doing  Fall – Attend ESS, TBC, KMWorld – latest ideas  Or develop a time machine and go back to yesterday and take my workshop  Fall 2014 – early 2015: New Book: – Text Analytics: Everything You Need to Know to Conquer Information Overload, Mine Social Media for Real Value, and Turn Big Text Into Big Data – Title might be shorter but it will be cover all you need to know

21 Questions? Tom Reamy tomr@kapsgroup.com KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com


Download ppt "Improving Search for Discovery Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional."

Similar presentations


Ads by Google