Presentation is loading. Please wait.

Presentation is loading. Please wait.

Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture.

Similar presentations


Presentation on theme: "Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture."— Presentation transcript:

1 Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture Professional Services http://www.kapsgroup.com

2 2 Agenda  Introduction  Project: Update ACM taxonomy – after 12+ years  Information Environment  Text Mining / Text Analytics   Multiple Methods / Reports  Conclusion

3 3 Introduction: KAPS Group  Knowledge Architecture Professional Services – Network of Consultants  Applied Theory – Faceted & emotion taxonomies, natural categories Services: – Strategy – IM & KM - Text Analytics, Social Media, Integration – Taxonomy/Text Analytics, Social Media development, consulting – Text Analytics Quick Start – Audit, Evaluation, Pilot  Partners – Smart Logic, Expert Systems, SAS, SAP, IBM, FAST, Concept Searching, Attensity, Clarabridge, Lexalytics  Clients: Genentech, Novartis, Northwestern Mutual Life, Financial Times, Hyatt, Home Depot, Harvard Business Library, British Parliament, Battelle, Amdocs, FDA, GAO, World Bank, Dept. of Transportation, etc.  Program Chair – Text Analytics World – March 29-April 1 - SF  Presentations, Articles, White Papers – www.kapsgroup.comwww.kapsgroup.com  Current – Book – Text Analytics: How to Conquer Information Overload, Get Real Value from Social Media, and Add Smart Text to Big Data

4 4 Introduction: Approach  Is Automatic Taxonomy Development Here Yet?  Not Yet  But it is getting closer  Hybrid: – Taxonomists, SME’s, database analysts, text analysts – Text Mining software – basic text analysis – power – Text analytics software – brains  New taxonomy terms & structure – Old = indexing, authors adding tags & keywords – New = auto-tagging, applications

5 5 Information Environment  Existing Taxonomy: Computing Classification System  Content: – Database export of Guide to the Computing Literature bibliographic records (.txt; approximately 7GB in 58 files.) – Statistical distribution of CCS categories across the Digital Library and Guide to Computing Literature (Excel; 4 files) – ACM Digital Library full text files (PDFs and XML metadata, including CCS categories; approximately 170GB in 240,000 files) – Ralston Encyclopedia of Computer Science (PDFs and HTML of each article with XML metadata, including CCS categories; approximately 350MB in 1,850 files)

6 Text Analytics in Taxonomy Development Case Study – Multiple Methods  Text Mining - terms in documents – frequency, date, source, etc. – Text Preparation – Create multiple filters  Quality – important terms, co-occurring terms  Time savings – only feasible way to scan documents  Clustering – suggested categories, chunking for editors – Clustering within clusters - explore  Entity Extraction – people, organizations, programming languages, hardware/devices, etc.  Joint Work Sessions – interactive exploration 6

7 Case Study – Taxonomy Development 7

8 8

9 9

10 10

11 Case Study – Taxonomy Development 11

12 12 Multiple Sets of Reports  Keyword Frequency – First Pass – 3,026 – Total – 508, 941 (Get from Big Database) – Sub-Totals Year Pre-1998, By Year, By 5 year blocks Map to other variables – Journals, Authors – basis for communities  Keywords in Abstract/Title  Cluster analysis of keyword-abstract-title  Search Terms in keyword-abstract-title

13 13 Entity Extraction – Company, Internet, Organization, Title

14 14 Multiple Methods - Reports  Spreadsheets – static reports  Database query reports – Create multiple slices, views, filters  Working reports – eliminate more noise words  Multiple mapping – extractions, author tags &keywords  Map – frequency in abstracts, titles, articles  Search logs – terms and phrases  Date ranges – trend reports – per terms, new words

15 15

16 16

17 17 Conclusions  Auto-taxonomy not here - Yet  Scale requires semi-automated solution  Human effort – initial design, text preparation – Now would add more auto-categorization  Human effort – analysis & refinement – of queries, text mining, and taxonomy  Simple taxonomies are better – part of information ecosystem – Lower levels of terms – into auto-tagging rules  Early 2015: New Book: – Text Analytics: Everything You Need to Know to Conquer Information Overload, Mine Social Media for Real Value, and Turn Big Text Into Big Data – Title might be shorter but it will be cover all you need to know

18 Questions? Tom Reamy tomr@kapsgroup.com KAPS Group Knowledge Architecture Professional Services http://www.kapsgroup.com


Download ppt "Text Analytics A Tool for Taxonomy Development Tom Reamy Chief Knowledge Architect KAPS Group Program Chair – Text Analytics World Knowledge Architecture."

Similar presentations


Ads by Google