Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2007 IBM Corporation 1 The Challenges of Building Enterprise Content Taxonomies and the Role of Classification Technologies in Maintaining Their Effectiveness.

Similar presentations


Presentation on theme: "© 2007 IBM Corporation 1 The Challenges of Building Enterprise Content Taxonomies and the Role of Classification Technologies in Maintaining Their Effectiveness."— Presentation transcript:

1 © 2007 IBM Corporation 1 The Challenges of Building Enterprise Content Taxonomies and the Role of Classification Technologies in Maintaining Their Effectiveness Reginald J. Twigg, Ph.D. Capture, Classification and Taxonomy, IBM ECM

2 © 2008 IBM Corporation Information Management Software | Enterprise Content Management 2 Agenda The Challenge of Unstructured Content Key Concepts and Terms Taxonomy, Classification and ECM Adoption Classification Technologies for ECM

3 © 2007 IBM Corporation 3 The Challenge of Managing Unstructured Content

4 © 2008 IBM Corporation Information Management Software | Enterprise Content Management 4 80% of Enterprise Data is Unstructured Databases Billing statements Claims images Customer correspondence Mortgage docs Contracts Signed BOLs Healthcare EOBs Marketing collateral Website content Voice authorizations Signature cards Credit enrollments Material Safety Data Sheets ISO 9000 docs Plant schematics Product images Spec sheets ….and much more!

5 © 2008 IBM Corporation Information Management Software | Enterprise Content Management 5 What is Enterprise Content?

6 © 2008 IBM Corporation Information Management Software | Enterprise Content Management 6 Where do I start? Weve got 600 GB of content from basic content services all over the enterprise. How can we get this content efficiently mapped into our ECM taxonomy? Weve been managing our content without classifying it for a few years now. How can our users navigate amongst this existing content in a way thats intuitive for our business? The lawyers have to review 400,000 electronic documents for their case. How can we make sure they dont waste their time? Organizing the explosion of unstructured content becomes critical:

7 © 2008 IBM Corporation Information Management Software | Enterprise Content Management 7 Key Business Drivers Increase worker productivity and automate content related decisions Ad Hoc Category Suggestion Content-Based Workflow Selection Content Based Decision Making In Process Classification Increase accessibility of content under management Automated, High Scale Classification Classify at ingestion and/or re-classify over time Taxonomy Evolution Tools Enhanced Accessibility Taxonomy Proposer ECM Taxonomy and Classification Increase legal discovery review effectiveness while reducing risk Legal Discovery Prioritization and Workflow Assignment Records Classification and Exception Handling Storage and Retention Policy Assignment Compliance, Records, Legal Discovery Reduce inquiry costs, automate message routing and increase customer satisfaction , Chat Routing Agent Response Suggestion Supervision and Monitoring Automatic Customer Response Message Tagging, Classification and Monitoring Business Value of Classification for ECM

8 © 2008 IBM Corporation Information Management Software | Enterprise Content Management 8 Percent of corporate information value managed in traditional databases Data Creation And Demand OLTP and BI (narrow scope) Application Types Compliance, Competitive Intelligence (wide scope) Source: Gartner Unstructured Data Structured Data Ability to Structure Content with Databases

9 © 2008 IBM Corporation Information Management Software | Enterprise Content Management 9 Multiple Repositories Make Access Difficult 36% 14% 25% 17% 1 repository 5% 2-5 repositories 6-10 repositories repositories 4% More than 15 repositories Don't know Base: 81 North American decision-makers (multiple responses accepted) The Future of Content in the Enterprise, Connie Moore and Robert Markham

10 © 2008 IBM Corporation Information Management Software | Enterprise Content Management 10 And Then Theres SharePoint, File Shares and...

11 © 2007 IBM Corporation 11 Key Concepts and Terms

12 © 2008 IBM Corporation Information Management Software | Enterprise Content Management 12 Key Concepts Metadata: a means of describing, locating, cataloging, and activating content as objects in a software ecosystem (literally, data about data). Enterprise Catalog: a centralized and normalized metadata model for unstructured content for the purposes of providing consistent services across all ECM applications. Taxonomy: a hierarchical structure of information components, any part of which can be used to classify a content item in relation to other items in the structure. Classification: a coding of content items as members of a group for the purposes of cataloging them or associating them with a taxonomy.

13 © 2008 IBM Corporation Information Management Software | Enterprise Content Management 13 Taxonomy Is... Not turning animals into trophies A system for organizing the corpus of business content

14 © 2008 IBM Corporation Information Management Software | Enterprise Content Management 14 Taxonomy and Classification in ECM Classification Examples: –Document Classing –Foldering Taxonomy Examples: –Enterprise Content Catalog –Industry Standard Document Taxonomies (ISO, XMI) Methods: –Rules-Based: Applies pre-determined rules for if, then classification of text and properties –Analytics-Based: Applies algorithms to interpret classes in order to apply classification rules to them

15 © 2008 IBM Corporation Information Management Software | Enterprise Content Management 15 ECM Taxonomy Illustrated

16 © 2007 IBM Corporation 16 Taxonomy, Classification and ECM Adoption

17 © 2008 IBM Corporation Information Management Software | Enterprise Content Management 17 Drive New Business Value from Content Content Classification Solutions Improve Content Access Organize Unstructured Content Derive Business Insight

18 © 2008 IBM Corporation Information Management Software | Enterprise Content Management 18 Business Drivers for ECM Taxonomy Management Proliferating departmental solutions –Content Management –Collaboration (SP, Quickr, Team Rooms, Wikis) User-based classification and high workforce turnover –Productivity declines as knowledge disappears –Legal discovery is a secondary concern Mergers and Acquisitions – need to reconcile disparate content management practices, repositories and processes

19 © 2008 IBM Corporation Information Management Software | Enterprise Content Management 19 1 Classification is Hard Work Key Business Challenges ECM Taxonomy and Classification Most organizations face content taxonomy pain – especially as they standardize around ECM –Mapping content to taxonomy during ingestion –Reclassifying content under management –Evolving taxonomies as new types of content emerge –Integrating folksonomies (SharePoint) into a master taxonomy Increase accessibility of content under management Automated, High Scale Classification Classify at ingestion and/or re-classify over time Taxonomy Evolution Tools Enhanced Accessibility Taxonomy Proposer 1

20 © 2008 IBM Corporation Information Management Software | Enterprise Content Management 20 Organization is the Root Cause Most organizations face content taxonomy barriers – especially as they standardize around ECM –Assigning categories en masse –Reclassifying existing content as taxonomies evolve –Merging taxonomies –Integrating the wisdom of folksonomies

21 © 2008 IBM Corporation Information Management Software | Enterprise Content Management 21 Challenges and Impacts of Merging Taxonomies Misclassification – change is constant, and master taxonomies must manage multiple custom taxonomies for each content source Folksonomies from departmental collaboration solutions are created by users and unmanaged by ECM standards Impact: –Unreliable Metadata – Inconsistencies lose or mislabel content –Process Misfires – Poor metadata triggers incorrect events and workflows Scale is the Challenge – Automation is Essential

22 © 2008 IBM Corporation Information Management Software | Enterprise Content Management 22 LOB Systems Evolution Over Time Paper Stand-alone Shared Drive Multiple disparate content repositories Unified RepositoryContent/Process Fusion Active enforcement of Compliance Integrated Content, Process and Compliance capabilities Available as a service Federated Ad-hoc usage No re-use No workflow End-user Driven Undefined Responsibility File system security Level 1 Chaos Ad-hoc usage Fragmented security Little re-use Application based, siloed workflow End-user Driven Loosely defined responsibility No re-use across departments Ad-hoc usage Normalized security Some re-use Directory Server based security End-user Driven Loosely defined responsibility Ability to find content across departments End-user and Process Driven Normalized and federated security Re-use enabled Ability to find and use content across departments Federated Retention Zero-click Policy based Compliance Active Storage And Retrieval End-user and Process Driven Normalized and federated security Re-use enabled Ability to find and use content across departments Level 2 Silos & Storage Level 3 Search & Discovery Level 5 Federated Activation & Policy Based Truly Enterprise Class Systems Classification Barriers to ECM Maturity Level 4 Federation & Activation LOB and Departmental Scale / Scope Technology & Capabilities Maturity Classification hurdle #1 - Ingestion Classification hurdle #2 - Standardization Classification hurdle #3 - Enforcement

23 © 2008 IBM Corporation Information Management Software | Enterprise Content Management 23 Lessons Learned From ERP Adoption Getting Classification Right: Garbage in = garbage out is often used in metadata management projects to describe the problem of building a metadata model on inconsistent sources. Driving Process on Taxonomies: ERP systems depending on 3 master taxonomies – material, vendor and customer. These taxonomies drive events, workflow definition and the development of transaction-centric business process applications Mastering Metadata: The ability to deploy new enterprise applications depends upon the re-usability, scalability and integrity of the metadata model System of Record is Required for Standardization: –Establishes an enterprise standard that can be audited –Forms the foundation for building demonstrable best practices –Enforces consistency of data capture and output

24 © 2008 IBM Corporation Information Management Software | Enterprise Content Management 24 Customer Lessons for Mastering ECM Taxonomies Master taxonomy of record required for –Compliance –Business process applications Merged master taxonomies become large and unwieldy –Multiple taxonomies require integration and translation –Centralized, decentralized, or hybrid? Intelligent Classification increasingly is used to manage: –Taxonomy merging from multiple use cases –Taxonomy/folksonomy translation from distributed content sources

25 © 2007 IBM Corporation 25 A Look at ECM Classification Technologies

26 © 2008 IBM Corporation Information Management Software | Enterprise Content Management 26 State of Classification Management Technologies ECM Classification/Taxonomy is an emerging discipline –Industry standard taxonomies: Focus on business function or transaction types Have not reached the enterprise level –Classification best practices: Content ingestion Application development reclassification Classification software focuses on content ingestion : –Electronic content ( , Office documents, free-form text) –Paper content (document images) requires OCR Search is not enough – must drive value in the business process

27 © 2008 IBM Corporation Information Management Software | Enterprise Content Management 27 Criteria For ECM Classification Management Solutions Integrate with and support the ECM metadata model Interpret a highly-federated content ecosystem Go beyond search to catalog and manage content Build on advanced analytic technologies – rules alone are not enough –Interpret content to extract meaningful (meta)data –Employ multiple methods (engines) for classification –Integrate teaching/learning

28 © 2008 IBM Corporation Information Management Software | Enterprise Content Management 28 Common Platform for Electronic Content Classification Queue Classification and Monitoring In Process Classification ECM Taxonomy and Classification Compliance, Records, Legal Discovery Classification Platform

29 © 2008 IBM Corporation Information Management Software | Enterprise Content Management 29 IBM Classification/Taxonomy Strategy for ECM Enterprise Services for Active Content Classification/Reclassification at Capture –File Shares –SharePoint, Quickr –Federated Repositories Taxonomy Management for –Exposing P8 taxonomies (Enterprise Manager) for classifying enterprise content –Extending taxonomies as enabling services for Content-Centric BPM Applications Establish System of Record for Master Content Management

30 © 2008 IBM Corporation Information Management Software | Enterprise Content Management 30 IBM Classification Module for Electronic Content Organize your ECM content Automated classification and filtering Combines text analytics understanding with rules Acquires domain specificity from your own content Unique learning technology for adaptive classification Suggests new categories or even seeds an entirely new taxonomy Rectifies conflicting taxonomies Market proven, scalable platform

31 © 2008 IBM Corporation Information Management Software | Enterprise Content Management 31 Understanding Content with Text Analytics Matching Categories list and Relevancies (Scores) Classification Engine Classification Engine Corpus (Categorized) The strategic value of this market is paramount to IBM Audit Training (Teach) Feedback C The core market for this new product has been defined as such by IBM A IP is essential A IP is essential A Legal is currently requiring full approval A Legal is currently requiring full approval B Engineering requires clear requirements B Engineering requires clear requirements C Strategy is Important to the marketing team C Strategy is Important to the marketing team C: 97%, B: 54%, A: 12% The strategic value of this market is paramount to IBM

32 © 2008 IBM Corporation Information Management Software | Enterprise Content Management 32 Classification Workflow: Accelerating Content Organization File System Classifier Existing Unclassified Managed Content Classification Review Tool Filter out documents Automatically categorize majority of content Reference: Integration Components Classifier (Runtime Application) Classification Review (UI) Taxonomy Proposer (UI) Content Extractor (training based on P8) Send to taxonomy proposer Basic Content Services

33 © 2008 IBM Corporation Information Management Software | Enterprise Content Management 33 Components of the Solution for Text Classification Classifier –Automatically classifies and filters out documents –Moves some documents for manual review Classification Review Tool –Allows user to manually review documents Content Extractor –Extracts content from the ECM system for training Taxonomy Proposer –User workflow to identify and name new categories or apply existing taxonomy from P8

34 © 2008 IBM Corporation Information Management Software | Enterprise Content Management 34 Classification for Paper Documents Classification of paper documents occurs in capture process Use cases for paper document classification –Recognition using OCR/ICR –Classification to associate to folders or doc class –Separation to reduce costs and improve process

35 © 2008 IBM Corporation Information Management Software | Enterprise Content Management 35 Three Primary Types of Images – The Document Recognition Problem Less Advanced More Advanced Semi- Structured Structured Un-Structured

36 © 2008 IBM Corporation Information Management Software | Enterprise Content Management 36 The Document Separation Problem in Image Capture Separation of documents is a significant expense for a high-volume capture system –Typical structured recognition technologies are not applicable –Manual insertion of separator sheets is the primary workaround today –50% of document preparation labor is spent sorting documents and inserting separator pages – source: TAWPI Where does one document stop and the next begin? Here? Here?Here?Here?

37 © 2008 IBM Corporation Information Management Software | Enterprise Content Management 37 Classification Methods for Paper Content (Images) Image Classification –based on the overall layout and structure of a document –Includes lines, boxes, logos and placement of text Text Classification –based on detailed analysis of the text content of a page Rules-Based Classification –performed by searching for specific data or keywords –independent of layout Templated Classification –determined by the presence of one or more marks, barcodes or items of text in pre-defined locations

38 © 2008 IBM Corporation Information Management Software | Enterprise Content Management 38 Waterfall Approach to Classification and Separation Two-pass system: 1 st pass: Classification –optimizes performance by using fastest classification techniques first –Advanced Text Classification final catch-all Page # Image Classification: N/A ????? Rules Based : N/A ? Text Classification: N/A Barcode Recognition: ??????? 1 ms 20 ms 200 ms 1000 ms First Form X First Form Z First Form Y Last Form X Last Form Z Last Form Y Middle Form X Middle Form Z ?

39 © 2008 IBM Corporation Information Management Software | Enterprise Content Management 39 Why Invest in Automated Classification? Accelerate the time to value in your investment in ECM Free up your subject matter experts Ensure more accurate content catalogs Make your content easier to find and leverage

40 © 2008 IBM Corporation Information Management Software | Enterprise Content Management 40 Summary 1.Accelerate ECM Standardization Poor content classification undermines ECM value – maximize your ECM potential and time-to-value with automated classification 2.Automating Classification Always Pays Typical employees spend 10 hours/week searching for information – slash that time and increase productivity 3.Classification Technologies Automate Classification to Drive Development of Best Practices IBM Classification Module for IBM FileNet P8 Automatically organizing your content by understanding it

41 © 2008 IBM Corporation Information Management Software | Enterprise Content Management 41 Contact Reggie Twigg for more information or to arrange a


Download ppt "© 2007 IBM Corporation 1 The Challenges of Building Enterprise Content Taxonomies and the Role of Classification Technologies in Maintaining Their Effectiveness."

Similar presentations


Ads by Google