Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2008 IBM Corporation ® eMail and Records Management with IBM Classification Module Jon Dellaria, IBM Certified ECM Information Technology Specialist.

Similar presentations


Presentation on theme: "© 2008 IBM Corporation ® eMail and Records Management with IBM Classification Module Jon Dellaria, IBM Certified ECM Information Technology Specialist."— Presentation transcript:

1 © 2008 IBM Corporation ® and Records Management with IBM Classification Module Jon Dellaria, IBM Certified ECM Information Technology Specialist

2 © 2008 IBM Corporation Information Management software | Enterprise Content Management What is Classification? Definition: Class.i.fic.a.tion [klas-uh-fi-key-shuhn] – n – the act of assigning an element (a document for example) to a category.

3 © 2008 IBM Corporation Information Management software | Enterprise Content Management IBM – Leadership in Text Analysis and Classification  IBM has a 50+ year history in text analysis and discovery –As early as 1957, IBM published pioneer research done on text classification (and related topics, such as text search, and automatic creation of text abstracts)  IBM invests ~$50M annually in research and development for search and text analytics –200 people actively engaged in R&D –IBM holds over 200 patents in information access with more each year

4 © 2008 IBM Corporation Information Management software | Enterprise Content Management Options for Implementing the Classification Process Low High Low Cost Savings Productivity Accuracy Manual Classification Authoring Templates Rules Based Classification Context Based Classification Multiple Methods Simple Rules Complex Policies Consistent Participation & Enforcement

5 © 2008 IBM Corporation Information Management software | Enterprise Content Management IBM Classification Module Implementing the classification process in ECM & more  Intelligent applications of policies via automatic, advanced classification  Combines the best automatic methods: context sensitive and rule-based  Flexible automation levels accelerate adoption and acceptance  Incorporates user feedback in real-time to improve understanding  Integrated to IBM ECM architecture or use as a free-standing service  12 languages – and 3 more on the way! ICM

6 © 2008 IBM Corporation Information Management software | Enterprise Content Management Advanced Classification is Key to Compliant Information Management

7 © 2008 IBM Corporation Information Management software | Enterprise Content Management Advanced Classification: The Facts Every manual classification forced on your users will cost your organization 17 cents in productivity Wide-spread adoption of archiving or records management in your organization will lead to large, measurable productivity loss 4 Compliance professionals hold the incorrect assumption that humans are the best option for piece by piece decision-making 3 Results of human-reliant filing are inconsistent and inaccurate, resulting in effective accuracy of 50%, at best 2 Implications Facts Unstructured content makes up 80% of the volume of information in the average enterprise and that segment is growing 30% annually 1 Business users find forced manually classification “burdensome” and at least 50% will not participate Deploying an archiving or records management initiative is increasingly important, large scale and difficult problem 1 Humans provide, at best, marginally better accuracy in executing classification, in controlled tests

8 © 2008 IBM Corporation Information Management software | Enterprise Content Management Critical Dimensions of Classification Cost (per doc) Accuracy Increasing Volume Consistency Manual Automated 92% 50 – 80% $ 0.17 < $ 0.01 <50% 100% X 46%

9 © 2008 IBM Corporation Information Management software | Enterprise Content Management Participation Impacts Accuracy  National Archives and Records Administration Study –Electronic Records Management initiative focused on user driven records declaration –6+ month study –60% drop-off in participation in months after training  End users frequently outright refuse to categorize content  Manual classification and an emphasis on “user training” is outdated, providing inconsistent and inaccurate results Participation in Manual Filing; by Month Inconsistent participation from humans is the critical factor in evaluating different classification methods

10 © 2008 IBM Corporation Information Management software | Enterprise Content Management Manual Classification With paper With rudimentary electronics Today’s advanced electronics

11 © 2008 IBM Corporation Information Management software | Enterprise Content Management Rules-based Classification Simple Rules: Does the body contains the phrase “sure thing”? Did the CFO send the ? Metadata extraction: Does the body of the have anything that matches the pattern “XXX-YY-ZZZZ”? Complex Policies: Does the body contains the phrase “sure thing” in the same sentence as “stock"? Did the sender belongs to the “broker” group and send an externally using the phrase “sure thing” in the body? To: Bob Smith From: Bill Roker Subject: Market Movement Bob, Hope you’re doing well. I’ve got a sure thing going with the stock we spoke about on the phone. I think its time to pull the trigger for my client. The client’s name is John Doe. His social is He’s totally on board and he’s excited to take advantage of this new offer. Talk to you tomorrow, Bill Bill Roker Financial Advisors, Inc. To: Bob Smith From: Bill Roker Subject: Market Movement Bob, Hope you’re doing well. I’ve got a sure thing going with the stock we spoke about on the phone. I think its time to pull the trigger for my client. The client’s name is John Doe. His social is He’s totally on board and he’s excited to take advantage of this new offer. Talk to you tomorrow, Bill Bill Roker Financial Advisors, Inc.

12 © 2008 IBM Corporation Information Management software | Enterprise Content Management Rule-based Classification’s Achilles’ Heel: Rule Maintenance, Accuracy and Cost Time Accuracy Changes in business Effort to adjust rules to new environment

13 © 2008 IBM Corporation Information Management software | Enterprise Content Management Context Sensitive Classification Statistic- Based Categorization Category 1Category 2 Category 3 Unclassified text

14 © 2008 IBM Corporation Information Management software | Enterprise Content Management Context Sensitive Classification Simple rules or keyword based analysis can be too coarse to make fine distinctions between long-form texts with very different intent

15 © 2008 IBM Corporation Information Management software | Enterprise Content Management Choosing the Right Classification Method  Combined approaches provide the maximum accuracy from automation, at a slight productivity cost  Automated methods slash the costs  Manual methods have high costs associated to them  Manual methods suffer from lack of participation, hampering their overall viability Low High Low Cost Savings Productivity Accuracy Manual Classification Authoring Templates Rules Based Classification Context Based Classification Multiple Methods Simple Rules Complex Policies Consistent Participation & Enforcement

16 © 2008 IBM Corporation Information Management software | Enterprise Content Management IBM ECM Records Management Electronic Discovery Advanced Classification Content Collection Enterprise Compliance Vision Integrated Agile ECM Platform for Compliant Information Management

17 © 2008 IBM Corporation Information Management software | Enterprise Content Management Reclassification & Records Management File plan: Legal File plan: Marketing File plan: Finance File plan: Research & Development... Review & Audit Records Management IBM Classification Module ECM Repository

18 © 2008 IBM Corporation Information Management software | Enterprise Content Management 18 US Army and Records Manager Pilot GOAL  Provide a means to address Army’s requirement for the successful records management of –Challenges faced: Lack of records management follow through from end users Need to capture records and transactional activities from Need to capture records without user intervention

19 © 2008 IBM Corporation Information Management software | Enterprise Content Management 19 US Army and Records Manager Pilot Success Criteria for pilot: –Correctly capture and retrieve provided –Ensure information is secure –Determine can be accurately Auto Categorized by the IBM Categorization Module (ICM) Goal of 90% or better accuracy Show how ICM learns and improves accuracy over time –Place categorized record s under correct Army records disposition

20 © 2008 IBM Corporation Information Management software | Enterprise Content Management Army Pilot Concept of Operations (CONOPS)

21 © 2008 IBM Corporation Information Management software | Enterprise Content Management 21 Concept of Operations Tasks Phase IPhase II Phase III Identification of Records Categories  Delivery of.pst files Organization of.pst files to build knowledge base  Ingesting of s – Build Corpus Ingesting of s - Auto Cat Runs Auditing complete

22 © 2008 IBM Corporation Information Management software | Enterprise Content Management Pilot Phases  Pre-Phase Activity –Teach the system by building the knowledge base (Corpus)  Phase I –Process the first run of sample.pst files –Review and Audit the results  Phase II (30 days later) –Process the second run of sample.pst files –Review and Audit the results  Phase III (30 days later) –Process the third run of sample.pst files –Review and Audit the results

23 © 2008 IBM Corporation Information Management software | Enterprise Content Management Knowledge Base (Corpus) Training Record Category: Marketing Record Category: Legal Record Category: Finance Record Category: R&D... Army Records Managers User 1 PST Inboxes Organized User 2 User n ...

24 © 2008 IBM Corporation Information Management software | Enterprise Content Management Outlook Configuration

25 © 2008 IBM Corporation Information Management software | Enterprise Content Management Building the Knowledge Base for Categorization

26 © 2008 IBM Corporation Information Management software | Enterprise Content Management Reports

27 © 2008 IBM Corporation Information Management software | Enterprise Content Management Training Knowledge Base - The Results Raw Data Adjusted Data

28 © 2008 IBM Corporation Information Management software | Enterprise Content Management 28 Pilot Project Pre-Phase Activities Build Categorization Knowledge Base Work with Army Records Managers to define the most appropriate records categories and identify example mails for them  Goal: –Find examples of records for each of the record categories –Find 15 – 20 examples for each category  Results: –54 records categories were identified as being associated with the assigned offices 28 categories have 15 or more examples 26 categories have 14 or less examples

29 © 2008 IBM Corporation Information Management software | Enterprise Content Management Army Pilot Phase I – III Auto Categorization Steps... Review & Audit IBM P8 Manager Records Management Search Engine Archive Record Category: 690 (Personnel) Record Category: 37 (Budget and Resource Management) Record Category: 25-30y (Publication Reports) Record Category: 1hh (Temporary Duty)... Spam and Non Records Retention: 90day IBM Categorization Module.PST Files P8 ‘InBox’ Folder 1 Army Records Manager

30 © 2008 IBM Corporation Information Management software | Enterprise Content Management 30 First Pass of Categorization (process.pst files)  Take the Knowledgebase created by Army Records Managers and apply it to the bulk of  Measure categorization results returned and begin Audit and Review process Audit and Review process  Audit – Used to confirm the accuracy of categorization via a random sampling of categorized results. If necessary, the chosen category may be modified which serves to retrain the knowledgebase for the future  Review – items that do not meet the defined thresholds for categorization are available for further analysis and categorization by records personnel  The result of Audit and Review is improved the accuracy of the knowledgebase therefore improved categorization for future ingest Post Audit/Review reprocessing of to measure categorization improvements  Measure results for the completion of each Phase Pilot Project Phase I – III Activities

31 © 2008 IBM Corporation Information Management software | Enterprise Content Management 31 Pilot Project Activities  Focus on from 16 different offices across Army Demonstrate ability to categorize s across Army enterprise  PST files from 398 pre-selected users 581,634 s in total in Phase I 581,256 s in total in Phase II 735,333 s in total in Phase III 1,898,232 total s through Phase III  PST files transferred to the pilot system via secure connection

32 © 2008 IBM Corporation Information Management software | Enterprise Content Management 32 Phase I Categorization Results Total Categorized 84.5% 98.8 % Total Not Categorized 15.5% 1.2% First PassPost Audit/Review Total Categorized 99.01% 99.9 % Total Not Categorized.9%.1% First PassPost Audit/Review Phase II Categorization Results Total Categorized 98.4% 99.9 % Total Not Categorized 1.6%.1% First PassPost Audit/Review Phase III Categorization Results

33 © 2008 IBM Corporation Information Management software | Enterprise Content Management Army Records Manager Observations  As a records manager with a 25-year background in federal and civilian records management, I believe the automatic categorization of information is the next logical evolution in managing the records of an organization.  The classifier correctly identifies categories of records based on information from office file plans. Since office file plans are incorporated within an agency records manual, the initial input for the system is nominal. The office file plan becomes the document classifier.  Because the classifier retains information on document retrieval activity, it may be appropriate for use in many other information management program areas, including the Freedom of Information and Privacy Act.

34 © 2008 IBM Corporation Information Management software | Enterprise Content Management 34 Demo

35 © 2008 IBM Corporation Information Management software | Enterprise Content Management 35 Thank You

36 © 2008 IBM Corporation Information Management software | Enterprise Content Management IBM Records Manager with Army File Plan


Download ppt "© 2008 IBM Corporation ® eMail and Records Management with IBM Classification Module Jon Dellaria, IBM Certified ECM Information Technology Specialist."

Similar presentations


Ads by Google