Presentation is loading. Please wait.

Presentation is loading. Please wait.

Big Data, Big Records NOVA ARMA NCC-AIIM

Similar presentations

Presentation on theme: "Big Data, Big Records NOVA ARMA NCC-AIIM"— Presentation transcript:

1 Big Data, Big Records NOVA ARMA NCC-AIIM
US. Department of the Interior Office of the Chief Information Officer John Montel Policy Planning and Management February 27, 2013 Carrie Mallen IQ Business Group eDiscovery Practice

2 Department of the Interior
Cabinet level agency 14 Bureau Offices Employ’s ~70,000 / 280,000 volunteers Manages $16.8B operating budget Manages 500 million acres of surface land Manages 479 dams and 348 reservoirs Supplies 30% of the nation's energy production Produce 55,000 different maps each year Protects ~500 million recreational and cultural visitors IT Transformation Goal: 500 Million by 2020 Goal: IT Consolidation Goal: Information Management Goal: Infrastructure Reduction 47 Lines of business 5 Strategic mission areas Mange one 5th the all US land 2,400 office locations 90 Million visitors $18.2 Billion in revenue collected from energy, mineral, grazing, timber, recreation, land sales, $1.7 Billion acres of the Outer Continental Shelf U.S. Department of the Interior

3 U.S. Department of the Interior
IT Transformation Unified Messaging (BisonConnect) Google apps for Government Enterprise Information (eERDMS) Enterprise eArchive System Enterprise Content System Enterprise Forms System Enterprise Dashboard System eERDMS design Objectives XML NIEM Driving factor November 28, 2011 “Managing Government Records” August 24, 2012 “M-12-18” U.S. Department of the Interior

4 eEDRMS Program Vision Provide the Department of the Interior with a single cohesive integrated information management program designed to support and manage departmental records related to , documents, and content in the Cloud eERDMS will provide the Department with an single enterprise records management solution for the capture, preservation and management of electronic and paper-based records. eERDMS will identify, capture, preserve and accession records associated with all inbound and outbound , forms, reports, documents, and other Departmental record assets so that the Department will have effective access and management of all records.

5 eERDMS Program Objectives
Capture all unified messaging journaled records Capture all mobile content records Capture all lines of business records Capture all business system records Develop a super bucket records schedule Develop an online automated litigation hold process Support Freedom of Information Act requests Support litigation early case assessment needs Support Congressional and Department inquiries

6 Program Capabilities Records Management DoD 5015 v3
Records, Document and Archiving/Journaling Records and Document Auto Classification Records and Document Content Management Records and Document Imaging Records and Document Management Records and Document Scanning Records and Document Workflow Records and Document Collaborating Workspaces Records and Document Auditing Records and Document Advanced Early Case Assessment & Review Records and Document Mobility Content Management Section 508 Compliance out of the box Optional: Advanced Legal Review, Social Media Capture, Management, National Shredding Program & National Digitization Program, Migration Services and Support Staff Services. Other areas are still being evaluated such as Fax

7 U.S. Department of the Interior
OMB Directive M-12-18 Requires to the fullest extent possible - eliminate paper and use electronic recordkeeping. Expected benefits: improved performance and promotion of openness and accountability further identification and transfer to the National Archives and Records Administration (NARA) of the permanently valuable historical records minimizing costs and operating more efficiently A driving factor of eERDMS Begins with records Combines RM Systems Low Risk High value U.S. Department of the Interior

8 U.S. Department of the Interior
eERDMS Environment Enterprise Content System Enterprise Forms System ERA NIEM XML Human Resources Enterprise Records System Enterprise Dashboard System Contracts Security Personnel Finance Programs Operations Administration Logistics Enterprise Fax System Enterprise Social System U.S. Department of the Interior

9 U.S. Department of the Interior
Big Data, Big Business 600+ million s a year 70 Million in Jan 2013 100 Million Estimated for February 2013 1.2B s received 15.5M records produced a day 22 Billion data points generated 5,500+ FOIA cases a year 200+ ongoing litigation cases 100+ million printed pages a year 4,100+ mobile devices 15,000 Fax devices Exabyte / Zettabyte of electric content U.S. Department of the Interior

10 Records Management Objectives
Provide the Department with: a single, simplified, integrated Records Retention Schedule for managing Bureau/Office records a Retention Schedule based on Lines of Business shared across Bureaus/Offices a Retention Schedule which reduces the complexity of the existing Schedules to allow for the use of auto-classification tools for assigning retention periods to Department records We are, integrating knowledge for tomorrows workforce CM starts The information John just provided is the background on the creation, management and rollout of various segments or the program . Now we will change the focus to the management of all of this data with the heavy lifting tools of the program, records management, auto-classification and Technology Assisted Review.

11 Starting Point 14 Bureaus/Offices in DOI
Simplified Schedule Traditional Big Bucket 14 Bureaus/Offices in DOI 200 existing Retention Schedules 2,330 retention instructions Some Big Bucket Schedules Some Traditional Schedules Some schedules in draft Some schedules at NARA awaiting approval To organize the vast amount of data we have in the environment; Carol had to reduce the number of retention instructions by simplifying the schedules. We had to reach a level of granularity with the schedules which would allow for Auto Classification but still adequately capture the records.

12 Department Records Schedule (DRS) Strategy
Started with the Existing DOI Retention Schedules Identified the Department’s Lines of Business Created Crosswalks Created Summary Worksheets Drafted Super Bucket Retention Schedules, Ver 1 Entered Super Bucket Retention Schedules, Ver 1 in eERDMS and then……..Auto-Classification This section of my presentation focuses on the super bucket retention schedules that my colleague Carol Brock has created and implemented. I am sure all of you have heard about them and certainly recognize Carol as one of the worlds leading authorities in Digital preservation. In fact, Carol is working on her doctorate in that area when we she is not out here driving this segment of the program.

13 Policy Bucket Controls and Oversight Planning and Budgeting
Litigation and Judicial Activities Regulatory Development 1st Big Bucket

14 Mission Bucket Biological Resources Culture & Heritage
Disaster Management Education Energy Environmental Management Financial Management Geospatial Services Grants & Cooperative Agreements Intelligence Operations Land & Marine Conservation Land Management Planning Land Use Minerals Public Health & Safety Water Water Quality Wildland Fire 2nd big bucket

15 Administrative Bucket
Accounting Administration/Housekeeping Ultra Transitory? Transitory; out of office, Amazon, eBay, twitter, early dismissal, marketplace, Credit Union, Advisory notices, holiday notices, Dept. wide notices Human Resources Information and Technology 3rd Big Bucket ((Ask the audience)). Has anyone ever heard the expression ULTRA BUCKET? UTLRA: system generated content

16 Crosswalks Mapped each schedule item in every schedule to the Department’s Lines of Business Developed crosswalks Vetted crosswalks with Bureaus/Offices Records Officers Some Bureaus/Offices were very involved with the process Schedules Lines of Business Vetted Crosswalks Carol Cross-walked schedule items to the DOI Lines of Business by identifying the best fit of a schedule item to the line of business Often more than one line of business would fit Placed an item as close to the function of its office-of-origin as possible, when there were choices

17 Super-Bucket Former Results
200 schedules / 2330 retention Periods Former 1 schedule / 207 retention periods over 47 LOBs Super-Bucket Superbucket Roadmap: Similar in that they appeared to support the same LOB From the 14 Bureaus/Offices, we took the 200 existing retention schedules with 2,330 retention periods and condensed them down to one retention schedule with 47 items (lines of business) with 207 retention periods.

18 Auto-Classification Definition/How it Works Exemplars/Why
Testing and Refinement Training Implementation Legal Defensibility NOW Lets Focus on auto-classification.

19 Auto-Classification Definition of auto-classification:
Tool that provides automatic identification, classification, retrieval, and archival and disposal capabilities for electronic records Tool that uses a hybrid approach that combines machine learning, rules, and content analytics Tool that uses a rules engine and scans content for words, phrases, tone, etc. to identify semantic relationships to assign records classification and retention periods to content (Open Text) TO me as an eDisovery Litigation preparedness Consultant the most important aspect of AC is THAT IT Removes the classification burden from the users and increases the accuracy of classification!!! Much like document review, human participation is always subjective. Let’s leave Classification and review TO technology. The results are far more accurate as noted in “Legal Trec” and other such controlled studies that have been around as far back as 2006. Automates decisions for assigning retention rules to content Tool that is based on statistically relevant sampling and quality control

20 Auto-Classification Transparency and Defensible… the words that have captured the attention of the bench for good reason. A recent ruling on Rule 26(g) to Control e-Discovery Abuses A federal judge in Baltimore added teeth to Rule 26(g) with an opinion enforcing the mandatory sanctions provision. Branhaven LLC v. Beeftek, Inc. This is the first ruling of its kind that I know of. The ruing isolates subsection 3 which imposes sanctions when 26 g is violated which is improper certification of validity of the Discovery by counsel. This is a big game changer. As you are probably aware, the volume of content we produce and have to manage continues to grow, and the only sane thing to do is to start getting rid of content that has no business value. In order to do this, we need to classify the content. This is where Auto-Classification is very important, because it allows us to classify low-touch content such as legacy content, and social media. Auto-Classification is integrated with OpenText Records Management, meaning that you can apply you existing RM classifications and even make use of existing classified documents. There are a couple of things that make Auto-Classification special: 1 - We have taken the voodoo out of classifying content by providing a TRANSPARENT, step-by steps process that has been created specifically for records managers to help them achieve their desired accuracy and thoroughness. <CLICK> This process is based on workbenches used for identifying exemplar documents and rules, testing and refining effectiveness and quality assurance and sampling against a broader set of documents on an ongoing basis. The Testing and Optimize workbench performs automated testing, provides reports on effectiveness, and provides suggestions on how to improve effectiveness. 2 - Records Managers have told us that they are uncomfortable trusting retention and disposition to a “blackbox” technology. That is why we have built DEFENSIBILITY directly into the solution. Auto-Classification. It includes the a wizard to statistically sample content and a Review Workbench where Records Managers can quickly view documents to accept and reject the classification. This work on a small subset of documents can be used to further refine the accuracy of classification. The ability to demonstrate that you have tested and continue to monitor the effectiveness of Auto-Classification will provide Defensibility and transparency is critical to Auto-Classification, however, it is also good to know what is under the hood. Auto-Classification is powered by OpenText Content Analytics (formerly Nstein) which is a world class semantic analysis and classification engine - which means that tuning the engine will be easier, faster, and better results will be achieved.

21 Auto-Classification Process
System uses exemplars of each file node to train system to recognize patterns, tone, etc. Find “like” (similar) feature used to gather additional exemplars Use exemplars to create a model Precision and recall numbers need to be 75% or better Refine model with additional exemplars over time Auto-classification run on incoming content to assign retention periods. Back to the process of creating a defensible program. So far, AC is un-tested in Federal sector (although it may already be at intelligence… but they are not talking). DOI attorneys are discussing our approaches with DOJ to assure legal defensibility of our processes. AC based on precision and recall. Since assigning record-ness is usually a user function with minimal compliance, normal practice precision and recall are fairly low. Automated AC can raise those numbers about 75% with refinement. We consider this a great success!

22 Hold Options Search-Based Holds User-Based Holds Location-Based Holds
Classification-Based Holds Other Considerations Journaling “Live” Content Content at Risk Onto ECA . ECA to me is actually determining the scope and material facts of the matter in order to gather the appropriate evidence and relevant data. Here are our choices in Search-Based Holds Options in content Server - The standard User-Based Holds - THE option of choice is best when the specifics of a case are not yet know Location-Based Holds - Good when you know where the content is, or if there is mixed content in a location. Also effective for users personal WS Classification-Based Holds - Rare - but potentially en entire class of content needs to be placed on hold

23 Select Users to be on Hold - per Matter
Option for selecting entire results set We can Search for users or groups of users to put on Hold. Select entire results set or individuals, then choose Add…. Copyright © 2010 Open Text Corporation. All rights reserved.

24 User Based Holds In addition to using keywords, folders or RM Classifications to determine what is on Hold, you can choose Users that are on Hold. Each ‘Hold’ can have different users with different time periods to place content on Hold ….. .

25 User Based Holds Date ranges can be applied
Applies a hold to all items Created By, Owned By or have a version added by the users in the specified date range. The last tab in the process shows a status of how many users haven’t been processed yet, either because they have newly been added or removed from the hold, or if the date range criteria has changed. If you enter a date range, then upon processing, only content created, owned by or have a version added by the users in that time frame will be placed on hold. When you upload a document to the system, you are logged as the creator of that content, there may be someone else that is the owner of that content. And if you upload a version to a document that was originally created by someone else, that document will also be put on hold if you are and if it falls within the date range. If you leave the date ranges blank, then all content in the system created, owned by or have a version added by the specified users will be put on hold, and if the To Date is blank, then any content added on a continuing basis by the users will also be put on hold. So this is a good way to set up a hold in perpetuity, or a continual hold. The Comments of the user based hold will be stored in the audit trail for the holds. User Based Holds LPU-1871 US: FR: Place a users content on Hold LPU-2334 MA: Process\update user based holds (apply hold to managed items)

26 Users Can be Removed View users assigned to this Hold, or Remove them. TO me, the targeted hold, custodian interviews and targeted fact gathering pays off in the end. Filter results list by search. See status of hold per user ….

27 More Advanced Search Here are some of the Advanced search options (tick off from list on left)

28 eDiscovery Early Case Assessment
Live exploration Search and explore data before collection and preservation Reduce involvement of IT in collection Only relevant ESI required for hold is automatically collected to central hold repository Further cull and deduplicate prior to export of fully processed ESI Remote collection from disparate enterprise data sources - including ECM Suite Desktops ECA Culling & Deduplication Processing SharePoint File Servers EESSuite EES Suite Finally, we move into the ECA/ AECA portion of the program. (SEE SLIDE POINTS) Predictive Coding addresses the fundamental shortcomings of linear document review by accelerating the review process. Starting with a small number of documents identified by a subject matter expert as a representative “seed set”, Predictive Coding uses machine learning technology to identify and prioritize similar documents across an entire corpus – in the process literally “reviewing” all documents in a corpus, whether 10 megabytes or 10 terabytes. The result? A more thorough, more accurate, more defensible and far more cost-effective document review regardless of corpus size. We have now addressed the lifecycle of the Department of Interior Records, and Transitory content from A-Z Any Review Platform Copyright © Open Text Corporation. All rights reserved.

29 Communication and Outreach
Shared vision and goals up, down, and across the organization Bureau/Office Records Officers Work Group Records Officer Task Force with leadership role Staff dedicated to supporting the effort with the client Pete Denholm is here.- John will introduce him and the Outreach & Comms

30 Thank you John Montel eRecords Service Manager
Service Planning and Management Department of the Interior Office of the Chief Information Officer 1849 C. Street, N.W. Room 7444 Washington, DC T. (202) C. (202) F. (202) E. Carrie Mallen eDiscovery SME IQ Business Group Prime for eEDRMS Department of Interior Room 2012 Washington, DC 20040 C E.

Download ppt "Big Data, Big Records NOVA ARMA NCC-AIIM"

Similar presentations

Ads by Google