Presentation is loading. Please wait.

Presentation is loading. Please wait.

Electronic Record Challenges

Similar presentations

Presentation on theme: "Electronic Record Challenges"— Presentation transcript:

1 Automated Records Management DocMan Electronic Records Management System (eRMS)

2 Electronic Record Challenges
Many agencies still have ineffective “print and file” policies in place that do not account for different types of media. poses significant challenges to meeting federal record keeping obligations.

3 Electronic Record Challenges
Records are typically stored on servers, public drives and on the end user's local drives. The result is replicated and redundant files stored across multiple locations. Without the ability to organize and categorize this massive amount of information, IT departments cannot delete any electronic files for fear of legal or regulatory repercussions. Lessons learned – user tagging does not work.

4 Presidential Memorandum
President Barack Obama signed the Memorandum on November 28, 2011 and said, “The current federal records management system is based on an outdated approach involving paper and filing cabinets. Today’s action will move the process into the digital age so the American public can have access to clear and accurate information about the decisions and actions of the Federal Government.”

5 NARA and OMB Directive Managing Government Records Directive
Directive creates a records management framework to achieve the benefits outlined in the Presidential Memorandum. Directive requires that to the fullest extent possible, agencies eliminate paper and use electronic recordkeeping. By 2019, Federal agencies must manage all permanent electronic records in an electronic format. By 2016, Federal agencies must manage both permanent and temporary records in an accessible electronic format.

6 eRMS Solution – 3 Components
Content Management Records Repository Automatic Categorization Watch the Process

7 Content Management

8 Records Repository

9 Automatic Categorization
Automated Rule-based Training Categorization Content Tagged Content Travel Budget

10 Component #1 Content Management
SharePoint 2010 – Utilize Exchange 2010 Journaling SharePoint Sites and Content Shared Drives (In process) Local User Drives – Migrate content to SAN (Q1 2013)

11 Component #2 Electronic Records Repository
Electronic s and files are automatically routed to the SharePoint Records Center and categorized by Recommind. Simple for Records Managers to learn and use. Records Managers can configure multi-phase disposition schedules.

12 Component #3 Automatic Categorization
Allows thousands of newly-created electronic records to be classified daily.  Ensures that -based information is properly tagged and categorized with no impact on busy professionals.  Proactively reduces the costs and risks associated with eDiscovery and compliance by reducing the amount of information being stored.

13 Decisiv Categorization
Machine learning offers the highest potential for automatic categorization accuracy (as more records are added the accuracy increases). Recommind uses a patented algorithm known as Probabilistic Latent Semantic Analysis (PLSA). Decisiv identifies and structures relevant concepts and topics within record training sets. Identifies duplicates and near duplicates. Brains behind the solution. Uses both machine-based learning and rules-based categorization techniques to organize huge quantities of data automatically. System results are directly related to the level of effort put into it. Must create quality training sets!

14 Categorization Training Process
PHASE I Record Collection PHASE II Record Training PHASE III Ongoing Training and Testing PHASE I Pilot Groups Program Offices are setup to have their records automatically submitted. Records Managers review new records to identify training documents and create a taxonomy based on the DOE Disposition Schedules. PHASE II Taxonomy is created in Categorizer and training sets are crawled. Rules are created for categories when applicable. Accuracy and categorization testing. PHASE III Production Ongoing training and testing as new records are identified through sampling.

15 Categorization Testing
CATEGORIZATION RATE Percentage of total files categorized. Monitor production system. CATEGORIZATION ACCURACY Precision (number of “Budget” records in “Budget” category) Recall (number of “Budget” records incorrectly sent to other categories) Controlled test groups on sandbox.

16 Email Categorization Accuracy
Accuracy Average 87% Rule-Based Accuracy testing is used to identify categories that need additional training and to monitor the results of ongoing re-training efforts.

17 Email Categorization Rate


19 2012 Categorized Email 6.24 M Total Categorized Volume 1.5 M Records
Non-Records & Transitory Total yearly volume: 8.1 M Uncategorized volume: 1.62 M Uncategorized records: 113,400

20 Shared Drive Categorization
Over 1 million files (1.6 TB). During first phase of training, 300,000 records were categorized and audited. Average accuracy was 82%. Legacy data is the biggest challenge. Phase I and II

21 Friday, March 4, 2011 Armies of Expensive Lawyers, Replaced by Cheaper Software When five television studios became entangled in a Justice Department antitrust lawsuit against CBS, the cost was immense. As part of the obscure task of “discovery” — providing documents relevant to a lawsuit — the studios examined six million documents at a cost of more than $2.2 million, much of it to pay for a platoon of lawyers and paralegals who worked for months at high hourly rates But that was in Now, thanks to advances in artificial intelligence, “e-discovery” software can analyze documents in a fraction of the time for a fraction of the cost. In January, for example, Blackstone Discovery of Palo Alto, Calif., helped analyze 1.5 million documents for less than $100,000. Some programs go beyond just finding documents with relevant terms at computer speeds. They can extract relevant concepts — like documents relevant to social protest in the Middle East — even in the absence of specific terms, and deduce patterns of behavior that would have eluded lawyers examining millions of documents. “From a legal staffing viewpoint, it means that a lot of people who used to be allocated to conduct document review are no longer able to be billed out,” said Bill Herr, who as a lawyer at a major chemical company used to muster auditoriums of lawyers to read documents for weeks on end. “People get bored, people get headaches. Computers don’t.” Computers are getting better at mimicking human reasoning — as viewers of “Jeopardy!” found out when they saw Watson beat its human opponents — and they are claiming work once done by people in high-paying professions. The number of computer chip designers, for example, has largely stagnated because powerful software programs replace the work once done by legions of logic designers and draftsmen. Software is also making its way into tasks that were the exclusive province of human decision makers, like loan and mortgage officers and tax accountants. These new forms of automation have renewed the debate over the economic consequences of technological progress. David H. Autor, an economics professor at theMassachusetts Institute of Technology, says the United States economy is being “hollowed out.” New jobs, he says, are coming at the bottom of the economic pyramid, jobs in the middle are being lost to automation and outsourcing, and now job growth at the top is slowing because of automation. “There is no reason to think that technology creates unemployment,” Professor Autor said. By John Markoff “The computers seem to be good at their new jobs. Mr. Herr, the former chemical company lawyer, used e-discovery software to reanalyze work his company’s lawyers did in the 1980s and ’90s. His human colleagues had been only 60 percent accurate, he found.” In 1978 the Justice Department filed antitrust lawsuit against 5 CBS studios. During the “discovery” process, the studios examined six million documents at a cost of more than $2.2 million. And what was their accuracy?? Now, thanks to advances in artificial intelligence, “e-discovery” software can analyze documents in a fraction of the time for a fraction of the cost.

22 Current Status 1,054 mailboxes (850 users plus system accounts) at headquarters are submitting through the system. Up to 40,000 messages are categorized daily. Shared drive categorization is currently in progress.

23 Next Steps Migrate local drives to SAN for categorization.
Use Recommind Axcelerate for FOIAs and e-Discovery. Implement additional Recommind modules: File collection from multiple data sources. Convert paper records to electronic files (OCR scanning) for categorization. Categorize legacy on Exchange servers (2 TBs).

24 Next Steps - Governance
Official records are maintained in the SharePoint Records Center. Users may search Records Center. stored on Exchange servers will be deleted after three years. Mailbox size to be limited. Shared Drives and SharePoint Sites Records on the shared drives will be transferred to the Records Center and categorized. SharePoint site content will be automatically routed to the Records Center and categorized.

25 Questions? U.S. Department of Energy Steve VonVital, CIO
(Office of Energy Efficiency and Renewable Energy) Steve VonVital, CIO

Download ppt "Electronic Record Challenges"

Similar presentations

Ads by Google