The Search & Access ERA Instance for G. W. Bush Electronic Presidential Records
What Does the Base ERA Do? Focus: Functions: Federal Records Nationwide records management program National Archives Creation, review and approval of records schedules Manage transfer of physical and legal custody of all types of records Systematically collect, create, and manage lifecycle data about records Actual transfer, inspection, and archival storage of electronic records
What Does the Search & Access ERA Do? Focus: Functions: Presidential Electronic Records George W. Bush Presidential Library Rapid ingest of very large volumes of electronic records Automatic indexing on ingest Immediate searchability, based on index Creation of different versions to support structured search of priority records Basic case management for review and redaction of sensitive content.
Search and Access Instance Development Achieved Initial Operating Capability December 8, 2008 LMC proposed and received NARA and EOP agreement on an expedited method for transfer of electronic records. NARA has enjoyed excellent collaboration from the EOP. NARA implemented a contingency plan for access to high priority e-records, the finding aid for WH paper records and the database of digital photography, pending completion of processing into ERA.
1/26 EOP Transfer & Ingest Overview ARMS (PRA) = 1.9 TB PDS = 0.0005 TB WARDS =.018 TB SAN B 1 PDS (delta) = 0.0005 TB WARDS (delta) = 0.001 TB SAN A Exchange 12/5 (IOC) 12/8 ARMS (SAN) 12/15 PDSWARDS 1/15 RMS 1/30 Merlin One = 36 TB Non-Pri Types = 20TB RMS = 1.0 TB 6.0 Storage Arrays 7.1 7.2 ARMS (PRA) PDS WARDS PDS (delta) WARDS (delta) Merlin One 1/20 Data Type SW Drops SASS Operations (Ingest) 7.0 SAN B Returns 12/12 Merlin One January 15, 2015 RMS 2/11 Snap Server RMS (Update) 11 SAN A 2 Merlin One 2 = 36 TB Exchange Non-Pri Types = 0.2 TB SAN B 2 Exchange = 57 TB ARMS (FRA) = 5.1 TB ? 5/16?
G.W. Bush Presidential Electronic Records Records Number of objects Gigabytes of Data Shipped to ERA Data Center Status Priority Records Email (2000-2003)44,815,1841,68812/8/2008>99% available for search in ERA. There are technical problems with the remaining messages. MS Exchange email (2003-2008) 150,000,000 estimated 16,500 estim ated Expected mid May In temporary storage. Conversion to standard format, separation from federal records, and identification of responsible EOP component largely complete. Presidential Diary682,193112/8/2008 and 1/26/2009 100% available for search in ERA digital photography11,220,04431,0001/26/2009Problems require shipment of a second set, expected in mid May Index to White House paper records 313,8505831/26/2009100% available for search in ERA, but about 6% of the records appear to be missing some pieces of data. Visitor and worker access to EOP buildings 28,922,9881412/8/2008 and 1/26/2009 100% available for search in ERA Index to motion video30551/26/2009In ERA, being processed Email from WH Counsel572,0511,0571/26/2009In ERA, being processed Other Records>12,000,000>5,450Partial shipment 1/26/2009 Some in ERA, being processed. Remainder expected mid May
Processing Status - 1 All Bush e-records have been transferred to NARA’s custody. Not all have been transferred to the ERA Data Center in WV. EOP is maintaining copies until NARA successfully completes ingest. Archives Operational Issues Several sets of records were not transferred in the formats previously agreed by NARA and EOP o NARA required retransmission Some records exhibited anomalies o Some ARMS email records had binary data in the “To” field o Some metadata in the digital photography system did not have corresponding images. o Some entries in the Records Management System are missing some fields. o MS Exchange email was not divided presidential from federal records or associated with EOP component, and contained numerous duplicates. EOP is addressing these problems prior to transfer to ABL. EOP has converted from proprietary to standard format. NARA will preserve both the original files and the output of the EOP processing. o Encoding of date of birth in the Access system impeded searches on that field. Viruses have been found in a small percentage of files. o Infected files have been successfully quarantined. LMC & NARA are working to produce clean copies.
Processing Status - 2 Technical Issues Issues with COTS products: o Automatic indexing of a batch of records stops when errors are found in any of the records; e.g., binary data in headers of email. o Erroneous results returned in certain conditions o Incomplete search results returned in other cases. o LMC underestimated storage space needed for the index. Additional hardware has been ordered. Unanticipated software development needed to ensure complete and accurate mapping between ‘.eml’ email produced by the EOP and the original MS ‘.pst’ files NARA directed LMC to hire a subcontractor to perform actual ingest of records.
Status of Requests for Bush Records 28 Requests for access as of March 17, 2009 Primarily for paper records NARA has responded using data about the paper records in the Records Management System A few requests were for digital photographs. Most requests were addressed using the two systems NARA set up under the Contingency Plan because processing of the records had not been completed at the time the requests were received. Three requests fulfilled using records on temporary ERA storage.
What’s in Store for the Future? Increment 2 Preservation Framework o Introduction and use of a variety of tools for different preservation needs Public access o Information about all types of records o Online access to electronic records Initial system evolution Increments 3 - 5 Incremental enhancements in capability & capacity Continuing system evolution Governmentwide expansion Full Lifecycle Management Plans Appraisal case management and workflow Search Framework supporting different tools FOIA and other access case management Review and redaction of sensitive content
Shared Services ERA Functional View: Current Status System Management Help Desk Network Base Instance EOP Instance White HouseAgencies Enterprise Service Bus Data Management
Shared Services ERA Functional View: Planned System Management Preservation Framework Public Access Help Desk Network Base Instance EOP Instance White House Congressional Instance Committees Records Center Instance Agencies Public Enterprise Service Bus Current capability: solid fill Future capability: hashed fill Data Management
ERA Instances Base Instance (June 2008) Used by NARA and federal agencies For management of all federal records For transfer, inspection and management of federal electronic records EOP instance (December 2008) Used by NARA and Presidential Administrations For transfer, inspection, and management of presidential electronic records Congressional Instance (future) Used by NARA for Congressional Committees For transfer, inspection, and management of presidential electronic records Federal Records Center Instance (future) Used by NARA and other federal agencies For transfer and storage of temporary and permanent federal electronic records that remain under the control of the originating agency
ERA Shared Services System Management (current) System operation and maintenance Security User account management Deployment of new & updated software Backup & other common services Help Desk (current) Respond to technical questions and issues from users Network Link to the Internet, NARANET (current) Interfaces with other systems (future) Data Management Data about records and transactions related to them (current) Description of NARA holdings (Increment 2) Review and redaction of records with restricted content (future) Preservation Framework (Increment 2) Tools to overcome obsolescence of different digital formats (future) Public Access (Inc. 2 +) Search and retrieval of information about records, regardless of custody Search and access to electronic records in NARA’s custody Search and access to digitized records from NARA’s holdings Freedom of Information Act for restricted records in NARA’s custody
Advantages of the Instances & Shared Services Approach Instances enable different business rules and processes for different mission requirements: Base Instance: Federal Records Act provisions on governmentwide records management and on the National Archives EOP instance: Presidential Records Act Congressional instance: House and Senate rules. Federal Records Center Instance: Federal Records Act provisions on storage of temporary and permanent records under originating agencies’ authority.
Advantages of the Instances & Shared Services Approach Shared services maximize utilization of resources, reduce redundancy and provide a stable foundation for system growth and evolution over time. Shared services deliver capabilities and capacity wherever needed, regardless of differences in mission and business needs E.g. the Preservation Framework can be used to preserve any electronic records, regardless of whether they came from Congress, the White House or a federal agency. E.g., a citizen seeking access to information will be able to find it using a single web portal, regardless of whether o It is information about records or in the records, o the records are in NARA’s physical custody, o the records are electronic or hard copy, o they originated in the White House, Congress or an agency.
Preservation Electronic Record 2 Preservation Framework Record Identity Record Integrity Original Order Tool 1 Tool 2 Tool n … The Preservation Framework supports the introduction and use of an arbitrary number and variety of processes under the control of archival requirements for authenticity. Electronic Record n Electronic Record 1 Electronic Record 2’ Electronic Recordn’ Electronic Record 1’ …
Public Access Information about all records From Records Schedules Archival Descriptions Other NARA information Online access to electronic records Online access to scanned versions of hard copy records Requests for copies of records Freedom of Information Act requests for restricted records Assistance from NARA staff
Increment 3 Work Status Authority to Proceed Issued for Early Analysis Architectural Framework Preservation examination and prototyping Search Engine examination and selection Open Access examination and selection Enhancements to address authorized user defined changes and software defects not addressed at IOC Discussions begun on scope of work and technical details for full proposal Target date for award: 7/09
Governmentwide Expansion Initial Implementation June 2008 – June 2009 Four collaborating agencies NARA staff proxy for other agencies Invitational Phase June 2009 – February 2010 Additional agencies by invitation Voluntary Phase February 2010 – December 2010 Additional agencies who volunteer and meet critera Mandatory Phase January 2011 All agencies
The Development Timeline Full Operating Capability Initial Operating Capability) 6/08 Operation & Maintenance 9/059/069/079/089/099/109/11 Search & Access ERA Public Access & Preservation Framework Enhancement ERA Base