Presentation on theme: "Advanced Decision Support for Archival Processing of Presidential E-Records: Results and Demonstration William Underwood, P.I. Georgia Tech Research Institute."— Presentation transcript:
Advanced Decision Support for Archival Processing of Presidential E-Records: Results and Demonstration William Underwood, P.I. Georgia Tech Research Institute Atlanta, Georgia This research was sponsored by the Army Research Laboratory and NARA under Army Research Office Cooperative Agreement W911NF (Sept 22, 2006-Sept 21, 2009).
Overview Document Type Recognition Metadata Extraction Item Description Speech Act Recognition Decision Support for Archival Review File Format Identification Demonstrations
Document Types, Metadata and Archival Description In responding to FOIA requests, Archivists need to be able to search collections of records with high precision and recall. But at the time of responding to FOIA requests, archivists have not read all of the records, so cannot index the records and search on such attributes as person, organization and location names, topics, dates, authors and addressees names and document types. Archivists cannot describe a collection until the collection has been manually read and reviewed. With increasing volumes of electronic records, it may be decades or even centuries before new acquisitions are described. Item Descriptions are needed in the results of FOIA Search Filename - 3
Method for Recognizing Document Types 1. Document Reader 2. English Tokenizer 3. Wordlist Lookup + enhanced wordlists 4. Sentence Splitter 5. Hepple POS Tagger + lexicon 6. Semantic Tagger + Named Entity Rules 7. Intellectual Element Annotator + Intellectual Element Rules (DER) 8. SUPPLE Parser/Interpreter + Document Type Grammars augmented with Semantics 9. Extract Metadata Filename - 4
Documentary Form: Intellectual Element Recognition Filename - 5
Filename - 6 Grammar for Documentary Form of a Memorandum
Parse Tree and Semantics of the Document Filename - 7
Extracted Metadata and Item Description in Manifest DOCTYPE = White House Memorandum DATE = April 27, 1992 AUTHOR = EDE HOLIDAY ADDRESSEE = SAM SKINNER TOPIC = California Earthquake DESCRIPTION = Memorandum dated April 27, 1992 from EDE HOLIDAY to SAM SKINNER regarding California Earthquake
Speech Acts and Record Description Actions are a part of item descriptions Signature Memorandum from Boyden Gray to the President recommending the nomination of Ronald B. Leighton to be a US District Judge. Letter from President Bush to President Mikhail Gorbachev suggesting an informal meeting. Memorandum from President Bush to Boyden Gray requesting an analysis of the War Powers Resolution. Letter from Susan Black to President Bush expressing appreciation for nomination and commitment to serve.
Speech Acts and Archival Review Archival review in response to FOIA requests requires recognition of the actions expressed in records Presidential Records Act restriction on disclosure a(5) Confidential Advice "confidential communications requesting or submitting advice, between the President and his advisors, or between his advisors Example of action expressing confidential advice: I further recommend that the President look for opportunities to speak at an appropriate event indicating his knowledge of and interest in this issue, …
Explicit & Implicit Speech Acts Every complete sentence carries out a speech act. Performative sentences express explicit speech acts. A performative verb is a verb whose action is accomplished merely by saying it or writing it. I recommend that you attend the conference. Declarative, imperative and interrogative sentences express implicit speech acts. Declarative (state) You completed the report Imperative (request) Please, complete the report. Interrogative (ask) Did you complete the report?
A Method for Recognizing Speech Acts in E-Records Input: Textual Document & metadata from the Manifest 1. Read author and addressee metadata from the manifest 2. Information extraction 3. Parse Sentences in the document 4. Speech Act Transducer Annotate Explicit Speech Acts Annotate Implicit Speech Acts Annotate Speech Acts Indicated by Text Structure Annotate Indirect Speech Acts Annotation of the Primary Speech Acts Output: [document(e1), author(e1, S), addressee(e1, H), act(e1 F(P))]
Decision Support for Archival Review FOIA (and systematic) review of Presidential records for PRA and FOIA restrictions on disclosure requires page-by page review of the records Due to the increasing volume of records, in all braches of Government, and especially EOP, decision support is needed to assist archivists in review.
Potential Benefits of Archival Review Assistant Reducing the risk of opening a document or passage of a record whose access should be restricted, A tutoring tool during training of review archivists. A tool that novice reviewers could use to check their work. Provision of additional evidence in case a reviewer's judgment was uncertain, or point out uncertainties, where the reviewer thought the decision was certain. Support estimation of FOIA review workload in terms of the number of restrictions and types of restrictions likely to apply. Support reviews of Federal Records for FOIA exemptions. Extension of the technology to support declassification of security classified records.
Components of an Archival Review Assistant
File Format Identification A capability to identify file formats is needed by ERA for Insuring compliance with Record Transmittal Agreement Viewing/playing files Conversion to current or standard file formats archive extraction Password recovery and decryption Repair of damaged files
Linux File Command & Magic File
Extensions of File Command and Magic File Magic for individual file formats Output of file command/magic file is File Format ID Rewriting file command code for identifying Characteristics of Text files and Document Types Defined approx. 800 file format signatures Collected examples of approx. 500 of the file format types Created File Signature Database Verified that File Format Identifier with magic file correctly identifies approx. 500 File Types
Demonstrations Demonstrations 1. Document Type Recognition, Metadata Extraction & Item Description 2. Automatic Recognition and Interpretation of Performative Sentences 3. Decision Support for Archival Review 4. File Format Library & File Format Identifier
Additional Information 1. W. Underwood et al. Advanced Decision Support for Archival Processing of Presidential E-records, TR ITTL/CSITD 09-01, Georgia Tech Research Institute, Sept W. Underwood & S. Laib. Automatic Recognition of Documentary Forms, Technical Report ITTL/CSITD 08-02, GTRI, May W. Underwood. Recognizing Speech Acts in Presidential E-records, TR ITTL/CDITD 08-03, GTRI, Oct 2008