Presentation is loading. Please wait.

Presentation is loading. Please wait.

Automating Document Review Nathaniel Love CS 244n Final Project Presentation 6/14/2006.

Similar presentations


Presentation on theme: "Automating Document Review Nathaniel Love CS 244n Final Project Presentation 6/14/2006."— Presentation transcript:

1 Automating Document Review Nathaniel Love CS 244n Final Project Presentation 6/14/2006

2 Nathaniel Love Document Review Litigation cases, government investigations Discovery process: Company involved in case is compelled to produce documents (internal memos, financial statements, email) in response to a discovery request. Company doesn’t want to release everything, only those documents that are Responsive to the discovery request, and Not privileged, meaning subject to protection under attorney-client privilege. Company’s attorney must review all documents before they are produced. In a large litigation case, this may be ~500,000 documents.

3 Nathaniel Love Classification Problem 500,000 emails to review Inspection by attorneys at ~100/hr, $275/hr $1.375 million to pay for document review for 1 case Improving this process Each email must be classified as Responsive / non-responsive Privileged / non-privileged As attorneys review, train 2 MaxEnt classifiers Organize documents classified by partially trained classifiers. Present sorted documents to attorneys, with suggested classifications. Run trained classifier on all previously reviewed documents to check errors.

4 Nathaniel Love Feature Selection / Data Emails: sender, recipient, date, words/word pairs in subject, presence/type of attachments… Hand-built features: added based on concepts relevant to discovery request Enron Corpus: solid match for data seen in actual document review process. Test and training data drawn from hand-tagged Enron emails (work done by Berkeley group). Mapped Berkeley categories into responsive/privileged categories based on FERC investigation into Enron (concerning manipulation of energy markets in western U.S.) Issues Small data set overall (1700 documents tagged out of over 600,000 in corpus) Poor data for privilege classifier: tagged documents contain many fewer privileged emails than exist in the corpus overall

5 Nathaniel Love Results Accuracy: 75% (responsive) 93% (privileged) Accuracy improved with more training. Positive feedback from attorneys on use of system, especially on the organization and presentation of documents by classifier as it trains. Weights on features (responsive classifier) david.parquet@enron.com (high positive weight) nicholas.oday@enron.com (high negative weight) David Parquet was Enron ’ s Vice President for project development in the western U.S. Nicholas O ’ Day was Vice President at Enron Japan.


Download ppt "Automating Document Review Nathaniel Love CS 244n Final Project Presentation 6/14/2006."

Similar presentations


Ads by Google