Presentation is loading. Please wait.

Presentation is loading. Please wait.

Daniel G. Bobrow Research Fellow Palo Alto Research Center Inc. with Tracy King and Lawrence Lee June 4, 2007 Enhancing Legal Discovery with Linguistic.

Similar presentations


Presentation on theme: "Daniel G. Bobrow Research Fellow Palo Alto Research Center Inc. with Tracy King and Lawrence Lee June 4, 2007 Enhancing Legal Discovery with Linguistic."— Presentation transcript:

1 Daniel G. Bobrow Research Fellow Palo Alto Research Center Inc. with Tracy King and Lawrence Lee June 4, 2007 Enhancing Legal Discovery with Linguistic Processing

2 2 The problems in Legal Discovery  Recall –Nothing relevant left behind  Precision –Very little irrelevant to ignore  Scalability –Need to handle more and more  Privacy –What they see is only what they should get

3 3 Enhancing Legal Discovery with Linguistic Processing Today: negotiated keyword search protocol  All documents discussing or referencing scientific research on the effects of secondhand smoking published prior to 1985. Defendant’s Initial Proposal: “secondhand smok!” and (finding or science or or research) and (1985 or 1984 or 1983 or 1982 or 1981 or 1980 or 197! or 196! or 195!) Plaintiffs’ Rejoinder: ((find! or result! or effect!) w/page (secondhand or “second hand”)) or (other! w/5 smok!)  All documents relating to destruction of records under defendants’ records retention policies and practices. Defendant’s Initial Proposal: “records” and “destruction” Plaintiffs’ Counterproposal: destr! or elim! or dispos! or purg! or recycl! or retain! or reten!

4 4 Enhancing Legal Discovery with Linguistic Processing Linguistic enhancement of keyword queries  Inflexional morphology – forms of verbs –destroy  destroys, destroyed, destroying, … –comply  complies, complied, complying  Derivational morphology – verbs  nouns –destroy  destruction, destroyer,.. –comply  compliance, … –retain  retention, …  Word taxonomy (e.g. WordNet) –result  consequence, effect, outcome, result, event, issue, upshot

5 5 Enhancing Legal Discovery with Linguistic Processing Processing the collection rather than the queries ASKER: A Semantically-indexed Knowledge Repository Inference- sensitive lexical resources Intelligence Source Documents Text Passages Query Query AKR Query index terms Passage, AKR + index terms Filtered answers ASKER Knowledge repository Passages + AKR with semantic index Normalize to AKR Entailment & Contradiction Detection Retrieved passages + AKR ExpandSimplify

6 6 Enhancing Legal Discovery with Linguistic Processing Normalize to Semantic Representation  Syntactic Normalization –morphological: »bought  buy +past –structural: »the file was lost by Mary  Mary lost the file –derivational: »the destruction of the memo by the CEO  the CEO destroyed the memo  Semantic normalization –word to list of WordNet synsets »buy  [buy, purchase, …] [ …] –Connect predicate and arguments »Pred:destroy Agent: CEO Theme: memo –Fill in implicit arguments »Ed was easy to please  Ed was pleased

7 7 Enhancing Legal Discovery with Linguistic Processing Improved Recall (Google and Asker on Wikipedia) Query: How many terrorists have died? Google: In addition to the 19 hijackers, 2973 people died in the terrorist attack... Although there were security alerts at many locations, no other terrorist incidents occurred outside central London. This is a list of sportspeople who have died … Asker: The encounter resulted in the deaths of two terrorists of the Al Omar Tanzeem In blazing gunfire, five of the insurgents perished… “…see to it that those terrorists die and are broken”

8 8 Enhancing Legal Discovery with Linguistic Processing Improved Precision (Using argument roles for relevance test) Query: What terrorists have been killed? Google:.. not include most people killed in big terrorist bombings …act of terrorism in which 93 innocent people have been killed or are missing in the ruins Asker: During a two-hour gun battle in Mdantsane, police kill a terrorist or freedom fighter All the three terrorists killed in this incident have been identified as Pakistani Nationals. … the former Socialist government carried out a covert campaign in which 27 suspected Basque terrorists were killed.

9 9 Enhancing Legal Discovery with Linguistic Processing Scalability ( Cost of doing linguistic processing at scale)  Linguistic processing time: < 1 CPU sec/sentence –parsing, semantic normalization, indexing  Assumptions: –Average collection size: 100 million documents –Document size: 25 sentences –8 core processor -- $6K or $250/month (depreciated and housed for 3 years) –2.5 million seconds month = 100,000 documents/core/month  Cost for handling 100 million documents/month –1000 cores = 125 processors*$250 = $32,000  Use human review: query costs are in the noise

10 10 Enhancing Legal Discovery with Linguistic Processing Privacy  Identify sensitive content by entity type and relationship (linguistic processing) –e.g. Phone numbers of people  Encrypt content to make content unreadable (PARC security technology)  Provide content-specific keys for those people with a need to know specific information  Additional PARC security technologies can identify additional content to be redacted to mitigate inference channels –can redacted information be discovered based on what is available?

11 11 Enhancing Legal Discovery with Linguistic Processing Linguistic processing can be useful in legal discovery Thank you With good Recall, Precision, Scalability, Privacy


Download ppt "Daniel G. Bobrow Research Fellow Palo Alto Research Center Inc. with Tracy King and Lawrence Lee June 4, 2007 Enhancing Legal Discovery with Linguistic."

Similar presentations


Ads by Google