Presentation is loading. Please wait.

Presentation is loading. Please wait.

The process of electronic discovery Herbert L. Roitblat, Ph.D. OrcaTec LLC.

Similar presentations


Presentation on theme: "The process of electronic discovery Herbert L. Roitblat, Ph.D. OrcaTec LLC."— Presentation transcript:

1 The process of electronic discovery Herbert L. Roitblat, Ph.D. OrcaTec LLC

2 What is relevance? How do we explain the poor agreement on document relevance? H1: No objective relevance, only subjective (post-modern view) H2: There is relevance, but it is difficult to measure reliably Implications for how we approach machine support for discovery

3 Subjective relevance No inherent wrong decisions Relevance determined by the project manager (e.g., lead attorney) Reviewer’s job is not to be right, but to represent the designated opinion Conundrum: On what basis can one challenge the decision of the leader? Work cannot be replicated

4 Objective relevance Decision making is fallible Key is to discern what document features make a document relevant Limited by lack of world and case knowledge Limited by the flexibility of human language What level of quality is reasonable?

5 MeasurementMeasurement Recall nominally important Recall is difficult to measure Use sampling techniques to estimate Focus on whole project –Right query terms? –Right technology? –Right selection methods? Measuring performance is more important than the technology used "All documents constituting or reflecting discussions about unfair or discriminatory allocations of [Brand X] products or the fear of such unfair or discriminatory allocations."

6 ElusionElusion Proportion of rejected documents that are relevant Sampling for elusion leads directly to an “accept on zero” quality assessment

7 MeasuresMeasures Contingency table Truly relevant Truly irrelevant Total Called relevant ABC Called irrelevant DEF TotalGHI Precision A/C Recall A/G Elusion D/F

8 Documents to review The estimated number of documents to review to achieve specified levels of confidence and maximum acceptable prevalence rates (p s ). Confidence psps 0.9990.9950.990.98 0.000169075529814605039119 0.0005138131059492097823 0.0016905529646033911 0.00513791058919781 0.01688528459390 0.02342263228194

9 Why is this important? Manual review has reached the end of its useful life—volume, volume, volume Need powerful tools to augment human review (AI, statistical, neural networks) Need to be able to assess quality Need to communicate to audience in terms they care about

10 What form will assistance take? Semantic web approach –eDiscovery ontology –Requires adaptation for each case –Several services exploit similar approach Adaptationist approach –Statistical (LSI, Bayes, language modeling) –Neural network –Other machine learning techniques Syntactic

11 Why is this so hard?

12 What properties does language have? Systematicity, atomicity, semantic transparency: Words are independent symbols Compositionality and syntax: Words can be combined according to rich rules “There was a desert wind blowing that night. It was one of those hot, dry, Santa Anas that come down through the mountain passes and curl your hair and make your nerves jump and your skin itch. On nights like that, every booze party ends in a fight. Meek little wives feel the edge of the carving knife and study their husbands' necks. Anything can happen.”—Raymond Chandler

13 Semantic transparency How long did the Hundred-Years War last? Which country makes Panama hats? What is a camel's hair brush made of? The Canary Islands in the Atlantic are named after what animal? 116 years, from 1337 to 1453 Ecuador Squirrel fur Insularia Canaria - Island of the Dogs

14 Real mother I was adopted, I don’t know who my real mother was. I am not a nurturing person, so I don’t think that I could ever be a real mother to anyone. My real mother died when I was an embryo, and I was frozen and later implanted in the womb of a woman who gave birth to me. I had a genetic mother who contributed the egg that was implanted in the womb of my real mother who gave birth to me. Fun with systematicity

15 Systematicity & Atomicity (answer quickly) What is Mr. Baron’s first name? What currency is used in Italy? How many animals of each kind did Moses take on the Ark? What is the nationality of Thomas Edison, inventor of the telephone? In what biblical story was Job swallowed by a whale? What do cows drink?

16 ConclusionConclusion Lawyers need eDiscovery help Any approach that relies on the systematicity of language is limited –Can work with a lot of human input Human review is of unknown accuracy, but high subjective confidence Measurement is essential to evaluating reasonableness


Download ppt "The process of electronic discovery Herbert L. Roitblat, Ph.D. OrcaTec LLC."

Similar presentations


Ads by Google