Presentation is loading. Please wait.

Presentation is loading. Please wait.

Scientists See Promise in Deep-Learning Programs Microsoft Seeks an Edge in Analyzing Big Data Jeff Hawkins Develops a Brainy Big Data Company Google Offers.

Similar presentations


Presentation on theme: "Scientists See Promise in Deep-Learning Programs Microsoft Seeks an Edge in Analyzing Big Data Jeff Hawkins Develops a Brainy Big Data Company Google Offers."— Presentation transcript:

1 Scientists See Promise in Deep-Learning Programs Microsoft Seeks an Edge in Analyzing Big Data Jeff Hawkins Develops a Brainy Big Data Company Google Offers Big-Data Analytics The Age of Big Data How Big Data Became So Big Why Hire a Lawyer? Computers Are Cheaper Armies of Expensive Lawyers, Replaced by Cheaper Software

2 The total amount of digital data in the world is estimated to exceed 1.8 Zettabytes (1.8 TRILLION Gigabytes)) The digital universe is doubling every 2 years 85% of that data is owned or controlled by corporations at some point in its lifecycle Source: International Data Corporation (IDC) Study, 2012

3 Big Data is Here And it’s coming soon to a litigation near you… What’s changed?

4 The Great Commingling

5 Redefining scalability in eDiscovery. 1 1000 1 X 10 12

6 Predictive Coding is a Form of Machine Learning What is Machine Learning?

7 voice recognition software, e.g., calling your bank or credit card company handwriting, facial or fingerprint recognition analyzing market trends and guiding investment decisions making decisions on applications for credit or loans modeling and predicting severe weather patterns filtering spam in your email inbox targeted marketing on the internet robotics It’s already a part of our lives...

8 KEY POINT: Predictive coding is just a part of a continuum of technology assisted review (TAR) methods that we are already very familiar with in searching and analyzing data. Key Words Concept Clustering Concept Search Predictive Coding Three supporting propositions: 1.Each successive approach incorporates the preceding approaches. 2.Each successive approach contains more supporting criteria. 3.All are ultimately based on the concept of pattern matching.

9 Key Words = Simple pattern matching External input: “wild,” “wolf,” “pet” dog cat rhino ferret goldfish cow wolf domestic wild pet

10 Concept Clustering = Organization based on internal relationships dog cat domesticated wild pet rhino ferret goldfish cow wolf tiger dog cat domesticated wild pet rhino ferret goldfish cow wolf tiger 01110111011010010110110001100100 (wild) 011001000110111101100111 (dog) 011100000110010101110100 (pet)

11 Concept Searching dog cat rhino ferret goldfish cow wolf domestic wild pet dog cat rhino ferret goldfish cow wolf domesticated wild pet tiger = Key words + Concept organization External input: “zoo,” wild,” “domesticated” farm zoo 01111010011011110110111 (zoo) 01110111011010010110110001100100 (wild) 01100100011011110110110101100101011100 11011101000110100101100011011000010111 01000110010101100100 (domesticated)

12 Predictive Coding dog cat rhino ferret goldfish cow wolf domestic wild pet dog cat rhino ferret goldfish cow wolf domesticated wild pet tiger = document-level input + probabilistic modeling farm zoo external input: human-coded documents output: doc-level probability rankings 01111010011011110110111 (zoo) 01110111011010010110110001100100 (wild) 01100100011011110110110101100101011100 11011101000110100101100011011000010111 01000110010101100100 (domesticated)

13 Infer Step 1. sample documents from entire set.

14 Step 2: attorney review of sample documents to create training and control set. In the European mind, wolves long stood as a symbol of baneful, uncontrollable nature. As far back as the time of Aesop in 500 BCE (Before the Common Era), wolves in literature are portrayed as wicked villains and long- fanged, terrible beasts. Before the Middle Ages, wolves were nearly always the greedy thief, criminal trickster, or cruel remorseless murderer. The wolf does not fare well in the European imagination. Can the wolf be domesticated? The domesticated dog is descended from the wolf found in the wild. While some people have occasionally attempted to raise wolves as pets, their 2 ½ inch fangs and tendency to eat nearby small animals such as cats can create socially awkward situations with neighbors. Responsive Not Responsive

15 Step 3: create model from human coded training set (responsive and not responsive). In the European mind, wolves long stood as a symbol of baneful, uncontrollable nature. As far back as the time of Aesop in 500 BCE (Before the Common Era), wolves in literature are portrayed as wicked villains and long-fanged, terrible beasts. Before the Middle Ages, wolves were nearly always the greedy thief, criminal trickster, or cruel remorseless murderer. The wolf does not fare well in the European imagination. Can the wolf be domesticated? The domesticated dog is descended from the wolf found in the wild. While some people have occasionally attempted to raise wolves as pets, their 2 ½ inch fangs and tendency to eat nearby small animals such as cats can create socially awkward situations with neighbors. Can the wolf be domesticated? The domesticated dog is descended from the wolf found in the wild. While some people have occasionally attempted to raise wolves as pets, their 2 ½ inch fangs and tendency to eat nearby small animals such as cats can create socially awkward situations with neighbors. wolves wolf pet WordPos.Neg. wolf.98.08 dog.56.43 pet.42.28 raise.61.09 costner dances WordAssoc% wolfpet.73 dogwolf.43 petraise..88 raisewolf.61 raise werewolf 01100100 01101111 01100111

16 Step 4: test model against sample (human coded) set. "Dances With Wolves" has the makings of a great work, one that recalls a variety of literary antecedents, everything from "Robinson Crusoe" and "Walden" to "Tarzan of the Apes." Michael Blake's screenplay touches both on man alone in nature and on the 19th- century white man's assuming his burden among the less privileged. Wolves are sometimes kept as exotic pets, and in some rarer occasions, as working animals. Although closely related to dogs (which are believed to have split from wolves between 10,000 and 100,000 years ago), wolves do not show the same tractability as dogs in living alongside humans. Wolves also need much more space than dogs, about 10- 15 sq. miles.

17 Yes No Apply model to remainder of documents that have not been reviewed Responsive Non-responsive

18 Step 5: Apply model to entire set and rank documents. 100 % 90% 80% 70% 60% 50% 40% 30% 20% 10% 0%

19 PREDICTIVE CODING AND BIG DATA NYLJ/Pangea3 Webinar April 15, 2013

20 OUTLINE 1.Mitigating Big Data in E-Discovery 2.Stakeholder Analysis 3.The New Reality of Predictive Coding 4.Long-Term Trends

21 MITIGATING BIG DATA IN E- DISCOVERY Predictive Coding and Big Data

22 BIG DATA IN E-DISCOVERY Bigger haystack—more documents in general Corporate data culture—more relevant documents More sources—poses collection/preservation challenges

23 MITIGATING BIG DATA IN E-DISCOVERY Some mitigating factors: Principles of proportionality and cooperation Information governance tools and document management Technology-assisted review and predictive coding

24 STAKEHOLDER ANALYSIS Predictive Coding and Big Data

25 PREDICTIVE CODING STAKEHOLDER ANALYSIS Judges: generally receptive Clients: cost efficiencies vs. risk management Lawyers: new model, building expertise

26 THE NEW REALITY OF PREDICTIVE CODING Predictive Coding and Big Data

27 NEW REALITY OF PREDICTIVE CODING Reduced Data Volumes Increased Complexity and Density Focused, High-Stakes Human Review Battle of Expertise Predictive Coding

28 LONG-TERM TRENDS Predictive Coding and Big Data

29 LONG-TERM TRENDS Over time, Big Data growth > predictive coding benefits Some document-by-document human review necessary Strategic nuances in a new discovery battleground

30 NEW YORK Pangea3 LLC 530 5th Avenue, 7th FL New York, NY 10036 Tel. (US Main): +1-212-689-3819 Fax: +1-212-820-9784 MUMBAI Pangea3 Legal Database Systems Pvt. Ltd. 102-B, Ground Floor, Leela Business Park Andheri-Kurla Road Andheri East, Mumbai 400 059, India U.S. Line:+1-877-311-8528 Tel.:+91-22-6191-7500 Fax:+91-22-6191-7600 DALLAS Pangea3 LLC 2395 Midway Road Carrollton, TX 75006 Tel. (US Main): +1-212-689-3819 Fax: +1-212-820-9784 DELHI Pangea3 Legal Database Systems Pvt. Ltd. B-23, Sector 58 Noida UP 20 301, India U.S. Line: +1-877-311-8528 Tel: +91-120-425-5210/14/16 Fax: +212-820-9783 CONTACT PANGEA3

31 SEARCH (1) How do we search for discoverable ESI? Manually? Manually? With automated assistance? With automated assistance? Which is“better” and why? Which is“better” and why? – M.R. Grossman & G.V. Cormack, “The Grossman-Cormack Glossary of Technology-Assisted Review,” 7 Fed. Cts. Law R. 1 (2013) – Maura R. Grossman & Gordon V. Cormack, “Technologically- Assisted Review in E-Discovery Can Be More Effective and More Efficient than Exhaustive Manual Review,” XVII Rich. J.L. & Tech. 11 (2011) (available at http://jolt.richmond.edu/v17i3/article11.pdf) http://jolt.richmond.edu/v17i3/article11.pdf – For a “shorter” discussion, see Efficient E-Discovery, ABA Journal 31 (Apr. 2012) 31

32 SEARCH (2) Using search terms? How accurate are these? See In re National Ass’n of Music Merchants, Musical Instruments and Equipment Antitrust Litig., 2011 WL 6372826 (S.D. Ca. Dec. 19, 2011) Using search terms? How accurate are these? See In re National Ass’n of Music Merchants, Musical Instruments and Equipment Antitrust Litig., 2011 WL 6372826 (S.D. Ca. Dec. 19, 2011) 32

33 SEARCH (3) Automated review or “predictive coding” as an alternative to the use of search terms. For decisions which address automated review, see: EORHB, Inc. v. HOA Holdings LLC, C.A. No. 7409 (Del. Ct. Ch. Oct. 15, 2012) EORHB, Inc. v. HOA Holdings LLC, C.A. No. 7409 (Del. Ct. Ch. Oct. 15, 2012) In re Actos (Pioglitazone) Prod. Liability Litig., MDL No. 6:11-md-2299 (W.D. La. July 27, 2012) In re Actos (Pioglitazone) Prod. Liability Litig., MDL No. 6:11-md-2299 (W.D. La. July 27, 2012) Da Silva Moore v. Publicis Groupe SA, 2012 U.S. Dist. LEXIS 23350 (S.D.N.Y. Feb. 24), aff’d, 11 Civ. 1279 (ALC (AJP) (S.D.N.Y. Apr. 26, 2012) Da Silva Moore v. Publicis Groupe SA, 2012 U.S. Dist. LEXIS 23350 (S.D.N.Y. Feb. 24), aff’d, 11 Civ. 1279 (ALC (AJP) (S.D.N.Y. Apr. 26, 2012) Global Aerospace Inc. v. Landow Aviation, L.P., Consol. Case No. CL 61040 (VA Cir. Ct. Apr. 23, 2012) Global Aerospace Inc. v. Landow Aviation, L.P., Consol. Case No. CL 61040 (VA Cir. Ct. Apr. 23, 2012) 33

34 SEARCH (4) WHAT LESSONS CAN BE DRAWN FROM THE DECISIONS? Judge approved automated search at a “threshold” level. “Results” may be subject to challenge and later rulings. Judge approved automated search at a “threshold” level. “Results” may be subject to challenge and later rulings. Threshold superiority of automated vs. manual review recognized given volume of ESI and attorney review costs. Threshold superiority of automated vs. manual review recognized given volume of ESI and attorney review costs. Large volumes of ESI in issue. Large volumes of ESI in issue. Party seeking to do automated review must offer “transparency of process” or something close to it. Party seeking to do automated review must offer “transparency of process” or something close to it. “Reasonableness” of methodology is key. “Reasonableness” of methodology is key. Speculation by the opposing party is insufficient to defeat threshold approval. Speculation by the opposing party is insufficient to defeat threshold approval. 34

35 SEARCH (5) LET’S TAKE A DEEP BREATH AND RECAP WHERE WE ARE TODAY, VENDOR HYPE NOTWITHSTANDING: We have yet to see a judicial analysis of process and results in a contested matter. We have yet to see a judicial analysis of process and results in a contested matter. Safe to assume that the proponent of a process will bear the burden of proof (whatever that burden might be). Safe to assume that the proponent of a process will bear the burden of proof (whatever that burden might be). Safe to assume at least some transparency of process may/will be expected. Safe to assume at least some transparency of process may/will be expected. If “reasonableness” is standard, how reasonable must the results be? Is “precision” of 80% enough? 90%? Remember, there are no agreed-on standards. If “reasonableness” is standard, how reasonable must the results be? Is “precision” of 80% enough? 90%? Remember, there are no agreed-on standards. 35

36 INTERLUDE Assume a party makes production of ESI based on search terms proposed by an adversary. Assume further that the adversary suspects “something” is missing. Is suspicion enough to warrant direct access to the party’s databases by a consultant retained by the adversary? If not, what proofs should be required? Will an attorney’s certification or affidavit suffice? Will an attorney’s certification or affidavit suffice? Will/should the attorney become a witness? Will/should the attorney become a witness? Will experts be needed? Will experts be needed? Note, with regard to proofs, S2 Automation LLC v. Micron Technology, Inc., No. 11-0884 (D.N.M. Aug. 9, 2012), where the court, relying on Rule 26(g)(1), required a party to disclose its search methodology. 36

37 INTERLUDE A collision between search and ethics? Assume a party’s attorney knows that search terms proposed by adversary counsel, if applied to the party’s ESI, will not lead to the production of relevant (perhaps highly relevant) ESI. Assume a party’s attorney knows that search terms proposed by adversary counsel, if applied to the party’s ESI, will not lead to the production of relevant (perhaps highly relevant) ESI. Absent a lack of candor to adversary counsel or the court under RPC 3.4 (which implies if not require,s some affirmative statement), does not RPC 1.6 require the party’s attorney to remain silent? Absent a lack of candor to adversary counsel or the court under RPC 3.4 (which implies if not require,s some affirmative statement), does not RPC 1.6 require the party’s attorney to remain silent? What if the “nonproduction” becomes learned later? If nothing else, will the party’s attorney suffer bad “PR” if nothing else? What if the “nonproduction” becomes learned later? If nothing else, will the party’s attorney suffer bad “PR” if nothing else? If the party’s attorney wants to advise the adversary, should the attorney secure her client’s informed consent? What if the client says, “no?” If the party’s attorney wants to advise the adversary, should the attorney secure her client’s informed consent? What if the client says, “no?” (with thanks to the Hon. John M. Facciola) 37

38 INTERLUDE AS WE THINK ABOUT SEARCH, THINK ABOUT THE ETHICS ISSUES THAT USE OF A NONPARTY VENDOR MAY LEAD TO! 38


Download ppt "Scientists See Promise in Deep-Learning Programs Microsoft Seeks an Edge in Analyzing Big Data Jeff Hawkins Develops a Brainy Big Data Company Google Offers."

Similar presentations


Ads by Google