Speaker: Richard Cheng Richard Cheng, CISSP, CISA, directs digital forensics and e-discovery cases and consults on IT audits, governance and compliance. His experience includes the collection and processing of unique and/or proprietary ESI (Apple devices, mobile devices, collaboration sites, and the cloud). Richard has provided testimony as a neutral expert and technology authority. He has two M.S. degrees from the University of New Haven and a B.S. from MIT.
Speaker: Megan Bell Megan Bell directs data analysis projects. She is experienced in the analysis of complex data sets, search and reporting technology and the automation of workflows that increase efficiency and deliver better outcomes. Her case experience includes data/security breach, IP theft, insurance, and employment matters. She also has extensive experience in the development and launch of new product technologies. She has a degree in Chemical Engineering from WPI.
Overview: Boolean Search Early eDiscovery famous moments Martha Stewart voicemail Lehman Brothers bankruptcy Merrill Lynch analyst emails on junk investments Its not just e-discovery.
Universe of Search Types of Data Sources: Databases, Email, Files, SharePoint Locations: Local computer, server, backup, mobile device Search Technologies: dtSearch Lucene Grep SQL Automated predictive methods/ neural nets
Why Boolean? Boolean search: Character-based searching. Toolbox of relationship connectors and limiters to broaden or narrow search Benefits: Identify important words/ phrases and how used Research written language context and relationship Easily vary breadth and scope of search Customizable search
Overview of Boolean Search Construction Boolean connectors AND, OR, NOT
Overview of Boolean Search Construction Other Boolean elements Proximity, Stemming, Fuzzy Searching Parentheses Wildcards Numeric terms and ranges Fields (i.e., email address) Differences in Boolean connectors AND versus Proximity Stemming versus Wildcard use
Overview of Boolean Search Construction for Foreign Languages
Foreign Languages How will you handle the multiple foreign languages? Example: Chinese Dialects Gan - / Gan - / 31 million Guan (Mandarin) - / Guan (Mandarin) - / 836 million Hui - Hui - 3.2 million Jin - / Jin - / 45 million Kejia (Hakka) - Kejia (Hakka) - 34 million Min - / Min - / 60 million Wu - / Wu - / 77 million Xiang - / / / Xiang - / / / 36 million Yue - / Yue - / 71 million UnclassifiedUnclassified not determined
Optimizing Boolean Search Statement Construction 1. Invest time in identifying relevant search terms and phrases. 2. Determine which search terms to search in combination. 3. Use the most appropriate Boolean logic. 4. Adjust Boolean search statements to account for variations in search term wording, spellings and abbreviations. 5. Modify Boolean search statement when special characters are present.
1. Capturing the Variation for a Word Example: eDiscovery Boolean: e-Discovery OR eDiscovery OR electronic discovery OR electronic w/1 discovery
2. Searching for Unique Phrases Example: Search for the ratio 1:1 Boolean: 1?1 AND (NOT(101 OR 111 OR 121 OR 131 OR 141 OR 151 OR 161 OR 171 OR 181 OR 191))
3. Simplifying Complex Compound Phrases Example: (product rollout OR product release) AND (China OR Japan OR Korea OR Asia OR ASEAN OR Taiwan OR Hong Kong) Boolean: (product release) AND (China OR Japan OR Korea OR Asia OR ASEAN OR Taiwan OR Hong Kong) (product rollout ) AND (China OR Japan OR Korea OR Asia OR ASEAN OR Taiwan OR Hong Kong)
4. When Dates are Search Terms Example: 1/6/11 Boolean: 1?6?11 OR !1?6?2011 Others?
5. Compound Words Example: Watch-out Boolean: Watchout OR Watch?out watch out?
6. Noise Filter Issues Example: The The Boolean: The The
7. Improving Search Results for an Overused and Important Word Example: When confidential is important as a search term and overused Boolean: confidential AND NOT (communication is confidential OR confidentiality notice OR confidential personal) confidential AND NOT (confidential w/3 communication) confidential AND NOT (confidential w/3 notice) confidential AND NOT (confidential w/3 personal)
Statistical Sampling Recent court opinions suggest that sampling as used in Assisted Review is not only useful but may be required in certain cases. Several decisions in the past few years have penalized lawyers for not sampling documents before they were produced (waiver of privilege) and for not sampling the documents that were not produced (omission of responsive data). In two landmark decisions, U.S. Magistrate Judges John M. Facciola and Paul W. Grimm issued key rulings discussing sampling. Specifically, they criticized counsel who hoped to be excused for inadvertent waiver of privilege because they did not sample the documents produced after key-word searches. United States v. OKeefe, 537 F. Supp. 2d 14 (D.D.C. 2008) (Judge Facciola) Victor Stanley, Inc. v. Creative Pipe, Inc., 250 F.R.D. 251 (D. Md. 2008) (Judge Grimm)
Smoking Gun Even more recently, another court found waiver of privilege in a smoking gun attorney-client communication because counsel failed to sample. Mt. Hawley Ins. Co. v. Felman Prod., Inc., 2010 WL 1990555 (S.D. W. Va. May 18, 2010)