AI in the legal market Jan Puzicha, CTO
Recommind Proprietary and Confidential Page 2 Recommind Is The Leading Enterprise Search Vendor for Professional Services Organizations in Legal Recommind is an enterprise software company focused on building Enterprise Search, Categorization, & Intelligent Review solutions for global organizations with large amounts of structured and unstructured information Leading Enterprise Search vendor in the Legal industry Over 25% of the top law firms are Recommind customers Headquartered in California with offices globally North America: San Francisco, New York, Boston, Chicago, Atlanta Europe: Bonn, Germany, London, UK Asia: Sydney, Australia (Partner office) Founded 2000; Privately held, profitable
Recommind Proprietary and Confidential Page 3 Customer list Field Fisher Waterhouse Davies Arnold Cooper Everscheds Watson Farley &Williams Simmons & Simmons Novartis Corporate Legal Cleary Gottlieb Bryan Cave Luther Rechtsanwälte DLA Piper Rudnik Gray Cary Wilson Sonsini Homburger Paul Hastings Miller Canfield Pfizer Legal 3 Morrison & Foerster Jackson Lewis Shearman & Sterling Cooley Godward Kronish Cravath, Swaine & Moore Bingham McCutchen Fasken Martineau Lewis Silkin Nixon Peabody O‘Melveny & Myers Orrick, Herrington & Sutcliffe Shook, Hardy & Bacon And many more
Recommind Proprietary and Confidential Page 4 Concept search Concept search = finding key ‚concepts‘ in text, –noun phrase extraction –useful for navigation and summarization –useful for filtering –search for key-word matches Concept search = semantic query understanding –understanding semantic relationship between words –understand topical structure of a document –understanding ambiguities –search for semantic matches manual: Ontologies, Semantic Web, … automated: Probabilistic Latent Semantic Analysis (PLSA)
Recommind Proprietary and Confidential Page 5 Noun phrase extraction and concept search examples
Recommind Proprietary and Confidential Page 6 6 Probabilistic Latent Semantic Indexing - pLSA Statistical inference Automated learning from Context Extraction of topical structures Domain adaptive accuracy automation Search Engines Ontologies pLSA concept-based representation robustness statistical inference Content Retrieval
Recommind Proprietary and Confidential Page 7 7 Estimation via pLSA Latent Concepts Terms Documents TRADE economic imports trade Concept expression probabilities are estimated based on all documents that are dealing with a concept. “Unmixing” of superimposed concepts is achieved by statistical learning algorithm. Conclusion: No prior knowledge about concepts required, context and term co- occurrences are exploited CHINA china bejing
Recommind Proprietary and Confidential Page 8 8 Why statistical NLP? Language independent Symbolic methods Solely tokenization required Learning from example Domain adaptive Tailored towards specific use-case Trained on specific corpus Language is too complex for rules-only Data-intensive, but no expert required More data is better Examples easier to provide than rules
Recommind Proprietary and Confidential Page 9 9 Aspect Models for Conceptual Matching 10 out of 128 aspects, articles from Science
Recommind Proprietary and Confidential Page 10 Recommind’s Sophisticated Technology Automatically Extracts Concepts From Your Own Data Aspect 3 miranda confession tape identification interview interrogation tapes photographs pornography conversation statements entrapment told fbi recording statement videotape agent Aspect 4 patent infringement uspto invention patents copyright software specification equivalents art copyrighted uspq patentee works inventor pto copying patented copyrights infringing Aspect 5 environmental water epa waste hazardous pollution disposal cercla clean emissions exxon nuclear cleanup toxic corps contamination asbestos solid sites chemical
Recommind Proprietary and Confidential Page Categorization: What is the problem?
Recommind Proprietary and Confidential Page MindServer Categorization Automatic Categorization Content manager, librarian Enterprise Content Assets Enterprise Taxonomy
Recommind Proprietary and Confidential Page Probabilistic Support Vector Machines Learning from examples Balancing simplicity against performance on training data Highest empirical performance for categorization accuracy automation Naive Bayes pSVM learning efficiency Human Annotations Expert example based Content Categorization
Recommind Proprietary and Confidential Page MindServer Legal - Autofile
Recommind Proprietary and Confidential Page MindServer Categorization
Recommind Proprietary and Confidential Page Customer Case Study: MindServer categorization at ZDF Background: ZDF, based on Germany, is Europe’s largest television station Over 1000 categories, hierarchically structured into four layers Geography, People, Organizations Covers 2 languages: German and English Results: Automated indexing and categorization tripled capacity All information across the organization available in a single search Accuracy : Results: Precision / Recall Naïve Bayes 42% 71% Human % 78% Correct False Positive False Negative Precision Recall “Precision” is the percent of documents that are categorized correctly; “Recall” is the percent of relevant documents that are categorized Recommind % 94%
Recommind Proprietary and Confidential Page Case Study - Legal (Cleary Gottlieb) 800 attorneys, Global with 10 offices in 9 countries iManage, Lotus Notes, Intranet and library file systems Multiple languages: English, French, German, Korean, Chinese Universal Search - ties together multiple document management, practice management, and resource information sources across global offices Automate records management department: categorize doctype, flag drafts, extract title, involved parties, governing law etc. Precision / recall (doctype): 76% / 95% Background Solution “Our strategy has always been to provide powerful tools that enable our lawyers to share and access information in the most efficient way possible. We were impressed with Recommind's technology, which delivers high-quality conceptual search matches, while seamlessly pulling information from a range of sources.” - Brent Miller, Director of Knowledge Management, Cleary Gottlieb
Recommind Proprietary and Confidential Page 18 Case Study – Cleary Gottlieb PROPOSED DOC TYPE OVERALL STATS PROPOSED DOCTYPE CONFIDENCE %AGREEDISAGREETOTAL% CORRECT %-90.00% % 89.99%-80.00% % 79.99%-70.00% % 69.99%-60.00% % 59.99%-50.00% % 49.99%-40.00% % 39.99%-30.00% % 29.99%-20.00% % 19.99%-10.00% % 9.99%-0.00% % TOTAL % PROPOSED DOC TYPE AGREEMENTS PROPOSED DOCTYPE CONFIDENCE %AGREEDISAGREETOTAL% CORRECT %-90.00% % 89.99%-80.00% % 79.99%-70.00% % 69.99%-60.00% % 59.99%-50.00% % 49.99%-40.00% % 39.99%-30.00% % 29.99%-20.00% % 19.99%-10.00%0000% 9.99%-0.00%0000% TOTAL %
Recommind Proprietary and Confidential Page The coding panel shows auto-populated Issues, subjects etc.
Recommind Proprietary and Confidential Page By selecting ‘Energy Prices’ from the Issues List, the highlighting of the document changes to show what text lead the system to auto categorise the document to this issue. At any stage the document preview can be launch in a second window for reviewers using multiple (or large) screens