Www.recommind.com AI in the legal market Jan Puzicha, CTO.

Slides:



Advertisements
Similar presentations
Features of Property Co. Founded in 1996, Property is an independent company. Property is not a subsidiary of any major company or manufacturer.
Advertisements

Language Technologies Reality and Promise in AKT Yorick Wilks and Fabio Ciravegna Department of Computer Science, University of Sheffield.
Beth Golden Manager, Editorial Services Factiva Intelligent Indexing SLA 2004.
Your dissertation and the Library James Webley 19 February 2013.
Classification & Your Intranet: From Chaos to Control Susan Stearns Inmagic, Inc. E-Libraries E204 May, 2003.
Semantics Rule, Keywords Drool J. Brooke Aker CEO Expert System USA February 2010.
Learning Semantic Information Extraction Rules from News The Dutch-Belgian Database Day 2013 (DBDBD 2013) Frederik Hogenboom Erasmus.
Overview of PubWEST Patent and Trademark Depository Library Training Seminar April 2006.
FAO of the UN Library and Documentation Systems Division ECDL 2003 Trondheim August 03 Automatic multi-label subject indexing in a multilingual environment.
Faceted Navigation: Search and Browse Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services
Recommind Proprietary and Confidential Page 1 The ROI of Enterprise Search: Increase Revenue, Reduce Cost, Reduce Risk Robert Tennant.
A Probabilistic Framework for Information Integration and Retrieval on the Semantic Web by Livia Predoiu, Heiner Stuckenschmidt Institute of Computer Science,
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
Advanced Distributed Learning. Conditions Before SCORM  Couldn’t move courses from one Learning Management System to another  Couldn’t reuse content.
Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.
Semantic (Language) Models: Robustness, Structure & Beyond Thomas Hofmann Department of Computer Science Brown University Chief Scientist.
Patent CLEF John Tait, Chief Scientific Officer, IRF.
1 UCB Digital Library Project An Experiment in Using Lexical Disambiguation to Enhance Information Access Robert Wilensky, Isaac Cheng, Timotius Tjahjadi,
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Bond International Software Adapt Insight Search Adapt Central May 2015 Presented by: Tony Deliseo.
Redefining Perspectives A thought leadership forum for technologists interested in defining a new future June COPYRIGHT ©2015 SAPIENT CORPORATION.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
The use of patents by a university spin-off. Sub-module BThe use of patents by a university spin-off 2/21 Structure of the case study University technology.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics – Bag of concepts – Semantic distance between two words.
Blaz Fortuna, Marko Grobelnik, Dunja Mladenic Jozef Stefan Institute ONTOGEN SEMI-AUTOMATIC ONTOLOGY EDITOR.
1 The BT Digital Library A case study in intelligent content management Paul Warren
Mining the Semantic Web: Requirements for Machine Learning Fabio Ciravegna, Sam Chapman Presented by Steve Hookway 10/20/05.
Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet.
Knowledge Representation and Indexing Using the Unified Medical Language System Kenneth Baclawski* Joseph “Jay” Cigna* Mieczyslaw M. Kokar* Peter Major.
© Copyright 2013 ABBYY NLP PLATFORM FOR EU-LINGUAL DIGITAL SINGLE MARKET Alexander Rylov LTi Summit 2013 Confidential.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
NLP And The Semantic Web Dainis Kiusals COMS E6125 Spring 2010.
When Search is not Enough Case Study: The Advertising Research Foundation Gilbane Boston November 27, 2007 Gilbane Boston November 27, 2007.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
GTRI.ppt-1 NLP Technology Applied to e-discovery Bill Underwood Principal Research Scientist “The Current Status and.
Alison Mancusi February 12, 2011 Overview of Exalead.
1 Internet Research Third Edition Unit A Searching the Internet Effectively.
Trustworthy Semantic Webs Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #4 Vision for Semantic Web.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Strategies for subject navigation of linked Web sites using RDF topic maps Carol Jean Godby Devon Smith OCLC Online Computer Library Center Knowledge Technologies.
Advanced Semantics and Search Beyond Tag Clouds and Taxonomies Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
Topic Maps introduction Peter-Paul Kruijsen CTO, Morpheus software ISOC seminar, april 5 th 2005.
National Technical University of Ukraine “Kiev Polytechnic Institute” Heat and energy design faculty Department of automation design of energy processes.
Semantic web Bootstrapping & Annotation Hassan Sayyadi Semantic web research laboratory Computer department Sharif university of.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
Improving the Classification of Unknown Documents by Concept Graph Morteza Mohagheghi Reza Soltanpour
Jean-Yves Le Meur - CERN Geneva Switzerland - GL'99 Conference 1.
Introducing the New iManage Dan Carmel, Chief Marketing Officer.
An Ontology-based Automatic Semantic Annotation Approach for Patent Document Retrieval in Product Innovation Design Feng Wang, Lanfen Lin, Zhou Yang College.
Semantic Web. P2 Introduction Information management facilities not keeping pace with the capacity of our information storage. –Information Overload –haphazardly.
Semantic Web Technologies Readings discussion Research presentations Projects & Papers discussions.
Personalized Ontology for Web Search Personalization S. Sendhilkumar, T.V. Geetha Anna University, Chennai India 1st ACM Bangalore annual Compute conference,
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin
Presented by: Hassan Sayyadi
Measuring Sustainability Reporting using Web Scraping and Natural Language Processing Alessandra Sozzi
Presentation Title.
Taxonomies, Lexicons and Organizing Knowledge
Text Analytics in ITS 2.0: Annotation of Named Entities
Internet Research Third Edition
The ROI of Enterprise Search:
FIBO-aligned Semantic Triples
CSE 635 Multimedia Information Retrieval
Introduction to Information Retrieval
NLP for business process automation practical cases
Information Retrieval
Active AI Projects at WIPO
Presentation transcript:

AI in the legal market Jan Puzicha, CTO

Recommind Proprietary and Confidential Page 2 Recommind Is The Leading Enterprise Search Vendor for Professional Services Organizations in Legal Recommind is an enterprise software company focused on building Enterprise Search, Categorization, & Intelligent Review solutions for global organizations with large amounts of structured and unstructured information Leading Enterprise Search vendor in the Legal industry Over 25% of the top law firms are Recommind customers Headquartered in California with offices globally North America: San Francisco, New York, Boston, Chicago, Atlanta Europe: Bonn, Germany, London, UK Asia: Sydney, Australia (Partner office) Founded 2000; Privately held, profitable

Recommind Proprietary and Confidential Page 3 Customer list  Field Fisher Waterhouse  Davies Arnold Cooper  Everscheds  Watson Farley &Williams  Simmons & Simmons  Novartis Corporate Legal  Cleary Gottlieb  Bryan Cave  Luther Rechtsanwälte  DLA Piper Rudnik Gray Cary  Wilson Sonsini  Homburger  Paul Hastings  Miller Canfield  Pfizer Legal 3  Morrison & Foerster  Jackson Lewis  Shearman & Sterling  Cooley Godward Kronish  Cravath, Swaine & Moore  Bingham McCutchen  Fasken Martineau  Lewis Silkin  Nixon Peabody  O‘Melveny & Myers  Orrick, Herrington & Sutcliffe  Shook, Hardy & Bacon  And many more

Recommind Proprietary and Confidential Page 4 Concept search Concept search = finding key ‚concepts‘ in text, –noun phrase extraction –useful for navigation and summarization –useful for filtering –search for key-word matches Concept search = semantic query understanding –understanding semantic relationship between words –understand topical structure of a document –understanding ambiguities –search for semantic matches  manual: Ontologies, Semantic Web, …  automated: Probabilistic Latent Semantic Analysis (PLSA)

Recommind Proprietary and Confidential Page 5 Noun phrase extraction and concept search examples

Recommind Proprietary and Confidential Page 6 6 Probabilistic Latent Semantic Indexing - pLSA Statistical inference Automated learning from Context Extraction of topical structures Domain adaptive accuracy automation Search Engines Ontologies pLSA concept-based representation robustness statistical inference Content Retrieval

Recommind Proprietary and Confidential Page 7 7 Estimation via pLSA Latent Concepts Terms Documents TRADE economic imports trade Concept expression probabilities are estimated based on all documents that are dealing with a concept. “Unmixing” of superimposed concepts is achieved by statistical learning algorithm. Conclusion:  No prior knowledge about concepts required, context and term co- occurrences are exploited CHINA china bejing

Recommind Proprietary and Confidential Page 8 8 Why statistical NLP? Language independent Symbolic methods Solely tokenization required Learning from example Domain adaptive Tailored towards specific use-case Trained on specific corpus Language is too complex for rules-only Data-intensive, but no expert required More data is better Examples easier to provide than rules

Recommind Proprietary and Confidential Page 9 9 Aspect Models for Conceptual Matching 10 out of 128 aspects, articles from Science

Recommind Proprietary and Confidential Page 10 Recommind’s Sophisticated Technology Automatically Extracts Concepts From Your Own Data Aspect 3 miranda confession tape identification interview interrogation tapes photographs pornography conversation statements entrapment told fbi recording statement videotape agent Aspect 4 patent infringement uspto invention patents copyright software specification equivalents art copyrighted uspq patentee works inventor pto copying patented copyrights infringing Aspect 5 environmental water epa waste hazardous pollution disposal cercla clean emissions exxon nuclear cleanup toxic corps contamination asbestos solid sites chemical

Recommind Proprietary and Confidential Page Categorization: What is the problem?

Recommind Proprietary and Confidential Page MindServer Categorization Automatic Categorization Content manager, librarian Enterprise Content Assets Enterprise Taxonomy

Recommind Proprietary and Confidential Page Probabilistic Support Vector Machines Learning from examples Balancing simplicity against performance on training data Highest empirical performance for categorization accuracy automation Naive Bayes pSVM learning efficiency Human Annotations Expert example based Content Categorization

Recommind Proprietary and Confidential Page MindServer Legal - Autofile

Recommind Proprietary and Confidential Page MindServer Categorization

Recommind Proprietary and Confidential Page Customer Case Study: MindServer categorization at ZDF Background:  ZDF, based on Germany, is Europe’s largest television station  Over 1000 categories, hierarchically structured into four layers  Geography, People, Organizations  Covers 2 languages: German and English Results:  Automated indexing and categorization tripled capacity  All information across the organization available in a single search Accuracy : Results: Precision / Recall Naïve Bayes 42% 71% Human % 78% Correct False Positive False Negative Precision Recall “Precision” is the percent of documents that are categorized correctly; “Recall” is the percent of relevant documents that are categorized Recommind % 94%

Recommind Proprietary and Confidential Page Case Study - Legal (Cleary Gottlieb)  800 attorneys, Global with 10 offices in 9 countries  iManage, Lotus Notes, Intranet and library file systems  Multiple languages: English, French, German, Korean, Chinese  Universal Search - ties together multiple document management, practice management, and resource information sources across global offices  Automate records management department: categorize doctype, flag drafts, extract title, involved parties, governing law etc.  Precision / recall (doctype): 76% / 95% Background Solution “Our strategy has always been to provide powerful tools that enable our lawyers to share and access information in the most efficient way possible. We were impressed with Recommind's technology, which delivers high-quality conceptual search matches, while seamlessly pulling information from a range of sources.” - Brent Miller, Director of Knowledge Management, Cleary Gottlieb

Recommind Proprietary and Confidential Page 18 Case Study – Cleary Gottlieb PROPOSED DOC TYPE OVERALL STATS PROPOSED DOCTYPE CONFIDENCE %AGREEDISAGREETOTAL% CORRECT %-90.00% % 89.99%-80.00% % 79.99%-70.00% % 69.99%-60.00% % 59.99%-50.00% % 49.99%-40.00% % 39.99%-30.00% % 29.99%-20.00% % 19.99%-10.00% % 9.99%-0.00% % TOTAL % PROPOSED DOC TYPE AGREEMENTS PROPOSED DOCTYPE CONFIDENCE %AGREEDISAGREETOTAL% CORRECT %-90.00% % 89.99%-80.00% % 79.99%-70.00% % 69.99%-60.00% % 59.99%-50.00% % 49.99%-40.00% % 39.99%-30.00% % 29.99%-20.00% % 19.99%-10.00%0000% 9.99%-0.00%0000% TOTAL %

Recommind Proprietary and Confidential Page The coding panel shows auto-populated Issues, subjects etc.

Recommind Proprietary and Confidential Page By selecting ‘Energy Prices’ from the Issues List, the highlighting of the document changes to show what text lead the system to auto categorise the document to this issue. At any stage the document preview can be launch in a second window for reviewers using multiple (or large) screens