Presentation is loading. Please wait.

Presentation is loading. Please wait.

Iuriservice II Ontology Development

Similar presentations


Presentation on theme: "Iuriservice II Ontology Development"— Presentation transcript:

1 Iuriservice II Ontology Development
Núria Casellas, Denny Vrandečić, Joan Josep Vallbé, Aleks Jakulin, Mercedes Blázquez Workshop on Artificial Intelligence and Law XXII World Congress of Philosophy of Law and Social Philosophy Granada, May 2005

2 Agenda Introduction to SEKT Project and Legal Case Study Methodology
OPJK Improving knowledge discovery on the competency questions Architecture May 25, 2005

3 The inSEKTs Empolis University of Sheffield Universität Karlsruhe BT
Vrije Universiteit Amsterdam Empolis University of Sheffield Universität Karlsruhe BT Ontoprise Kea-pro Universität Innsbruck iSOCO Sirma AI Universitat Autònoma de Barcelona Jozef Stefan Institute May 25, 2005

4 SEKT Main goals of SEKT European Leadership in Semantic Technologies
Core Research Combine Human Language Technologies, Knowledge Discovery and Ontology Technologies Provide intelligent knowledge access May 25, 2005

5 Description of the Problem: Legal Domain
El problema que motiva la aplicación que pretendemos diseñar, proviene de las quejas acerca de la rapidez de la justicia española, los jueces tienen más trabajo del que pueden asumir. En particular, este problema se hace especialmente importante en los jueces sin experiencia, que salen de la escuela judicial preparados con un montón de conocimiento teórico, pero que no saben aplicar en la vida real, de manera que recurren a otros jueces a la hora de tomar decisiones. Esto ralentiza aún más su trabajo. Para intentar paliar esta situación, se intentará construir un sistema inteligente que recoja ese conocimiento experto (expertise) que no tienen los jueces en su primer destino, en forma de un sistema de preguntas frecuentes potenciado con tecnologías de web semántica. In General: Complaint about diligence of legal administration. The Judges are overworked. In Particular: New Judges A lot of theoretical knowledge, but few practical knowledge On Duty. When they are confronted with situations in which they are not sure what to do “Disturb” experienced judges with typical questions. Usually his/her former tutor (Preparador) Existing Technology Legal Databases Essential in their daily work Based on keywords and boolean operators A search retrieves a huge number of hits May 25, 2005

6 Description of the Problem: Legal Domain
El problema que motiva la aplicación que pretendemos diseñar, proviene de las quejas acerca de la rapidez de la justicia española, los jueces tienen más trabajo del que pueden asumir. En particular, este problema se hace especialmente importante en los jueces sin experiencia, que salen de la escuela judicial preparados con un montón de conocimiento teórico, pero que no saben aplicar en la vida real, de manera que recurren a otros jueces a la hora de tomar decisiones. Esto ralentiza aún más su trabajo. Para intentar paliar esta situación, se intentará construir un sistema inteligente que recoja ese conocimiento experto (expertise) que no tienen los jueces en su primer destino, en forma de un sistema de preguntas frecuentes potenciado con tecnologías de web semántica. Solution: Design an intelligent system to help new judges with their typical problems. Extended FAQ system using Semantic Web technologies Connect the FAQ system with the exiting jurisprudence. Search Jurisprudence using Semantic Web technologies. May 25, 2005

7 State of the Art in Legal Ontologies
LLD [Language for Legal Discourse, L.T. McCarty, 1989]: Atomic formula, Rules and Modalities. NOR [Norma, R.K. Stamper, 1991, 1996]: Agents Behavioral invariants, Realizations. LFU [Functional Ontology for Law, R.W. van Kranlinger; P.R.S. Visser, 1995]: Normative Knowledge, World knowledge, Responsibility knowledge, Reactive knowledge and Creative knowledge. FBO [Frame-Based Ontology of Law, A. Valente, 1995]: Norms, Acts and Concepts Descriptions]. LRI-Core Legal Ontology [J. Breuker et al., 2002]: Objects, Processes, Physical entities, Mental entities, Agents, Communicative Acts. IKF-IF-LEX Ontology for Norm Comparaison [A. Gangemi et al., 2001]: Agents, Institutive Norms, Instrumental provisions; Regulative norms; Open-textured legal notions, Norm dynamics. May 25, 2005

8 Conceptual distinctions
Professional Knowledge (PK) Legal Knowledge (LK)  Legal Core Ontologies (LCO) [based on General Theories of Law] Legal Professional Knowledge (LPK)  OPLK Judicial Professional Knowledge (JPK)  OPJK May 25, 2005

9 Ethnographic survey 10 6 1 16 7 8 14 8 5 10 16 12 8 29 Total Autonomous Communities: 14 (out of 17) May 25, 2005

10 Preliminary exploitation of data
Statistical analysis of results Judicial units: heterogeneity Judge’s profile Protocols of analysis Literal transcripts Completed questionnaires List of extracted questions May 25, 2005

11 OPJK Modeling Identification of possible concepts through ALCESTE’s results and TextToOnto conceptual distribution Domain detection Competency questions discussion and concept extraction May 25, 2005

12 DECISION-MAKING & JUDGMENTS
Intuitive ontological subdomains CRIMINAL LAW GENDER VIOLENCE ON-DUTY FAMILY ISSUES ORDER OF PROTECTION / INJUNCTION JUDGE CONTRACT LAW IMMIGRATION COMMERCIAL LAW REAL ESTATE JUDICIAL CLERKS PROCEEDINGS DECISION-MAKING & JUDGMENTS May 25, 2005

13 Term extraction using TextToOnto
May 25, 2005

14 Term extraction using TextToOnto and Spanish Gate
May 25, 2005

15 Identify important concepts that should be represented
Hierarchy construction Identify relations between them Redefine the ontology repeting steps 1-4 May 25, 2005

16 Competency question discussion
Selecting (underlying) all the nouns (usually concepts) and adjectives (usually properties) contained in the competency questions. ¿Cuál es el tratamiento de las denuncias manifiestamente inverosímiles o relativas a hechos que evidentemente carecen de tipicidad? ¿Y si se trata de una querella que reúne todos los demás presupuestos procesales pero los hechos objeto de la misma carecen de relevancia penal o manifiestamente falsos? ¿Qué ocurre si comparece en el juzgado una persona que quiere denunciar hechos difícilmente creíbles, sin relación entre sí, dudándose por el juez de la capacidad mental del denunciante? ¿Ante quién debe interponerse el recurso de reforma contra la prisión, delante del juez de guardia o del juez que dictó el correspondiente auto de prisión? May 25, 2005

17 OPJK classes identified
May 25, 2005

18 OPJK and Proton Integration
May 25, 2005

19 Improving knowledge discovery on the competency questions

20 Data and Method Data: 3 text corpora (judges’ questions): Method:
Corpus 1: Scholar “on duty” questions (Spanish Judicial School = 99) Corpus 2: Practical “on duty” questions (= 163) (field work) Corpus 3: All practical questions (=756)(field work) Method: TEXT GARDEN (J. Stefan Institute, Ljubljana) ALCESTE -Analysis of the co-occurring lexemes within the simple statements of a text [Reinert 2002, 2003] May 25, 2005

21 Analysis of Text The text needs to be represented in an appropriate way for statistical analysis: Breaking text into “units” (lines, sentences, …) Morphological categorization (adjectives, prepositions, …) Putting words into canonical form: Lemmatization (is,was,are → be) Stemming (loved, loving → lov+) Analysis: Clustering Latent semantic indexing Correspondence analysis Classification Visualization May 25, 2005

22 Correspondence analysis
ALCESTE (Reinert,1988) Folch & Habert (2000) { } { } Hierarchical descending clustering Correspondence analysis { } Corpus Segmented in chunks List of typical words related to each class Geometric representation Classes of related chunks

23 Example of Correspondence Analysis and Visualization
+-----| | | | | |-----+ 20| solo| | 19| | parte | 18| | monitorio demand | 17| | archiv+accion | 16| present+ | falta+ vehiculo+fase | 15| | seguir procurador | 14| |recurso+ pago+quiebra | 13| ofici+| gasto ejecut+ejecucion+ | 12| sido dia+ .finca+embarg+verbal+ | 11| interes+traficoacto+.notificacionentrega+ | 10| momentocelebr+hall+ cuantia+resolver | 9 | valor |auto+admit+qued+.juicio+deposit+ | 8 | lesion venirdinero.. notific+pericial+ | 7 | | si vista+aport+inform | 6 madreacord+viviend | cabo solicit | 5 | victima+maridoempresa | llev+ ya prueba+abogado+ | 4 | ..tratosproteccion | | 3 | .senor+alejamiento | responsabili | 2 tema+mujer+malo+violencia | | 1 | denunci+medida+visitas | | 0 +--.separacion+orden venirfiscal 1 | pidepresun | | 2 | | | 3 | | | 4 | | | 5 | | | 6 | | | 7 | dict | | 8 | | | 9 | | | 10| | | 11| | | 12| | | 13| | | 14| | un | 15| | | 16| | levantamient | 17| | tenerdeten+ libertadforense | 18| |person hacercausa+asunto+ | 19| servicio judicial+actuacion+ | 20| guardia+. juezllam policiadetenido+ | 21| | partido | TEXT GARDEN ALCESTE

24 Example of Clustering Class 1: Judicial unit
funcionar+ (21), juzgar(26), oficina(11), trabaj+(13), decir(26), llam+(16), mand+(12), acudir(11), adjunto(4), busc+(4), consult+(4), dato(6), hablar(4), jurisprudencia(3), local+(3), material(6), necesit+(7), policia(14), prensa(4), sala(4), funerari+(2), hurto(3), informacion(5), miedo(3), robo(3), servicio+(7), sustitu+(4), tecnico(2), venir(15) Class 2: Family law alejamiento(22), malo(22), medida(16), orden+(23), proteccion(17), senor+(13), trat+(22), victima(11), mujer(11), padre(7), denunci+(12), domestico(8), violencia(8), agresor(4), dict+(10), madre(7), marido(6), nino(5), pension(4), psicolog+(5), separacion(5), abus+(5), alimento(3), ayud+(4), casa(3), cautelar+(3), divorcio(2), empresa(3), hijo(4), lesion+(6) Class 3: Proceedings escrit+(9), fiscal+(13), instruccion(9), ordinario(5), seguir(11), acumular(5), audiencia-provincia(2), conform+(2), contradictori+(3), criterio+(10), cuantia(5), falt+(7), injusto(3), interpretacion(3), ley(6), motiv+(3), pendiente(2), perito(5) Class 4: Enforcement (judgment) ejecucion(14), ejecut+(15), embarg+(11), finca+(9), depositar+(6), interes+(6), pago(6), suspension(5), deposito(6), entreg+(6), quiebra(5), sentencia(9), solicit+(9), vehiculo(4), acreedor(3), administracion(4), cantidad(4), conden+(4), cost+(4), dinero(4), edicto(2), imposibilidad(3), multa(3), notificacion(4), pagar+(4)

25 Stemming vs Lemmatization
Stem Lema acumulacion acumulación acumularse acumular acumul+ --- admision admisión admit+ admitir celebracion celebración celebr+ celebrar misma+ mismo mismo+ --- suspenderse suspender suspend+ --- Stemming: the longest string of characters that is common to different words: For all the variants of ‘love’, but also for ‘lover’ (noun), ‘lovely’ (adverb), it can offer the stem: lov+ Lemmatization respects the category: 3 different lemma: love (verb), lover (noun) lovely (adv) If we apply this process to Spanish or Catalan (or every Romanesque language), which have a high flection capacity (60 forms for verbs, without taking into account the composed forms), stemming would hide a lot of information. EXAMPLES May 25, 2005

26 Quantitative Comparison
Stemmed Corpus Lemmatized Corpus Num. different forms 3074 2064 Num. Ocurrences 19861 19946 Max. Freq. Of a form 1230 2208 Hapax 1666 934 Quantitative Comparison Lemmatized corpus has fewer word-forms than the stemmed version. The LSI on the lemmatized corpus is able to reconstruct documents better, especially in few dimensions. The lemmatized corpus clustering is more detailed.

27 Comparision of Clustering Results
Clustering with stemmed corpus offers us 4 classes: ‘On-duty’ actions (mixed with Judicial Office) (54,06%) Proceedings and Trial (18,10%) Enforcement (judgements) (14,39%) Family Law (gender violence, divorce, separation…) (13,46%) Clustering with lemmatized corpus is more detailed and offers 6 classes: Judicial Office (20,11%) ‘On-duty’ actions (27,25%) Family Law (gender violence, divorce, separation…)(14,55%) Proceedings (15,61%) Trial (8,47%) Enforcement (judgements) (14,02%) May 25, 2005

28 Take-Home Messages Do text analysis of legal documents!
If you do that, Do lemmatization! May 25, 2005

29 Methodology

30 + Based on 800 competency questions + Questions were clustered
Initial Methodology + Based on 800 competency questions + Questions were clustered + Middle-out strategy – Usage of ontology not considered – Repetitive discussions – Long discussions May 25, 2005

31 Considering the “Why” No normative knowledge
Stick to the questions as sources Model the questions, not the answers May 25, 2005

32 Wiki visualization May 25, 2005

33 Diligent Argumentation Ontology
Argumentation ontology defined Based on Case Studies to identify the most effective types of arguments Argument type recognition based on RST May 25, 2005

34 Methodology changes Using DILIGENT made the ontology engineering…
… much faster … amenable to distributed development … better documented … trackable … better manageable Also DILIGENT itself got changed! May 25, 2005

35 Outlook Better tool support – off-the-shelf wiki had weaknesses
Moderator support in discussions Competency question clustering Gathering further experience from legal and other case studies May 25, 2005

36 Architecture

37 High Level Requirements
Los jueces no deben enfrentarse a un complejo interfaz de usuario. Un sencillo interfaz de lenguaje natural sea lo más apropiado. La similitud entre una pregunta formulada y una pregunta almacenada (con su correspondiente respuesta) debe estar basado en su significado más que en una simple concordancia entre palabras (word matching). Se puede usar una ontología para realizar esta correspondencia semántica de las preguntas. Las preguntas almacenadas en el sistema deben ser de excelente calidad Es mejor ser exhaustivos y reflejar lo mejor posible la situación actual. Para la base de preguntas se ha realizado una encuesta exhaustiva entre más de 250 jueces con experiencia. Judges should not be bothered with a complex user interface. A simple natural language interface is probably appropriate. The decision as to whether a new question is similar to a stored question (with its corresponding answer) should be based on semantics rather than on simple word matching. An ontology can be used to perform this semantic matching of questions. The questions included in the system should be of high quality. Be rather exhaustive and reflect the actual situation As extensive survey with more than 250 Spanish judges forms the basis for the questions. Justify the answer provided by the system with existing Jurisprudence. Jurisprudence databases. Metadata and Ontology process of documents. Knowledge Management at all levels May 25, 2005

38 Example Question-Answer
Este es un ejemplo del tipo de preguntas que se almacenarían en la FAQ, y que se obtendrán a través de un proceso de entrevistas que será explicado más adelante. Question: What problems can we foresee with the analysis of small amounts of drugs, where the identification test destroys the drugs? Answer: This is an unrepeatable piece of evidence at the trial. In these cases, the Spanish Criminal Procedure Act states that the adversarial principle should be respected. While the trial proceedings are prepared, the judge must explain to all parties that they may choose an expert to perform these tests. May 25, 2005

39 Example of judgment: parts
Court and docket number Grounds of Decision Names of the magistrates Date and place Prefatory statement History of the Case May 25, 2005

40 Relations between the Question/Answer & Judgment
Judgement Summary Case History Decision Grounds Ruling Question Answer FAQ OPJK Practical Knowledge Instances Correspondencias entre partes de: FAQs y Sentencias Sentencias y Sentencias May 25, 2005

41 Architecture Expert Knowledge Jurisprudence Web browser Natural
Questions- Answers Expert Knowledge Semantic Matching DB 1 Decisions DB N Ontology Learning & feeding Ontology Merging Jurisprudence Ontology Alignment Web browser Natural Language May 25, 2005

42 Expert Knowledge Retrieval
Point out the following key aspects: - design with a compromise in mind between efficiency and accuracy, both often incompatible. - designed for equilibrate this compromise - A multistage (plugin chain or pipeline) search system. A FAQ subset is the entry of each search stage, and a reduced subset is the outcome Advantages of this design: - Customizable design where chain links can be removed or added when necessary to increase the system efficiency and accuracy - Search deep can be customized to fit a user necessities. The outcome of one stage is the entry of next one. Underlying technologies are: - Natural language processing to get a deep comprehension of the user question - Ontology processing to located the user question into the domain knowledge in other to compare the user question with all FAQ questions For improving efficiency other technologies have been included: - memory caching of pre-calculated objects. - persistence subsystem to avoid re-calculate non necessary data Design - Technological considerations iFAQ System Multistage Searching Subsystem Accuracy Eficiency Ontology Domain Detection Keyword Matching Ontology Grapth Path Matching Natural Language Processing Ontology Technology Caching subsystem Persistence subsystem May 25, 2005

43 Expert Knowledge Retrieval
Describe the chain of searching stages with a progressive reduction of the FAQ candidate subset. Describe also how a search factory is used to produce the necessary search engine. Diferent searching engines can be built and used, depending on System configuration. Plugged Searching Stages Chain of Resposability pattern FAQ FAQ FAQ FAQ Ontology Domain Detection Keyword/synonym matching stage Ontology graph path matching FAQ Candidates User Question iFAQ Search Engine Other search engines ... Search Factory May 25, 2005

44 Expert Knowledge Retrieval
Semantic Similarity: Main steps Ontology Linking Ontology Semantic Distance Calculation NL query NLP POS list (lemmas) Term Coverage Calculation between queries Best match of stored queries Semantic distance Between queries May 25, 2005

45 Expert Knowledge Retrieval
Semantic Similarity The semantic distance is based on the weighted navigation distance between terms in the ontology. Navigation through the ontology means that one moves from one concept to another concept, via one of its relations or attributes. Is a Follows Actor Etc. The task of associating distance costs: Is a domain specific Needs to be performed by legal expert. Ontology Accuse Actions Follow Denounce Mother Son May 25, 2005

46 Conclusions Decision support system for unexperienced judges
En resumen, se trata de hacer un sistema de ayuda a la decisión para jueces sin experiencia, usando la tecnología del proyecto SEKT para: · Capturar el conocimiento de los expertos. · Compartir el conocimiento de uno cuantos jueces con todos los demás. · Proporcionar el conocimiento en el momento de la toma de decisiones. Decision support system for unexperienced judges Using Semantic Web technology for handling knowledge Provide knowledge for decision making process Capture knowledge from experts Share knowledge among all users Extended understanding capacities Background knowledge: Professional Legal Ontology Decision Explanation Improved Knowledge Acquisition May 25, 2005

47 Expert Knowledge Retrieval
Term Coverage Terms of the input question are filtered by their part-of-speech category: Nouns, Verbs, Adjectives, and Adverbs Each term is linked to the ontology if it is possible The algorithm constructs a semantic path from each input term to terms of the stored query. Terms which are linked to the ontology Terms (user questions) linked to the ontology but no corresponding them can be found in the stored questions (ontology navigation)  Semantic distance infinitely large. Terms cannot be linked to the ontology. But have a corresponding one at the stored question (same lemma)  Distance is Zero Not corresponding lemma in the stored question  Distance is infinite. May 25, 2005


Download ppt "Iuriservice II Ontology Development"

Similar presentations


Ads by Google