Eurovoc and parliamentary documents: a semi-automatic classification experience at the Camera dei deputati Calogero Salamone Luxembourg, 19 november 2010.

Slides:



Advertisements
Similar presentations
AMELI and BASILE : Informatics tools and the Senates procedure.
Advertisements

Chapter 7 System Models.
1 Federica Paradisi Italian National Bibliography Classification and Indexing Division National Central Library of Florence (Italy) Linking DDC numbers.
Data transfer to the EHES RC Luxembourg
Mind the lexical gap- Eurovoc Luxembourg, November 2010 Automatic Eurovoc indexing of parliamentary documentation Live demostration Victoria Fernández.
1 Automatic Indexing with the EuroVoc Thesaurus Enabling Cross-lingual Search Marie Francine Moens Katholieke Universiteit Leuven, Belgium Frane Šarić
Configuration management
Presentation by Priyanka Sawarkar
Database Planning, Design, and Administration
Rule-Making Book II EU Administrative Procedures – The ReNEUAL Draft Model Rules 2014 Brussels, May th Herwig C.H. Hofmann University of Luxembourg.
Software Processes Coherent sets of activities for specifying, designing, implementing and testing software systems.
FLORIDA LEGISLATIVE HISTORY RESEARCH Florida Supreme Court Library June 3, 2009.
Case Tools Trisha Cummings. Our Definition of CASE  CASE is the use of computer-based support in the software development process.  A CASE tool is a.
Click on one of the chapters below to get more information or click the next button to start the web tutorial. 1.How to navigate on the homepage?How to.
Database Planning, Design, and Administration Transparencies
Software Engineering COMP 201
© 2005 Prentice Hall7-1 Stumpf and Teague Object-Oriented Systems Analysis and Design with UML.
Requirements Specification
Lecture Nine Database Planning, Design, and Administration
MACHINE TRANSLATION TRANSLATION(5) LECTURE[1-1] Eman Baghlaf.
The ECHA-term project Multilingual REACH and CLP Terminology Dieter Rummel, Translation Centre for the Bodies of the EU Luxembourg EAFT - Oslo, 11 October.
Foundation Degree IT Project Methodologies (for reference)
Database System Development Lifecycle © Pearson Education Limited 1995, 2005.
Overview of the Database Development Process
Systems Analysis and Design
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 7 Slide 1 System models l Abstract descriptions of systems whose requirements are being.
CIG Conference Norwich September 2006 AUTINDEX 1 AUTINDEX: Automatic Indexing and Classification of Texts Catherine Pease & Paul Schmidt IAI, Saarbrücken.
Knowledge representation
fact sheet (07/03/2007) 1 ARE ARCHIVING SOLUTIONS RECORDKEEPING SOLUTIONS? 7 th March 2007 Stephen Clarke Government Recordkeeping Programme.
Lecture 3 Software Engineering Models (Cont.)
Discovering Computers Fundamentals Fifth Edition Chapter 9 Database Management.
Database Planning, Design, and Administration Transparencies
1/26/2004TCSS545A Isabelle Bichindaritz1 Database Management Systems Design Methodology.
CS 4310: Software Engineering Lecture 4 System Modeling The Analysis Stage.
Modified by Juan M. Gomez Software Engineering, 6th edition. Chapter 7 Slide 1 Chapter 7 System Models.
Methodology - Conceptual Database Design
Terminology and documentation*  Object of the study of terminology:  analysis and description of the units representing specialized knowledge in specialized.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
ISPRA 2004 Automatic Eurovoc indexing an Experiment in the Czech Parliament Anna Lhotská, Václav Sklenář Office of the Chamber of Deputies, Parliament.
Compiling, processing and accessing the collection of legal regulations of the Republic of Croatia T. Didak Prekpalaj, T. Horvat, D. Miletić, D. Mokriš.
CMNS 261 Finding Public Policy Documents Sylvia Roberts
Chapter 4 Automated Tools for Systems Development Modern Systems Analysis and Design Third Edition 4.1.
R. Winkels Comparing XML standards Alexander Boer Leibniz Center for Law University of Amsterdam.
A Patent Document Retrieval System Addressing Both Semantic and Syntactic Properties Liang Chen*,Naoyuki Tokuda+, Hisahiro Adachi+ *University of Northern.
ANALYSIS PHASE OF BUSINESS SYSTEM DEVELOPMENT METHODOLOGY.
Description and exemplification use of a Data Dictionary. A data dictionary is a catalogue of all data items in a system. The data dictionary stores details.
Enable Semantic Interoperability for Decision Support and Risk Management Presented by Dr. David Li Key Contributors: Dr. Ruixin Yang and Dr. John Qu.
Jean-Yves Le Meur - CERN Geneva Switzerland - GL'99 Conference 1.
Workshop on Standard operating procedures in the phytosanitary field, September Belgrad Serbia Monica Maria COJANU, Romania.
Chapter 9 Database Planning, Design, and Administration Transparencies © Pearson Education Limited 1995, 2005.
Database Principles: Fundamentals of Design, Implementation, and Management Chapter 1 The Database Approach.
CMNS 261 Finding Public Policy Documents
Methodology Conceptual Databases Design
Software Engineering Lecture 4 System Modeling The Analysis Stage.
Modern Systems Analysis and Design Third Edition
Methodology Conceptual Database Design
Computer Aided Software Engineering (CASE)
Modern Systems Analysis and Design Third Edition
The Systems Engineering Context
Training Course on Integrated Management System for Regulatory Body
Sohar University Quality Unit
CONSOLIDATED TEXTS ESTONIAN EXPERIENCE Silver Raukas, Jüri Heinla
Modern Systems Analysis and Design Third Edition
Modern Systems Analysis and Design Third Edition
Test Case Test case Describes an input Description and an expected output Description. Test case ID Section 1: Before execution Section 2: After execution.
Updates on issues related to the EU Ecolabel
Methodology Conceptual Databases Design
Modern Systems Analysis and Design Third Edition
System Analysis and Design:
Presentation transcript:

Eurovoc and parliamentary documents: a semi-automatic classification experience at the Camera dei deputati Calogero Salamone Luxembourg, 19 november 2010

General Establishing techniques to allow citizens access to legal information is a matter of primary importance in terms of the fundamentals of public service Classification of parliamentary and legal resources provide an important support for research

History In 1969/1970, Italys Chamber of Deputies and Senate began to consider the classification of laws, in the context of early automation projects of the Parliament An Automatic machine dictionary of Italian language (Camera 72) was projected to be used for the information retrieval of legal texts

History The project should have included a research system based on the storage of the full text of laws, decrees, treaties etc. dating back to 1848 An accurate legal-linguistic analysis was to establish a classification system to identify and resolve the problems of homographs, polysemy, shifts of meanings This project was abandoned

History In 1992 the thesaurus TESEO (TEsauro Senato per lOrganizzazione dei documenti parlamentari) was adopted for the classification of the bills database managed by the Senate The same thesaurus was adopted for the database of parliamentary oversight (Sindacato ispettivo) managed by the Chamber of deputies (questions to the government, motions and resolutions)

History TESEO includes 3650 terms grouped into 45 thematic areas (Top Terms), derived from an old home-made classification system and arranged according to the logical structure of the Universal Decimal Classification (UDC) There are only 358 language equivalent terms (non-descriptors) used for cross-referencing

From TESEO to EUROVOC The use of TESEO at Chamber of Deputies was overall satisfactory Difficulties were sometimes encountered in some areas due to the vagueness or absence of appropriate descriptors These problems led to creating a supplementary list with additional descriptors

From TESEO to EUROVOC In 2005 the Chamber began to consider whether to switch from TESEO to EUROVOC We considered inter alia the advantages of multilingual classification, including the possibility of connecting different legal and social phenomena under a single system of categorization

From TESEO to EUROVOC We also considered the larger number of descriptors available and the even bigger number of language equivalent terms (non- descriptors) available for the italian language There are some areas arranged in an EU perspective that can be difficult to use in a national perspective.

From TESEO to EUROVOC We hope to gradually extend Classification through Eurovoc thesaurus from policy-setting and oversight documents to the whole information system Thats why we developed a map to match and link the descriptors of Eurovoc to those of TESEO

Automatic indexing We know that automatic classification processes do not achieve the same quality as human indexing does They can be efficient enough to be used for specific purposes, e.g. to automatically index documents that otherwise would not be indexed at all, or to support the process of human indexing

Automatic indexing The Chamber of deputies chose to test automatic indexing on policy-setting and oversight documents These are texts written in everyday language whose length is usually limited

Automatic indexing The application of automatic indexing to the classification of legislative texts is probably more difficult Legislative texts present a higher level of formalization of language and the consistency of documentary units that should be indexed (up to the level of the paragraphs), may probably be too short for the application of automated tools

Automatic indexing The Chamber of Deputies decision to use an automated classification system was finalised in 2005 In an initial phase we started by testing automatic classification through TESEO descriptors In a second phase started in 2006, the program was set to automatic classification with Eurovoc thesaurus

Automatic indexing In 2008, with the beginning of the 16th Parliament, the Eurovoc classification of policy- setting and oversight documents of the Chamber of Deputies and the Senate was launched

Automatic indexing We selected a semantic technology solution (COGITO by Expert System), which automatically suggests a set of descriptors to be applied to each document Each document is analyzed and interpreted in order to be archived quickly in the corresponding category

Automatic indexing The categorizer automatically analyzes each document and suggests a list of descriptors that could be used This list is checked, modified and validated by a professional operator

Automatic indexing The current procedure is in fact semi-automatic Automatic suggestions are modified and integrated (amended and supplemented) The operator is responsible for the selection and final results

Automatic indexing So far, the classification suggested by Cogito categorizer has been used by transferring it manually to another application in order to record Eurovoc descriptors in the database used for research

Automatic indexing

History A new integrated application, is now available, which enables the automatic Cogito categorizer to analyse all the texts, and then to revise them, as well as validate and record Eurovoc descriptors

History is a Web application created to manage the automatic classification of policy- setting and oversight documents The application also allows the management of various stages of classification and its history

History is entirely developed in an open source environment using three-tier architecture Applicative infrastructure is divided into three different modules dedicated respectively to the user interface (View), the functional logic also called business logic (Model) and the data persistence management (Controller)

Automatic indexing Main functionalities: Sampling of new texts needing to be classified

Automatic indexing

Main functionalities: Display lists of documents automatically classified, divided by classification status

Automatic indexing

Main functionalities: Viewing and editing the automatic classification of a document; confirmation and subsequent storage of the final classification

Automatic indexing

Future developments include a phase of extensive and deep fine-tuning The aim is to check whether the system ultimately can lead to a high level of response so that it can be considered acceptable - even temporarily - without human intervention

Automatic indexing In case of positive results, we can consider the possibility of publishing automatic classification before revising it Users would be warned about this characteristic by a message like Classification to be reviewed

Questions to: