Managing legislative information in Parliaments: new frontiers Prof. Fabio Vitali Department of Computer Science University of Bologna.

Slides:

Advertisements

Similar presentations

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.

Advertisements

Configuration management

CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.

The Acquisition and Sharing of Domain Knowledge Contained in Software with a Compliant SIK Architecture by Prof. dr. Vasile AVRAM Academy of Economic Studies.

Using the Semantic Web to Construct an Ontology- Based Repository for Software Patterns Scott Henninger Computer Science and Engineering University of.

Where are the Semantics in the Semantic Web? Michael Ushold The Boeing Company.

COMP 6703 eScience Project Semantic Web for Museums Student : Lei Junran Client/Technical Supervisor : Tom Worthington Academic Supervisor : Peter Strazdins.

Software Requirements

ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.

1 CS 502: Computing Methods for Digital Libraries Lecture 17 Descriptive Metadata: Dublin Core.

The RDF meta model: a closer look Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations.

IMT530- Organization of Information Resources1 Feedback Like exercises –But want more instructions and feedback on them –Wondering about grading on these.

Ontology-based Access Ontology-based Access to Digital Libraries Sonia Bergamaschi University of Modena and Reggio Emilia Modena Italy Fausto Rabitti.

Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.

Metadata and identifiers for e- journals Copenhagen Juha Hakala Helsinki University Library

Metadata Standards and Applications 5. Applying Metadata Standards: Application Profiles.

RDF (Resource Description Framework) Why?. XML XML is a metalanguage that allows users to define markup XML separates content and structure from formatting.

 Copyright 2005 Digital Enterprise Research Institute. All rights reserved. Towards Translating between XML and WSML based on mappings between.

©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 6 Slide 1 Software Requirements.

Knowledge representation

Introduction to XML. XML - Connectivity is Key Need for customized page layout – e.g. filter to display only recent data Downloadable product comparisons.

1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,

The Metadata Object Description Schema (MODS) NISO Metadata Workshop May 20, 2004 Rebecca Guenther Network Development and MARC Standards Office Library.

XHTML1 Building Document Structure Chapter 2. XHTML2 Objectives In this chapter, you will: Learn how to create Extensible Hypertext Markup Language (XHTML)

Software Requirements Presented By Dr. Shazzad Hosain.

Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK

Meta Tagging / Metadata Lindsay Berard Assisted by: Li Li.

XML and Digital Libraries M. Zubair Department of Computer Science Old Dominion University.

Definition of a taxonomy “System for naming and organizing things into groups that share similar characteristics” Taxonomy Architectures Applications.

1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.

Evolving MARC 21 for the future Rebecca Guenther CCS Forum, ALA Annual July 10, 2009.

©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.

EEL 5937 Ontologies EEL 5937 Multi Agent Systems Lecture 5, Jan 23 th, 2003 Lotzi Bölöni.

It’s all semantics! The premises and promises of the semantic web. Tony Ross Centre for Digital Library Research, University of Strathclyde

RELATORS, ROLES AND DATA… … similarities and differences.

SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.

Oreste Signore- Quality/1 Amman, December 2006 Standards for quality of cultural websites Ministerial NEtwoRk for Valorising Activities in digitisation.

Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.

The RDF meta model Basic ideas of the RDF Resource instance descriptions in the RDF format Application-specific RDF schemas Limitations of XML compared.

Metadata : an overview XML and Educational Metadata, SBU, London, 10 July 2001 Pete Johnston UKOLN, University of Bath Bath, BA2 7AY UKOLN is supported.

CASE (Computer-Aided Software Engineering) Tools Software that is used to support software process activities. Provides software process support by:- –

Topic Maps introduction Peter-Paul Kruijsen CTO, Morpheus software ISOC seminar, april 5 th 2005.

Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,

R. Winkels Comparing XML standards Alexander Boer Leibniz Center for Law University of Amsterdam.

EEL 5937 Ontologies EEL 5937 Multi Agent Systems Lotzi Bölöni.

DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.

Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.

The Semantic Web. What is the Semantic Web? The Semantic Web is an extension of the current Web in which information is given well-defined meaning, enabling.

Differences and distinctions: metadata types and their uses Stephen Winch Information Architecture Officer, SLIC.

The Akoma Ntoso Naming Convention Fabio Vitali University of Bologna.

Software Engineering, COMP201 Slide 1 Software Requirements BY M D ACHARYA Dept of Computer Science.

OWL Web Ontology Language Summary IHan HSIAO (Sharon)

The CEN Metalex Naming Convention Fabio Vitali University of Bologna.

Introduction: Databases and Database Systems Lecture # 1 June 19,2012 National University of Computer and Emerging Sciences.

CASEY A. MULLIN WITH: LALA HAJIBAYOVA SCOTT MCCAULAY DECEMBER 8, 2008 FRBR in RDF: a proof-of-concept model 1 ©2008 Casey A. Mullin.

Metadata Schema Registries: background and context MEG Registry Workshop, Bath, 21 January 2003 Rachel Heery UKOLN, University of Bath Bath, BA2 7AY UKOLN.

Semantic Web. P2 Introduction Information management facilities not keeping pace with the capacity of our information storage. –Information Overload –haphazardly.

Online Information and Education Conference 2004, Bangkok Dr. Britta Woldering, German National Library Metadata development in The European Library.

1 Software Requirements Descriptions and specifications of a system.

Setting the stage: linked data concepts Moving-Away-From-MARC-a-thon.

Attributes and Values Describing Entities. Metadata At the most basic level, metadata is just another term for description, or information about an entity.

Geospatial metadata Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall

Presentation on Software Requirements Submitted by

XML QUESTIONS AND ANSWERS

Akoma Ntoso and functionally equivalent naming conventions (FENC)

Attributes and Values Describing Entities.

Accommodating local cataloguing traditions in a global context

Some Options for Non-MARC Descriptive Metadata

Attributes and Values Describing Entities.

Presentation transcript:

Managing legislative information in Parliaments: new frontiers Prof. Fabio Vitali Department of Computer Science University of Bologna

Next: Summary2/34 Purpose of this talk To assert that parliamentary processes and citizens’ access to parliamentary records and documents can be improved by: Adopting the best technologies for document management (mainly, XML and related standards) Adopting standard formats for naming and electronic representation of documents, possibly a common, multi- lingual, multi-national standard. Fostering the creation and adoption of many different software tools to be made available to support these standards.

Next: Norme In Rete3/34 Summary My background Computer support for parliamentary activities Functionalities Advantages Key discussion points Data/metadata Different views of the idea of document Content, structure and presentation Metadata and ontologies Naming mechanism

Next: Akoma Ntoso4/34 Norme In Rete Norms on the Net Italian-wide initiative sponsored by the Ministry of Justice ( present) to develop An XML-based data format for national, regional and local norms A naming schema to identify all relevant documents, both available and unavailable, both existing and potential A distributed, federated architecture allowing for multiple storage centers with overlapping competencies, official and not official, unified by a single search engine National standard, adopted by a large number of institutions both at the national and local level. Large font of inspiration for LexML (Brazil)

Next: CEN Metalex5/34 Akoma Ntoso Sponsored by the UN Department of Economic and Social Affairs (UNDESA), born in 2004 and now adopted by Kenya, Nigeria, South Africa, Cameroon, etc. Architecture for Knowledge-Oriented Management of African Normative Texts using Open Standards and Ontologies. Describing structures for legislative documents in XML Referencing documents within and across countries using URIs Adding systematic metadata to documents using ontologically sound approaches based on OWL, FRBR, etc. for describing and managing legislative documents and Parliamentary workflow documentation needs in Africa Easy to implement, easy to understand, easy to use, yet complete, precise and reliable

Next: Computer support for parliamentary activities6/34 CEN Metalex CEN-sponsored initiative for an XML-based interchange format for European-wide legislative systems. Born in Still ongoing Output for ongoing European projects Not an actual format, rather a meta-format allowing for individual formats to recognize each other Basic ideas: to identify similar structures through roles rather than vocabulary: an article is an article regardless of how it is called. Naming, workflow, references are also managed to support functionality without giving up generality

Next: Standard Applications, Architectures or Formats?7/34 Computer support for parliamentary activities Support for documents’ generation Drafting activities, record keeping, translation into national languages, etc. Support for workflow Management of documents across lifecycle, storage, security, timely involvement of relevant individuals and offices Support for citizens’ access Multi-channel publication (on paper and on the web), search, classification, identification Further activities Consolidation, version comparison, language synchronization, etc.

Next: HTML, PDF8/34 Standard Applications, Architectures or Formats? Applications rely on concrete technologies (e.g., programming languages, operating systems, programming libraries, etc.) and provide actual support for users' processes and experience. Architectures describe processes and actors and roles, and describe the characteristics of the tools that support them. Data formats describe the kind of information that is exchanged by tools and that is kept over time. Standardizing applications forces common architectures and data formats, but also forces uniformity in users' processes and experience, and is the most fragile to technological advances. Standardizing architectures is less fragile, but forces uniformity in processes and experience Standardizing formats first provides solutions that are not dependent on technological advances, and fosters the further generation of architectural and applicative standards as a result, rather than as a prerequisite.

Next: XML9/34 HTML, PDF Just a publishing medium, HTML helped make the Web a big success, but it was constraining by its own simplicity Excessive reliance on typographic rather than semantic description Few rules not even strongly imposed PDF is a commercial, opaque data format aimed at guaranteeing visual aspect of documents Appropriate when the important characteristic to be maintained is the visual aspect No support for structure, homogeneity, semantic awareness A different format is appropriate that provides Clear differentiation between visual aspect and actual meaning Strong syntactic rules heavily imposed to guarantee uniformity, homogeneity, sophisticated applications

Next: Parliamentary documents and XML10/34 XML XML (Extensible Markup Language) is a W3C standard of incredibly widespread diffusion. XML is pure syntax, without pre-defined semantics. This allows document designers to provide their own semantics. Thanks to the associated languages (DTD, XSLT, RDF) we can create sophisticated applications with big flexibility in uses. XML allows to create markup languages that are readable, generic, structured, hierarchical.

Next: Why is XML good? 11/34 Parliamentary documents and XML XML is ideal for representing parliamentary documents (and especially bills and acts): They have a well-defined structure, which is systematic and standardized There are required and optional parts according to rules and tradition There are containment constraints that determine the global correctness of the document There are references to other texts (schedules, other acts, etc.) that can fruitfully be used to create a hypertext network.

Next: What to look for12/34 Conversion is difficult Conversion is very easy Energy / Information Why is XML good? Conversion is very easy

Next: Approaches13/34 What to look for Simple, standard-based data formats to facilitate usage and understanding. relying on all the relevant W3C and ISO standards. Long term feasibility and evolution (backward and forward) To support documents being drafted now as well as those already drafted and enacted a long time ago. to support useful lifespan of the system and the documents in the tens and possibly hundreds of years. Self explaining formats Documents need to be able to provide all information for their use and meaning through a simple examination, even without the aid of specialized software. Tools need to be created with ease to provide automatic and semi- automatic aid to data markup and document description. Manual markup or fine tuning still a possible option for exceptions.

Next: Understanding the data/metadata dichotomy (1)14/34 Approaches Extensibility It must be possible to allow local customizations of the data model It must be possible to extend the reach of the language towards more countries, more document types, larger vocabularies of fragment qualification Format-induced homogeneity Documents produced by different tools and individuals need to be, as much as possible, identical Documents produced by hand and by tools need to be, as much as possible, identical Multiple uses Display on PC Screen, display on cell phone, display on Braille terminal, print on paper, print on paper with a different paper size, cataloguing, searching, workflow management (during drafting and active lifecycle), automatic consolidation, textual analysis, semantic analysis, provision analysis, cross-country comparison, synchronized translation, etc.

Next: Understanding the data/metadata dichotomy (2)15/34 Understanding the data/metadata dichotomy (1) Data the actual content (text, structure, images, schemas) as was exactly provided by the author of the document Metadata Any consideration or comment or additional information that can be expressed on the content and on the document. Metadata is generated either by human intervention, or through automated processes. Ontology (in short) A formalized representation of the conceptual model that shapes all metadata associated to a document.

Next: Different views on the idea of document (1)16/34 Understanding the data/metadata dichotomy (2) Authors’ contribution: data The words and punctuation and breaks, exactly as have been written and accepted by the original author (in the case of legislation, the legislative body) Editors’ contribution: metadata Publication data. Lifecycle information. Footnotes. Analysis of provisions. Metadata is useless unless it is provided following a precise conceptual model, called ontology. In a way, editors are the authors of the metadata Put it in another way, metadata is information about a document that was not provided by its authors.

Next: Different views on the idea of document (2)17/34 Different views on the idea of document (1) Different concepts Italian Act 137/2004 The current consolidated version of the Italian Act 137/2004 An XML representation of the current consolidated version of the Italian Act 137/2004 The file stored as “act xml” stored in a specific folder of my computer Different properties What is the name of the document? Who is the author of the document? What is the creation date of the document? The IFLA FRBR hierarchy… Work: a distinct intellectual creation. Expression: the specific form in which a work is realized Manifestation: the representation of an expression according to the requirements of a medium Item: a single exemplar (an instance) of a manifestation … provides different answers E.g.: a different name for each level E.g.: the legislator, the editor, the publisher, the data provider E.g.: the enactment date, the consolidation date, the generation date, the copy date

Next: Content, structure and presentation (1)18/34 Different views on the idea of document (2) Different processes. E.g.: A repeal is really a process on the work An amendment is a process on an expression generating a new one The markup is a process on an expression generating a manifestation The copy is a process on an item generating another item. Different peculiarities A work has no content. The content of an expression is a set of words and drawings. The content of a manifestation is computer data. Works are eternal and created by Authors. Expressions are stable and created either by Authors or by Editors with domain expertise (consider amendment acts that do not specify the resulting consolidated text). Manifestations are created by computer tools used by secretaries or low level operatives.

Next: Content, structure and presentation (2)19/34 Content, structure and presentation (1) Content What exactly was written in the document. Structure How the content is organized Presentation The typographical choices to present a document on screen or on paper.

Next: Descriptive vs. prescriptive approach20/34 Content, structure and presentation (2) The structure adds meaning to pieces of content. The words “Initial definitions” assumes meaning once we know it is the title of section #1 of the Italian Act 137/2004 The structure connects the presentation to the content Once we know that the text “Initial definitions” is the heading of a section, we can apply the typographical choices associated to section headings. The structure can be used to test and validate the correctness of a document We can deduce that a document is incorrect if there is no heading associated to a section.

Next: Metadata (and ontologies) (1)21/34 Descriptive vs. prescriptive approach Descriptive schemas: a very loose set of constraints providing a full vocabulary of elements and little or no check on their presence and order. They are meant to: Describe a set of documents with allowable many exceptions to the basic rule. Describe an existing (and thus non-modifiable) set of documents Describe a set of documents created by a higher authority than the XML coder. Prescriptive schemas: a more restricted set of constraints providing the same full vocabulary plus tight checks on presence and order. They are meant to: Impose adherence to drafting guidelines, and reject uncompliant documents Impose homogeneity on the work of multiple different authors Allow applications to expect certain characteristic of the documents to be present Akoma Ntoso, for instance, provides a two-tiered level of documents allowing the full potentiality of both to be expressed

Next: Matadata issues22/34 Metadata (and ontologies) (1) Documents’ content does not include all that is interesting about them. A metadata schema is necessary to associate to documents all data that is not in the content of a document Some metadata schema are flat, i.e., metadata are simply text values referring to the document; e.g.: Dublin Core, Marc 21, etc. This prevents tools to differentiate between the different ideas of document, identify more precisely classes of concepts associated to documents, such as actors (persons and organizations), events, provisions, places, terms, etc. An ontology expressed using Semantic Web concepts and languages (e.g., OWL and/or Topic Maps) offers all advantages of metadata schemas, plus allows to: associate appropriate properties to different ideas of documents (e.g., author, creation date, title, etc.) Make assertions about abstract concepts rather than plain strings

Next: Metadata terminology23/34 Matadata issues Authorship of metadata The generation of metadata is itself an authoring process and needs to be controlled, dated, signed, clearly identified. Versioning of metadata Metadata may change in time, and actually more often than the document content. How to deal with changing of it? Relationships between metadata and IFLA FRBR document levels All metadata refer to one and not the other idea of documents. We need to make sure that these associations are not ambiguous and agreed upon. Location of metadata: internals or externals? Internal location guarantees co-maintainance of content and metadata, but makes it difficult to allow for multiple views of the same content External location allows multiple metadata sets to coexist on the same document, but complicates correct association of data and metadata

Next: Workflow management24/34 Metadata terminology Objective A piece of information for which no reasonable doubt can exist E.g. the title of article 15, the publication date Subjective A piece of information that requires an active interpretation from a human that may be wrong, or for which different opinions exist E.g., resolution of implicit citations, classification of provisions Low competence the kind of competence one may expect from a non-specialized employee, such as a secretary, armed with just common sense and some topical experience E.g.: where does article 1 end and article 2 start High competence A piece of information whose determination requires the kind of competence one may expect from specialized jurists that come to their results after careful and painful reasoning e.g.: dates and times in norms.

Next: Consolidation and side-by-side comparison 25/34 Workflow management An important bit of metadata sophistication is the support for workflow Explicit management of document evolution Identification of sources of authority (e.g., legislative bodies), sources of changes (e.g., amending acts), time of changes (time of acts is an extremely complex discipline) Reliable identification of actors and content (through digital signature)

Next: Naming documents and fragments26/34 Consolidation and side-by-side comparison Only possible when structure, content and presentation of documents are explicitly separated Traditional approaches are labour-intensive, manual, requiring both legislative and typographic competences Explicit recording of structure and independences from presentation allows: Consolidation as a semi-automatic process based on explicit structural references in amendments and modification laws Side-by-side comparison as a fully-automatic process based on a different presentation patterns of the differences between an original and a modified text.

Next: Naming documents and fragments (2)27/34 Naming documents and fragments Universal Resource Identifiers These are used throughout the World Wide Web to indicate resources. The best known are the URL (Universal Resource Locators) that are used to navigate on the web 

Next: Naming documents and fragments (3)28/34 Naming documents and fragments (2) With legislative documents, the situation is more complex. Works, expressions and manifestations are not physical resources, but abstract entities. Only items are physical resources. Yet, references are rarely (or never) to items. So works, expressions and manifestations must have their own URI, This URI will not be a URL (i.e., it will not correspond to a physical address on a computer) The act of finding out what is the URL of the item that best represents the manifestation that we are looking for is called URI resolution.

Next: The basic features of a good national standard29/34 Naming documents and fragments (3) Naming schema must guarantee a few properties: Complete: all relevant documents (in all their levels) must be contemplated Global: all legislative bodies (ideally even across countries) must be able to use and clearly identify their documents. Meaningful: names need to mean something.  Make assumption about the kind, freshness and relevance of a citation by looking only at the reference’s name Memorizable: names need to be easy to jot down, easy to remember, easy to correct if something was written down wrongly. Guessable: given a reference to act 136/2005, it should be easy to deduce what is the form for act 76/2006, etc.

Next: Why bother?30/34 The basic features of a good national standard Compatibility with CEN Metalex Systematically use W3C standards (esp. XML, XML Schema, Namespace, semantic web languuages, etc.) Separate: Structure Normative content Presentation Metadata Strong naming policies (a future extension of CEN Metalex will provide guidelines) Allow for exceptions, extensions and customization

Next: Inventing, adopting, or… ?31/34 Why bother? An open standard for data format allows for easier, more cost- effective distribution of legislative content An open standard for data format allows for long-term preservation of investments and supports ease of maintenance An open standard for data format allows for a thriving competing market of tools An open standard for data format allows integration of authoritative content providers and added-value content providers (esp. Private publishers and academics) An open standard for data format allows comparative studies to be performed with greater ease

Next: Conclusions (1)32/34 Inventing, adopting, or… ? As long as fundamental compatibility is maintained In terms of basic structures (CEN Metalex) Naming policies (URI-based) It is not relevant that you adopt existing standards… E.g. Akoma Ntoso … or invent your own national new one But do behave fairly, and allow for international interoperability.

Next: Conclusions (2)33/34 Conclusions (1) A successful system is built on three key factors: Precise and sophisticated content structure Complete metadata model (with precise time-awareness) Sophisticated and easy to use naming mechanism NormeInRete, Akoma Ntoso and (increasingly) CEN Metalex share these properties. Also it is important to remember that we are discovering new interesting ways to store and use information in this very moment. So casting in stone design decisions that prevent future evolution of document formats, tools architecture and overall functionalities is wrong and doomed.

Fine presentazione34/34 Conclusions (2) Adopting an international standard (e.g. Akoma Ntoso) is a first step in the right direction Open to local customization, yet international Allows immediate adoption of existing architectures and tools, yet allows for local developments and extensions Sharing knowledge and experiences with colleagues from other countries increases the chance of success of local initiatives Chances for training and capacity building exist Cfr: Summer school on Legislative Informatics in Florence (September 2007, June 2008)… … but also local initiatives specific to regional and national needs (e.g. African legislative school, Kenya, January 2008)