Managing legislative information in Parliaments: new frontiers Prof. Fabio Vitali Department of Computer Science University of Bologna.

Managing legislative information in Parliaments: new frontiers Prof. Fabio Vitali Department of Computer Science University of Bologna

Next: Summary2/34 Purpose of this talk To assert that parliamentary processes and citizens’ access to parliamentary records and documents can be improved by: Adopting the best technologies for document management (mainly, XML and related standards) Adopting standard formats for naming and electronic representation of documents, possibly a common, multi- lingual, multi-national standard. Fostering the creation and adoption of many different software tools to be made available to support these standards.

Next: Norme In Rete3/34 Summary My background Computer support for parliamentary activities Functionalities Advantages Key discussion points Data/metadata Different views of the idea of document Content, structure and presentation Metadata and ontologies Naming mechanism

Next: Akoma Ntoso4/34 Norme In Rete Norms on the Net Italian-wide initiative sponsored by the Ministry of Justice (1999 - present) to develop An XML-based data format for national, regional and local norms A naming schema to identify all relevant documents, both available and unavailable, both existing and potential A distributed, federated architecture allowing for multiple storage centers with overlapping competencies, official and not official, unified by a single search engine National standard, adopted by a large number of institutions both at the national and local level. Large font of inspiration for LexML (Brazil)

Next: CEN Metalex5/34 Akoma Ntoso Sponsored by the UN Department of Economic and Social Affairs (UNDESA), born in 2004 and now adopted by Kenya, Nigeria, South Africa, Cameroon, etc. Architecture for Knowledge-Oriented Management of African Normative Texts using Open Standards and Ontologies. Describing structures for legislative documents in XML Referencing documents within and across countries using URIs Adding systematic metadata to documents using ontologically sound approaches based on OWL, FRBR, etc. for describing and managing legislative documents and Parliamentary workflow documentation needs in Africa Easy to implement, easy to understand, easy to use, yet complete, precise and reliable

Next: Computer support for parliamentary activities6/34 CEN Metalex CEN-sponsored initiative for an XML-based interchange format for European-wide legislative systems. Born in 2006. Still ongoing Output for ongoing European projects Not an actual format, rather a meta-format allowing for individual formats to recognize each other Basic ideas: to identify similar structures through roles rather than vocabulary: an article is an article regardless of how it is called. Naming, workflow, references are also managed to support functionality without giving up generality

Next: Standard Applications, Architectures or Formats?7/34 Computer support for parliamentary activities Support for documents’ generation Drafting activities, record keeping, translation into national languages, etc. Support for workflow Management of documents across lifecycle, storage, security, timely involvement of relevant individuals and offices Support for citizens’ access Multi-channel publication (on paper and on the web), search, classification, identification Further activities Consolidation, version comparison, language synchronization, etc.

Next: HTML, PDF8/34 Standard Applications, Architectures or Formats? Applications rely on concrete technologies (e.g., programming languages, operating systems, programming libraries, etc.) and provide actual support for users' processes and experience. Architectures describe processes and actors and roles, and describe the characteristics of the tools that support them. Data formats describe the kind of information that is exchanged by tools and that is kept over time. Standardizing applications forces common architectures and data formats, but also forces uniformity in users' processes and experience, and is the most fragile to technological advances. Standardizing architectures is less fragile, but forces uniformity in processes and experience Standardizing formats first provides solutions that are not dependent on technological advances, and fosters the further generation of architectural and applicative standards as a result, rather than as a prerequisite.

Next: XML9/34 HTML, PDF Just a publishing medium, HTML helped make the Web a big success, but it was constraining by its own simplicity Excessive reliance on typographic rather than semantic description Few rules not even strongly imposed PDF is a commercial, opaque data format aimed at guaranteeing visual aspect of documents Appropriate when the important characteristic to be maintained is the visual aspect No support for structure, homogeneity, semantic awareness A different format is appropriate that provides Clear differentiation between visual aspect and actual meaning Strong syntactic rules heavily imposed to guarantee uniformity, homogeneity, sophisticated applications

Next: Parliamentary documents and XML10/34 XML XML (Extensible Markup Language) is a W3C standard of incredibly widespread diffusion. XML is pure syntax, without pre-defined semantics. This allows document designers to provide their own semantics. Thanks to the associated languages (DTD, XSLT, RDF) we can create sophisticated applications with big flexibility in uses. XML allows to create markup languages that are readable, generic, structured, hierarchical.

Next: Why is XML good? 11/34 Parliamentary documents and XML XML is ideal for representing parliamentary documents (and especially bills and acts): They have a well-defined structure, which is systematic and standardized There are required and optional parts according to rules and tradition There are containment constraints that determine the global correctness of the document There are references to other texts (schedules, other acts, etc.) that can fruitfully be used to create a hypertext network.

Next: What to look for12/34 Conversion is difficult Conversion is very easy Energy / Information Why is XML good? Conversion is very easy

Next: Approaches13/34 What to look for Simple, standard-based data formats to facilitate usage and understanding. relying on all the relevant W3C and ISO standards. Long term feasibility and evolution (backward and forward) To support documents being drafted now as well as those already drafted and enacted a long time ago. to support useful lifespan of the system and the documents in the tens and possibly hundreds of years. Self explaining formats Documents need to be able to provide all information for their use and meaning through a simple examination, even without the aid of specialized software. Tools need to be created with ease to provide automatic and semi- automatic aid to data markup and document description. Manual markup or fine tuning still a possible option for exceptions.

Next: Understanding the data/metadata dichotomy (1)14/34 Approaches Extensibility It must be possible to allow local customizations of the data model It must be possible to extend the reach of the language towards more countries, more document types, larger vocabularies of fragment qualification Format-induced homogeneity Documents produced by different tools and individuals need to be, as much as possible, identical Documents produced by hand and by tools need to be, as much as possible, identical Multiple uses Display on PC Screen, display on cell phone, display on Braille terminal, print on paper, print on paper with a different paper size, cataloguing, searching, workflow management (during drafting and active lifecycle), automatic consolidation, textual analysis, semantic analysis, provision analysis, cross-country comparison, synchronized translation, etc.

Next: Understanding the data/metadata dichotomy (2)15/34 Understanding the data/metadata dichotomy (1) Data the actual content (text, structure, images, schemas) as was exactly provided by the author of the document Metadata Any consideration or comment or additional information that can be expressed on the content and on the document. Metadata is generated either by human intervention, or through automated processes. Ontology (in short) A formalized representation of the conceptual model that shapes all metadata associated to a document.

Next: Different views on the idea of document (1)16/34 Understanding the data/metadata dichotomy (2) Authors’ contribution: data The words and punctuation and breaks, exactly as have been written and accepted by the original author (in the case of legislation, the legislative body) Editors’ contribution: metadata Publication data. Lifecycle information. Footnotes. Analysis of provisions. Metadata is useless unless it is provided following a precise conceptual model, called ontology. In a way, editors are the authors of the metadata Put it in another way, metadata is information about a document that was not provided by its authors.

Next: Different views on the idea of document (2)17/34 Different views on the idea of document (1) Different concepts Italian Act 137/2004 The current consolidated version of the Italian Act 137/2004 An XML representation of the current consolidated version of the Italian Act 137/2004 The file stored as “act137- 2004.xml” stored in a specific folder of my computer Different properties What is the name of the document? Who is the author of the document? What is the creation date of the document? The IFLA FRBR hierarchy… Work: a distinct intellectual creation. Expression: the specific form in which a work is realized Manifestation: the representation of an expression according to the requirements of a medium Item: a single exemplar (an instance) of a manifestation … provides different answers E.g.: a different name for each level E.g.: the legislator, the editor, the publisher, the data provider E.g.: the enactment date, the consolidation date, the generation date, the copy date

Next: Content, structure and presentation (1)18/34 Different views on the idea of document (2) Different processes. E.g.: A repeal is really a process on the work An amendment is a process on an expression generating a new one The markup is a process on an expression generating a manifestation The copy is a process on an item generating another item. Different peculiarities A work has no content. The content of an expression is a set of words and drawings. The content of a manifestation is computer data. Works are eternal and created by Authors. Expressions are stable and created either by Authors or by Editors with domain expertise (consider amendment acts that do not specify the resulting consolidated text). Manifestations are created by computer tools used by secretaries or low level operatives.

Next: Content, structure and presentation (2)19/34 Content, structure and presentation (1) Content What exactly was written in the document. Structure How the content is organized Presentation The typographical choices to present a document on screen or on paper.

Next: Descriptive vs. prescriptive approach20/34 Content, structure and presentation (2) The structure adds meaning to pieces of content. The words “Initial definitions” assumes meaning once we know it is the title of section #1 of the Italian Act 137/2004 The structure connects the presentation to the content Once we know that the text “Initial definitions” is the heading of a section, we can apply the typographical choices associated to section headings. The structure can be used to test and validate the correctness of a document We can deduce that a document is incorrect if there is no heading associated to a section.

Next: Metadata (and ontologies) (1)21/34 Descriptive vs. prescriptive approach Descriptive schemas: a very loose set of constraints providing a full vocabulary of elements and little or no check on their presence and order. They are meant to: Describe a set of documents with allowable many exceptions to the basic rule. Describe an existing (and thus non-modifiable) set of documents Describe a set of documents created by a higher authority than the XML coder. Prescriptive schemas: a more restricted set of constraints providing the same full vocabulary plus tight checks on presence and order. They are meant to: Impose adherence to drafting guidelines, and reject uncompliant documents Impose homogeneity on the work of multiple different authors Allow applications to expect certain characteristic of the documents to be present Akoma Ntoso, for instance, provides a two-tiered level of documents allowing the full potentiality of both to be expressed

Next: Matadata issues22/34 Metadata (and ontologies) (1) Documents’ content does not include all that is interesting about them. A metadata schema is necessary to associate to documents all data that is not in the content of a document Some metadata schema are flat, i.e., metadata are simply text values referring to the document; e.g.: Dublin Core, Marc 21, etc. This prevents tools to differentiate between the different ideas of document, identify more precisely classes of concepts associated to documents, such as actors (persons and organizations), events, provisions, places, terms, etc. An ontology expressed using Semantic Web concepts and languages (e.g., OWL and/or Topic Maps) offers all advantages of metadata schemas, plus allows to: associate appropriate properties to different ideas of documents (e.g., author, creation date, title, etc.) Make assertions about abstract concepts rather than plain strings

Next: Metadata terminology23/34 Matadata issues Authorship of metadata The generation of metadata is itself an authoring process and needs to be controlled, dated, signed, clearly identified. Versioning of metadata Metadata may change in time, and actually more often than the document content. How to deal with changing of it? Relationships between metadata and IFLA FRBR document levels All metadata refer to one and not the other idea of documents. We need to make sure that these associations are not ambiguous and agreed upon. Location of metadata: internals or externals? Internal location guarantees co-maintainance of content and metadata, but makes it difficult to allow for multiple views of the same content External location allows multiple metadata sets to coexist on the same document, but complicates correct association of data and metadata

Next: Workflow management24/34 Metadata terminology Objective A piece of information for which no reasonable doubt can exist E.g. the title of article 15, the publication date Subjective A piece of information that requires an active interpretation from a human that may be wrong, or for which different opinions exist E.g., resolution of implicit citations, classification of provisions Low competence the kind of competence one may expect from a non-specialized employee, such as a secretary, armed with just common sense and some topical experience E.g.: where does article 1 end and article 2 start High competence A piece of information whose determination requires the kind of competence one may expect from specialized jurists that come to their results after careful and painful reasoning e.g.: dates and times in norms.

Next: Consolidation and side-by-side comparison 25/34 Workflow management An important bit of metadata sophistication is the support for workflow Explicit management of document evolution Identification of sources of authority (e.g., legislative bodies), sources of changes (e.g., amending acts), time of changes (time of acts is an extremely complex discipline) Reliable identification of actors and content (through digital signature)

Next: Naming documents and fragments26/34 Consolidation and side-by-side comparison Only possible when structure, content and presentation of documents are explicitly separated Traditional approaches are labour-intensive, manual, requiring both legislative and typographic competences Explicit recording of structure and independences from presentation allows: Consolidation as a semi-automatic process based on explicit structural references in amendments and modification laws Side-by-side comparison as a fully-automatic process based on a different presentation patterns of the differences between an original and a modified text.

Next: Naming documents and fragments (2)27/34 Naming documents and fragments Universal Resource Identifiers These are used throughout the World Wide Web to indicate resources. The best known are the URL (Universal Resource Locators) that are used to navigate on the web  http://www.akomantoso.org/09-examples.html

Next: Naming documents and fragments (3)28/34 Naming documents and fragments (2) With legislative documents, the situation is more complex. Works, expressions and manifestations are not physical resources, but abstract entities. Only items are physical resources. Yet, references are rarely (or never) to items. So works, expressions and manifestations must have their own URI, This URI will not be a URL (i.e., it will not correspond to a physical address on a computer) The act of finding out what is the URL of the item that best represents the manifestation that we are looking for is called URI resolution.

Next: The basic features of a good national standard29/34 Naming documents and fragments (3) Naming schema must guarantee a few properties: Complete: all relevant documents (in all their levels) must be contemplated Global: all legislative bodies (ideally even across countries) must be able to use and clearly identify their documents. Meaningful: names need to mean something.  Make assumption about the kind, freshness and relevance of a citation by looking only at the reference’s name Memorizable: names need to be easy to jot down, easy to remember, easy to correct if something was written down wrongly. Guessable: given a reference to act 136/2005, it should be easy to deduce what is the form for act 76/2006, etc.

Next: Why bother?30/34 The basic features of a good national standard Compatibility with CEN Metalex Systematically use W3C standards (esp. XML, XML Schema, Namespace, semantic web languuages, etc.) Separate: Structure Normative content Presentation Metadata Strong naming policies (a future extension of CEN Metalex will provide guidelines) Allow for exceptions, extensions and customization

Next: Inventing, adopting, or… ?31/34 Why bother? An open standard for data format allows for easier, more cost- effective distribution of legislative content An open standard for data format allows for long-term preservation of investments and supports ease of maintenance An open standard for data format allows for a thriving competing market of tools An open standard for data format allows integration of authoritative content providers and added-value content providers (esp. Private publishers and academics) An open standard for data format allows comparative studies to be performed with greater ease

Next: Conclusions (1)32/34 Inventing, adopting, or… ? As long as fundamental compatibility is maintained In terms of basic structures (CEN Metalex) Naming policies (URI-based) It is not relevant that you adopt existing standards… E.g. Akoma Ntoso … or invent your own national new one But do behave fairly, and allow for international interoperability.

Next: Conclusions (2)33/34 Conclusions (1) A successful system is built on three key factors: Precise and sophisticated content structure Complete metadata model (with precise time-awareness) Sophisticated and easy to use naming mechanism NormeInRete, Akoma Ntoso and (increasingly) CEN Metalex share these properties. Also it is important to remember that we are discovering new interesting ways to store and use information in this very moment. So casting in stone design decisions that prevent future evolution of document formats, tools architecture and overall functionalities is wrong and doomed.

Fine presentazione34/34 Conclusions (2) Adopting an international standard (e.g. Akoma Ntoso) is a first step in the right direction Open to local customization, yet international Allows immediate adoption of existing architectures and tools, yet allows for local developments and extensions Sharing knowledge and experiences with colleagues from other countries increases the chance of success of local initiatives Chances for training and capacity building exist Cfr: Summer school on Legislative Informatics in Florence (September 2007, June 2008)… … but also local initiatives specific to regional and national needs (e.g. African legislative school, Kenya, January 2008)

Managing legislative information in Parliaments: new frontiers Prof. Fabio Vitali Department of Computer Science University of Bologna.

Similar presentations

Presentation on theme: "Managing legislative information in Parliaments: new frontiers Prof. Fabio Vitali Department of Computer Science University of Bologna."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Managing legislative information in Parliaments: new frontiers Prof. Fabio Vitali Department of Computer Science University of Bologna.

Similar presentations

Presentation on theme: "Managing legislative information in Parliaments: new frontiers Prof. Fabio Vitali Department of Computer Science University of Bologna."— Presentation transcript:

Similar presentations

About project

Feedback