Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010 Automatic Eurovoc indexing of parliamentary documentation Live demostration Victoria Fernández.

Similar presentations


Presentation on theme: "Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010 Automatic Eurovoc indexing of parliamentary documentation Live demostration Victoria Fernández."— Presentation transcript:

1 Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010 Automatic Eurovoc indexing of parliamentary documentation Live demostration Victoria Fernández Mera Congreso de los Diputados victoria.fernandez@congreso.es http://www.congreso.es

2 JRC tool was retrained on more than 80.000 parliamentary Spanish texts (short abstracts, manually indexed with 3 and 3.1 Eurovoc versions). 5th June 2005, the European Community and the Congress of Deputies signed a Software License Agreement to grant a free of charge licence on the software. It has been the main indexing tool since November 2005. Available from any computer with a web browser inside the Congress of Deputies. Login and an associated password to access. Joint Research Centre automatic indexing software at the Congress of Deputies Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010

3 How does the system work? Web interface USER Argo Database Gets texts Stores indexation Simple computer with a Web browser Server (Linux fedora) With: - Perl installed V5.8.5 - Oracle Client 9.0.1 - Apache server 2.0.55 eic server nogal.congreso.es From Bruno Pouliquen. Technical documentation, overview of the tool. Global architecture and requirements. (Information brochure unpublished). 17 p.

4 ORACLE database The information is organized on text, numerical and data fields Gathers information on any and all written communications submitted to the Congress of Deputies. Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010 Congress of Deputies Parliamentary activities information system Argo

5 Types and numerical codes of parliamentary texts Legislative initiatives : Governments bills (121) Private Members´ bills (122, 123, 124,124) Decree-laws (130) International Treaties (110,111,112) Control of the Executive: Granting and withdrawal of confidence: Investiture of the Government (80) Censure motions (82) Question of confidence (81) Checking on the Government´s performance Interpelations and motions (161,162,170,171,172,173) Oral and written questions (180,181,184) Attendances: Members of Government (210, 213, 214) Other Authorities (212, 219) Government communications, programmes, plans and other reports Nominations and appointments of high-ranking officials to certain State bodies Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010

6 Main indexing language since 1987 Eurovoc official edition Spanish geographical application Short abstracts or titles are indexed Descriptors are only assigned to the one document that start the procedure in the House Average number of three descriptors assigned Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010 Eurovoc at the Argo database

7 Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010 Welcome page

8 Clicking on Index a Congreso text, we will be ready to index Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010

9 Document indexing page (to tap the numerical code of the texts to index )

10 The system always displays all the texts that have not been indexed yet Clicking on the box ready to index, we will go to the validation interface. Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010 Indexation interface

11 Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010 Validation interface It displays a ranked list of 30 descriptors. The descriptors assigned are ranked by their score. Ticking the corresponding box to choose the good descriptors Clicking on the link below Id, the browser shows all the thesaurus relations descriptor.

12 Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010 Thesaurus relations descriptor

13 Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010 Look for new descriptors (in the box Add a new descriptor tap a Eurovoc descriptor code, if known, or a plain text)

14 Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010 Look for new descriptors ( The box search for …. in Eurovoc allows to look for new descriptors and look through the thesaurus on line)

15 Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010 To display geographical descriptors click on the button show INE

16 Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010 Clicking on some additional administrative tools here a new interface performs several funtions

17 Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010 Clicking on Add documents, the system is ready to plan text indexation

18 Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010 Planning indexation (this interface resumes the codes to be indexed)

19 Conclusions The software is able to assign keywords from a controlled language It performs a high average of correct descriptors among the 10 first assigned It is able to retrain continuously the assignment of new descriptors It is a reliable system It gives a list of Eurovoc descriptors, which have to be validated by the human indexers. So, we can define it as a good automatic assignment tool to help and support indexers work. victoria.fernandez@congreso.es Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010


Download ppt "Mind the lexical gap- Eurovoc Luxembourg, 18-19 November 2010 Automatic Eurovoc indexing of parliamentary documentation Live demostration Victoria Fernández."

Similar presentations


Ads by Google