Quality Control of Language Resources at ELRA Henk van den Heuvel a, Khalid Choukri b, Harald Höge c, Bente Maegaard d, Jan Odijk e, Valerie Mapelli b.

Slides:



Advertisements
Similar presentations
European Masters Program in Language and Communication Technologies Free University.
Advertisements

Parma, 21st November 2003Minerva European Conference : Quality for cultural Web sites Quality Framework and Guidelines for Cultural Web Sites Isabelle.
Management of the SEAMCAT tool European Communications Office Jean-Philippe Kermoal (ECO) October 2010 EUROPEAN COMMUNICATIONS OFFICE Nansensgade 19 DK-1366.
Configuration management
Configuration management
1 CS2SPE- Group project presentation Haia Al-Majali Bojin Zhou Rania Ali Suraj Patel Fatima Tunc Victoria Casas Sam Diab 19 th March 2008.
FGDC & ISO: What is the Current Status and Considerations when Moving Forward? Viv Hutchison USGS Core Science Systems November 10, 2010 Salem, OR.
Lesson 6 Software and Hardware Interaction
YMIR is a Support Action funded by the European Union under the 7th Framework Programme YMIR IMPROVE.
Software Quality Assurance Inspection by Ross Simmerman Software developers follow a method of software quality assurance and try to eliminate bugs prior.
1 ESS Approach to Quality The revised Code of Practice and the new Quality Assurance Framework UN Statistical Commission's Learning Centre on National.
Validation and Distribution of Speech Corpora Henk van den Heuvel SPEX: Speech Processing Expertise Centre CLST: Centre for Language and Speech Technology.
Implementation. We we came from… Planning Analysis Design Implementation Identify Problem/Value. Feasibility Analysis. Project Management. Understand.
Introducing HTML & XHTML:. Goals  Understand hyperlinking  Understand how tags are formed and used.  Understand HTML as a markup language  Understand.
Incident Response Updated 03/20/2015
United Nations Economic Commission for Europe Statistical Division Applying the GSBPM to Business Register Management Steven Vale UNECE
The Future for BRC Global Standard Food Safety David Brackston Senior Technical Service Manager BRC.
CLARIN-NL First Call Jan Odijk CLARIN-NL Kick-off Meeting Utrecht, 27 May 2009.
AICT5 – eProject Project Planning for ICT. Process Centre receives Scenario Group Work Scenario on website in October Assessment Window Individual Work.
SWIS Digital Inspections Project (SWIS DIP) Chris Allen, Information Management Branch California Integrated Waste Management Board November 5, 2008 The.
Copyright © Jerzy R. Nawrocki Requirements Review Requirements Engineering & Project.
PrepTalk a Preprocessor for Talking book production Ted van der Togt, Dedicon, Amsterdam.
Antonella Fresa Amman, December 2006 The MINERVA Products Antonella Fresa Amman, December 2006 Ministerial NEtwoRk for Valorising Activising.
 To explain the importance of software configuration management (CM)  To describe key CM activities namely CM planning, change management, version management.
Cocosda 2001  ELRA/ELDA KC/1 Brief Overview of recent activities in Europe Khalid CHOUKRI ELRA/ELDA 55 Rue Brillat-Savarin, F Paris, France Tel.
Herbert Desel & Martin Ganzert1 R O S E T T A ENHANCING DATA QUALITY BY STANDARDISATION OF DATA ELECTRONIC EXCHANGE Herbert Desel 1 & Martin Ganzert 2.
The EQUASS Manual. Criteria One manual for all service providers. Paper & digital Aims: Understanding of the EQUASS POE, EQUASS assurance criteria and.
Language Resources College 11 th ECESS meeting 11th ECESS Meeting College Language Resources 0. Minutes making for College ‘Language Resources’ 1. Goal.
Configuration Management (CM)
LR College Paris 10 th ECESS meeting 10th ECESS Meeting College Language Resources Paris January Goal of meeting 2. Status members of College 3.
ICT Standards and Guidelines The Structure of the Project Akram Najjar CNSI – Senior Consultant Director of InfoConsult.
User Interface. Welcome to eValid! On the surface, eValid is just another Web browser. But, the test engine, load, performance or web mapping features.
Moving into Implementation SYSTEMS ANALYSIS AND DESIGN, 6 TH EDITION DENNIS, WIXOM, AND ROTH © 2015 JOHN WILEY & SONS. ALL RIGHTS RESERVED.Roberta M. Roth.
Slide 1 Construction (Testing) Chapter 15 Alan Dennis, Barbara Wixom, and David Tegarden John Wiley & Sons, Inc. Slides by Fred Niederman Edited by Solomon.
1 The Technical Standards and Your Bid Sarah Ormes UKOLN University of Bath Bath, BA2 7AY UKOLN is funded by Resource: The Council for Museums, Archives.
Current and Future Applications of the Generic Statistical Business Process Model at Statistics Canada Laurie Reedman and Claude Julien May 5, 2010.
For web 2.0.  Digital media files that is made available for download via web syndication.  It is a way to receive audio/video files over the internet.
 ELRA/ELDA EU Enlargement and Integration Workshop Arona, September 2005 Victoria Arranz 1 European Language Resources Association ELRA/ELDA: The Importance.
Workshops to support the implementation of the new languages syllabuses in Years 7-10.
Cooperation for Arabic Language Resources and Tools – The MEDAR Project Bente Maegaard, Mohamed Attia, Khalid Choukri, Olivier Hamon, Steven Krauwer, Mustafa.
CLEF 2009 Workshop Corfu, September 30, 2009  ELDA 1 Overview of QAST Question Answering on Speech Transcriptions - J. Turmo, P. R. Comas,TALP.
Using the Global Change Master Directory (GCMD) to Promote and Discover ESIP Data, Services, and Climate Visualizations Presented by GCMD Staff January.
Using Internet sites to inform potential beneficiaries & the public: the Greek experience Ivana Doulgerof CSF Management Organisation Unit Programming.
United Nations Economic Commission for Europe Statistical Division High-Level Group Achievements and Plans Steven Vale UNECE
Management of the SEAMCAT tool European Communications Office Jean-Philippe Kermoal (ECO) October 2010 EUROPEAN COMMUNICATIONS OFFICE Nansensgade 19 DK-1366.
SLR Validation: procedures and prospects Eric Sanders Henk van den Heuvel.
The Software Development Process
10 Aug 2010 ECE/BENG-493 SENIOR ADVANCED DESIGN PROJECT Meeting #2.
Chapter 12 Implementation and Maintenance
Copyright 2010, The World Bank Group. All Rights Reserved. Recommended Tabulations and Dissemination Section B.
A wiki is a collaborative web application which allows people to add and edit content using a browser… …it creates communities and empowers users as they.
The IODE Anniversary Bibliography: 50 years of activities Maria Kalenchits, Estonian Marine Institute, Estonia Pauline Simpson, Central Caribbean Marine.
II Course on GBIF Node Management Arusha, Tanzania 31 st October and 1 st November 2008 GBIF Training Materials and Future Plans Alberto GONZÁLEZ-TALAVÁN.
ESSnet project "Automated data collection and reporting in accommodation statistics" Objectives, achievements and results Köln,
GSICS GDWG Report, Toulouse, 9 February GSICS Data Working Group Status Report V. Gaertner CMA, EUMETSAT, JMA, KMA, NOAA/NESDIS, WMO.
By: Jamie Morgan  A wiki is a web page or collection of web pages which you and your students can access to contribute or modify content without having.
Accounting Review Summary and action list from the (pre)GDB Julia Andreeva CERN-IT WLCG MB 19th April
Chapter 6 : User interface design
End User Support – User Training
RCM Turbo SQL Version.
GLAST Release Manager Automated code compilation via the Release Manager Navid Golpayegani, GSFC/SSAI Overview The Release Manager is a program responsible.
Software Documentation
This year you will complete Unit 1 (ICT Skills for Business) and Unit 21 (Creating Computer Graphics). You will gain a OCR Level 2 National First Award.
My Oracle Support (The next generation Metalink experience) lynn
ESSnet project "Automated data collection and reporting in accommodation statistics"   Objectives, achievements and results
Making oral histories accessible
The Global Digital Library will increase the availability of high quality learning resources in underserved languages worldwide.
Welcome Traceability Software Integrators
Manual Water Ski Directory
Infrastructrural Language Resources and International Cooperation
Presentation transcript:

Quality Control of Language Resources at ELRA Henk van den Heuvel a, Khalid Choukri b, Harald Höge c, Bente Maegaard d, Jan Odijk e, Valerie Mapelli b a SPEX, Nijmegen, the Netherlands; b ELRA/ELDA, Paris, France; c SIEMENS AG, Munich, Germany; d CST, Copenhagen, Denmark; e ScanSoft Belgium & Utrecht University, the Netherlands OUTLINE The European Language Resources Association (ELRA) has installed a Validation Committee (VCOM) to promote quality control of its language resources (LR). This poster addresses: 1. Organisation of the VCOM 2. Validation 3. Standards 4. Bug Reports & Patches 5. Dissemination of Results 6. Future Work 1. Organisation of the VCOM Tasks VCOM: - Define / supervise tasks of operational units - Define validation criteria - Exploit bug reports - Disseminate info via the web - Report to the board of ELRA Operational units: - Validation centres (SPEX, CST) - ELDA Tasks validation centres: - Produce validation manuals - Promote standards and best practices - Describe the quality of existing LR - Improve the quality of existing LR - Maintain the LR validation portals Tasks ELDA: - Communicate with users and producers of LR - Maintain the ELRA web pages 2. Validation Validation = quality check of a LR against its specifications Checks include formal and content evaluation of: - Documentation - Formats - Design and completeness - Speech files - Lexicon - Speakers - Recording environments - Orthographical transcriptions A full validation is time-consuming and costly. Therefore, the VCOM introduced a Quick Quality Check (QQC). A QQC should take about 5 hours Original approach to QQC: check database content against documentation.  Paradox 1: no criteria for documentation itself  Paradox 2: missing documentation on a topic hinders proper validation New approach to QQC: check database content against (objective) minimal quality requirements as defined by the VCOM. QQC procedures are now available for speech databases and (phonetic) lexicons. At present 64 (spoken) LR in ELRA’s catalogue have been validated and 12 others have had a QQC. 3. Standards The VCOM promotes standards both for production and validation of LR. Adherence to standards during production facilitates validation as well. Adherence to standards contributes to LR that have better quality and are easier to use. Starting point for promoting standards is the collection of best practices and guidelines developed in successful projects. 4. Bug Reports & Patches LR users are important to detect remaining errors in a LR. Therefore, VCOM launched a bug report service on ELRA’s web pages. Verified bugs are collected in Formal Error List for each LR. These lists can be inspected via the web. A procedure was developed to correct bugs and release patches. At regular times the best bug report is selected and awarded with an attractive prize (PDA, digital camera, etc.) 5. Dissemination At ELRA’s website click “Services around LRs” > “Validation”. Structure of ELRA’s pages on VCOM work: Public pages of validation centres contain: - Bug report forms - Formal Error Lists - QQC reports Further, ELRA’s newsletter is used to promote the validation activities of VCOM. 6. Plans Spoken LRs (SPEX): - More QQCs for new and existing LRs in ELRA’s catalogue Written LR (CST): - Edit validation manual (draft now exists!) - Test validation procedure - Install bug report service CONCLUSION We have presented ELRA’s VCOM and its activities. Only with the joint effort of users and providers can ELRA improve the quality of its LRs.