Presentation is loading. Please wait.

Presentation is loading. Please wait.

Quality Control of Language Resources at ELRA Henk van den Heuvel a, Khalid Choukri b, Harald Höge c, Bente Maegaard d, Jan Odijk e, Valerie Mapelli b.

Similar presentations


Presentation on theme: "Quality Control of Language Resources at ELRA Henk van den Heuvel a, Khalid Choukri b, Harald Höge c, Bente Maegaard d, Jan Odijk e, Valerie Mapelli b."— Presentation transcript:

1 Quality Control of Language Resources at ELRA Henk van den Heuvel a, Khalid Choukri b, Harald Höge c, Bente Maegaard d, Jan Odijk e, Valerie Mapelli b a SPEX, Nijmegen, the Netherlands; b ELRA/ELDA, Paris, France; c SIEMENS AG, Munich, Germany; d CST, Copenhagen, Denmark; e ScanSoft Belgium & Utrecht University, the Netherlands OUTLINE The European Language Resources Association (ELRA) has installed a Validation Committee (VCOM) to promote quality control of its language resources (LR). This poster addresses: 1. Organisation of the VCOM 2. Validation 3. Standards 4. Bug Reports & Patches 5. Dissemination of Results 6. Future Work 1. Organisation of the VCOM Tasks VCOM: - Define / supervise tasks of operational units - Define validation criteria - Exploit bug reports - Disseminate info via the web - Report to the board of ELRA Operational units: - Validation centres (SPEX, CST) - ELDA Tasks validation centres: - Produce validation manuals - Promote standards and best practices - Describe the quality of existing LR - Improve the quality of existing LR - Maintain the LR validation portals Tasks ELDA: - Communicate with users and producers of LR - Maintain the ELRA web pages 2. Validation Validation = quality check of a LR against its specifications Checks include formal and content evaluation of: - Documentation - Formats - Design and completeness - Speech files - Lexicon - Speakers - Recording environments - Orthographical transcriptions A full validation is time-consuming and costly. Therefore, the VCOM introduced a Quick Quality Check (QQC). A QQC should take about 5 hours Original approach to QQC: check database content against documentation.  Paradox 1: no criteria for documentation itself  Paradox 2: missing documentation on a topic hinders proper validation New approach to QQC: check database content against (objective) minimal quality requirements as defined by the VCOM. QQC procedures are now available for speech databases and (phonetic) lexicons. At present 64 (spoken) LR in ELRA’s catalogue have been validated and 12 others have had a QQC. 3. Standards The VCOM promotes standards both for production and validation of LR. Adherence to standards during production facilitates validation as well. Adherence to standards contributes to LR that have better quality and are easier to use. Starting point for promoting standards is the collection of best practices and guidelines developed in successful projects. 4. Bug Reports & Patches LR users are important to detect remaining errors in a LR. Therefore, VCOM launched a bug report service on ELRA’s web pages. Verified bugs are collected in Formal Error List for each LR. These lists can be inspected via the web. A procedure was developed to correct bugs and release patches. At regular times the best bug report is selected and awarded with an attractive prize (PDA, digital camera, etc.) 5. Dissemination At ELRA’s website click “Services around LRs” > “Validation”. Structure of ELRA’s pages on VCOM work: Public pages of validation centres contain: - Bug report forms - Formal Error Lists - QQC reports Further, ELRA’s newsletter is used to promote the validation activities of VCOM. 6. Plans Spoken LRs (SPEX): - More QQCs for new and existing LRs in ELRA’s catalogue Written LR (CST): - Edit validation manual (draft now exists!) - Test validation procedure - Install bug report service CONCLUSION We have presented ELRA’s VCOM and its activities. Only with the joint effort of users and providers can ELRA improve the quality of its LRs.


Download ppt "Quality Control of Language Resources at ELRA Henk van den Heuvel a, Khalid Choukri b, Harald Höge c, Bente Maegaard d, Jan Odijk e, Valerie Mapelli b."

Similar presentations


Ads by Google