Presentation is loading. Please wait.

Presentation is loading. Please wait.

Catia Cucchiarini, Walter Daelemans and Helmer Strik Strengthening the Dutch Language and Speech Technology Infrastructure Catia Cucchiarini, Walter Daelemans.

Similar presentations


Presentation on theme: "Catia Cucchiarini, Walter Daelemans and Helmer Strik Strengthening the Dutch Language and Speech Technology Infrastructure Catia Cucchiarini, Walter Daelemans."— Presentation transcript:

1 Catia Cucchiarini, Walter Daelemans and Helmer Strik Strengthening the Dutch Language and Speech Technology Infrastructure Catia Cucchiarini, Walter Daelemans and Helmer Strik Strengthening the Dutch Language and Speech Technology Infrastructure

2 Dutch HLT Platform: aim to contribute to the further development of an adequate language and speech technology infrastructure for Dutch

3 Dutch Language Union (NTU)  Intergovernmental organisation based on Language Union Treaty between the Netherlands and Belgium  Mission: fostering integration between the Netherlands and Flanders in the field of language and literature  Policy: Dutch and Flemish ministers of culture and education  NTU already active in HLT:  Spoken Dutch Corpus: NTU  owner, responsible for exploitation  NL-Translex: NTU  coordinator, owner

4 Other participants  the Ministry of the Flemish Community  the Flemish Institute for the Promotion of Scientific-technological Research in Industry  the Fund for Scientific Research – Flanders  the Dutch Ministry of Education, Culture and Sciences  the Dutch Ministry of Economic Affairs  the Netherlands Organisation for Scientific Research  Senter (an agency of the Dutch Ministry of Economic Affairs)

5 Objectives  strengthening the position of Dutch in HLT  establishing the proper conditions for a successful management and maintenance of basic HLT resources developed through governmental funding  stimulating co-operation between academia and industry in the field of HLT  contributing to the realisation of European co- operation in HLT-relevant areas  establishing a network that brings together demand and supply of knowledge, products and services

6 Action line A ‘broking and linking’ function encouraging co-operation between industry, academia and policy institutions raise awareness and give publicity to the results of HLT research

7 Action line B strengthening the digital language infrastructure defining the so-called BLARK (Basic LAnguage Resources Kit) for Dutch carrying out a survey to determine what is needed to complete this BLARK and what costs are associated with the development of the material needed drawing up a priority list with cost estimates which can serve as a policy guideline

8 Action line C working out standards and evaluation criteria drawing up a set of standards and criteria for the evaluation of the basic materials contained in the BLARK and for the assessment of project results.

9 Action line D management, maintenance and distribution plan defining a blueprint for management (including intellectual property rights), maintenance, and distribution of HLT resources

10 Action lines B and C Steering committee to draw up plan of activities to develop initial survey framework to define BLARK to supervise survey Field researchers to refine framework to conduct survey to write report

11 Survey instruments Applications: classes of applications rather than specific applications or products. Modules (or semi-products): the basic software components of HLT applications. Data: sets of language data and descriptions in machine readable form, to be used in building, improving or evaluating natural language and speech processing systems.

12 BLARK language technology Modules –Robust modular text preprocessing –Morphological analysis and morphosyntactic disambiguation / unknown words –Robust syntactic analysis –Aspects of semantic analysis (word meaning and reference) Data –Monolingual lexicon –Annotated corpus of written Dutch –Benchmarks for evaluation

13 BLARK speech technology Modules –Automatic speech recognition (module) –Speech synthesis system (module) –Tools for annotation of speech corpora –Confidence measures and utterance verification –Identification (speaker, language, dialect) –Evaluation of speech technology tools and applications Data –Monolingual speech corpora for specific applications –Multilingual speech corpora –Multimodal/medial speech corpora –Richly annotated speech corpora –Pronunciation lexicons

14 Modules-applications: LT

15 Modules-applications: ST

16 Data-modules: LT

17 Data-modules: ST

18 Further survey instruments Table containing information on availability of modules and data

19 Survey results Preliminary priority list that will be submitted to the whole HLT field Comments from the HLT field on priority list Final priority list


Download ppt "Catia Cucchiarini, Walter Daelemans and Helmer Strik Strengthening the Dutch Language and Speech Technology Infrastructure Catia Cucchiarini, Walter Daelemans."

Similar presentations


Ads by Google