Download presentation
Presentation is loading. Please wait.
Published byGarey Riley Modified over 9 years ago
1
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Why should we invest in DWF? Peter Wittenburg CLARIN Research Infrastructure EUDAT Data Infrastructure www.eudat.eu www.clarin.eu
2
Things that keep us busy I understanding language roots feature matrix extracted from many cross-disciplinary & cross-country resources phylogenetic algorithms to compute dependency trees can’t easily access required resources understanding language machine so many institutes creating brain image data do we know about them and their recording contexts? can we access them easily?
3
Things that keep us busy II automatic language processing speech and body movement (gesture, signing, mimics, etc.) recognition is hard no one stochastic recognizer will do there is so much technology out there worldwide and components from different disciplines do we know about them can we easily access them
4
In CLARIN we are so good developed a flexible component model to allow user to create metadata profiles have established an open Data Category Registry (ISOcat) system based on ISO 12620 (compliant with ISO 11179) got a professional tool set allowing users to create, register and share components and profiles to create MD descriptions efficiently
5
In CLARIN we are so good Virtual Language Observatory
6
In CLARIN we are so good got a distributed SOA domain with many language&speech tools integrated / being integrated use metadata profile matching to find appropriate tools when chaining services
7
but... there is so much data (& software) out there no one still knows of resp. no one is able to access from about 200 linguistic departments creating data there are less than a handful centers in EU who have a proper repository, do archiving and curation, give access, allow computation and enrichments, are audited, etc. no way to allow machines currently to access most of the resources blindly - common way: download & squeeze each individual resource/collection proper metadata at high granularity still unpopular only some harmonization at international level only incidentally discipline crossing chats
8
cross-disciplinary aspect large number of discipline-specific centers with access services all disciplines similar should we all do LTA, offer capacity computing, run PID, etc.? a network of strong data & compute hubs let them give COMMON services such as LTP, data staging, PID, AAI, etc. network of large data hubs network of discipline hubs
9
but... do we know what common services are and do we accept do we understand data organizations of communities to design services do we have agreed mechanisms working on large and complex data sets in a secure way in a federation do we agree on the same essential building blocks for a common data infrastructure AND - many communities are organized worldwide Thus - need a GLOBAL forum to agree on some essentials that will make data-driven research more efficient and foster new insights
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.