Presentation is loading. Please wait.

Presentation is loading. Please wait.

CLARIN: Common Language Resources and Technology Infrastructure for the Social Sciences and Humanities Steven Krauwer Utrecht institute of Linguistics.

Similar presentations

Presentation on theme: "CLARIN: Common Language Resources and Technology Infrastructure for the Social Sciences and Humanities Steven Krauwer Utrecht institute of Linguistics."— Presentation transcript:

1 CLARIN: Common Language Resources and Technology Infrastructure for the Social Sciences and Humanities Steven Krauwer Utrecht institute of Linguistics UiL-OTS (NL) INFuture, Zagreb Nov 7 2007

2 Steven KrauwerINFuture 2007, Zagreb2 Overview Problem & Mission Some why-questions Approach How we work and who we are Why this talk Summing up

3 Steven KrauwerINFuture 2007, Zagreb3 The problem Much data in digital archives language based Many archives only known to local insiders and mostly unconnected Every archive has its own standards for storage and access, normally only simple retrieval of files (text, audio or video documents) Social sciences and humanities researchers are often not aware of the potential benefits of using language and speech technology tools, and these tools are hard to use for non- specialist

4 Steven KrauwerINFuture 2007, Zagreb4 The CLARIN Mission What: Create an infrastructure that makes language resources and technology (LRT) available to scholars of all disciplines, especially social sciences and humanities (SSH) How: Unite existing digital archives into a federation of connected archives with unified web access Provide language and speech technology tools as web services operating on language data in archives

5 Steven KrauwerINFuture 2007, Zagreb5 Why a European infrastructure? too much fragmentation lack of coordination lack of visibility lack of interoperability lack of sustainability expertise exists but not in all countries language independent tools can be shared language dependent tools can often be ported most countries not able to bear the cost

6 Steven KrauwerINFuture 2007, Zagreb6 Why now? Exponential growth of digital data Maturity of language and speech technology: –allows for high speed processing –allows for large volumes –allows for new research questions Growing interest at EU level in research infrastructures (RI) for the ERA ESFRI RI Roadmap published in 2006 includes 34 proposals for RIs all of them will get EC funding for a 1-3 year preparatory phase

7 Steven KrauwerINFuture 2007, Zagreb7 Overall plan for CLARIN Preparatory phase 2008 – 2010: Put everything in place to get started for real Build prototype Budget in preparatory phase –4.1 M€ from EC –??? M€ from participating countries Construction phase 2011 – 2015: Build and populate with tools and resources Exploitation phase 2016 - …. CLARIN in full service Overall budget 2008 - 2020: ca 200 M€

8 Steven KrauwerINFuture 2007, Zagreb8 4-dimensional approach for the prep phase The technical dimension The language dimension The user dimension The governance and legal dimension

9 Steven KrauwerINFuture 2007, Zagreb9 Technical Technical specification of the infrastructure Construction of a prototype Validation on rich variety of –languages (>20) –resources –services –based on existing resources and tools (i.e. not a digitization or tools creation project) Strong focus on interoperability standards Conversion of existing resources Encapsulation of existing tools

10 Steven KrauwerINFuture 2007, Zagreb10 Strong sustainable centers

11 Steven KrauwerINFuture 2007, Zagreb11 Languages Intention to cover all languages spoken or studied in participating countries Representational and descriptive standards should be adequate and validated for all languages Same minimal coverage of basic resources and tools for all languages is to be defined (and implemented if additional funds are available)

12 Steven KrauwerINFuture 2007, Zagreb12 Language activities Survey of resources and tools, including: –encoding and annotation data –quality indicators agreeing on taxonomies and ontologies agreeing on common standards Focus on integration of tools interoperability usage scenarios if possible creation of missing essential resources validating specifications and prototype

13 Steven KrauwerINFuture 2007, Zagreb13 User Users are SSH scholars Do WE know what they need? Do THEY know what they need? Actions: analyze past and ongoing SSH projects user consultation launch typical example projects to show potential create expertise centers awareness actions

14 Steven KrauwerINFuture 2007, Zagreb14 Governance, funding and legal issues Agree on e.g.: Who is going to pay for the construction and exploitation of the infrastructure How will the costs be shared How will it be managed How will it be coordinated with national policies Actions: Analyse best practice in funding and management of transnational projects Prepare agreement between (now) 22 countries about long term joint funding of CLARIN Set up IPR framework

15 Steven KrauwerINFuture 2007, Zagreb15 How we work Most tasks executed in Working Groups WGs consist of project partners & other experts (CLARIN is open for contributions by others!) Some WGs do work (e.g. build prototype), others create consensus Participation by others essential as e.g. standards cannot be imposed by a small group Unfortunately no funding available for WG participation by others – only influence!

16 Steven KrauwerINFuture 2007, Zagreb16 Who we are The CLARIN consortium has 32 partners from 22 EU and associated countries, including Croatia (FFZG) The CLARIN community has 92 members in 32 countries (Nov 07) Leading partners are: Utrecht University (Steven Krauwer coordinator) Max Planck Institute Nijmegen (Peter Wittenburg) Hungarian Academy of Sciences (Tamas Varadi)

17 Steven KrauwerINFuture 2007, Zagreb17 National vs EC funding EC funds managed by consortium, will pay for –generic tasks (e.g. research, prototyping, coordination, dissemination) –participation by a single national coordination point in every country (in HR: FFZG Zagreb) National funds to be managed nationally, will pay for –participation by other sites in the country –taking care of own language and priorities (standards, & validation, adaptation of tools & resources) –carrying out example humanities projects –(hopefully) participating in Working Groups

18 Steven KrauwerINFuture 2007, Zagreb18 Why this talk? Invitation to join CLARIN: –We need user involvement –We need archives willing to join the federation –We need experts for our centers of expertise –We need example humanities projects for the preparatory phase

19 Steven KrauwerINFuture 2007, Zagreb19 Summing up (1) CLARIN is about to embark on its 3 year Preparatory Phase project aimed at designing and building an LRT infrastructure for the SSH It can only work with support from the whole SSH community, both inside and outside the EU Please join us if you feel you can and want to contribute. We don’t pay you but don’t charge you either – it’s free! Contact:, steven.krauwer@let.uu.nl or your national contact point

20 Steven KrauwerINFuture 2007, Zagreb20 Summing up (2) One day any SSH scholar should be able to ask without any difficulty: “List all uses of enthusiasm in 19 th century English novels written by women” “Find all video clips of Tony Blair on BBC in 2007” “Summarize Le Monde of October 7 th 2007 – in Croatian”

Download ppt "CLARIN: Common Language Resources and Technology Infrastructure for the Social Sciences and Humanities Steven Krauwer Utrecht institute of Linguistics."

Similar presentations

Ads by Google