Presentation is loading. Please wait.

Presentation is loading. Please wait.

CLARIN: Goals and Structure of the Project Steven Krauwer CLARIN Coordinator Utrecht institute of Linguistics UiL-OTS (NL)

Similar presentations

Presentation on theme: "CLARIN: Goals and Structure of the Project Steven Krauwer CLARIN Coordinator Utrecht institute of Linguistics UiL-OTS (NL)"— Presentation transcript:

1 CLARIN: Goals and Structure of the Project Steven Krauwer CLARIN Coordinator Utrecht institute of Linguistics UiL-OTS (NL)

2 Steven KrauwerCLARIN - Riga 03-11-20082 Overview Problem & Mission Some why-questions Some who-questions Overall plan –Technical dimension –Language dimension –User dimension –Governance and legal dimension What CLARIN is NOT about How we work Funding Structure To conclude

3 Steven KrauwerCLARIN - Riga 03-11-20083 The problem Much data in digital archives language based Only known to insiders Archives mostly unconnected Every archive has its own standards for storage and access Normally only simple retrieval of files (text, audio or video documents) Social sciences and humanities researchers are not language or speech technologists They are often not aware of the potential benefits of using language and speech technology Available tools are hard to use for non- specialist

4 Steven KrauwerCLARIN - Riga 03-11-20084 The CLARIN Mission What: Create an infrastructure that makes language resources and technology (LRT), available to scholars of all disciplines, especially social sciences and humanities (SSH) How: Unite existing digital archives into a federation of archives with unified web access Provide existing language and speech technology tools as web services operating on language data in archives

5 Steven KrauwerCLARIN - Riga 03-11-20085 Why a European infrastructure? too much fragmentation lack of coordination across countries lack of visibility lack of interoperability lack of sustainability expertise exists but not in all countries language independent tools can be shared language dependent tools can often be ported most countries not able to bear the cost

6 Steven KrauwerCLARIN - Riga 03-11-20086 Why now? Exponential growth of digital data Increasing maturity of language and speech technology: –high speed –large volumes –new research questions Growing interest at EU level in research infrastructures (RI) RI Roadmap published in 2006 by ESFRI includes 35 accepted proposals for RIs CLARIN is one of them all of them will get funding for a 1-3 year preparatory phase

7 Steven KrauwerCLARIN - Riga 03-11-20087 Who we are and where we come from The CLARIN consortium has now 32 partners from 22 EU and associated countries (and more on the waiting list) The CLARIN community has 142 members in 32 countries (Oct 2008) CLARIN is based on 4 earlier initiatives with many participants: –LangWeb –EARL –TELRI –(and later) DAM-LR

8 Steven KrauwerCLARIN - Riga 03-11-20088 Who else do we need? Both our membership and our consortium are quite unbalanced: –Speech & multimodality underrepresented –Humanities other than linguistics underrepresented –Social sciences underrepresented –Some countries still missing There is no money to extend the consortium but we have to fill these gaps

9 Steven KrauwerCLARIN - Riga 03-11-20089 Overall plan for CLARIN Preparatory phase: 2008-2010 Put everything in place Construction phase: 2011-2015 Build and populate with tools and resources Exploitation phase: 2016-…. CLARIN in full service Budget: Prep phase –4.1 M€ from EC – ??? from countries Estimated budget until 2020: ca 200 M€

10 Steven KrauwerCLARIN - Riga 03-11-200810 4-dimensional approach in the preparatory phase First 3 years dedicated The technical dimension The language dimension to the design: The user dimension The governance and legal dimension

11 Steven KrauwerCLARIN - Riga 03-11-200811 Technical Technical specification of the infrastructure Construction of a prototype Validation on rich variety of –languages (>20) –resources –services Federation of existing archives Based on existing resources, tools Strong focus on interoperability standards Conversion of existing resources Encapsulation of existing tools

12 Steven KrauwerCLARIN - Riga 03-11-200812 Languages Cover all languages spoken or studied in participating countries Representational and descriptive standards should be adequate and validated for all languages Same minimal coverage of basic resources and tools for all languages BLARK (Basic Language Resources Toolkit) to be defined and implemented (funds from other sources needed)

13 Steven KrauwerCLARIN - Riga 03-11-200813 Language activities Survey of resources and tools, including: –encoding and annotation data –quality indicators taxonomies and ontologies agreeing on common standards Focus on integration of tools interoperability usage scenarios creating missing essential resources validating specifications and prototype

14 Steven KrauwerCLARIN - Riga 03-11-200814 User Users are SSH scholars (including linguists, translation experts) Do WE know what they need? Do THEY know what they need? Actions: –analyze past and ongoing SSH projects –user consultation –launch typical example projects to show potential –expertise centers –awareness actions

15 Steven KrauwerCLARIN - Riga 03-11-200815 Legal IPR issues aim at open source, but IPR for existing and future non-open resources must be accommodated federation of archives requires authentication, authorization and trust between archives aim at limited number of template license agreements for most common cases respect national legislation address ethical issues

16 Steven KrauwerCLARIN - Riga 03-11-200816 Governance and Funding Agree on e.g.: Who is going to pay for the construction and exploitation of the infrastructure How will it be managed How will it be coordinated with national policies Actions: Analyse best practice in funding and management of transnational projects Prepare agreement between (now) 22 countries about long term joint funding of CLARIN

17 Steven KrauwerCLARIN - Riga 03-11-200817 What CLARIN is NOT about building the infrastructure – we are just preparing it creating new resources – at this stage we want to use what is there and adapt it if necessary creating new applications – except maybe some essential tools or demonstrators focusing on the big languages – we find all languages equally important strengthening European industry – our target audience are SSH researchers, but we don’t want to exclude anyone

18 Steven KrauwerCLARIN - Riga 03-11-200818 How we work (1) Work packages: WP1: Management and coordination WP2: Designing the infrastructure and building the prototype WP3: Humanities overview WP5: Language resources and technology overview WP6: Dissemination WP7: IPR and business models WP8: Construction and exploitation agreement

19 Steven KrauwerCLARIN - Riga 03-11-200819 How we work (2) WP8 Org&Legal Framework WP7 IPR, A&A, licensing WP5 LRT Exploration WP2 Infrastructure Prototype WP3 Humanities Projects 1 2 6 7 4 5 3 8

20 Steven KrauwerCLARIN - Riga 03-11-200820 How we work (3) Most tasks executed in Working Groups WGs consist of project partners & other experts (CLARIN is open!) Some WGs do work (e.g. build prototype), others create consensus Participation by others essential as e.g. standards cannot be imposed by a small group Unfortunately no EC funding available for WG participation – only reward is influence!

21 Steven KrauwerCLARIN - Riga 03-11-200821 Funding & what to use it for From EC: 4.1 M€, used for generic, language independent tasks From countries: ??? M€, to be used for preparing CLARIN at the national level in every country: –build and organize local national CLARIN community –support for participation in working groups (e.g. travel) –validation tasks for own language(s) –creation or adaptation of essential resources –pilots and demonstrators & humanities projects –(co-)organisation of local or international events –preparing for future role (expertise centers, repositories)

22 Steven KrauwerCLARIN - Riga 03-11-200822 Structure Executive Board, consisting of the 7 WP leaders plus a special representative to liaise with the humanities community (a.o. through the DARIAH sister project) Boards: –Scientific Board –Strategic Coordination Board –International Advisory Board Meetings (virtual or face to face): –Consortium meetings –Member meetings –Working group meetings

23 Steven KrauwerCLARIN - Riga 03-11-200823 More info CLARIN Website: http://www.clarin.eu CLARIN Office: CLARIN Newsletter: CLARIN Members:

Download ppt "CLARIN: Goals and Structure of the Project Steven Krauwer CLARIN Coordinator Utrecht institute of Linguistics UiL-OTS (NL)"

Similar presentations

Ads by Google