Resource and Service Centers as the Backbone for a Sustainable Infrastructure Peter Wittenburg CLARIN Research Infrastructure Co-Authors: Nuria Bel, Lars.


Similar presentations
PUMA & MetaPub Open Access to Italian CNR Repositories in the Perspective of the European Digital Repository Infrastructure GL9 - NINTH INTERNATIONAL CONFERENCE.

UKOLN is supported by: JISC Information Environment update Repositories and Preservation Programme meeting, October 24-25, 2006 Rachel Heery UKOLN
The Alliance for Data Archive Technologies: Looking towards a Common Future Myron Gutmann, ICPSR Ben Evans, ASSDA Deborah Mitchell, ASSDA Kevin Schürer,
CLARIN AAI, Web Services Security Requirements
Interoperability aspects in the The Virtual Language Observatory Dieter Van Uytvanck Max Planck Institute for Psycholinguistics
CLARIN Technical Infrastructure How far are we?. Short Overview CLARIN is one of the 44 accepted ESFRI Roadmap Initiatives official start: , Kick-off:
Computational Paradigms in the Humanities – eHumanities and their role and impact in transdisciplinary research Gerhard Budin University of Vienna.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Repositories, Federations, APIs, Policies - wrap up - Peter Wittenburg these slides are just a personal summary of major points they do not represent per.
LIBRARIES IN THE CHANGING WEB ENVIRONMENT Tanja Merčun University of Ljubljana Department of Library and Information Science and Book Studies.
Steven KrauwerCLARIN-NL Launch CLARIN-EU: Where do we stand? Steven Krauwer Utrecht institute of Linguistics UiL OTS CLARIN-EU Coordinator.
CLARIN: Common Language Resources and Technology Infrastructure for the Social Sciences and Humanities Steven Krauwer Utrecht institute of Linguistics.
Steven KrauwerLREC20081 CLARIN: Common Language Resources and Technology Infrastructure for the Humanities and Social Sciences Kimmo Koskenniemi (University.
The current state of Metadata - as far as we understand it - Peter Wittenburg The Language Archive - Max Planck Institute CLARIN Research Infrastructure.
Paul Messina Towards a robust software environment for e-Infrastructure Paul Messina Argonne National Laboratory Caltech (Ret.) USC-ISI December 9, 2003.
Repositories, Workspaces, Web Services - some ideas - Peter Wittenburg The Language Archive - Max Planck Institute CLARIN Research Infrastructure Nijmegen,
Open Cloud Sunil Kumar Balaganchi Thammaiah Internet and Web Systems 2, Spring 2012 Department of Computer Science University of Massachusetts Lowell.
CLARIN Centers for a Sustainable Infrastructure Daan Broeder, MPI for Psycholinguistics Jan Odijk, Utrecht University.
Chinese-European Workshop on Digital Preservation, Beijing July 14 – Network of Expertise in Digital Preservation 1 Trusted Digital Repositories,
CLARIN Common Language Resources and Technology Infrastructure Daan Broeder & Dieter van Uytvanck Max-Planck Institute for Psycholinguistics TF-EMC2 Meeting,
CLARIN-NL Second Open Call Jan Odijk CLARIN-NL Call 2 Info-session Amsterdam, 26 Aug 2010.
From DOBES to CLARIN and beyond Axel Horstmann Peter Wittenburg Erhard Hinrichs VolkswagenFoundation MPI for Psycholinguistics University of Tübingen ?
11-July-2011, SURFnet Heather Flanagan, COmanage Project Coordinator Benn Oshrin, COmanage Developer Scott Koranda, U. Wisconsin – Milwaukee and LIGO.
Sharing Resources in CLARIN-NL Jan Odijk, Arjan van Hessen LRTS Workshop IJCNLP Chiang Mai, Thailand, 12 Nov 2011.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Increasing the usage of endangered language archives in the.
Data Archiving and Networked Services DANS is an institute of KNAW en NWO Trusted Digital Archives and the Data Seal of Approval Peter Doorn Data Archiving.
Data Archiving and Networked Services DANS is an institute of KNAW en NWO and the Peter Doorn Data Archiving and Networked Services EUDAT Conference Trust.
CLARIN ERIC Progress according to the Strategy Plan Steven Krauwer, Bente Maegaard 1.
The role of Parthenos for CLARIN ERIC Steven Krauwer CLARIN ERIC Executive Director 1.
CLARIN - a European Research Infrastructure Peter Wittenburg Max-Planck Institut für Psycholinguistik, Nijmegen.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Why should we invest in DWF? Peter Wittenburg CLARIN Research.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
Cloud Computing Characteristics A service provided by large internet-based specialised data centres that offers storage, processing and computer resources.
CLARIN Infrastructure Vision (and some real needs) Daan Broeder CLARIN EU/NL Max-Planck Institute for Psycholinguistics.
CLARIN Metadata Infrastructure Component Metadata and intermediate solutions Daan Broeder Claus Zinn Dieter van Uytvanck - Max-Planck Institute for Psycholinguistics.
C ross-European data sharing made easy EDAF Luxembourg.
Wishes from Hum infrastructures Examples: DOBES and CLARIN Peter Wittenburg Max Planck Institute for Psycholinguistics.
DASISH Final Conference Common Solutions to Common Problems.
COCOSDA Meeting -summing up some impressions after a very dense week – -on one hand the “big and slightly smaller” challenges of the discipline -highly.
CLARIN work packages. Conference Place yyyy-mm-dd
CLARIN Issues Peter Wittenburg MPI for Psycholinguistics Nijmegen, NL.
Recent Developments in CLARIN-NL Jan Odijk P11 LREC, Istanbul, May 23,
Exploring ‘Workspaces’ Tom Visser, SARA compute and networking services, Amsterdam Garching Workshop 21 st September 2010.
General remarks about the conference SK Soesterberg1Friday Oct
European Life Sciences Infrastructure for Biological Information ELIXIR and Identity Management 2 nd Workshop on Federated Identity.
Find Research Data B2FIND User Training How to find data objects and collections using EUDAT’s B2FIND This work is licensed.
Creating & Testing CLARIN Metadata Components A CLARIN-NL project Folkert de Vriend Meertens Institute, Amsterdam 18/05/2010.
Introduction to the VO ESAVO ESA/ESAC – Madrid, Spain.
Managing Access at the University of Oregon : a Case Study of Scholars’ Bank by Carol Hixson Head, Metadata and Digital Library Services
Store and Share Research Data B2SHARE How to share and store research data using EUDAT’s B2SHARE This work is licensed under.
The Two Cultures: Mashing up Web 2.0 and the Semantic Web The 16 th International World Wide Web Conference (2007) - Position Paper - Presented By Anupriya.
European Science Cloud for Research Towards a common vision Per Öster CSC – IT Center for Science Ltd.
Open Science and Research – Services for Research Data Management © 2014 OKM ATT 2014–2017 initiative Licenced under.
CLARIN EUDAT2020 uptake plan Dieter Van Uytvanck CLARIN ERIC EUDAT User Forum, Rome.
AAI needs of the Distributed Computing Infrastructures - CLARIN Dieter Van Uytvanck Max Planck Institute for Psycholinguistics
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No EPOS and EUDAT.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No EUDAT Aalto Data.
Store and exchange data with colleagues and team Synchronize multiple versions of data Ensure automatic desktop synchronization of large files B2DROP is.
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No Support to scientific.
Accessing the VI-SEEM infrastructure
PIDs in EUDAT Webinar, 15 Februari 2013
Towards a pan-European Collaborative Data Infrastructure
GISELA & CHAIN Workshop Digital Cultural Heritage Network
AAI for a Collaborative Data Infrastructure
CLARIN Federated Identity Vision
EGI-Engage Engaging the EGI Community towards an Open Science Commons
WP 5 Shared Data Access & Enrichment
Common Solutions to Common Problems
European Research Data Services, Expertise & Technology Solutions
GISELA & CHAIN Workshop Digital Cultural Heritage Network
Presentation transcript:

Resource and Service Centers as the Backbone for a Sustainable Infrastructure Peter Wittenburg CLARIN Research Infrastructure Co-Authors: Nuria Bel, Lars Borin, Gerhard Budin, Nicoletta Calzolari, Eva Hajicova, Kimmo Koskenniemi, Lothar Lemnitzer, Bente Maegaard, Maciej Piasecki, Jean-Marie Pierrel, Stelios Piperidis, Inguna Skadina, Dan Tufis, Remco van Veenendaal, Tamas Varadi, Martin Wynne

Which Scenario are we aiming at? let's first say which researchers we have in mind speaking primarily about the typical researcher in the humanities and social sciences, but probably not limited to them small research departments little of no technical minded support staff little knowledge about standards (why should they) lacking knowledge about computer-based methods etc. increasingly often they are excluded from data-driven research "even" at an institute such as MPI many research questions cannot be dealt with due to the effort needed to find and operate on resources Only little fits together as we all know.

Which Scenario are we aiming at? everyone is relying on Google to search for all sorts of web information i.e. the web-based paradigm is widely accepted ~100% available, robust, simple, critical mass of information, etc. when it comes to research work people still apply the "down-load first paradigm" and "manage their own creative data backyard" only my theory is relevant and papers count my creative data backyard is private Wall of Silence

Which Scenario are we aiming at? does not seem to be efficient but has some advantages will remain - but need another dimension network of centers offering data and services make data explicit set up services down-load firstvs. cyberinfrastructure this may facilitate working with language resources and tools many communities are working along same goals (life sciences, bioinformatics, geosciences, etc.) funders are changing their rules (NL, recently NSF)

What is required? trust of the researchers which has many facets: availability and easiness of services security of services and workspaces persistency of services scalability of services (not just for a few users) added functionality such as virtual collection and workflow building AND as James Pustejovsky put it recently: we are talking about international collaboration which we will only manage when we agree on standards are we mature enough? recently a joint roadmap document for working towards standards Nuria Bel, Jonas Beskow, Lou Boves, Gerhard Budin, Nicoletta Calzolari, Khalid Choukri, Erhard Hinrichs, Steven Krauwer, Lothar Lemnitzer, Stelios Piperidis, Adam Przepiorkowski, Laurent Romary, Florian Schiel, Helmut Schmidt, Hans Uszkoreit, Peter Wittenburg in the mean time adopted by CLARIN

How can we ensure all this? there are many ingredients of course one is establishing a network of service centers fulfilling requirements be ready for deposits & take full responsibility of all deposited resources a proper repository system guaranteeing availability, persistency and authenticity of stored objects in case of services requirements are not as obvious adhere to CLARIN standards and providing high-quality metadata regular quality assessment according to TRAC or DSA support dynamic and flexible research workflows participation in the national identity federation and in the CLARIN service provider federation to establish a TRUST domain explicitness about IPR, licenses, ethical issues etc. probably a linguistic/technical staff is required to manage all this and to support users

What is the state? CLARIN: > 180 members ~ 25 centre candidates setup at different speeds

State of federations? Initial SPF Finland Germany Netherlands all documents with IdPs were signed more than 1 Mio potential users for single identity and single sign-on now quick extension in EU

Can they do everything? what about long-term preservation? what about workspaces and execution spaces (compute time)? collaboration with big EU computer/storage centers on a data service infra User Communities Data Generation Virtual Research Environments Community Centers Data Curation Community Access Services Data Centers Data Preservation Generic Data Services RI domain data centers domain CLARIN (our domain) LifeWatch (biodiversity) ELIXIR (biogenetics) METAFOR (climate) open slot "general user" SARA, CSC, RZG, FZJ, CENECA, BSCC, etc. already an open deposit offer in place together with two centers with 50 years guarantee

department server Do we have concrete examples? User 1 archive other archives User x domain of data centers service deployment data replication

Can users rely on information? CGN (12.000) OLAC (40.000) End.Lang. (35.000) MPI (33.000) BAS (7.400) AILLA (1.800) LRT Inventory (800/137) DFKI Tool Registry (292) ELDA (60) others IMDI Domain GIS overlay Facetted Browser Catalogue hard problem: - mapping - granularity - curation Indexes OAI PMH harvesting and transformation Virtual Language Observatory with objects, but...

Summarizing we need stable and powerful service centers to convince researchers to deposit their data (and thus make it explicit) and to rely on web-based services we know that this will take a while and also requires some pressure (see NSF, NWO,...) there are some major ingredients for continuing on this road establish trust along various dimensions (availability, security, persistence, scalability,...) stepwise move towards standards (as discussed the other 2 days) (hide complexity by tools!!) carry out regular quality assessment and performance monitoring support dynamic research workflows participate in European trust federations THIS IS ALREADY HAPPENING - BUT NOT YET SYSTEMATICALLY

Can we achieve something? Falls nicht to end in Babylonish scenario nous avons still algo time om sistemas te improve. Thanks for your attention. Roberto's key question: how many infrastructures? But...