N. Calzolari12nd KYOTO Workshop, Gifu, Japan, January 2011 Nicoletta Calzolari Istituto di Linguistica Computazionale – CNR – Pisa

Slides:



Advertisements
Similar presentations
The Seven Pillars of Open Language Archiving: Introducing the OLAC Vision Gary Simons SIL International LSA Symposium: The Open Language Archives Community.
Advertisements

FP7, Information Day Call 5, Luxembourg, May 11-12, 2009 KYOTO (ICT ) Yielding Ontologies for Transition-Based Organization FP7: Intelligent Content.
A centralized approach to language resources Piek Vossen S&T Forum on Multilingualism, Luxembourg, June 6th 2005.
Elearning Quality for Learning Repositories in Secondary Education Elearning Quality for Learning Repositories in Secondary Education e-Learning Quality:
Zürich, January 28, 2009 ERCIM WG eMobility Meeting Torsten Braun University of Bern, Switzerland
DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
Near East Plant Protection Network for Regional Cooperation & Knowledge Sharing Food and Agriculture Organization of the United Nations An Overview on.
Computational Paradigms in the Humanities – eHumanities and their role and impact in transdisciplinary research Gerhard Budin University of Vienna.
Cultural Content and Digital Heritage Bernard Smith European Commission INFSO/D2.
Spatial Data Infrastructure: Concepts and Components Geog 458: Map Sources and Errors March 6, 2006.
… e Progetti Risorse Linguistiche (lessici, corpora, ontologie, …)
2nd Workshop Prague 29/11/2007 WP4: Pilot action plan Region of Central Macedonia Isidoros Passas, URENIO Research Unit.
ICT Monica Monachini – 1° KYOTO Workshop – Amsterdam 2/ KYOTO (ICT ) Yielding Ontologies for Transition-Based Organization Intelligent.
Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa N. CalzolariNijmegen, August
Role of RAS in the Agricultural Innovation System Rasheed Sulaiman V
SEVENPRO – STREP KEG seminar, Prague, 8/November/2007 © SEVENPRO Consortium SEVENPRO – Semantic Virtual Engineering Environment for Product.
Supporting education and research E-learning tools, standards and systems Sarah Porter Head of Development, JISC.
Connecting people, society and the economy to a location UNSC Learning Centre 25 February 2013 Peter Harper Deputy Australian Statistician Australian Bureau.
Linked Data as an enabler of cross-media and multilingual content analytics for enterprises across Europe A.Gómez-Pérez (UPM) Project Coordinator.
Thee-Framework for Education & Research The e-Framework for Education & Research an Overview TEN Competence, Jan 2007 Bill Olivier,
Association for the Education of Adults EAEA European AE Research – Look towards the future ERDI General Assembly, 2004.
DISTILLATE An introduction Final workshop of the DISTILLATE programme Great Minster House, London Tuesday 22 nd January 2008 Professor Tony May ITS, University.
Course Instructor: Aisha Azeem
15 April Fostering Entrepreneurship among young people through education: a EU perspective Simone Baldassarri Unit “Entrepreneurship” Forum “Delivering.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Training of master Trainers Workshop 10 – 15 November 2012 e-Services Design and Delivery Module VI Emilio Bugli Innocenti.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 18 Slide 1 Software Reuse.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Critical Role of ICT in Parliament Fulfill legislative, oversight, and representative responsibilities Achieve the goals of transparency, openness, accessibility,
TDT4252/DT8802 Exam 2013 Guidelines to answers
Dr. Jūratė Kuprienė Director for innovations and infrastructure development Workshop: Information services for research process , Rīga Research.
The role of Parthenos for CLARIN ERIC Steven Krauwer CLARIN ERIC Executive Director 1.
1 INFRA : INFRA : Scientific Information Repository supporting FP7 “The views expressed in this presentation are those of the author.
European Spatial Data Infrastructure Conceptual Schema Language workshop Summary INSPIRE – EuroSDR – CEN/TC 287 WG SDI 13 and 14 Oct 2005, JRC, Ispra,
Save time. Reduce costs. Find and reuse interoperability solutions on Joinup for developing European public services Nikolaos Loutas
A complementary view from the DIGOIDUNA study Paolo Bouquet, University of Trento, Italy SMART 2010/0054.
Ontology Summit2007 Survey Response Analysis -- Issues Ken Baclawski Northeastern University.
Towards a European network for digital preservation Ideas for a proposal Mariella Guercio, University of Urbino.
ENABLER, BLARK, what’s next? Steven Krauwer Utrecht University / ELSNET.
Interfacing Registry Systems December 2000.
Results of the HPC in Europe Taskforce (HET) e-IRG Workshop Kimmo Koski CSC – The Finnish IT Center for Science April 19 th, 2007.
JOINING UP GOVERNMENTS EUROPEAN COMMISSION Establishing a European Union Location Framework.
CLARIN work packages. Conference Place yyyy-mm-dd
Nicoletta Calzolari Berlin, October PWI ISO SC 4/WG 4 Lexicon-Ontology relations PWI Nicoletta Calzolari Exploratory meeting.
Geneva, Switzerland, April 2012 Introduction to session 7 - “Advancing e-health standards: Roles and responsibilities of stakeholders” ​ Marco Carugi.
W HAT IS I NTEROPERABILITY ? ( AND HOW DO WE MEASURE IT ?) INSPIRE Conference 2011 Edinburgh, UK.
Information Society and Media Directorate-General Unit Grid Technologies NCP Info Day Call5 - Brussels, 02 June – Advanced Grid Technologies,
MEDIN Work Plan for By March 2011 MEDIN will be 3 years into the original 5 year development plan started in Would normally ask for continued.
Recent Developments in CLARIN-NL Jan Odijk P11 LREC, Istanbul, May 23,
ISO-PWI Lexical ontology some loose remarks Thierry Declerck, DFKI GmbH.
LREC 2010, Malta, 20 May e Content plus Preparing the field for an Open and Distributed Resource Infrastructure: the role of the FLaReNet Network.
Service Service metadata what Service is who responsible for service constraints service creation service maintenance service deployment rules rules processing.
Measuring Sustainable development: Achievements and Challenges Enrico Giovannini OECD Chief Statistician June 2005.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
© The ATHENA Consortium. EM1 - Enterprise Modelling as a way to achieve Interoperability Module 3 - What interoperability problems does Enterprise.
26/05/2005 Research Infrastructures - 'eInfrastructure: Grid initiatives‘ FP INFRASTRUCTURES-71 DIMMI Project a DI gital M ulti M edia I nfrastructure.
European strategies for digitisation: the context of i2010 digital libraries Pat Manson Head of Unit Cultural Heritage and Technology Enhanced Learning.
Technology-enhanced Learning: EU research and its role in current and future ICT based learning environments Pat Manson Head of Unit Technology Enhanced.
EGI-InSPIRE RI EGI-InSPIRE Open Science Open Data Open Access Sergio Andreozzi Strategy & Policy Manager, EGI.eu
19-20 October 2010 IT Directors’ Group meeting 1 Item 6 of the agenda ISA programme Pascal JACQUES Unit B2 - Methodology/Research Local Informatics Security.
A look into current and future trends in national policies for eHealth and Innovation in the WHO European Region Clayton Hamilton, eHealth and Innovation.
Towards a European Shared Environmental Information System in Support of Environmental Policies: INSPIRE: an Inspired revolution for a knowledge-based.
COST Action and European GBIF Nodes Anne-Sophie Archambeau.
KYOTO (ICT ) Knowledge Yielding Ontologies for Transition-Based Organization Intelligent Content and Semantics The First KYOTO Workshop February.
European Network of e-Lexicography
Information Technology (IT)
SDMX Global Conference - Addis Ababa, 2-5 October 2017
DRIVER Digital Repository Infrastructure Vision for European Research
Infrastructrural Language Resources and International Cooperation
Presentation transcript:

N. Calzolari12nd KYOTO Workshop, Gifu, Japan, January 2011 Nicoletta Calzolari Istituto di Linguistica Computazionale – CNR – Pisa The Future of KYOTO … with some historical notes to show a path along an evolving vision Language Resources in today EU context: META-SHARE,...

Why such needed LRs, are lacking after 30 years of R&D in the field?  1) Because the main trend until mid-’80s was to privilege the processing of so-called “critical” phenomena, studied by the dominating linguistic theories, rather than focusing on the deep analysis of the real uses of a language As a result CL was focusing on: As a result CL was focusing on: few examples - often artificially built lexicons made of few entries (toy lexicons) grammars with poor coverage  2) Because large-scale LRs are costly & their production requires a big organizing effort N. Calzolari22nd KYOTO Workshop, Gifu, Japan, January 2011 Old slide with Antonio Zampolli (’80s/early ‘90s) Why we still lack them??

… back from the early ‘80s It became evident that: Part of the results of meaning extraction, e.g. many meaning distinctions, which could be generalised over lexicographic definitions and automatically captured, were unmanageable at the formal representation level, and had to be blurred into unique features and values Unfortunately, it is still today difficult to constrain word-meanings within a rigorously defined organization: by their very nature they tend to evade any strict boundaries N. Calzolari32nd KYOTO Workshop, Gifu, Japan, January 2011 Automatic acquisition of lexical information from MRDs Was my first research & became central in the Pisa group (ACQUILEX) And also Amsler, Briscoe, Boguraev, Wilks’ group, IBM, then Japanese groups, … The trend was: “large-scale computational methods for the transformation of machine readable dictionaries into machine tractable dictionaries” Instead of relying on linguists’ introspection PioneeringResearch Historical notes

Automatic acquisition of info from texts: Automatic acquisition of info from texts: This trend has become today a consolidated & pervasive fact From acquisition of “linguistic information” To acquisition of “general knowledge”, with more data intensive, robust, reliable methods N. Calzolari42nd KYOTO Workshop, Gifu, Japan, January 2011 … back from the late ‘80s After acquisition from MRDs, Historical notes Need of adequate models to handle actual usage of language Lesson learned ( IN-)Adequacy of (current) lexicons Lesson learned Going from core sets to large coverage has implications not just in quantitative terms, but more interestingly in terms of changes to the models and the strategies of processes Lesson learned

N. Calzolari 5 2nd KYOTO Workshop, Gifu, Japan, January MultiLex GeneLex AcquiLexAcquiLex Xxx-LexXxx-Lex A. Zampolli: Let’s be coherent: Xxx-LexXxx-Lex After the “Grosseto Workshop” (1985): a turning point

ISO LMF Lexical Markup Framework N. Calzolari 2nd KYOTO Workshop, Gifu, Japan, January Structural skeleton, with the basic hierarchy of information in a lexical entry + various extensions  Modular framework  LMF specs comply with modelling UML principles  an XML DTD allows implementation Builds on EAGLES/ISLE NEDOAsianLang.uages The field is mature NICT Language-Grid NICT Language-Grid Service Ontology ICTKYOTO LIRICS New initiatives … LexInfo

N. Calzolari72nd KYOTO Workshop, Gifu, Japan, January 2011 KYOTO A search environment using semantic technologies A “compass” for the web2.0 Interdisciplinarity scientific community (LRT, web technologies, knowledge engineers), companies, domain experts Multilingualism 7 languages (2 Asiatic languages) needs to share lexical/knowledge bases & tools both general & domain-related underthe form of lexical/ontological & sw repositories under the form of lexical/ontological & sw repositories Kyoto Core System is open & free The “resource” perspective

Annotation Format (KAF) Multi-level Annotation Format stand-off stand-off annotation uniform uniform representation for 7 languages  Shared through the languages Text Text: tokenisation, sentences, paragraphs with reference to the sources Terms Terms: words & multi-words, parts-of- speech, etc. Chunks Chunks: constituents & syntagmatic realization Dependencies Dependencies: grammatical functions L1 – Semantic modules OntoTagging ● L1 – Semantic modules: Multiword tagging, Sense Tagging, Named Entity Recognition, OntoTagging L2 – Semantic module ● L2 – Semantic module: event/fact extraction N. Calzolari82nd KYOTO Workshop, Gifu, Japan, January 2011 from Piek Vossen

N. Calzolari92nd KYOTO Workshop, Gifu, Japan, January 2011 KYOTO System & Adoption of Standards LinearMAF/SYNAF SEMAF Term extraction Tybot GenericTMF Semantic annotation Linear Generic FACTAF Fact extraction Kybot Domain editing Wikyoto Wordnet Domain Wordnet LMF API Ontology Domain ontology OWL API Concept User Fact User from Piek Vossen SourceDocuments Could be at the basis of a new standard?

2nd KYOTO Workshop, Gifu, Japan, January 2011 A common representation format for WordNets Wn IT Wn EN Wn EU Wn NL Wn JP Wn CH Wn ES representation format allowing easy access, integration & interoperability  endow WordNet with a representation format allowing easy access, integration & interoperability among resources Wn IT Wn EN Wn EU Wn NL Wn JP Wn CH Wn ES

2nd KYOTO Workshop, Gifu, Japan, January 2011N. Calzolari11 GlobalInformation Lemma Monolingual ExternalRef Monolingual ExternalRefs Sense LexicalEntry Statement Definition SynsetRelation SynsetRelations Monolingual ExternalRef Monolingual ExternalRefs Synset Lexicon Interlingual ExternalRef Interlingual ExternalRefs SenseAxis SenseAxes LexicalResource * * * * Meta 0..1 Meta 0..1 Meta 0..1 Meta 0..* * 0..* * Data Categories from Monica Monachini

2nd KYOTO Workshop, Gifu, Japan, January 2011 A list of 85 sem.rels as a result of a mapping of the KYOTO WordNet grid Inter-WN Intra-WN N. Calzolari12

2nd KYOTO Workshop, Gifu, Japan, January 2011 N. Calzolari13 SWN n <!ATTLIST SenseAxis id ID #REQUIRED relType CDATA #REQUIRED> <!ATTLIST Target ID CDATA #REQUIRED> <!ATTLIST InterlingualExternalRef externalSystem CDATA #REQUIRED externalReference CDATA #REQUIRED relType (at|plus|equal) #IMPLIED> IWN n WordNet-LMF Multilingual level - Cross-lingual Relations WN n groups monolingual synsets corresponding to each other and sharing the same relations to English link to ontology/(ies) specifies the type of correspondence from Monica Monachini

N. Calzolari142nd KYOTO Workshop, Gifu, Japan, January 2011 Complex picture! Is there anything we need to do for Interoperability? Work within ISO:  LMF: abstract meta-model for lexical representation  Ontology Group or more Groups?  Language Resource Ontologies: ontology of data categories Real life:  Lexicons (e.g. WordNets) that are called Ontologies  Lexicons linked to Ontologies: to be used in applications, in multilingual systems, domains, …  Work on “ontologising” Lexicons: to allow exploiting various relations, to make inferences, …  Semantic Lexicons, with many types of relations among semantic units: these are often of “conceptual/world-knowledge” nature. Do we want DCs for these? ISO SC 4/WG 4 – Lexicon-Ontology relations PWI ISO SC 4/WG 4 – Lexicon-Ontology relations New work item: PWI KYOTO can contribute

N. Calzolari152nd KYOTO Workshop, Gifu, Japan, January 2011 To explore the need of doing something within ISO about the relations between Lexicon and Ontology Do we/ISO need to address another (lexical) layer?  How lexicons and ontologies are linked and information mapped from one to the other  The ontological layer in a/connected to a lexicon Possible issues/questions:  Is LMF enough to represent Ontological links?  How to connect work being done in ISO Lexical group and ISO Ontology groups?  Lexicon and Ontologies: separation? or lexicalised ontologies? or ontologies lexicons?  Lexicon, Ontologies and Domains  On a very different dimension: Ontology of lexical/semantic/conceptual categories? Standardised semantic categories, ontology labels?  Relation to multilinguality ... KYOTO can contribute

N. Calzolari162nd KYOTO Workshop, Gifu, Japan, January 2011 Input to Multilingual Web The MultilingualWeb project is exploring standards and best practices that support the creation, localization and use of multilingual web-based information The MultilingualWeb project is exploring standards and best practices that support the creation, localization and use of multilingual web-based information It aims to raise the visibility of existing best practices and standards and identify gaps It aims to raise the visibility of existing best practices and standards and identify gaps The core vehicle for this is a series of four workshops, for networking across communities that span the various aspects involved The core vehicle for this is a series of four workshops, for networking across communities that span the various aspects involved Next workshop on best practices aimed at development of Content for the Web, including creation of content ranging from personal authoring for blogs and social networking sites to development of large corporate or organizational enterprises: Next workshop on best practices aimed at development of Content for the Web, including creation of content ranging from personal authoring for blogs and social networking sites to development of large corporate or organizational enterprises: “Content on the Multilingual Web” 4-5 April 2011 Pisa, Italy KYOTO can contribute

N. Calzolari172nd KYOTO Workshop, Gifu, Japan, January 2011 A new paradigm of R&D in LRs & LT Since few years Open & distributed linguistic infrastructures for LRs & LT accumulation of knowledge & results Adopting the paradigm of accumulation of knowledge, so successful in more mature disciplines, based on sharing LRs, tools & results cooperation of many groups on common tasks Ability to build on each other achievements, allowing controlled & effective cooperation of many groups on common tasks (see HumanGenomeProject) e. g. initiatives to achieve international consensus on annotation guidelines collective intelligence Emerging concept of collective intelligence interoperability Emphasize interoperability among LRs & LT

Some steps for a “new generation” of LRs N. Calzolari182nd KYOTO Workshop, Gifu, Japan, January 2011 From huge efforts building static, large-scale, general-purpose LRs dynamic To dynamic LRs rapidly built on- demand, tailored to specific user needs From closed, locally developed and centralized resources To LRs residing over distributed places, accessible on the web, choreographed by agents acting over them From Language Resources To Language Services Need of an infra that makes this vision operational

Lexical WEB As a critical step for semantic mark-up in the Semantic Web N. Calzolari192nd KYOTO Workshop, Gifu, Japan, January 2011 ComLex SIMPLE WordNets FrameNet Lex_x Lex_y with intelligent agents NomLex Standards for Content Interoperability Enough?? Global WordNet GRID BioLexicon SIMPLE-WEB

(Distributed) Language Services N. Calzolari202nd KYOTO Workshop, Gifu, Japan, January 2011 content interoperability standards supra-national cooperation architectures enabling accessibility Collaborative & collective/social development & validation Collaborative & collective/social development & validation, cross-resource integration & exchange of information Create new resources on the basis of existing Exchange & integrate information across repositories Compose new services on demand Can KYOTO contribute?

N. Calzolari2nd KYOTO Workshop, Gifu, Japan, January Which Communities? Language Resources Language Technologies Standardisation Content/Ontologies System developers Integrators SSH EC EC National funding agencies National funding agencies Industry Industry Many applications/domains  MT  CLIR  …  e-government  content industry  intelligence  e-culture  e-health  domotics… core EUForum with Focus on cooperation Many LRs & LTs exist, but a global vision, policy & strategy is needed for CLARIN for SSH CLARIN FLaReNetNetworkFLaReNetNetwork META-NETNoEMETA-NETNoE Need to consider together technical technical organisational organisational strategic strategic economic, social economic, social cultural cultural legal legal political issues wrt LRs & LTs political issues wrt LRs & LTs Many dimensions Today

FLaReNet at a glance Fostering Language Resources Network FLaReNet at a glance An international Forum to facilitate interaction, to Overcome the fragmentation in LR & LT & recreate a community Anticipate the needs of new types of LR & LT & Language Infrastructures Create a shared policy for the next years  Foster a European strategy for consolidating the sector 22 N. Calzolari222nd KYOTO Workshop, Gifu, Japan, January Institutional Members From 33 countries 351 Individual Subscribers Community mobilisation Essential Community mobilisation RI (also to prepare the ground for a RI) Community mobilisation Essential Community mobilisation RI (also to prepare the ground for a RI) “roadmap” A “roadmap”: a plan of actions as input to policy development A ( EU) model for the LRs/LTs area of the next years Ambitious!

N. Calzolari2nd KYOTO Workshop, Gifu, Japan, January Create a shared repository of data formats, annotations, etc. as a major help to achieve standardisation Common repositories for tools & language data should be established that are universally and easily accessible by everyone Coordinate input to ISO/W3C standardisation work Results from Vienna & Barcelona Forum: Shaping the Future of the Multilingual Digital Europe Standards, Interoperability & Metadata are topics to be approached in cooperation Access to LRs is critical & should involve all the community Need to create the means to plug together different LR & LT, In a web-based resource and technology “grid” For a new world-wide language infrastructure

2 nd Blueprint Result of a permanent and cyclical consultation Result of a permanent and cyclical consultation  Inside the community it represents  Outside it, through connections with neighbouring projects, associations, initiatives, funding agencies three main “directions”: Organised along three main “directions”:  Infrastructural Aspects  Research and Development  Political and Strategic Issues development factors Reflect three major development factors that can boost or hinder the growth of the field of LRT N. Calzolari2nd KYOTO Workshop, Gifu, Japan, January Provide feedback!

Sources: many meetings Operational Interoperability Asian Collaboration Workshop FL-SILT Workshop Lexicon/O ntology Standards NEERI 2 nd FLaReNet Forum Less- resourced Languages Automatic Acquisition Legal Issues Standards International Cooperation N. Calzolari2nd KYOTO Workshop, Gifu, Japan, January

N. Calzolari2nd KYOTO Workshop, Gifu, Japan, January rd FLaReNet Forum The European Language Resources and Technologies Forum: Important role in defining recommendations 120 Participants from 22 Countries In Barcelona: 120 Participants from 22 Countries Define final recommendations Define final recommendations Previous Proceedings & Reports on the web  Blueprint discussed  Blueprint will be discussed  Also for adoption & endorsement by Institutional Members  Also for adoption & endorsement by FLaReNet Institutional Members

N. Calzolari2nd KYOTO Workshop, Gifu, Japan, January IssueChallengeRecommended Actions Metadata Interoperability Interoperability of Metadata sets Set up a global infrastructure of common and uniform and/or interoperable metadata sets Metadata usable both by humans and by machines machine-understandable metadata Create machine-understandable metadata with formal syntax and clear semantics Automate the process of metadata creation Develop structured metadata Documentation Reliable documentation common best practices Reliable documentation of LRs according to common best practices Collect documentation Collect all possible and existing LR documentation standard documentation template Devise and adopt a widely agreed standard documentation template for all types of resources Infrastructural Aspects

Political and Strategic dimensions N. Calzolari2nd KYOTO Workshop, Gifu, Japan, January IssueChallengeRecommended Actions Funding Agencies policies easy access Devise models to allow different types of players easy access to resources publicly funded publicly available Ensure that publicly funded resources are publicly available either free of charge or at a small distribution cost of best practices Encourage/enforce use of best practices or standards in LR production projects Make sustainability and sharing/distribution plans mandatory in projects concerning LR production LR citation Appropriate citation of Language Resources like traditional publications a standard protocol for citing Develop a standard protocol for citing language resources KYOTO can be an example

LRE Map: Why?? The Map as an answer to start to fill this gap, but also: “change in culture” To encourage the needed “change in culture” N. Calzolari 2nd KYOTO Workshop, Gifu, Japan, January Problem: Lack of information & documentation about resources is, in the e- science paradigm, a very critical issue Non documented resources don’t exist!! Non documented resources don’t exist!! collective enterprise personal engagement in documenting resources A collective enterprise: Each researcher must become aware of the importance of his/her personal engagement in documenting resources A task as important as creating new resources and not an accessory to be disregarded service to the whole community As the necessary service to the whole community monitor the field Will become an essential instrument to monitor the field

N. Calzolari302nd KYOTO Workshop, Gifu, Japan, January 2011 How many LRs & Types at LREC? Corpora: 785 Lexicons: 289 Tagger/Parser: 181 Annotation tool: 134 Ontology: 73 Evaluation data: 40 Annotation Guidelines: Submissions: 1288LR forms: How many LRs & Types at COLING? Submissions: 880 LR forms: 735 Corpora : % Tagger/Parser: % Lexicons: % Evaluation data: % Ontology, Annotation tool, Evaluation tool, Tokenizer, NER < %

Languages: But obviously … N. Calzolari31 2nd KYOTO Workshop, Gifu, Japan, January !! image courtesy of Wordle (

Availability N. Calzolari 2nd KYOTO Workshop, Gifu, Japan, January Freely available! The wide majority of resources are freely available 3% 15% 25% LREC COLING

The Project META-NET N. Calzolari2nd KYOTO Workshop, Gifu, Japan, January Network of Excellence  META-NET is a Network of Excellence (coord. Hans Uszkoreit) dedicated to fostering the technological foundations of the European multilingual information societyObjectives: large-scale concerted effort  Prepare the ground for a large-scale concerted effort by building a strategic alliance of national and international research programmes, corporate users and commercial technology providers and language communities  Strengthen the European research community through research networking and by creating new schemes and structures for sharing resources and efforts  Build bridges by approaching open problems in collaboration with other research fields such as machine learning, social computing, cognitive systems, knowledge technologies and multimedia content Final goal: META – The Multilingual Europe Technology Alliance

language communities policy makers and funding bodies user industries provider industries language technology community machine learning community semantic techno- logies community cognitive systems community multimedia content techno- logies The META Alliance N. Calzolari342nd KYOTO Workshop, Gifu, Japan, January 2011

Founding Members  Deutsches Forschungszentrum für Künstliche Intelligenz GmbH, Germany  Barcelona Media – Centre d'Innovació, Spain  Consiglio Nazionale Ricerche – Instituto di Linguistica Computazionale “Antonio Zampolli”, Italy  Institute for Language and Speech Processing, R.C. “Athena”, Greece  Charles University in Prague, Czech Republic  Centre National de la Recherche Scientifique – Laboratoire d'Informatique pour la Mécanique et les Sci.s de l'Ingénieur, France  Universiteit Utrecht, The Netherlands  Aalto University, Finland  Fondazione Bruno Kessler, Italy  Dublin City University, Ireland  Rheinisch Westfälische Technische Hochschule Aachen, Germany  Jožef Stefan Institute, Slovenia  Evaluations and Language Resources Distribution Agency, France N. Calzolari352nd KYOTO Workshop, Gifu, Japan, January 2011

Three Lines of Action  The META-NET objectives translate into three lines of action: N. Calzolari362nd KYOTO Workshop, Gifu, Japan, January 2011

The Process META-VISION communication within META-NET (META-VISION) communication in the wider LT community and among other stakeholders communication to policy makers funding bodies, public N. Calzolari372nd KYOTO Workshop, Gifu, Japan, January 2011

 Data has become a key factor in LT R&D  A few indicators:  Increasing size & importance of LREC conference, corpora mailing list, etc.  Citation ranks of publications on language resources Data Intensive Sciences  Language research and language technology belong to the Data Intensive Sciences  Expensive data become valuable through sharing  However, the long demanded and well-contemplated instruments for managing and sharing this data are still missing N. Calzolari2nd KYOTO Workshop, Gifu, Japan, January

META-SHARE: Key Features open, integrated, secure, interoperable exchange infrastructure  META-SHARE is an open, integrated, secure, interoperable exchange infrastructure (resp. Stelios Piperidis) for language data & tools for the Human Language Technologies domain  ever-evolving, scalable, including free and for-a-fee LRs/LTs and services  including legacy, contemporary and emerging datasets, tools and technologies marketplace  A marketplace where language data & tools are documented, uploaded and stored in repositories, catalogued and announced, downloaded, exchanged, aiming to support a data economy (includes free and for-a-fee LRs/LTs and also services)  Standards-compliant  Standards-compliant, overcoming format, terminological and semantic differences distributed networked repositories  Based on distributed networked repositories accessible through common interfaces N. Calzolari2nd KYOTO Workshop, Gifu, Japan, January

What we’re offering share and distribute  A channel to share and distribute language data and tools  Technical solutions for building your own repositories  Protocols and mechanisms for making the descriptions of your resources (and the actual resources) harvestable  Guidelines and recommendations on standards used in the LR production and documentation processes  Recommendations on data and tools licensing issues  Access to large catalogues of documented, high-quality resources, as well as the actual data and tools N. Calzolari2nd KYOTO Workshop, Gifu, Japan, January KYOTO can be among the first

Features  Single Sign-On  Easy Administration  Metadata Harvesting  Persistent Identifiers (PIDs)  Intuitive Search N. Calzolari41  Open Source  Service-Oriented  Distributed  Replication/Backup  Reporting & Statistics 2nd KYOTO Workshop, Gifu, Japan, January 2011

v0 architecture

On the communication/mobilisation side change of culture  A change of culture  Convincing arguments that data assets and their value do not necessarily grow if locked in the drawer  Incentives models  Incentives and models that can convince data holders that there is life after the announcement of data existence and/or sharing (share does not necessarily mean for free, nor for unbridled use)  Interoperability  Interoperability, common metadata, formats, etc. a data economy  In other words we need to create/reinforce a data economy based on widely agreed principles and rules, mutual understanding, sustainable and adaptive models, simplified copyright rules and licensing models  The present time window seems appropriate Challenges 43 N.Calzolari Multilingual Web, Madrid, 2010 KYOTO can be a “model” For other projects to follow

Collaborative iResources LR building as collaborative “common shared task” New methodology of work map of language data and mechanisms Assemble a comprehensive “map of language data and mechanisms” for the planet’s languages (  LRE Map) Interoperability Interoperability acquires even more value Needs consensual planning of common strategies towards shared objectives Not just the sum of many individual efforts But an organised, well-structured, collective enterprise Similar to more mature sciences: Physicists/Astronomers’s experiments … of X,000 people working on the same big enterprise N. Calzolari442nd KYOTO Workshop, Gifu, Japan, January 2011 Paradigm shif t META-SHARE is a big step that needs a real Paradigm shif t

N. Calzolari 452nd KYOTO Workshop, Gifu, Japan, January 2011 We wanted more & more data... Have we been too successful ?!? Main Statement Where do we (try to) encode what we know about language properties? In annotations PreambleVision BUT

N. Calzolari 462nd KYOTO Workshop, Gifu, Japan, January 2011 Strategy A Multilingual Annotation Plan As a Very Large International Initiative Collaborative Resources : A new paradigm for a big language map Means a change of mentality: going beyond “individual” research interests From “my approach” to some “compromise” allowing to go for big amounts/ integration/building on each other/…

N. Calzolari From no infrastructure... To many infrastructures/networks We were complaining there was no infrastructure... Have we been too successful?? many infrastructural/networking initiatives Now many infrastructural/networking initiatives Very good opportunity coordinated & coherent But only if we are able to act in a coordinated & coherent way Otherwise we spoil & confuse the field 47 2nd KYOTO Workshop, Gifu, Japan, January 2011N. Calzolari