Presentation is loading. Please wait.

Presentation is loading. Please wait.

N. Calzolari 1 Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa Risorse Linguistiche.

Similar presentations


Presentation on theme: "N. Calzolari 1 Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa Risorse Linguistiche."— Presentation transcript:

1 N. Calzolari 1 Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa Risorse Linguistiche (lessici, corpora, ontologie, …) Standard e tecnologie linguistiche (cont.) With many others at ILC … e Progetti

2 N. Calzolari 2 Dottorato, Pisa, Maggio 2009 SIMPLE Model for a BioLexicon SIMPLE Model for a BioLexicon Design a representational model for a BioLexicon, a comprehensive lexical resource Design a representational model for a BioLexicon, a comprehensive lexical resource able to integrate terminological, lexical and ontological info able to integrate terminological, lexical and ontological info compatible with HLT international standards (i.e. ISO) compatible with HLT international standards (i.e. ISO) able to meet the domain-specific requirements able to meet the domain-specific requirements Implement a BioLexicon database, a container with lexical objects to be filled with data provided by “ populators ” (EBI, UoM & CNR-ILC) Implement a BioLexicon database, a container with lexical objects to be filled with data provided by “ populators ” (EBI, UoM & CNR-ILC) –able to be automatically incremented with new terms and linguistic info extracted from texts from Valeria Quochi

3 N. Calzolari 3 Dottorato, Pisa, Maggio 2009 Terminolgy to Ontology Jena/Rennes/EBI Bio-Lexicon Population variants; synt info of terms UoM Term Repository Gather terms EBI Bio-events extraction of bio-events ILC BioLexicon Building cycle Bio-Lexicon Conceptual model and physical DB ILC from Valeria Quochi

4 N. Calzolari 4 Dottorato, Pisa, Maggio 2009 The BioLexicon: where from Existing repositories MEDLINE BioLexicon chemical compounds, species names, disease, enzymes Subclustering of term variants genes/proteins Incremental population process Named Entity Recognition Term Mapping by Normalisation new genes/proteins names Manual curation Verbs, nouns, adjs, advs (variants, inflected forms, derivative relations,...) Linguistic pre-processing Subcat extraction Manual annotation of a bio-event corpus Bio-event extraction Syn-sem mapping from Simonetta Montemagni

5 N. Calzolari 5 Dottorato, Pisa, Maggio 2009 BioLexicon Model: High-level lexical objects, Data Categories Syntax Semantics e.g.

6 N. Calzolari 6 Dottorato, Pisa, Maggio 2009 GeneRegOnto – BioLex Concepts to Predicates from Valeria Quochi

7 N. Calzolari 7 Dottorato, Pisa, Maggio 2009 Regulation PositiveProtei n Regulation NegativeProtein Regulation regulates Transcription Factor Protein isregulatedby regulate PredRegulate Arg0RegulateArg1Regulate NF-ATIL2 regulation regulates regulator regulatee bio event concept bio entity concept bio relations Bio-specific qualia relations bio semantic entry predicative argument structure bio semantic roles NF-AT positively regulates IL2 from Valeria Quochi

8 N. Calzolari 8 Dottorato, Pisa, Maggio 2009 SynBehaviour SynBehaviour Lesion1 SubcatFrame SubcatFrame pp-of Sense Sense Lesion1 Predicate Predicate LESION SemArg Arg0 Pat Activity Protein SynArg Arg0 pp-of The pattern “lesion of PROTEIN” is not in the lexicon, but can be calculated accessing info scattered over various lexical objects (i.e the syntactic unit lesion heads a pp-of corresponding to the patient argument, restricted by the ontological node PROTEIN) All lexical items labelled as PROTEIN can be candidates to fill this argument slot. Lesion of OmpC, OmpR, etc… are all admitted instances/sentences of this “predicate”/pattern. BioLexicon

9 N. Calzolari 9 Dottorato, Pisa, Maggio 2009 derivesFrom derived_from precededBy? participatesIn? hasParticipant? agentOf … hasAgent? functionOf is_the_activity_of hasFunction … instanceOf … isAis_a partOf is_a_part_of hasPart has_as_part GrainOf… hasGrain … componentOf … hasComponent … properPartOf … hasProperPart … locatedIn … locationOf … containtIn …contains adjacentTo? Constitutive Telic Formal Agentive Good mapping of Relations OBO Relations Relations from Extended Qualia Structure

10 N. Calzolari 10 Dottorato, Pisa, Maggio 2009 Enhancing Semantic Relations Source_SenseRel TypeTarget_Sense PhosphoglycolateBelongsToSpeciesMouse phosphoglycolatemouse BelongsToSpecies from Valeria Quochi

11 N. Calzolari 11 Dottorato, Pisa, Maggio 2009 Place(s) of Semantics in BootStrep How to link Bio-Ontology and Bio-Lexicon Place(s) of Semantics in BootStrep Bio-Ontology holds domain specific as well as general semantics Bio-Ontology holds domain specific as well as general semantics (in terms of classes and relations between classes) Lexicon model comes with semantic layer based on linguistic ontology (SIMPLE-CLIPS Ontology) Lexicon model comes with semantic layer based on linguistic ontology (SIMPLE-CLIPS Ontology)Questions: What relation between bio-ontology and linguistic ontology? What relation between bio-ontology and linguistic ontology? Do they overlap? What is the overlap/intersection? the difference? Do they overlap? What is the overlap/intersection? the difference? Mapping possible? How could a mapping look like? Mapping possible? How could a mapping look like?Aim: Bringing lexical semantics and ontological semantics together Bringing lexical semantics and ontological semantics together ?

12 N. Calzolari 12 Dottorato, Pisa, Maggio 2009 the BioLexicon Model & Standards The Bio-Lexicon is based on the MILE metamodel and the more recent ISO proposal of a Lexical Markup Framework (LMF) Data Categories drawn as far as possible from already existing repositories and standards (i.e. morphosyntactic datacat) There is the need, however, to define a set of Data Categories specific for the biology domain (i.e. semantic roles and relations)

13 N. Calzolari 13 Dottorato, Pisa, Maggio 2009 ISO Meta-model & Data Categories An ISO standard for NLP lexica  Definition of the Lexical Markup Framework, a general & abstract meta-model & a set of structural nodes relevant for linguistic description Objectives abstract lexical meta-model  Design of the abstract lexical meta-model common setData Categories  Definition of the common set of related Data Categories The field is mature from Monica Monachini

14 N. Calzolari 14 Dottorato, Pisa, Maggio 2009 ISO - LMF Specifically designed to accommodate as many models of lexical representation as possible Specifically designed to accommodate as many models of lexical representation as possible Its pros: Its pros: Meta-model: a high-level specification ISO24613 Meta-model: a high-level specification ISO24613 Data Category Registry: low-level specifications ISO12620 Data Category Registry: low-level specifications ISO12620 Not a monolithic model, rather a modular framework Not a monolithic model, rather a modular framework LMF library provides the hierarchy of lexical objects (with structural relations among them) LMF library provides the hierarchy of lexical objects (with structural relations among them) Data Category Registry provides a library of descriptors to encode linguistic information associated to lexical objects (N.B. Data Categories can be also user- defined) Data Category Registry provides a library of descriptors to encode linguistic information associated to lexical objects (N.B. Data Categories can be also user- defined)

15 N. Calzolari 15 Dottorato, Pisa, Maggio 2009 ISO LMF – Lexical Markup Framework Structural skeleton, with the basic hierarchy of information in a lexical entry + various extensions; LMF specs comply with modelling UML principles; an XML DTD allows implementation Builds also on EAGLES/ISLE NEDOAsianLang. NICT Language- Grid Service Ontology ICTKYOTO LIRICS

16 N. Calzolari 16 Dottorato, Pisa, Maggio 2009 LMF: NLP Extension for Semantics

17 N. Calzolari 17 Dottorato, Pisa, Maggio 2009 Lexical Entry Lexical Entry LE_protein Lemma L_protein SyntacticBeahviour SB_protein Representation Frame RF_protein DC: writtenForm= protein

18 N. Calzolari 18 Dottorato, Pisa, Maggio 2009 Event Representation through SemanticPredicate SemanticPredicate SP_regulate SemanticArgument SP_TF_protein DC: role=agent SemanticArgument SP_Target Gene DC: role=patient

19 N. Calzolari 19 Dottorato, Pisa, Maggio 2009 Sense activate_2 Synset activate PredicativeRepre sentation SemanticFeature SF_chemistry SF_process Collocation SemanticRelation is_a: [SenseID] Typical_of: [SenseID] S_protein Sense Representation

20 N. Calzolari 20 Dottorato, Pisa, Maggio 2009 S _cox15 S _chromosome19 is _in Sense S_chromosome19 SemanticRelation Is_in Sense S_cox15 Example of Semantic Relation

21 N. Calzolari 21 Dottorato, Pisa, Maggio 2009 Example: How to encode Wordnet type of Info in LMF

22 N. Calzolari 22 Dottorato, Pisa, Maggio 2009 XML based Abstract Lexicon Interchange Format Mapping exercise Major best practices: OLIF PAROLE/SIMPLE LC-Star WordNet - EuroWordNet FrameNet BDef formal database of lexicographic definitions derived from Explanatory Dictionary of Contemporary French … …others on the way… Entries from existing lexicons have been mapped to LMF to prove that the model is able to represent many best practices and achieve unification from Monica Monachini

23 N. Calzolari 23 Dottorato, Pisa, Maggio 2009 Lexical WEB & Content Interoperability  ‘Standards’ As a critical step for semantic mark-up in the SemWeb As a critical step for semantic mark-up in the SemWeb ComLex SIMPLE WordNets FrameNet Lex_x Lex_y LMF with intelligent agents NomLex Standards for Interoperability Enough? ?

24 N. Calzolari 24 Dottorato, Pisa, Maggio 2009 Need of tools to make this vision operational & concrete New prototype “LeXFlow”: web-based collaborative environment for semi-automatic management/integration of lexical resources web-based collaborative environment for semi-automatic management/integration of lexical resources enabling interoperability of distributed lexical resources enabling interoperability of distributed lexical resources accessed by different types of agents accessed by different types of agents addressing semi-automatic integration of computational lexicons, with focus on linking and cross-lingual enrichment of distributed LRs addressing semi-automatic integration of computational lexicons, with focus on linking and cross-lingual enrichment of distributed LRs Case-study: cross-fertilization between Italian and Chinese WordNets Case-study: cross-fertilization between Italian and Chinese WordNets From Language Resources From Language Resources To Language Services To Language Services

25 N. Calzolari 25 Dottorato, Pisa, Maggio 2009

26 N. Calzolari 26 Dottorato, Pisa, Maggio 2009 Our WN case study ItalWordNet (Roventini et al., 2003) ItalWordNet (Roventini et al., 2003) Academia Sinica Bilingual Ontological WordNet (Sinica BOW, Huang et al., 2004) Academia Sinica Bilingual Ontological WordNet (Sinica BOW, Huang et al., 2004) Both connected to Princeton WordNet (although to different versions) Both connected to Princeton WordNet (although to different versions) Same set of semantic relations (EWN ones) Same set of semantic relations (EWN ones)

27 N. Calzolari 27 Dottorato, Pisa, Maggio 2009 ILI Mapper Italian Simple Italian Wordnet Chinese Wordnet Relation Mapper Web service Interface MultiWordnet Relation Calculator Web service Interface Simple-Wordnet Relation Calculator Agent Role1Agent Role4 Agent Role2 Agent Role3 Coordination Application Data Architecture for cooperative integration of lexicons

28 N. Calzolari 28 Dottorato, Pisa, Maggio 2009 Basic assumptions behind MWN … Interlingual level: Interlingual level: Interlingua provides an indirect linkage between different WordNets: the Interlingual Index (ILI), an unstructured version of WordNet used in EuroWordNet Interlingua provides an indirect linkage between different WordNets: the Interlingual Index (ILI), an unstructured version of WordNet used in EuroWordNet Each synset in a WN A is linked to at least one record of the ILI by means of a set of relations (eq_synonym, eq_near_synonym, …) Each synset in a WN A is linked to at least one record of the ILI by means of a set of relations (eq_synonym, eq_near_synonym, …) Synset correspondence: Synset correspondence: If there is a S A and a S B that point to the same ILI, they are correspondent If there is a S A and a S B that point to the same ILI, they are correspondent Relation correspondence: Relation correspondence: If there are two synsets in WN A and a relation between them, the same holds between corresponding synsets in WN B If there are two synsets in WN A and a relation between them, the same holds between corresponding synsets in WN B

29 N. Calzolari 29 Dottorato, Pisa, Maggio 2009 passaggio, strada,via N#1290 iperonimia/HYP parte, tratto N#12348 carreggiata N#21225 iponimia/HPO che_dao ( 車道 ) N# tong_dao ( 通道 ) N# dao_lu,dao,lu ( 道路, 道, 路 ) N# 上位(泛稱)詞 _ 為 /HYP meronimy/MPT ILI n road,route ILI n Synonym ILI n bend,crook,turn ILI n ILI n passage ILI n ILI n stretch ILI1.6-??? ILI n roadway ILI n curvatura, svolta,curva N#20944 Synonym 下位(特指)詞 _ 為 /HPO wan ( 彎 ) N# 部件 _ 部份詞 _ 為 /MPT A new proposed mero relation Reinforcement & validity Derived

30 v Absorb, assimilate Ingest, take_in v v receive, have v v imbibe v v acquire_knowledge v V#32080 assimilare_5, assorbire_3, accettare_2, recepire_1 V#39802 prendere_3 AG#42011 relativo_ v 吸 v v causes has_hyperonym HPO HYP eq_syn eq_near_syn a Respective, several, various a V#32925 studiare_3, imparare_1, apprendere_2 eq_near_syn has_hyperonym HYP CAU has_hyponym Derived

31 N. Calzolari 31 Dottorato, Pisa, Maggio 2009 For a Global WordNet Grid This architecture for making distributed wordnets interoperable lends itself to different applications in LR processing: This architecture for making distributed wordnets interoperable lends itself to different applications in LR processing: Enrichment of existing lexical resources Enrichment of existing lexical resources Creation of new resources Creation of new resources Validation of existing resources Validation of existing resources Can provide a platform for cooperative & collective creation & management of LRs, by providing a web-based environment for the collaboration & interaction of distributed agents and resources Can provide a platform for cooperative & collective creation & management of LRs, by providing a web-based environment for the collaboration & interaction of distributed agents and resources Can be seen as the Prototype of a web application supporting the GlobalWordNet Grid initiative, i.e. a shared multi-lingual knowledge base for cross-lingual processing based on distributed resources over the Grid Prototype of a web application supporting the GlobalWordNet Grid initiative, i.e. a shared multi-lingual knowledge base for cross-lingual processing based on distributed resources over the Grid New project: KYOTO

32 N. Calzolari 32 Dottorato, Pisa, Maggio 2009 Top Middle  H20CO2 Substance Abstract Process Physical Ontology Environmental organizations Tybot: term yielding robot Kybot: knowledge yielding robot Wordnets Distributed, diverse & dynamic data 1 Capture text: "Sudden increase of CO2 emissions in 2008 in Europe" 2 CO2 emission 3 Wikyoto maintain terms & concepts 4 Index facts: Process:Emission Involves: CO2 Property:increase, sudden When: 2008 Where: Europe 5 Text & Fact Index Semantic Search 6 Citizens Governments Companies Domain CO2 Emission H20 Pollution Greenhouse Gas from Piek Vossen

33 N. Calzolari 33 Dottorato, Pisa, Maggio 2009 TEXT Linear DAF Discourse Annotation Linear MAF Morphological Annotation Linear SYNAF Syntactic Annotation Linear SEMAF Term Extraction (Tybot) Generic TMF Semantic Annotation Linear Generic FACTAF Wordnet Domain Wordnet LMF API ontology domain ontology OWL API Fact Extraction (Kybot) Domain Terms Language Specific Language Neutral & Specific Language Neutral from Piek Vossen

34 N. Calzolari 34 Dottorato, Pisa, Maggio 2009 System components Wikyoto = wiki environment for a social group: Wikyoto = wiki environment for a social group: to model the terms and concepts of a domain and agree on their meaning, within group, across languages and cultures to model the terms and concepts of a domain and agree on their meaning, within group, across languages and cultures to define the types of knowledge and facts of interest to define the types of knowledge and facts of interest Tybots = Term extraction robots, extract term data from text corpus Tybots = Term extraction robots, extract term data from text corpus Kybots = Knowledge yielding robots, extract facts from a text corpus Kybots = Knowledge yielding robots, extract facts from a text corpus Linguistic processors: Linguistic processors: tokenizers, segmentizers, taggers, grammars tokenizers, segmentizers, taggers, grammars named entity recognition named entity recognition word sense disambiguation word sense disambiguation generate a layered text annotation in Kyoto Annotation Format (KAF) generate a layered text annotation in Kyoto Annotation Format (KAF) from Piek Vossen

35 N. Calzolari 35 Dottorato, Pisa, Maggio 2009 KYOTO SYSTEM Linear SYNAF/SEMAF Linear SEMAF Term extraction (Tybot) Generic TMF Semantic annotation Linear Generic FACTAF Fact extraction (Kybot) Domain editing (Wikyoto) Wordnet Domain Wordnet LMF API Ontology Domain ontology OWL API Concept User Fact User from Piek Vossen

36 N. Calzolari 36 Dottorato, Pisa, Maggio 2009 Fact mining by Kybots Source Documents Linguistic Processors [[the emission] NP [of greenhouse gases] PP [in agricultural areas] PP ] NP Morpho-syntactic analysis  AbstractPhysical H2OCO2 Substance CO2 emission water pollution OntologyWordnets & Linguistic Expressions Generic Process Chemical Reaction Logical Expressions Domain [[the emission] NP ] Process: e1 [of greenhouse gases] PP Patient: s2 [in agricultural areas] PP ] Location: a3 Fact analysis Patient from Piek Vossen

37 N. Calzolari 37 Dottorato, Pisa, Maggio 2009 Contribution of KYOTO html hundreds of thousands sources in the environment domain in many different languages spread all over the world changing every day xls pdf KYOTO learns terms and concepts from text documents, Stored as structures that people and computers understand Wordnet environment terms Ontology environment concepts Wordnet environment terms Wordnet environment terms Wordnet environment terms KYOTO delivers a Web 2.0 environment for community based control Connects people across language and cultures Establish consensus and knowledge transition KYOTO enables semantic search and fact extraction Software can partially understand language and exploit web 1 data Understanding is helped by the terms and concepts defined for each language environment facts TYBOT KYBOT WIKYOTO from Piek Vossen

38 N. Calzolari 38 Dottorato, Pisa, Maggio 2009 GlobalInformation Lemma Monolingual ExternalRef Monolingual ExternalRefs Sense LexicalEntry Statement Definition SynsetRelation SynsetRelations Monolingual ExternalRef Monolingual ExternalRefs Synset Lexicon Interlingual ExternalRef Interlingual ExternalRefs SenseAxis SenseAxes LexicalResource * * * * Meta 0..1 Meta 0..1 Meta 0..1 Meta 0..* * 0..* * A common representation format: WordNet - LMF Data Categories from Monica Monachini

39 N. Calzolari 39 Dottorato, Pisa, Maggio 2009 Centralized WordNet DC Registry A list of 85 sem.rels as a result of a mapping of the KYOTO WordNet grid Inter-WN Intra-WN from Monica Monachini

40 N. Calzolari 40 Dottorato, Pisa, Maggio 2009 SWN n IWN n WordNet-LMF multilingual level - Cross-lingual synset relations WN n groups monolingual synsets corresponding to each other and sharing the same relations to English link to ontology/(ies) specifies the type of correspondence from Monica Monachini

41 N. Calzolari 41 Dottorato, Pisa, Maggio 2009 Ultimate goal Global standardization and anchoring of meaning such that: Global standardization and anchoring of meaning such that: Machines can start to approach text understanding -> semantic web connects to the current web Machines can start to approach text understanding -> semantic web connects to the current web Communities can dynamically maintain knowledge, concepts and their terms in an easy to use system Communities can dynamically maintain knowledge, concepts and their terms in an easy to use system Cross-linguistic and cross-cultural sharing and communication of knowledge is enabled Cross-linguistic and cross-cultural sharing and communication of knowledge is enabled Comparable to a formalization of Wikipedia for humans AND machines across languages Comparable to a formalization of Wikipedia for humans AND machines across languages from Piek Vossen

42 N. Calzolari 42 Dottorato, Pisa, Maggio 2009 Some steps for a “new generation” of LRs From huge efforts in building static, large-scale, general- purpose LRs From huge efforts in building static, large-scale, general- purpose LRs To non-static LRs rapidly built on-demand, tailored to spefic user needs From closed, locally developed and centralized resources From closed, locally developed and centralized resources To LRs residing over distributed places, accessible on the web, choreographed by agents acting over them From Language Resources From Language Resources To Language Services

43 N. Calzolari 43 Dottorato, Pisa, Maggio 2009 Distributed Language Services A long-term scenario implying content interoperability standards, content interoperability standards, supra-national cooperation and supra-national cooperation and development of architectures enabling accessibility development of architectures enabling accessibility Create new resources on the basis of existing Exchange and integrate information across repositories Compose new services on demand Collaborative & collective/social development and validation, cross-resource integration and exchange of information Collaborative & collective/social development and validation, cross-resource integration and exchange of information Language Grid Wik i

44 N. Calzolari 44 Dottorato, Pisa, Maggio 2009 HLT Natural convergence with HLT : multilingual semantic processingmultilingual semantic processing ontologiesontologies semantic-syntactic computational lexiconssemantic-syntactic computational lexicons In the “Semantic Web” vision... …need to tackle the twofold challenge of content availability & content availability & multilinguality multilinguality

45 N. Calzolari 45 Dottorato, Pisa, Maggio 2009 Semantic Web LT & LRs Content Interoperable LRs & LT Language Tech … & … Knowledge, Content Knowledge Markup Ready?? ? How to cooperate??

46 N. Calzolari 46 Dottorato, Pisa, Maggio 2009 LR and the future of LT or Content Tech The need of ever growing and richer LRs for effective multilingual content processing requires a change in the paradigm, & the design of a new generation of LRs, based on open content interoperability standards The Semantic Web notion may be used to shape the LRs of the future, in the vision of an open space of sharable knowledge available on the Web for processing The effort of making available millions of “richly annotated words” for dozens of languages is not affordable by any single group This objective can only be achieved creating integrated Open and Distributed Linguistic Infrastructures Not only the linguistic experts can participate in these, but may include designers, developers, users of content encoding practices, etc. in wiki mode  Is the LR/LT field mature enough to broaden and open itself to the concept of cooperative effort of different set of communities?  Could a sort of “Language Genome” large initiative be effective? Storing lots of (annotated) facts

47 N. Calzolari 47 Dottorato, Pisa, Maggio 2009 In Spoken, Written, Multimodal areas … … in new emerging areas Statistical approaches… Different dimensions & layers: Content (Ontologies), Emotion, Time, … For Evaluation For Training … LREC (> 900 submissions); many LRs at COLING and even at ACL!! ELRA (self-sustaining) & LDC LRE (new Journal: N. Ide & NC) ISO-TC37-SC4/WG4 (International Standards for LRs) AFNLP… FLaReNet ESFRI - CLARIN (also political & strategic role) New calls or initiatives in EU, US, ASIA, on LRs, interoperability, cooperation, … Today, many vitality & success signs… for LRs

48 N. Calzolari 48 Dottorato, Pisa, Maggio 2009 BUT … an important point In the ’90s There was a global vision of the field & its main components: There was a global vision of the field & its main components: Standards Standards Creation of LRs Creation of LRs Distribution DistributionThen: Automatic acquisition Automatic acquisition … towards the Infrastructure of LRs & LT While today: There is an ever increasing set of initiatives for new LRs, basic robust technologies, models??, algorithms, There is an ever increasing set of initiatives for new LRs, basic robust technologies, models??, algorithms, We have a LR community culture BUT sort of scattered, opportunistic, not much coherence ELRA LDC

49 N. Calzolari 49 Dottorato, Pisa, Maggio 2009 Today … The wealth of data & of basic technologies is such that: We should reflect again at the field as a whole & ask if Standards Standards Creation of LRs Creation of LRs Automatic acquisition Automatic acquisition Distribution Distribution are still “the” important components, or how they have changed/must change … Which new challenges towards a new & more mature infrastructure of LRs & LTs?? Dynamic LRs  Dynamic LRs Sharing  Sharing Collaborative creation & Manag.  Collaborative creation & Manag.  Content interoperability

50 N. Calzolari 50 Dottorato, Pisa, Maggio 2009 These dimensions could be at the basis of a new Paradigm for LRs & LT & of a new Infrastructure ?? Dynamic LRs  Dynamic LRs Sharing  Sharing Collaborative creation & Manag.  Collaborative creation & Manag. Content interoperability  Content interoperability + Distributed architectures/infrastr  Distributed architectures/infrastr Need more Technology exists

51 N. Calzolari 51 Dottorato, Pisa, Maggio 2009 Cultural issues cultural identity  Language … and cultural identity the Humanities  Language … and the Humanities Many dimensions around the notion of language Economic, social issues  Applications  Services Technical issues Interdisciplinarity & Multidisciplinarity Political issues e.g. a commonly agreed list of minimal requirements for “national” LRs: BLARK Multilingualism Need of bodies for a broad research agenda & strategic actions for LT&LRs (W/S /MM) based on all the dimensions We need to put together technical, technical, organisational, organisational, strategic, strategic, economic, economic, political issues of LRs political issues of LRs Two new European Infrastructural & Networking Initiatives finally

52 N. Calzolari 52 Dottorato, Pisa, Maggio 2009 Which Communities? Language Resources Language Resources Language Technologies Language Technologies Standardisation Standardisation Grid Grid Semantic Web Semantic Web Ontologists Ontologists ICT ICT … Humanities Humanities Social Sciences Social Sciences Digital Libraries Digital Libraries Cultural Heritage Cultural Heritage …  Many application domains ( eculture, egovernment, ehealth, …) ( eculture, egovernment, ehealth, …) core Multilinguality Enablinginfrastr for on Focus on cooperation Technologies exist, but the infrastructure that puts them together and sustains them is still missing for FLaReNetNetworkFLaReNetNetwork CLARINResInfra

53 N. Calzolari 53 Dottorato, Pisa, Maggio 2009 CLARIN Large-scale pan-European collaborative effort (31+ countries) Make LRs & LTs available & readily usable to scholars of humanities & social sciences (& all disciplines) Need to overcome the present fragmented situation by harmonising structural and terminological differences Basis is a Grid-type infrastructure and Semantic Web technology The benefits of computer enhanced language processing become available only when a critical mass of coordinated effort is invested in building an enabling infrastructure, which can provide services in the form of provision of tools & resources as well as training & counseling across a wide span of domains The infrastructure will be based on a number of resource, service and expertise centres ESFRI Research Infrastructures Common Language Resources and Technologies Infrastructure for the Humanities & Social Sciences

54 N. Calzolari 54 Dottorato, Pisa, Maggio 2009 comprehensive and free to use distributed archive of LRs & LTs Create a comprehensive and free to use distributed archive of LRs & LTs covering not only the languages of all member states, but also other languages studied and used in Europe tools & resources interoperable across languages & domains, supporting multilingual & multicultural European heritage Through the fact that the tools & resources will be interoperable across languages & domains, contribute to preserving and supporting multilingual & multicultural European heritage open infrastructure of web services new paradigm of distributed collaborative development An operational open infrastructure of web services will introduce a new paradigm of distributed collaborative development Allow many contributors to add all kinds of new services based on existing ones, thus ensuring reusability and allowing scaling up to suit individual needs Allow many contributors to add all kinds of new services based on existing ones, thus ensuring reusability and allowing scaling up to suit individual needs CLARIN Mission

55 N. Calzolari 55 Dottorato, Pisa, Maggio 2009 How can we tackle these challenges? J. Taylor “eScience is about global collaboration in key areas of science and the next generation of infrastructures that will enable it” Need to build new types of platforms  to allow researchers to combine existing resources easily to new ones to tackle the big challenges  to increase the productivity of all interested researchers, since currently too much time is wasted by preparatory work from P. Wittenburg

56 N. Calzolari 56 Dottorato, Pisa, Maggio 2009 eScience Vision new generation CLARIN establishes such a new generation of extended infrastructure Thus CLARIN is not about creating and building new language resources and technology, but  making them available and accessible services  as services  in a stable and persistent infrastructure to allow tackling the great challenges CLARIN:http://www.clarin.euhttp://www.clarin.eu Grid Project:http://www.mpi.nl/dam-lrhttp://www.mpi.nl/dam-lr ISO TC37/SC4:http://www.tc37sc4.orghttp://www.tc37sc4.org Standards Project:http://lirics.loria.fr/http://lirics.loria.fr/ from P. Wittenburg

57 N. Calzolari 57 Dottorato, Pisa, Maggio 2009 We have still a long path … in an e-Contentplus Call for a: “Thematic Network on Language Resources”: “Thematic Network on Language Resources”:FLaReNet T o provide common recommendations (to the EC) for future actions To give priorities ‘visions’ Need of ‘visions’ & also a “new project” In a global context, in cooperation with CLARIN & also with non-EU members

58 N. Calzolari 58 Dottorato, Pisa, Maggio 2009 CLARINResInf Which Communities? Language Resources Language Resources Language Technologies Language Technologies Standardisation Standardisation Ontologists Ontologists Content Content EC EC Funding agencies Funding agencies … Humanities Humanities Social Sciences Social Sciences Digital Libraries Digital Libraries Cultural Heritage Cultural Heritage …  Many application domains ( eculture, egovernment, ehealth, intelligence, domotics, content industry, …) ( eculture, egovernment, ehealth, intelligence, domotics, content industry, …) core Multilinguality EUForum for for Focus on cooperation LRs & LTs exist, but a global vision, policy and strategy is still missing for FLaReNetNetwork

59 N. Calzolari 59 Dottorato, Pisa, Maggio 2009 e Content plus A new European Network for Language Resources – Nicoletta Calzolari (coord.) Fostering Language Resources Network

60 N. CalzolariDottorato, Pisa, Maggio A European forum to facilitate interaction among LR stakeholders The Network structure considers that LRs present various dimensions and must be approached from many perspectives: technical, but also organisational economic legal political Addresses also multicultural and multilingual aspects, essential when facing access and use of digital content in today’s Europe FLaReNet Fostering Language Resources Network

61 N. CalzolariDottorato, Pisa, Maggio A layered structure, with leading experts & groups (national and European institutions, SMEs, large companies) for all relevant LR areas (about 40 partners) in collaboration with CLARIN to ensure coherence of LR-related efforts in Europe FLaReNet will consolidate existing knowledge, presenting it analytically and visibly contribute to structuring the area of LRs of the future by discussing new strategies to: convert existing and experimental technologies related to LRs into useful economic and societal benefits integrate so far partial solutions into broader infrastructures consolidate areas mature enough for recommendation of best practices anticipate the needs of new types of LRs Organised in Thematic Working Groups

62 N. CalzolariDottorato, Pisa, Maggio The Chart for the area of LRs in its different dimensions Methods and models for LR building, reuse, interlinking and maintenance Harmonisation of formats and standards Definition of evaluation protocols and evaluation procedures Methods for the automatic construction and processing of LRs Thematic Areas To build together: Evolving RoadMap Blueprint of actions and infrastructures

63 N. CalzolariDottorato, Pisa, Maggio The largest Network of LR and HLT players, with diverse approaches, efforts and technologies Enable progress toward community consensus Give an extended picture of LRs & recast its definition in the light of recent scientific, methodological, technological, social developments Consolidate methods & approaches, common practices, frameworks and architectures A “roadmap” identifying areas where consensus has been achieved or is emerging vs. areas where additional discussion and testing is required, together with an indication of priorities Recommendations in the form of a plan of coherent actions for the EU and national organizations A European model for the LRs of the next years Objectives & expected results Ambitious!

64 N. CalzolariDottorato, Pisa, Maggio The outcomes will be of a directive nature  to help the EC, and national funding agencies, identifying priority areas of LRs of major interest for the public that need public funding to develop or improve A blueprint of actions will constitute input to policy development both at EU and national level  for identifying new language policies that support linguistic diversity in Europe  in combination with strengthening the language product market, e.g. for new products & innovative services, especially for less technologically advanced languages Outcomes of FLaReNet

65 N. CalzolariDottorato, Pisa, Maggio Call for international cooperation also outside Europe and will be relevant for setting up a global worldwide Forum of Language Resources and Language Technologies These Initiatives, … together


Download ppt "N. Calzolari 1 Dottorato, Pisa, Maggio 2009 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa Risorse Linguistiche."

Similar presentations


Ads by Google