Presentation is loading. Please wait.

Presentation is loading. Please wait.

1Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Nicoletta Calzolari Istituto di Linguistica Computazionale -

Similar presentations


Presentation on theme: "1Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Nicoletta Calzolari Istituto di Linguistica Computazionale -"— Presentation transcript:

1 1Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa An Infrastructure of Language Resources & Language Technologies: Why we need it? Priorities & Challenges

2 2Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 What are we (LT& LR) assembling, …. since many years? Lexicons & their Ontologies Lexicons & their Ontologies Written, Spoken, ItalWordNets, PAROLE/SIMPLE, … Written, Spoken, ItalWordNets, PAROLE/SIMPLE, … Annotated corpora/Treebanks Annotated corpora/Treebanks Basic Tools Basic Tools Integrated Architecture for Integrated Architecture for Annotation at various levels (from morph. to conceptual) Annotation at various levels (from morph. to conceptual) Acquisition/learning Acquisition/learning Classification Classification Ontology creation Ontology creation … Methodologies Methodologies Know how & expertise Know how & expertise Infrastructural bodies Infrastructural bodies (on which to build) Standards … a very large infrastructure of LRs & LT

3 3Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 History: Some international LRs initiatives ACQUILEX [ since 88 ] ACQUILEX [ since 88 ] MULTILEX MULTILEX ET-7 ET-7 ET-10 ET-10 TEI TEI NERC NERC RELATOR RELATOR ONOMASTICA ONOMASTICA MULTEXT MULTEXT COLSIT COLSIT LSGRAM LSGRAM DELIS DELIS EAGLES EAGLES PAROLE PAROLE SIMPLE SIMPLE SPARKLE SPARKLE ELSNET ELSNET EuroWordNet EuroWordNet MATE MATE NITE NITE Cluster 488 (Italian) Cluster 488 (Italian) TAL (Italian) TAL (Italian) ISLE ISLE ENABLER ENABLER INTERA INTERA … SENSEVAL SENSEVAL WRITE WRITE Forum TAL (Italian) Forum TAL (Italian) … LIRICS LIRICS ISO ISO ELRA ELRA LREC LREC LRE Journal LRE Journal NEDO NEDO … Essential role of EC to start a basic Infrastructure EU at the forefront in the areas of LRs and standards in the 90s EU at the forefront in the areas of LRs and standards in the 90s Established a model

4 4Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Today: a broad potential Infrastructure RELATOREAGLES/ISLEENABLERELSNETTELRIINTERA…LIRICSELRABLARK Unified Lexicon (W/S) LREC LRE journal …ERANET-LangNet… LDC & others ISOCOCOSDA/WRITE US Cyberinfrastructure Japan COE21 NEDO… EU Internat National ……… Cooperative initiatives – Links to… CLARIN (ESFRI proposal ) Vitality & Success signs… for LRs

5 5Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 {Casa,abitazione,dimora} Hyperonym : Hyperonym : {edificio,..} Hyponym: {villetta } {catapecchia, bicocca,.. } {cottage} {bungalow } Role_location: {stare, abitare,...} Role_target_direction: {rincasare} Role_patient: {affitto, locazione} Mero_part: {vestibolo} {stanza} Holo_part : {casale} {frazione} {caseggiato} {} {home,domicile,..} {} {house} TOP Concepts Object,Artifact,Building TOP Concepts: Object,Artifact,Building WordNets Synsets linked by semantic relations

6 6Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Terminological Wordnets: e.g. Jur-WordNet Jur-WordNet Extension for the juridical domain of ItalWordNet Jur-WordNet Extension for the juridical domain of ItalWordNet (With ITTIG-CNR - Istituto di Teoria e Tecniche dellInformazione Giuridica) Knowledge base for multilingual access to sources of legal information Knowledge base for multilingual access to sources of legal information Source of metadata for semantic markup oflegal texts Source of metadata for semantic markup oflegal texts To be used, together with the generic ItalWordNet, in applications of Information Extraction, Question Answering, Automatic Tagging, Knowledge Sharing, Norm Comparison, etc. To be used, together with the generic ItalWordNet, in applications of Information Extraction, Question Answering, Automatic Tagging, Knowledge Sharing, Norm Comparison, etc.

7 7Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 PAROLE- SIMPLE-CLIPS Lexicon: …harmonised model for 12 European languages

8 8Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006TopFormalConstitutiveAgentive Telic Is_aIs_a_part_ofProperty Contains Created_byAgentive_causeIndirect_telicPurpose InstrumentalIs_the_habit_of Used_forUsed_as... The targets of relations identify: prototypical semantic information associated with a SemU prototypical semantic information associated with a SemU elements of dictionary definitions of SemUs elements of dictionary definitions of SemUs typical corpus collocates of the SemU typical corpus collocates of the SemU 100 Rels. 100 Rels... Activity.... For a BioLexicon

9 9Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 mangiare Domain - Semantic class

10 10Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 mangiare Used_for Object_of_th e_ aactivity mangiare mangiare tavola FURNITURE forchetta posata INSTRUMENT ristorante BUILDING cucinare cuocere mestolo pentola CONTAINER mangiare friggere friggitrice bollitore bollire pesce pesciera Is_the_activity_of cuoco PROFESSION cucinare mangiare mangiare mangiare mangiare coniglio carne mela carota arrosto mangiare ARTIFACT _FOOD VEGETABLES FRUIT FOOD SUBSTANCE_FOOD +edible zucchero alloro tartufo VEGETAL_ENTITY FLAVOURING NATURAL_SUBSTANCE AGENTIVE TELIC Created_by cucinare cuocere arrostire bollire lessare stufare friggere rosolare grigliare …… Domain - Semantic class

11 11Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 These dimensions could be at the basis of a new Paradigm for LRs & LT & of a new Infrastructure Dynamic LRs Sharing Collaborative creation & Manag. Content interoperability + Distributed architectures Need tools Technology exist In the 90s : there was a global vision of the field & its main components: Standards, Creation of LRs, Automatic acquisition, Distribution Today : the wealth of data & basic technology is such that we should reflect again at the field as a whole & ask if these are still the important components, or how they have changed/must change … Which new challenges for a mature infrastructure of LRs & LT??

12 12Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Basic LR coverage for all languages(BLARK/ELARK Basic LR coverage for all languages (BLARK/ELARK) Specific (new) types of LRs: opinion, sentiment, emotion, subjectivity; Specific (new) types of LRs: opinion, sentiment, emotion, subjectivity; Example-based context sensitive LRs, Lexicon & Corpus together dynamically created, new ways to extract value from large linguistic repositories : Web exploited as a multilingual corpus Example-based context sensitive LRs, Lexicon & Corpus together, dynamically created, new ways to extract value from large linguistic repositories : Web exploited as a multilingual corpus Tools to quickly develop LRs (acquisition, annotation, porting betw. domains/languages); Coordinate the development of LTs & LRs (also across languages) Tools to quickly develop LRs (acquisition, annotation, porting betw. domains/languages); Coordinate the development of LTs & LRs (also across languages) Knowledge transfer across languages; Maintenance of LRs Knowledge transfer across languages; Maintenance of LRs Cooperation betw. communities of HLT & Semantic Web/Ontologists Cooperation betw. communities of HLT & Semantic Web/Ontologists 'Open Source'concept for LRs & LT, Open & distributed architectures for LRs and LT, wiki-mode? Collaborative Infrastructures Interoperability & Standards 'Open Source' concept for LRs & LT, Open & distributed architectures for LRs and LT, wiki-mode? Collaborative Infrastructures Interoperability & Standards GRID technology GRID technology … Challenges & Priorities for LRs with technological and/or organisational/political aspects Multilinguality Unifying frameworks

13 13Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Subjectivity, opinion, sentiment, emotion Detection /separation of subjective from objective content positive & negative perceptions big impact Subjectivity, opinion, sentiment, emotion: orthogonal issue wrt objective content. Detection /separation of subjective from objective content, opinion mining, extraction of positive & negative perceptions, have obvious and big impact in many applications, e.g. business intelligence Commonsense understanding Commonsense understanding with major implications allow commonsense reasoning/inference: plausible vs logical, for fail-soft applications allow commonsense reasoning/inference: plausible vs logical, for fail-soft applications can be pursued in distributed and collaborative fashion by the community as a whole can be pursued in distributed and collaborative fashion by the community as a whole relation of this with how an agent might put together SW services to accomplish high–level goals for the user relation of this with how an agent might put together SW services to accomplish high–level goals for the user Temporal structureTimeML Temporal structure for which de facto standards are emerging (TimeML) Integration of text, speech and gesture Integration of text, speech and gesture handling miscommunication Strategies for handling miscommunication Hybrid approaches, Interdisciplinary approaches Hybrid approaches, Interdisciplinary approaches … LT & new topics Multimodality

14 14Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 HLT Natural convergence with HLT : multilingual semantic processingmultilingual semantic processing ontologiesontologies semantic-syntactic computational lexiconssemantic-syntactic computational lexicons In the Semantic Web vision... …need to tackle the twofold challenge of content availability & content availability & multilinguality multilinguality

15 15Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Issues in LR & LT research agenda converging with Semantic Web needs From LT: Meaning & content Knowledge Meaning & content Knowledge Semantic markup: Concept-based Text representation Semantic markup: Concept-based Text representation Semantic lexicons/ Terminologies/ Ontologies Semantic lexicons/ Terminologies/ Ontologies To create a web of metadata Viceversa, from SW: LRs as web services LRs as web services Ontologies for LRs & LT Collaborative & distributed infrastructure; open access Collaborative & distributed infrastructure; open access Interoperability & standards Interoperability & standards to add meaning to Web data & make it usable for processing, mining, add spatial & temporal metadata, …

16 16Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Computational Lexicons: challenges from the Semantic Web Semantic Web The Semantic Web Vision turning the WWW into a machine understandable knowledge base Ontologies Knowledge Markup Intelligent Agents Applications Documents Databases Computational Lexicons Linguistic Markup

17 17Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Language/s Ontologies and Computational Lexicons ConceptSpaceConceptSpace Ontology ComputationalLexicon Semantics Syntax Morphology Multilinguality polysemy, context-sensitiveness, etc.

18 18Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 term extraction from text {museo, quadro, pinacoteca, biblioteca, sito_archeologico, museo_archeologico, museo_etrusco, scultura, affresco, …} conceptual clustering of terms C_MUSEO: {museo, pinacoteca, …} C_MUSEO_ARCHEOLOGICO: {museo_archeologico, museo_etrusco, …} C_OPERA_ARTISTICA: {quadro, scultura, affresco, …} C_MUSEO C_MUSEO_ARCHEOLOGICO is_a Ontology concept structuring concept structuring TL+ML Ontology Learning

19 19Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Identification of horizontal relations among terms through the events which better characterise them Ontology Learning in T2K from thesaurus to conceptual map events - situations involving domain entities

20 20Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Reference Lexical Resources Tools for terminology extraction Tools for Annotation of the logical structure Structured Knowledge LOGICAL FORM Module of analysis of Italian For Applications: Semantic/Conceptual Annotation of Texts

21 21Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Semantic Web LT & LRs Content Interoperable LRs & LT Language Tech … & … Knowledge, Content Knowledge Markup Ready?? ? How to cooperate?? Hum&SS

22 22Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 A new paradigm of R&D in LRs & LT Open & distributed linguistic infrastructures for LRs & LT Open & distributed linguistic infrastructures for LRs & LT adopting the paradigm of accumulation of knowledge so successful in more mature disciplines, based on sharing LRs & tools adopting the paradigm of accumulation of knowledge so successful in more mature disciplines, based on sharing LRs & tools ability to build on each other achievements, results accessible to various systems, allowing controlled & effective cooperation of many groups on common tasks (see HGP HLP) ability to build on each other achievements, results accessible to various systems, allowing controlled & effective cooperation of many groups on common tasks (see HGP HLP) Emerging concept of collective intelligence Emerging concept of collective intelligence Emphasize interoperability among LRs, LT & knowledge bases Emphasize interoperability among LRs, LT & knowledge bases e. g. initiatives aimed at achieving international consensus on annotation guidelines: to merge annotation efforts, produce coherent, comprehensive linguistic annotations to be readily disseminated throughout the community e. g. initiatives aimed at achieving international consensus on annotation guidelines: to merge annotation efforts, produce coherent, comprehensive linguistic annotations to be readily disseminated throughout the community

23 23Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 ISO & LIRICS: Meta-model & Data Categories e.g. Proposal for an ISO standard for NLP lexica Define a Lexical Markup Framework, a general & abstract meta- model & a set of structural nodes relevant for linguistic description Define a Lexical Markup Framework, a general & abstract meta- model & a set of structural nodes relevant for linguistic description Define a flexible environment, enabling specific implementations of user-defined mark-up languages (called LML) on the basis of common DCs Define a flexible environment, enabling specific implementations of user-defined mark-up languages (called LML) on the basis of common DCs Objectives abstract lexical meta-model Design of the abstract lexical meta-model common setData Categories Definition of the common set of related Data Categories The field is mature Builds also on EAGLES/ISLE

24 24Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 MILE Lexical Model Data Categories for Content Interoperability MILE Entry Schema MILE Lexical Classes User Defined User Defined MDC Registry RDF/SDescriptions Monolingual/MultilingualLexicon ISO TC37 SC4/WG4 Multilingual ISLE Lexical Entry LIRICS NEDO

25 25Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Beyond MILE: towards open & distributed Lexicon Infrastructure Semantic Lexicon URI = Syntactic Constructions URI = Ontology URI = Monolingual/ Multilingual Lexicons Lexicons Lex_object: semFeature URI = Lex_object: syntagmaNT URI = Corpora/ Web Language Knowledge …towards the Semantic Web

26 26Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Lexical WEB & Standards for Content Interoperability … still open as a critical step for semantic mark-up in the SemWeb as a critical step for semantic mark-up in the SemWeb ComLex SIMPLE WordNets FrameNet Lex_x Lex_y MILE with intelligent agents NomLex Standards for Interoperability

27 27Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Open distributed architectures for LRs and LT, interoperability, GRID technology, … & standards e-Science: GRID technology for large-scale distributed collaborative processing of huge quantities of facts & their relations (development of large-scale annotated LRs, linking them across different sources, …) GRID technology for large-scale distributed collaborative processing of huge quantities of facts & their relations (development of large-scale annotated LRs, linking them across different sources, …) problem of how to coordinate different information sources problem of how to coordinate different information sources new ways of extending large-scale LRs and knowledge bases relying on volunteer labour, wiki-mode? new ways of extending large-scale LRs and knowledge bases relying on volunteer labour, wiki-mode? interoperability Towards: Large online open source collaborative projects

28 28Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Need of tools to make this vision operational & concrete E.g. new prototype built in Pisa ( LeXFlow, a web-based collaborative environment for semi- automatic management of lexical resources LeXFlow, a web-based collaborative environment for semi- automatic management of lexical resources Is intended to fulfil the requirements posed by innovative types of LRs by supporting: Is intended to fulfil the requirements posed by innovative types of LRs by supporting: Dynamic language resources, integrating tools for automatic acquisition of information from corpora and cross-fertilization of lexicons Dynamic language resources, integrating tools for automatic acquisition of information from corpora and cross-fertilization of lexicons Content interoperability of resources, by supporting ISLE/ISO standards Content interoperability of resources, by supporting ISLE/ISO standards Cooperative & collective creation and management of LRs, by providing a web-based environment for the collaboration and interaction of distributed agents and resources Cooperative & collective creation and management of LRs, by providing a web-based environment for the collaboration and interaction of distributed agents and resources

29 29Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Why an infrastructure of LRs? Because what is special in Language data … Because what is special in Language data … … is what is more difficult wrt hard sciences, … is what is more difficult wrt hard sciences, i.e. language and its ambiguity Already in the ENABLER Mission: Availability of LRs also a sensitive issue, Availability of LRs also a sensitive issue, touching the sphere of linguistic & cultural identity, but also with economical & political implications Putting together technical,organisational,strategic,political issues of LRs

30 30Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Cultural issues cultural identity Language … and cultural identity the Humanities Language … and the Humanities Why an infrastructure of LRs? Many dimensions around the notion of language Economic, social issues Applications Services Technical issues Interdisciplinarity & Multidisciplinarity Political issues e.g. a commonly agreed list of minimal requirements for national LRs: BLARK Multilingualism Need of bodies for a broad research agenda & strategic actions for LT&LRs (W/S /MM) Putting together technical, organisational, strategic, political issues of LRs

31 31Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Which Communities? Language Resources Language Resources Language Technology Language Technology Standardisation Standardisation Grid Grid Semantic Web Semantic Web Ontologists Ontologists ICT ICT … Humanities Humanities Social Sciences Social Sciences Digital Libraries Digital Libraries Cultural Heritage Cultural Heritage … Many application domains Many application domains ( eculture, egovernment, ehealth, …) ( eculture, egovernment, ehealth, …) core Multilinguality Enablinginfrastr for on Focus on cooperation Technologies exist, but the infrastructure that puts them together and sustains them is still missing for


Download ppt "1Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Nicoletta Calzolari Istituto di Linguistica Computazionale -"

Similar presentations


Ads by Google