Presentation is loading. Please wait.

Presentation is loading. Please wait.

Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa N. CalzolariNijmegen, August 20101.

Similar presentations

Presentation on theme: "Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa N. CalzolariNijmegen, August 20101."— Presentation transcript:

1 Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa N. CalzolariNijmegen, August 20101

2 2 MultiLex GeneLex AcquiLex Xxx-Lex A. Zampolli: Let’s be coherent: Xxx-Lex After the “Grosseto Workshop” (1985): a turning point N. Calzolari 2 Nijmegen, August 2010

3 N. CalzolariNijmegen, August 20103 Reusability  Reusability as key concept  true also today To avoid duplication of efforts, costs, etc. To allow synergies, integration, exchange of data,... To provide a model for new data creation & acquisition “feasible”priorities  Decide on “feasible” areas & state priorities  this is changing over time strong sign of maturity The feasibility of formulation of consensual standards as a strong sign of maturity in the field  we can’t propose standards if there are not enough results on which to base them EAGLES was launched in ‘93 EAGLES was launched in ‘93 Key issues: Do conditions exist for standardisation effort?

4 Main Results in Lexicon & Corpus WGs First Phase ( N. Calzolari 4Nijmegen, August 2010 Standard for morphosyntactic encoding of lexical entries, in a multi-layered structure, with applications for all the EU languages Standard for subcategorisation in the lexicon: a set of standardised basic notions using a frame-based structure Proposal for a basic set of notions in lexical semantics: focus on requirements of Information Systems and MT Corpus Encoding Standard (CES) from TEI Standard for morphosyntactic annotation of corpora, to ensure compatibility/ interchangeability of concrete annotation schemata Standard for morphosyntactic annotation of corpora, to ensure compatibility/ interchangeability of concrete annotation schemata Preliminary recommendations for syntactic annotation of corpora Dialogue annotation, for integration of written and spoken annotation

5 N. CalzolariNijmegen, August 20105 Content vs. Format/Representation LMF : In LMF : on the abstract meta-model

6 N. CalzolariNijmegen, August 20106 Flexibility in the Recommendations e.g. Morphosyntax Recommendation Level Information Type Recommendation Obligatory  L-0 Part-of-Speech Obligatory Recommended  L-1 Morphosyntactic agreement Recommended features Optional  L-2 Language-specific (or refined) Optional features

7 N. CalzolariNijmegen, August 20107 MERITS  Strengths (from EAGLES-ISLE)

8 N. CalzolariNijmegen, August 20108 Why Standards for Language Resources? (from EAGLES-ISLE)  important for workflows  essential for a LR Infrastructure  for evaluation campaigns

9 N. CalzolariNijmegen, August 20109 Applications: requirements for systems & enabling technologies Machine Translation Information Extraction Information Retrieval Summarisation Natural Language Generation Word Clustering Multiword Recognition + Extraction Word Sense Disambiguation Proper Noun Recognition ParsingCoreference…

10 N. CalzolariNijmegen, August 201010 The Multilingual ISLE Lexical Entry (MILE)

11 N. CalzolariNijmegen, August 201011 MILE – Modularity The building-block model syntactic frame phrase slot Syn feature Lexical Objects Sem feature Lexical entry 1 Lexical entry 2 Lexical entry 3 Allow to express different dimensions of lexical entries Enable modular specification of lexical entries Create ready-to-use packages to be combined in different ways Lexical Classes as the main building blocks of the lexical architecture  Done in LMF

12 N. CalzolariNijmegen, August 201012 The MILE Data Categories User-adaptability and extensibility HUMAN ARTIFACT EVENT ANIMAL GROUP AGE MAMMAL instance_of Core UserDefined MLC:SemanticFeature  OK in ISOCat

13 N. CalzolariNijmegen, August 201013 MILE Lexical Data Category Registry A library of pre-instantiated objects  DC Selections  To be done … in ISOCat

14 N. CalzolariNijmegen, August 201014 ISO - LMF Lexical Markup Framework

15 N. CalzolariNijmegen, August 201015 ISO LMF Structural skeleton, with the basic hierarchy of information in a lexical entry + various extensions  Modular framework  LMF specs comply with modelling UML principles  an XML DTD allows implementation Builds on EAGLES/ISLE NEDOAsianLang. The field is mature NICT Language- Grid Service Ontology ICTKYOTO LIRICS New initiatives … LexInfo

16 Barcelona, IEC, 7-8 juliol de 2009 Monica Monachini Mettere entrata PAROLE in XML LMF compliant Nijmegen, August 2010

17 Barcelona, IEC, 7-8 juliol de 2009 Monica Monachini Nijmegen, August 2010 DCR

18 N. CalzolariNijmegen, August 201018 Mapping experiment Major best practices: OLIF PAROLE/SIMPLE LC-Star (Speech Lexicon) WordNet - EuroWordNet FrameNet BDef formal database of lexicographic definitions derived from Explanatory Dictionary of Contemporary French from Monica Monachini

19 BioLexicon SIMPLE model & ISO-LMF standard N. Calzolari 19Nijmegen, August 2010 BLBL A unique large-scale computational lexicon in the biomedical domain in terms of coverage & typology of information Populated with info from available biomedical resources Semi-automatically populated from corpora: Population toolkit available Including both domain- specific & general language words Rich linguistic information ranging over different linguistic descriptions levels Conformant to international lexical representation standards Designed to meet Bio- Text Mining requirements from Monica Monachini

20 The BioLexicon: why Nijmegen, August 2010N. Calzolari20

21 ICT-211423 Nijmegen, August 2010 KYOTO: the lexical resource perspective

22 KYOTO SYSTEM N. Calzolari 22Nijmegen, August 2010 Linear MAF/SYNAF Linear SEMAF Term extraction Tybot Generic TMF Semantic annotation Linear Generic FACTAF Fact extraction Kybot Domain editing Wikyoto Wordnet Domain Wordnet LMF API Ontology Domain ontology OWL API Concept User Fact User from Piek Vossen Source Documents

23 ICT-211423 Nijmegen, August 2010 A common representation format for WordNets Wn IT Wn EN Wn EU Wn NL Wn JP Wn CH Wn ES  endow WordNet with a representation format allowing easy access, integration & interoperability among resources Wn IT Wn EN Wn EU Wn NL Wn JP Wn CH Wn ES

24 ICT-211423 Nijmegen, August 2010 N. Calzolari24 GlobalInformation Lemma Monolingual ExternalRef Monolingual ExternalRefs Sense LexicalEntry Statement Definition SynsetRelation SynsetRelations Monolingual ExternalRef Monolingual ExternalRefs Synset Lexicon Interlingual ExternalRef Interlingual ExternalRefs SenseAxis SenseAxes LexicalResource 1..1 1..*0..1 1..* 1..1 0..* 0..1 1..* Meta 0..1 Meta 0..1 Meta 0..1 Meta 0..* 0..1 1..* 0..* 0..1 1..* Data Categories from Monica Monachini

25 ICT-211423 Nijmegen, August 2010 A list of 85 sem.rels as a result of a mapping of the KYOTO WordNet grid Inter-WN Intra-WN

26 ICT-211423 Nijmegen, August 2010 N. Calzolari26 SWN 09686541-n IWN 00001251-n WordNet-LMF Multilingual level - Cross-lingual Relations WN3.0 13480848-n groups monolingual synsets corresponding to each other and sharing the same relations to English link to ontology/(ies) specifies the type of correspondence from Monica Monachini

27 ICT-211423 Kyoto Knowledge Base Nijmegen, August 2010 WnIT Domain WnEN Domain WnEU Domain WnNL Domain WnJP Domain WnCH Domain WnES Domain Ontology Domain Ontology

28 LMF and Named Entity Lexicon Nijmegen, August 2010 from Monica Monachini N. Calzolari28

29 Named Entity Lexicon Nijmegen, August 2010 Wikip LR Onto from Monica Monachini N. Calzolari 29

30 N. CalzolariNijmegen, August 201030 LexInfo & Previous Models From Paul Buitelaar

31 LMF: ILC infrastructure Nijmegen, August 2010 N. Calzolari 31

32 Desiderata for Semantic Roles 32Nijmegen, August 2010 Martha Palmer N. Calzolari

33 Nijmegen, August 201033 Some steps for a “new generation” of LRs From huge efforts in building static, large-scale, general-purpose LRs To dynamic LRs rapidly built on-demand, tailored to specific user needs From closed, locally developed and centralized resources To LRs residing over distributed places, accessible on the web, choreographed by agents acting over them  From Language Resources To Language Services BUT Need of tools to make this vision operational & concrete

34 N. CalzolariNijmegen, August 201034 Lexical WEB & Content Interoperability As a critical step for semantic mark-up in the SemWeb ComLex SIMPLE WordNets FrameNet Lex_x Lex_y LMF with intelligent agents NomLex Standards for Interoperability Enough?? Global WordNet GRID BioLexicon SIMPLE-WEB

35 N. CalzolariNijmegen, August 201035 A new paradigm of R&D in LRs & LT A new paradigm of R&D in LRs & LT Distributed Language Services

36 N. CalzolariNijmegen, August 201036 A few Issues for discussion: “content”, guidelines, tools, priorities,... Semantic Web “content” interoperability:‘mature’ enough to converge For Semantic Web & “content” interoperability: is the field ‘mature’ enough to converge also for the semantic/conceptual level (e.g. to automatically establish links among different languages)? usability requirements of industrial applications For the standards to have impact, ensure their usability & gain industry support focusing on requirements of industrial applications Guidelines “usable product” To have Guidelines which are a “usable product” (to assist in creation or adaptation of lexicons, …) open-source reference implementation platform & toolsweb services Facilitate acceptance of the standards providing an open-source reference implementation platform & tools, related web services and test suites Spoken language Relation with Spoken language community further stepspriorities Define further steps necessary to converge on common priorities

37 N. CalzolariNijmegen, August 201037 Limits observed & needs of further work

38 N. CalzolariNijmegen, August 201038 Strengths

39 N. CalzolariNijmegen, August 201039 Future requirements & planning

40 N. CalzolariNijmegen, August 201040 FLaReNet Mission: structure the area of LR & LT of the future

41 N. CalzolariNijmegen, August 201041 International Cooperation Some results from FLaReNet Vienna Forum: International Cooperation

42 N. CalzolariNijmegen, August 201042

Download ppt "Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa N. CalzolariNijmegen, August 20101."

Similar presentations

Ads by Google