Presentation is loading. Please wait.

Presentation is loading. Please wait.

MEANING: a Roadmap to Knowledge Technologies German Rigau. TALP Research Center. UPC. Barcelona. Bernardo Magnini. ITC-IRST. Povo-Trento.

Similar presentations


Presentation on theme: "MEANING: a Roadmap to Knowledge Technologies German Rigau. TALP Research Center. UPC. Barcelona. Bernardo Magnini. ITC-IRST. Povo-Trento."— Presentation transcript:

1 MEANING: a Roadmap to Knowledge Technologies German Rigau. TALP Research Center. UPC. Barcelona. Bernardo Magnini. ITC-IRST. Povo-Trento. Eneko Agirre. IXA group. EHU. Donostia. Piek Vossen. Irion Technologies. Delft. John Carroll. COGS. U. Sussex. Brighton. Meaning

2 Meaning Introduction n Knowledge technologies (semantic web): make sense of petabytes of information n Range of techniques to automate knowledge lifecycle –Lexical KB (ontologies) –Text understanding (IE or other)  extract high-level meaning  represent and manage in a KB n HLT to enable knowledge technologies

3 Meaning Introduction n Building large and rich KB by hand: –Expensive E.g. CYC, WordNet (EuroWordNet) –Introspection fails to reflect reality in texts, domains Is a “saint” an animate being? not always, image. –Contradictions  Hamper applications of HLT and KT n Richer KBs (ontologies) –Domain knowledge –Contradictory subsets  Semi-automatic means

4 Meaning Introduction Crucial intermediate tasks n Word Sense Disambiguation  From words to concepts (word sense≈concept in KB) n Large scale enrichment of (multilingual) Lexical KB  Enable semantic processing Goal  Large-scale extraction of shallow meaning: relations among concepts

5 Meaning Shallow semantics (Chirac) (invita) (al Dalai_Lama) (a un almuerzo oficial) Invite s456 objectsource destination act s378 s412s933 (Chirac) (invites) (the Dalai_Lama) (to an official lunch)

6 Meaning Introduction Crucial intermediate tasks n Word Sense Disambiguation n Large scale enrichment of (multilingual) Lexical KB Problems (research goals): n Enriching LKBs, acquisition of linguistic knowledge: –Corpora need to be accurately tagged with concepts n Accurate WSD needs: –Hand-tagged data OR richer LKB n Multilinguality: –Words in several languages linked to common concepts

7 Meaning Outline n Major research goals –Knowledge acquisition into LKBs –WSD into LKB concepts –Multilingualism n Meaning roadmap n Overview of the project

8 Meaning Knowledge acquisition into LKBs n Semi-automatic acquisition of linguistic knowledge from corpora is working –Subcategorization information –Selectional preferences –Thematic role assignments –Diathesis alternations –Domain information –Topic signatures –Rich lexico-semantic relations between words (dictionaries) –… n Large bodies of text with (fast) shallow processors

9 Meaning Knowledge acquisition into LKBs n Knowledge for words is not enough: –Verb senses have different selectional preferences for e.g. the subject n The car ate all the petrol (WN) –Verb senses may have different subcat. frames –… n Better to key into word senses: source corpora should be tagged –Better reflect linguistic phenomena –Detect new senses –Clustering senses –Integrate easily into the multilingual LKB

10 Meaning WSD into LKB concepts n Senseval-2 uses word senses (concepts) from WN 1.7 n No large-scale broad-coverage WSD system is available n Accuracy around 60%-70% (V/A/N) when hand-tagged data available –Use hand-tagged data to train ML systems n Ng’s estimate: 16 persons/year (short) n Promising research lines –Automatically create training corpus using semantic relations in the LKB (WN) – Use untagged data to improve performance –Higher precision if more knowledgeable features are used (subcat, sel. preferences, domains) –Coarse grained: Domain tagging / Clusters of senses

11 Meaning Exploiting EWN Semantic Relations WSD

12 Meaning Exploiting EWN Semantic Relations partido 1 Pero España puso al partido intensidad, ritmo y coraje. El seleccionador cree que el partido de hoy contra Italia dará la medida de España El Racing no gana en su campo desde hace seis partidos. partido 2 Todos los partidos piden reformas legales para TV3. La derecha planea agruparse en un partido. El diputado reiteró que ni él ni UDC, “como partido”, han recibido dinero de Pellerols.

13 Meaning Exploiting EWN Semantic Relations partido 1 Rivera pide el soporte de la afición para encarrilar las semifinales. Sólo el equipo de Valero Ribera puede sentenciar una semifinal como lo hizo ayer en un Palau Blaugrana completamente entregado. El Racing ganó los cuartos de final en su campo. partido 2 No negociaremos nunca com un partido político que sea partidario de la independencia de Taiwan. Una vez más es noticia la desviación de fondos destinados a la formación ocupacional hacia la financiación de un partido político. Estas lleyes fueron votadas gracias a un consenso general de los partidos políticos.

14 Meaning Multilingualism n Language diversity is a barrier n Language diversity is helpful –Languages realize meaning in different ways n Use EuroWN multilingual architecture: Interlingual Index (ILI) links translation equivalents via interlingual concepts: –head s cabeza – s jefe n Research on how linguistic knowledge behaves when ported to other language (e.g.subcat information) –Very important for resource-poor languages

15 Meaning Multilingualism n Selectional preference for the object of the first sense of know: sense 1: know, cognize -- (be cognizant or aware of a fact or a specific piece of information; possess knowledge or information about; 0,1128 0,1128 0,0615 0,0615 0,0535 0,0535 0,0389 0,0389 0,0307 0,0307 n In EuroWordNet (http://ixa.si.ehu.es) –antzeman_1, jakin_2 and ezagutu_1 in Basque. –conocer_1 and saber_1 in Spanish –conèixer_1 and saber_1 in Catalan

16 Meaning MEANING roadmap n Solutions have been tried with relative success in isolation n Combination for significant advances (which?) n Web as corpus: BNC (100 Mw) small for many phenomena n Incremental design: a)WSD using whatever knowledge available at the time for bootstrapping b)Acquisition of linguistic knowledge using WSD available at the time (may discard low accuracy examples) c)Integrating acquired knowledge in the Multilingual Central Repository and porting knowledge from one language to the other n Series of cycles: WSD0, WSD1, WSD2, ACQ0, ACQ1, ACQ2, PORT0, PORT1, PORT2

17 Multilingual Central Repository ItalianEWN BasqueEWNSpanishEWN EnglishEWN Basque Web Corpus Italian Web Corpus English Web Corpus CatalanEWN Spanish Web Corpus Catalan Web Corpus ACQ ACQACQ ACQ UPLOADUPLOAD UPLOADUPLOAD PORT PORT PORT PORT WSD WSD WSD WSDMeaningArchitecture

18 n 3 years research project (started march 2002) n M Euro –2 contracted people per site n Consortium –TALP, UPC (German Rigau) –ITC-IRST (Bernardo Magnini) –IXA, UPV/EHU (Eneko Agirre) –University of Sussex (John Carroll) –Irion Technologies (Piek Vossen) Meaning Project overview

19 Meaning Project results n A Tool Set that using the semantic knowledge of EWN will obtain automatically from the web large collections of examples for each particular word sense. n A Tool Set for enriching EWN using the knowledge acquired automatically from the Web. n A Tool Set for selecting accurately the senses of the open-class words for the languages involved in the project. n Multilingual Central Repository to maintain compatibility between WordNets of different languages and versions, past and new. n A semantically annotated corpus for each WordNet word sense, that is, a multilingual web corpus with semantically annotated corpor a n Demonstration: CLIR, Q/A system. The results of MEANING will be public and free for research.

20 Meaning Why now? n Huge amounts of data: throw out non reliable n Syntactic dependencies with high enough accuracy n Supervised WSD with high enough accuracy –Coarser grains, sense domain tagging –Bootstrapping n Success coping with multilingualism: –Porting linguistic knowledge from one language to other using MT / comparable corpora –CLIR as good as monolingual IR

21 MEANING: a Roadmap to Knowledge Technologies German Rigau. TALP Research Center. UPC. Barcelona. Bernardo Magnini. ITC-IRST. Povo-Trento. Eneko Agirre. IXA group. EHU. Donostia. Piek Vossen. Irion Technologies. Delft. John Carroll. COGS. U. Sussex. Brighton. Meaning


Download ppt "MEANING: a Roadmap to Knowledge Technologies German Rigau. TALP Research Center. UPC. Barcelona. Bernardo Magnini. ITC-IRST. Povo-Trento."

Similar presentations


Ads by Google