Talp Research Center, UPC, Barcelona, Spain

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Building Wordnets Piek Vossen, Irion Technologies.
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
The WordNet Lexical Database Bernardo Magnini ITC-irst, Istituto per la Ricerca Scientifica e Tecnologica Trento - Italy.
Ewa Rudnicka, Wojciech Witkowski, Maciej Piasecki G4.19 Research Group Institute of Informatics, Wrocław University of Technology nlp.pwr.wroc.pl plwordnet.pwr.wroc.pl.
Complete and Consistent Annotation of WordNet with the Top Concept Ontology Javier Álvez, Jordi Atserias, Jordi Carrera, Salvador Climent, Egoitza Laparra,
Building an Ontology-based Multilingual Lexicon for Word Sense Disambiguation in Machine Translation Lian-Tze Lim & Tang Enya Kong Unit Terjemahan Melalui.
Research topics Semantic Web - Spring 2007 Computer Engineering Department Sharif University of Technology.
Creating a Bilingual Ontology: A Corpus-Based Approach for Aligning WordNet and HowNet Marine Carpuat Grace Ngai Pascale Fung Kenneth W.Church.
Integrating data sources on the World-Wide Web Ramon Lawrence and Ken Barker U. of Manitoba, U. of Calgary
Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers.
Title: Chinese Characters and Top Ontology in EuroWordNet Paper by: Shun Sylvia Wong & Karel Pala Presentation By: Patrick Baker.
LREC 2008 AWN 1 Building WordNets: The Arabic case H. Rodríguez.
Course Instructor: Aisha Azeem
BIS310: Week 7 BIS310: Structured Analysis and Design Data Modeling and Database Design.
Ontology Learning and Population from Text: Algorithms, Evaluation and Applications Chapters Presented by Sole.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
FRE 2672 Urban Ontologies : the Towntology prototype towards case studies Chantal BERDIER (EDU), Catherine ROUSSEY (LIRIS)
WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.
Adam Pease and Christiane Fellbaum Presenter: 吳怡安
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Ontology Development Kenneth Baclawski Northeastern University Harvard Medical School.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
CLEF Ǻrhus Robust – Word Sense Disambiguation exercise UBC: Eneko Agirre, Oier Lopez de Lacalle, Arantxa Otegi, German Rigau UVA & Irion: Piek Vossen.
INF 384 C, Spring 2009 Ontologies Knowledge representation to support computer reasoning.
School of Computing FACULTY OF ENGINEERING Developing a methodology for building small scale domain ontologies: HISO case study Ilaria Corda PhD student.
LREC 2008 AWN 1 Arabic WordNet: Semi-automatic Extensions using Bayesian Inference H. Rodríguez 1, D. Farwell 1, J. Farreres 1, M. Bertran 1, M. Alkhalifa.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
10/18/20151 Business Process Management and Semantic Technologies B. Ramamurthy.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
Definition of a taxonomy “System for naming and organizing things into groups that share similar characteristics” Taxonomy Architectures Applications.
Application of INTEX in refinement and validation of Serbian WordNet Ivan Obradović, Ranka Stanković Cvetana Krstev, Gordana Pavlović-Lažetić University.
Integrating Semantic Dictionaries for English, French and Bulgarian into the NooJ System for the Purposes of Information Retrieval Svetla Koeva, Max Silbetztein.
UNCERTML - DESCRIBING AND COMMUNICATING UNCERTAINTY WITHIN THE (SEMANTIC) WEB Matthew Williams
Integrating lexical units, synsets and ontology in the Cornetto Database Piek Vossen 1, 2, Isa Maks 1, Roxane Segers 1, Hennie van der Vliet 1 1: Faculty.
Proposed NWI KIF/CG --> Common Logic Standard A working group was recently formed from the KIF working group. John Sowa is the only CG representative so.
EEL 5937 Ontologies EEL 5937 Multi Agent Systems Lecture 5, Jan 23 th, 2003 Lotzi Bölöni.
Wordnet - A lexical database for the English Language.
1 Chen Yirong, Lu Qin, Li Wenjie, Cui Gaoying Department of Computing The Hong Kong Polytechnic University Chinese Core Ontology Construction from a Bilingual.
Topic 4 - Database Design Unit 1 – Database Analysis and Design Advanced Higher Information Systems St Kentigern’s Academy.
THE SEMANTIC WEB By Conrad Williams. Contents  What is the Semantic Web?  Technologies  XML  RDF  OWL  Implementations  Social Networking  Scholarly.
Copy right 2004 Adam Pease permission to copy granted so long as slides and this notice are not altered Ontology Overview Introduction.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
TUNING HIERARCHIES IN PRINCETON WORDNET AHTI LOHK | CHRISTIANE D. FELLBAUM | LEO VÕHANDU THE 8TH MEETING OF THE GLOBAL WORDNET CONFERENCE IN BUCHAREST.
WonderWeb. Ontology Infrastructure for the Semantic Web. IST Project Review Meeting, 11 th March, WP2: Tools Raphael Volz Universität.
Knowledge Support for Modeling and Simulation Michal Ševčenko Czech Technical University in Prague.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
Ontology Technology applied to Catalogues Paul Kopp.
DALOS Progress Meeting – April 20th Florence The Lois data base A Knowledge Organization System for Dalos Daniela Tiscornia.
Linked Open Data Dataset from Related Documents Petya Osenova and Kiril Simov IICT-BAS LDL-2016, LREC, Portoroz.
SERVICE ANNOTATION WITH LEXICON-BASED ALIGNMENT Service Ontology Construction Ontology of a given web service, service ontology, is constructed from service.
KYOTO (ICT ) Knowledge Yielding Ontologies for Transition-Based Organization Intelligent Content and Semantics The First KYOTO Workshop February.
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
The Semantic Web By: Maulik Parikh.
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
DOMAIN ONTOLOGY DESIGN
Automatically Extending NE coverage of Arabic WordNet using Wikipedia
Ontology Engineering: from Cognitive Science to the Semantic Web
Web Ontology Language for Service (OWL-S)
ArtsSemNet: From Bilingual Dictionary To Bilingual Semantic Network
Irion Technologies (c)
Chapter 2 Database Environment Pearson Education © 2009.
Ontology.
WordNet: A Lexical Database for English
WordNet WordNet, WSD.
Ontology-Based Approaches to Data Integration
Ontology.
Business Process Management and Semantic Technologies
Chapter 2 Database Environment Pearson Education © 2009.
ONTOMERGE Ontology translations by merging ontologies Paper: Ontology Translation on the Semantic Web by Dejing Dou, Drew McDermott and Peishen Qi 2003.
Presentation transcript:

Talp Research Center, UPC, Barcelona, Spain Arabic WordNet: What has been done, what could we do, what should we do? Horacio Rodríguez Talp Research Center, UPC, Barcelona, Spain horacio@lsi.upc.edu http://lsi.upc.edu/horacio NOOJ 2009

Index of the talk Introduction Arabic WordNet Ontologies Wordnets What has been done what could we do what should we do NOOJ 2009

Introduction semantic components used in NLP applications: Ontologies large-scale knowledge-bases. Need (or convenience) of developing wide-coverage domain-independent lexico-conceptual ontologies WordNet NOOJ 2009

Ontologies Ontologies represent static domain knowledge allowing an efficient use by multiple knowledge agents Acquiring domain knowledge for building ontologies is highly costly and time consuming. For this reason lots of methods and techniques have been developed for trying to reduce such efforts NOOJ 2009

Ontologies What an ontology is: Studer et al, 1998 an ontology is a formal explicit specification of a shared conceptualization Gruber, 1993 Studer et al, 1998 A conceptualization is an abstract, simplified view of the world represented for some purpose An ontology is a description (formal specification) of a set of concepts and relationships for enabling knowledge sharing and reuse (to perform logical commitments) An ontology commitment is an agreement to use a vocabulary in a way that is consistent with respect to the theory specified by the ontology NOOJ 2009

Lexico-Conceptual Ontologies NOOJ 2009

Ontologies The mapping between lexical items (words or multiwords) and concepts can be complex. Due to polysemy, most lexical items can be mapped into more than one concept. Due to synonymy, more than one word can be mapped to a concept. Usually the mapping is splitted into two steps from words into word-senses (i.e. different word meanings) and from word-senses into concepts. NOOJ 2009

Wordnets Princeton's English WordNet (Miller et al, 1990), (Fellbaum, 1998) Semantic Information more than 123,000 words organised in 117,000 synsets (WN3.0) more than 235,000 relations between synsets Freely available: http://wordnet.princeton.edu/ NOOJ 2009

Wordnets Princeton's English WordNet Lexicalised concepts (words, compounds, multiwords) Synset: synonym set (of words) Large semantic net conecting synsets synonymy, antonymy, hyperonymy, hyponymy, meronymy, implication, causation ... Structure Noun hierarchy depth ~12 Verb hierarchy depth ~3 Adjective/adverb not in hierarchy, but in star structure NOOJ 2009

Exemple of WN relations NOOJ 2009

Wordnets NOOJ 2009

Wordnets Beyond WN EuroWordNet (Vossen 98) UE funded project Integrated local wordnets in several languages English Sheffield Dutch Amsterdam Italian Pisa Spanish UB, UPC, UNED. http://www.hum.uva.nl/~ewn/ NOOJ 2009

Wordnets NOOJ 2009

Wordnets EuroWordNet Architecture Core Extensions Inter-Lingual-Index (ILI) Top Concept Ontology (TCO) Domain Ontology (DO) Extensions Local wordnets Domain wordnets NOOJ 2009

Wordnets Beyond WN EWN2 ITEM, CREL EuroTerm, Jur-Wordnet Balkanet German (GermaNet), French, Chec, Swedish, Estonian ITEM, CREL Spanish, Catalan, Basque (UB, UPC) EuroTerm, Jur-Wordnet Extending EWN in particular domain Balkanet Extending EWN for the Balkan languages Hownet Chinese WN NOOJ 2009

Wordnets Macro Ontologies based on WN MCR Yago Omega NOOJ 2009

Arabic WordNet USA REFLEX program funded (2005-2007) Partners: Universities Princeton Manchester UPC (Barcelona) UB (Barcelona) Companies Articulate Software Irion NOOJ 2009

Arabic WordNet papers Introducing the Arabic WordNet Project Black et al, 2006 Building a WordNet for Arabic Elkateb et al, 2006 Arabic WordNet: Current State and Future Extensions Rodríguez et al, 2008 Arabic WordNet: Semi-automatic Extensions using Bayesian Inference Automatically Extending NE coverage of Arabic WordNet using Wikipedia AlKhalifa, Rodríguez, 2009 NOOJ 2009

Arabic WordNet Objectives 10,000 synsets including some amount of domain specific data linked to PWN 2.0 finally to PWN 3.0 linked to SUMO + 1,000 NE manually built (or revised) vowelized entries including root of each entry NOOJ 2009

Arabic WordNet Criteria for selecting synsets to be covered Connectivity as densely connected as possible Most of them connected to English WN counterparts the overall topology of both wordnets is expected to be similar. Relevance Frequent and salient concepts Generality Synsets on the highest levels of WN NOOJ 2009

Arabic WordNet Approach described in 3rd GWC (Elkateb et al, 2006) Manually built 2 lexicographic interfaces Manchester, Barcelona guided by automatically generated suggestions of <Arabic word, English synset> pairs coming from bilingual resources. NOOJ 2009

Arabic WordNet Approach BCs Filling gaps Covering of EWN & Balkanet Base Concepts Filling gaps Building Arabic specific synsets Covering domain specific synsets Adding NEs. (Semi) automatic extensions heuristic based Bayesian networks NOOJ 2009

Arabic WordNet Resources used LOGOS database of Arabic verbs: contains 944 fully conjugated Arabic verbs Bilingual (Arabic-English) dictionaries NMSU bilingual Arabic-English lexicon: Salmoné University of Barcelona Effel Corpora Arabic GigaWord Corpus (from LDC) UN (2000-2002) bilingual Arabic-English Corpus (from LDC). NOOJ 2009

Arabic WordNet Representation database (implemented in MySQL) interchange format (XML) The database structure comprises four principal entity types: item, word, form and link. NOOJ 2009

AWN: What has been done Current (Final ?, we hope no!!!) figures up to date statistics: http://www.lsi.upc.edu/~mbertran/arabic/awn/query/sug_statistics.php. Arabic synsets 11270 Arabic words 23496 pos DB content a 661 n 7961 r 110 v 2538 Named entities: Synsets that are named entities 1142 Synsets that are not named entities 10028 Words in synsets that are named entities 1656 NOOJ 2009

AWN: What has been done Software Lexicographer's Web Interface http://www.lsi.upc.edu/~mbertran/arabic/awn/update/synset_browse.php User's Web Interface http://www.lsi.upc.edu/~mbertran/arabic/awn/index.html The Arabic Word Spotter http://www.lsi.upc.edu/~mbertran/arabic/wwwWn7/ AWN browser http://sourceforge.net/projects/awnbrowser/ AWN to SUMO mapping including automatic generation of Arabic paraphrases of SUMO formal axioms NOOJ 2009

AWN: What could we do AWN has a relatively small coverage compared with PWN But due to the way of building it, the coverage of most important concepts is comparable. AWN has a lower density of relations compared with PWN But many uses of PWN are reduced to the hypernymy/hyponymy relation and the coverage of this relation is similar in both WNs NOOJ 2009

AWN: What could we do AWN is fully linked to PWN and through PWN to many other WNs Existing WNs, specially PWN are used today in almost all NLP tasks that need (or involve) semantic (lexical) knowledge NOOJ 2009

AWN: What could we do USE AWN FOR NLP TASKS So, the morale is: For instance: Look at Csomai's bibliography in WN page NLP tasks IR, IE, MT, WSD, Coreference Resolution, Summarization, Textual Entailment, WN mappings, NER, NERC, Language Models, Semantic distances NOOJ 2009

AWN: What should we do IMPROVE AND EXTEND AWN AWN is far to be complete only 10,000 regular synsets only 1,000 NE synsets low density of relations lack of appropriate APIs for interfacing computer applications So, the morale is: IMPROVE AND EXTEND AWN NOOJ 2009

AWN: What should we do Lines of improvement: Extend AWN coverage: Manual semi-automatic Heuristic-based approach GWC 2008 (Rodríguez et al, 2008a) Bayesian Networks LREC 2008 (Rodríguez et al, 2008b) Improve the relation density. Finding new relations Manually revising relations existing for other languages (specially English) using of roots as way of suggesting new relations NOOJ 2009

AWN: What should we do Lines of improvement: Building APIs for make easier the use of the database (Perl, Python, Prolog, C, Java, ...) including computing of semantic distances Extending the coverage of NEs Ex. from Wikipedia as Knowledge Source Citala 2009 (AlKhalifa, Rodríguez, 2009) Link AWN with other already available resources: Wikipedia CyC Geonames ... NOOJ 2009

Thank you for your attention NOOJ 2009