Implementation of a QA system in a real context Carlos Amaral (Priberam, Portugal) Dominique Laurent (Synapse Développement, France) Workshop TellMeMore,

Slides:



Advertisements
Similar presentations
M-CAST Multilingual Content Aggregation System based on TRUST Search Engine Borys Czerniejewski Sebastian Lisek Infovide-Matrix S.A. (PL)
Advertisements

Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
GL8 New Orleans December 4-5, 2006 INIST-CNRS (France) From SIGLE to OpenSIGLE and Beyond From SIGLE to OpenSIGLE and Beyond An In-Depth Look at Resource.
Inventories, Discovery of Digital Content Minerva WP3 Sarah Faraud.
Online Access to Cultural Heritage through Digital Collections: the MICHAEL Project Giuliana De Francesco Ministero per i beni e le attività culturali,
Knowledge Organization Research in the last two decades: Fidelia Ibekwe-SanJuanEric SanJuan.
OAF - Workshop, Lisbon, Dec Open Access to Libraries MALVINE and LEAF. Perspectives of the Open Archives Initiative Protocol for Metadata Harvesting.
Implementing Effective Metadata Brian Lavoie Office of Research OCLC Online Computer Library Center, Inc. Intranets 99, San Francisco April 27, 1999.
ETSI Workshop – 24-Oct-06 NESSI From R&D to Competitiveness in Services Frederic Gittler HP Labs Vice-Chairman NESSI Steering Committee.
GMD German National Research Center for Information Technology Darmstadt University of Technology Perspectives and Priorities for Digital Libraries Research.
1 EnviroInfo 2006, 05/09/06 Graz Automatic Concept Space Generation in Support of Resource Discovery in Spatial Data Infrastructures Paul Smits, Anders.
Alexandria Digital Library Project Integration of Knowledge Organization Systems into Digital Library Architectures Linda Hill, Olha Buchel, Greg Janée.
TRUST & QRISTAL (TRUST = Text Retrieval Using Semantic Technologies) (QRISTAL = Questions-Réponses Intégrant un Système de Traitement Automatique des Langues)
Retrieval of Information from Distributed Databases By Ananth Anandhakrishnan.
CLEF QA, September 21, 2006, Synapse Développement, D. LAURENT Why not 100% ?
© 2005 InfoGLOBAL. Documento confidencial. Prohibida su reproducción total o parcial. Confidential document. Total or partial reproduction forbidden DISTRIBUTED.
Ken Varnum Copyright © 2001 Ford Motor Company Information Architecture at Ford Motor Company Ken Varnum Head, Web Development Group Library.
M. Balikova, NL CRCyfrowość bibliotek i archiwów Warszawa, Subject access in Czechia
ELPUB 2006 June Bansko Bulgaria1 Automated Building of OAI Compliant Repository from Legacy Collection Kurt Maly Department of Computer.
Semantic Annotation for Multilingual Search Shibamouli Lahiri
M-CAST Multilingual Content Aggregation System based on TRUST Search Engine Borys Czerniejewski Sebastian Lisek Infovide S.A. (PL)
ARCHIMÈDE Presented by Guy Teasdale Directeur, Services soutien et développement Bibliothèque de l’Université Laval CARL Workshop on Institutional Repositories.
The Unreasonable Effectiveness of Data Alon Halevy, Peter Norvig, and Fernando Pereira Kristine Monteith May 1, 2009 CS 652.
Search Engines and Information Retrieval
Intelligent Information Retrieval CS 336 Lisa Ballesteros Spring 2006.
Keyword extraction for metadata annotation of Learning Objects Lothar Lemnitzer, Paola Monachesi RANLP, Borovets 2007.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
Versus: A Web Repository Daniel Gomes, João P. Campos, Mário J. Silva XLDB Research Group University of Lisbon [dcg, jcampos, Versus is.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
 Ad-hoc - This track tests mono- and cross- language text retrieval. Tasks in 2009 will test both CL and IR aspects.
 Official Site: facility.org/research/evaluation/clef-ip-10http:// facility.org/research/evaluation/clef-ip-10.
GL12 Conf. Dec. 6-7, 2010NTL, Prague, Czech Republic Extending the “Facets” concept by applying NLP tools to catalog records of scientific literature *E.
The ECHA-term project Multilingual REACH and CLP Terminology Dieter Rummel, Translation Centre for the Bodies of the EU Luxembourg EAFT - Oslo, 11 October.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Search Engines and Information Retrieval Chapter 1.
CLEF – Cross Language Evaluation Forum Question Answering at CLEF 2003 ( Bridging Languages for Question Answering: DIOGENE at CLEF-2003.
22 August 2003CLEF 2003 Answering Spanish Questions from English Documents Abdessamad Echihabi, Douglas W. Oard, Daniel Marcu, Ulf Hermjakob USC Information.
QRISTAL (QRISTAL = Questions-Réponses Intégrant un Système de Traitement Automatique des Langues) Questions-Replies Integrating a System to Treat (process)
Information Need Question Understanding Selecting Sources Information Retrieval and Extraction Answer Determina tion Answer Presentation This work is supported.
Survey of Semantic Annotation Platforms
Best Soft Inc Multi Hotel Booking System By Hafijul Khan.
University of Dublin Trinity College Localisation and Personalisation: Dynamic Retrieval & Adaptation of Multi-lingual Multimedia Content Prof Vincent.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
Information Retrieval and Knowledge Organisation Knut Hinkelmann.
Information Retrieval and Web Search Lecture 1. Course overview Instructor: Rada Mihalcea Class web page:
Ontologies and Lexical Semantic Networks, Their Editing and Browsing Pavel Smrž and Martin Povolný Faculty of Informatics,
A Language Independent Method for Question Classification COLING 2004.
Edinburg March 2001CROSSMARC Kick-off meetingICDC ICDC background and know-how and expectations from CROSSMARC CROSSMARC Project IST Kick-off.
Question Answering over Implicitly Structured Web Content
D L T Cross-Language French-English Question Answering using the DLT System at CLEF 2003 Aoife O’Gorman Igal Gabbay Richard F.E. Sutcliffe Documents and.
1 01/10/09 1 INFILE CEA LIST ELDA Univ. Lille 3 - Geriico Overview of the INFILE track at CLEF 2009 multilingual INformation FILtering Evaluation.
Project Overview Vangelis Karkaletsis NCSR “Demokritos” Frascati, July 17, 2002 (IST )
Research Topics/Areas. Adapting search to Users Advertising and ad targeting Aggregation of Results Community and Context Aware Search Community-based.
Introduction to Information Retrieval Example of information need in the context of the world wide web: “Find all documents containing information on computer.
WEB PAGE CONTENTS VERIFICATION AGAINST TAGS USING DATA MINING TOOL IKNOW VІI scientific and practical seminar with international participation "Economic.
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
Evidence from Metadata INST 734 Doug Oard Module 8.
Virtual Information and Knowledge Environments Workshop on Knowledge Technologies within the 6th Framework Programme -- Luxembourg, May 2002 Dr.-Ing.
1 Question Answering and Logistics. 2 Class Logistics  Comments on proposals will be returned next week and may be available as early as Monday  Look.
An example of polish SME engaged in Craft Project TRUST– Multilingual Semantic and Cognitive Search Engine for Text Retrieval using Semantic Technologies.
Analysis of Experiments on Hybridization of different approaches in mono and cross-language information retrieval DAEDALUS – Data, Decisions and Language,
LREC – Workshop on Crossing media for Improved Information Access, Genova, Italy, 23 May Cross-Media Indexing in the Reveal-This System Murat Yakici,
Summon® 2.0 Discovery Reinvented
Multilingualism in UK websites Kate Fernie, MLA
Peggy van der Kreeft Deutsche Welle
Antoine Isaac SEMIC conference
What is the Entrance Exams Task
Presentation transcript:

Implementation of a QA system in a real context Carlos Amaral (Priberam, Portugal) Dominique Laurent (Synapse Développement, France) Workshop TellMeMore, November 24, 2006, C.Amaral, D.Laurent

1. The Question-Answering system What is a QA System ? System that enables the extraction of an answer (or several) to a request (a question) based on a corpus The problematic of « the type of the question » An answer or several, possibly a list from one or several documents, an answer of the type Yes/No…, On a corpus in one or several languages… Workshop TellMeMore, November 24, 2006, C.Amaral, D.Laurent

1.1. QA and Language Processing A QA system appears to be a LP « par excellence » However, certain systems are uniquely based on pattern matching (cf Soubotine & Soubotine, TREC 2003), These systems seems to have reached their limits And, if they can process all what is factual, the complex questions/queries are far beyond their possibility. The best systems validated at TREC and CLEF are based on Automated Language Processing. Workshop TellMeMore, November 24, 2006, C.Amaral, D.Laurent

1.2. OUR QA SYSTEM First developed ( ) within a French innovation project (Anvar) Then (end end 2003) within the European project TRUST (FP5) Currently, (2005/06) within the European project M-CAST (FP6) Main features : targets B2B and B2C, multilingual, NLP based and intensive. Workshop TellMeMore, November 24, 2006, C.Amaral, D.Laurent

A modular conception French Language Module Italian Language Module Portuguese Language Module Polish Language Module English Language Module Indexation engineExtraction of text engine Index Documents Visualization of Results Visualization of Results Czech Language Module Workshop TellMeMore, November 24, 2006, C.Amaral, D.Laurent

1.3. Evaluations of the QA system Professional benchmarking contests and campaigns such as EQueR (2004) and CLEF (2005 & 2006), Evaluations for the French, English, Portuguese and Spanish language modules, in monolingual and multilingual. Workshop TellMeMore, November 24, 2006, C.Amaral, D.Laurent

CLEF 2005 Workshop TellMeMore, November 24, 2006, C.Amaral, D.Laurent

CLEF 2006

Workshop TellMeMore, November 24, 2006, C.Amaral, D.Laurent In CLEF 2005 and CLEF 2006, the best engines for monolingual were our systems for Portuguese and French. And the best systems for multilingual were our systems for English-French, Portuguese-French, Spanish-Portuguese, Portuguese-Spanish. Synapse Développement and Priberam are now partners of the project Quaero.

Workshop TellMeMore, November 24, 2006, C.Amaral, D.Laurent 2. Implementation in M-CAST Project Tests carried-out on books in the National Czech library and the Torun library in Poland, Processing several millions of digitized documents, Manages meta-data and UDC classification, Accommodates questions and answers in English, French, Italian, Portuguese, Polish, Czech Implemented on both library portals

2.1. Adaptation to Digital Libraries Resources Scanned texts : poor quality –> Spell checker to improve the quality of documents. One book, lots of pages : –> Management of multi-part documents during semantic analysis Workshop TellMeMore, November 24, 2006, C.Amaral, D.Laurent

2.2. Integration of Dublin Core document’s attributes Storage of Dublin Core attributes as Metadata QA : Who is the author of Hamlet ? –Adaptation of the system to search in metadata –Use of those metadata as filters Workshop TellMeMore, November 24, 2006, C.Amaral, D.Laurent

2.3. Universal Decimal Classification Storage of UDC codes for each document Search through UDC codes Filtering through UDC codes Semantic disambigation through UDC codes Workshop TellMeMore, November 24, 2006, C.Amaral, D.Laurent

Technical architecture

Workshop TellMeMore, November 24, 2006, C.Amaral, D.Laurent END of Presentation I would appreciate your questions ! Thank you - Merci !