Presentation is loading. Please wait.

Presentation is loading. Please wait.

M-CAST Multilingual Content Aggregation System based on TRUST Search Engine Borys Czerniejewski Sebastian Lisek Infovide S.A. (PL)

Similar presentations


Presentation on theme: "M-CAST Multilingual Content Aggregation System based on TRUST Search Engine Borys Czerniejewski Sebastian Lisek Infovide S.A. (PL)"— Presentation transcript:

1 M-CAST Multilingual Content Aggregation System based on TRUST Search Engine Borys Czerniejewski Sebastian Lisek Infovide S.A. (PL)

2 The Project eContent project #22249 project start: 1 January 2005 project end: 31 December 2006 M-CAST

3 multilingual, full-text search engine (server version) Internet portals deployed in two libraries content aggregation facility business plan + IPRs/royalties fixed dissemination M-CAST Results Expected Results TRUST – Multilingual Semantic and Cognitive Search Engine for Text Retrieval Using Semantic Technologies (IST-1999-56416) ICONS – Intelligent Content Management System (IST-2001-32429) Previous Projects

4 Users & Business Case digital (Internet) libraries press agencies (press) publishers operators of scientific datases big companies (multinationals) Large Full-Text Data Collections Czech English French* Italian* Polish* Portuguese* Languages

5 Consortium TiP sp. z o.o., Katowice, Poland Synapse Développement SARL, Toulouse, France Priberam Informática Lda., Lisbon, Portugal Expert System S.p.A., Modena, Italy Vysoká Škola Ekonomická v Praze, Prague, Czech Republic Language Technology Infovide-Matrix S.A., Poland Coordinator, integrator The Nicholas Copernicus Provincial and Municipal Library – Toruń, Poland operator of the Polish Internet Library) Národní Knihovna České Republiky, Prague, Czech Republic Users

6 Architecture objectives end user perspective Performance –>1s response time Usability –a user should with ease learn to operate, prepare inputs for, and interpret outputs Availability –high availability - 24x7

7 Architecture objectives customer perspective Security –the M-CAST system should have the ability to manage, protect, and distribute sensitive information –copyrights Interoperability –the M-CAST system should have the ability to use the information that has been exchanged with various systems (resources) Scalability –the M-CAST architecture should be modified with ease to fit the performance and volume requirements

8 Architecture objectives producer perspective Time span –the architecture and technology should be in use in 2007 Portability –the M-CAST system should be transferred with ease from one hardware or software environment to another Flexibility –the M-CAST architecture should be modified with ease for use in applications or environments other than those for which it was specifically designed

9 Architectural decision SOA Service-oriented architecture is an approach to loosely coupled, protocol independent, standards-based distributed computing where software resources available on the network are considered as services.

10 Resources Internal view - architecture Integration layer library catalog system digitalize d resources M-CAST Presentation layer Us End users M-CAST user administrator library portal Linguistc Processor Us External systems

11 Resource Internal view resources M-CAST Resource Metadata Protocol - OAI-PMH Formats - Qualified DublinCore Data Protocol - ftp - http Formats - txt - html - pdf - rtf

12 Architectural decision OAI-PMH & DublinCore http://www.library.edu/oaipmh/OAIDataProvider?verb=GetRecord&identifi er=30843 … Vměnj Křesťanské aneb Přjprawa k dobré Smrti... 09 … UDC Filtering a polysemic word: ball –sens 1: ROUND OBJECT. any object in the shape of a sphere, especially one used as a toy by children or in various sports such as tennis and football –sens 2: DANCE. a large formal occasion where people dance two texts : –D1 : talk about football. UDC: 793 –D2 : Cinderalla. UDC 796 a question: "Where did the ball take place?".

13 Architectural decision OAI-PMH & DublinCore … Bellarmino, Robert Francesco Romolo Při strahovském exempláři B Z VIII 28 rukopiná poznámka: Jacobus Colens S.J. … text/html http://www.manuscriptorium.com? id=1184206 http://www.manuscriptorium.com? id=1184207 http://www.manuscriptorium.com? id_source=1184206

14 Architecture Linguistic Processor Language module PL Language module FR Language module PT Language module CZ Language module IT Language module EN Indexation engine Query engine Language recognizer Document types converters IndexDocuments

15 Linguistic Processor General Ontology Documents Derived Forms dictionary Taxonomy of types of questions Indexation cutting blocks Spelling correction Parsing Conceptual analysis Keywords index Names entities index Heads of derivation index Concepts index Areas index Anaphora resolution Questions-answers types index Question Question processing Spelling correction Parsing Conceptual analysis Extraction of keywords Type of the question Translation if multilingual Search into the index Synonyms + converses Selection of blocks Ordering blocks Extraction of blocks Answer extraction Answer (s) Spelling correction Parsing Conceptual analysis Type of the answer Keywords of the block Anaphora resolution Detection of metaphora Selection sentence (s) Sort of sentences Coherence, justification Extraction answer (s)

16 M-CASTs search network Us End users M-CAST Network user M-CAST DocumentsIndex M-CAST 1 DocumentsIndex M-CAST 2 DocumentsIndex M-CAST n DocumentsIndex

17 Thank you! Borys Czerniejewski Sebastian Lisek Infovide S.A. (PL) Information: m-cast@infovide.plm-cast@infovide.pl


Download ppt "M-CAST Multilingual Content Aggregation System based on TRUST Search Engine Borys Czerniejewski Sebastian Lisek Infovide S.A. (PL)"

Similar presentations


Ads by Google