Разширяване на кръгозора: Използване на лингвистични технологии в системи за публикации ICT PSP call identifier: CIP-ICT-PSP-2009-3 Theme 5: Multilingual.

Slides:



Advertisements
Similar presentations
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Advertisements

Information Society Technologies Third Call for Proposals Norbert Brinkhoff-Button DG Information Society European Commission Key action III: Multmedia.
University of Illinois Visualizing Text Loretta Auvil UIUC February 25, 2011.
Project co-funded by the European Commission within the ICT PSP Project Overview Performance Operational and Multilingual Interactive Services to support.
CerOrganic European Conference – Athens, 6/12/2011 Giannis Stoitsis, Alexios Dimitropoulos Agro-Know Technologies.
How to survive the document & data tsunami? Lambda Verdonckt Business Analyst TenForce.
Virtual Library Slavistics Its modules & new technologies COSEELIS conference 2009 Cambridge, April 6th, 2009.
Crosslingual Ontology-Based Document Retrieval (Search) in an eLearning Environment RANLP, Borovets, 2007 Eelco Mossel University of Hamburg.
Sunita Sarawagi.  Enables richer forms of queries  Facilitates source integration and queries spanning sources “Information Extraction refers to the.
Virtual Game Method in Higher Education Project presentation.
CLEF 2007 Multilingual Question Answering Track Danilo Giampiccolo, CELCT Anselmo Peñas, UNED.
An Agent-Oriented Approach to the Integration of Information Sources Michael Christoffel Institute for Program Structures and Data Organization, University.
Case Insurance business related IT issues Consultation based instead of implementation based 5-7 universities compete 3 weeks preparation period PP slides.
Administered by: Funded by: Multilingualism on the Internet: Putting the World in the World-Wide Web Arle Lommel Deutsches Forschungszentrum für Künstliche.
 Official Site: facility.org/research/evaluation/clef-ip-10http:// facility.org/research/evaluation/clef-ip-10.
Maximizing the Value of Your Investments With Advanced Campaign Management And Campaign Analysis Ad Campaigns.
Co-funded by the European Union Semantic CMS Community The IKS Project From free text input to automatic entity enrichment Copyright IKS Consortium 1 Lecturer.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Text Analysis Everything Data CompSci Spring 2014.
Universität Stuttgart Universitätsbibliothek Information Retrieval on the Grid? Results and suggestions from Project GRACE Werner Stephan Stuttgart University.
SEE-ERA.NET Initiatives Pilot Joint Call for Research Proposals IGLO Open Session, Southeast European Era-Net Florian Gruber, Dissemination and.
Institute of Informatics and Telecommunications – NCSR “Demokritos” Bootstrapping ontology evolution with multimedia information extraction C.D. Spyropoulos,
DFKI GmbH, , R. Karger Indo-German Workshop on Language Technologies Reinhard Karger, M.A. Deutsches Forschungszentrum für Künstliche Intelligenz.
Aardvark Anatomy of a Large-Scale Social Search Engine.
NERIL: Named Entity Recognition for Indian FIRE 2013.
February 2007MCST - FP7 Launch1 Michael Rosner Department of Computer Science and Artificial Intelligence University of Malta.
FishBase Summary Page about Salmo salar in the standard Language of FishBase (English) ENBI-WP-11: Multilingual Access to European Biodiversity Sites through.
CHAO-CHEN CHEN PROFESSOR AND UNIVERSITY LIBRARIAN GRADUATE INSTITUTE OF LIBRARY AND INFORMATION STUDIES, NATIONAL TAIWAN NORMAL UNIVERSITY E-book Services.
EUscreen: Examining An Aggregator ’ s Role in Digital Preservation Samantha Losben Digital Preservation - Final Project December 15, 2010.
Natural Language Processing Guangyan Song. What is NLP  Natural Language processing (NLP) is a field of computer science and linguistics concerned with.
Funded by the Library of Congress.
Multi-lingual & multi- institutional distant learning Example of an international master programme in Computational Linguistics November, Blaubeuren,
By Chung-Hong Lee ( 李俊宏 ) Assistant Professor Dept. of Information Management Chang Jung Christian University 資料庫與資訊檢索系統的整合 - 一個文件資料庫系統的開發研究.
Meeting the Challenge of Common Core: Planning Close Reading CFN 604 October 21 st, 2014.
European Virtual Seminar on sustainable development.
The Local Government Information Network: Building a Regional Information Exchange for Municipal Decision Makers LORIS Conference March 25, 2003 Hradec.
INFSO-RI Enabling Grids for E-sciencE NA2 in Bulgaria Aneta Karaivanova IPP-BAS.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
ON-line SERVICES based on DIGITAL DOCUMENTS Prof. Doina Banciu ROCS Bucharest, 2008.
LOGO A comparison of two web-based document management systems ShaoxinYu Columbia University March 31, 2009.
Acad. G. Bonchev St., Block 2, 1113 – Sofia, BULGARIA Phone/fax: (+359 2)
Examples for Open Access Scholar Electronic Repository by New Bulgarian University IP LibCMASS Sofia 2011 Contract № 2011-ERA-IP-7 Sofia, September,
Clarity Cross-Lingual Document Retrieval, Categorisation and Navigation Based on Distributed Services
Translingual Information Management Stephan Busemann Language Technology Lab German Research Center for Artificial Intelligence.
Benchmarking tool for Quality Assurance in VET.
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #15 Secure Multimedia Data.
Sofia, March 25, 2003 BULGARIA, BRITAIN AND THE LISBON AGENDA: STRATEGIES FOR RAISING EMPLOYMENT AND PRODUCTIVITY.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Processing of large document collections Part 1 (Introduction) Helena Ahonen-Myka Spring 2006.
Toward an Open Source Textual Entailment Platform (Excitement Project) Bernardo Magnini (on behalf of the Excitement consortium) 1 STS workshop, NYC March.
Digital University of Pisa Alessandro Lenci CoLing Lab – Laboratorio di Linguistica Computazionale Università di Pisa Aix-Marseille Université.
BACKGROUND The Web is a global information resource Web users that seek information vary, culturally and ethnically Users of different cultural backgrounds.
IR&NLP Coursework P1 Text Analysis Within The Fields Of Information Retrieval and Natural Language Processing By Ben Addley Academic Year 2004.
Cloud Computing Shannon McManus Michael Weihert. What is Cloud Computing?
“From GRID research to GRID business” Francesco Giglio
Language Technologies in the ICT Work Programme Hanna Klimek Directorate-General for Information Society & Media Unit E.1 “Language Technologies,
ETICS An Environment for Distributed Software Development in Aerospace Applications SpaceTransfer09 Hannover Messe, April 2009.
September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.
FUTURE POLICY MODELLING (FUPOL) Intelligent Tools for Policy Design.
Technology solution supported by the Technical University of Madrid
PLANNED ACTIONS – UPCOMING DUTIES
Therapy 2.0: 2nd Consortium Meeting in Fürth, Germany
中国计算机学会学科前沿讲习班:信息检索 Course Overview
Mining the Data Charu C. Aggarwal, ChengXiang Zhai
Engaging Students With Primary Sources
TDM=Text Mining “automated processing of large amounts of structured digital textual content for purposes of information retrieval, extraction, interpretation.
ITS 2.0 Enriched Terminology Annotation Showcase
Text Analytics and Machine Learning Workshop
ESS multilingual glossary in Statistics Explained - progress Marc Debusschere Agenda point 16 Dissemination Working Group October 2013.
Open Source SUMMA Platform
Presentation transcript:

Разширяване на кръгозора: Използване на лингвистични технологии в системи за публикации ICT PSP call identifier: CIP-ICT-PSP Theme 5: Multilingual Web 5.3 Multilingual Web content management - methods, tools and processes

The information today  Flood of multilingual and heterogeneous information  The challenge: The information has to be processed and analyzed in order to be used more efficiently

The information today  Increasing amount of multilingual and heterogeneous information

The information today

The Language Technologies (LT)  The computers process the information; humans do understand it.  The computers has limited resources to understand the information; the humans has limited resources to process the information.  The NLP technologies optimizes the level of understanding of the computers and thus increase the productivity of the humans.

Overview  The NLP technologies by examples  NLP in practice – the ATLAS project  Conclusions  Questions

NLP by examples (1)  Divide and Conquer  Grouping the information:  By importance

NLP by examples (1)  Divide and Conquer  Grouping the information  By importance  Automatic text categorization  Politics (24)  Sports (5)  Entertainment (5)  Technologies (12)  Science (20)  Rumors (6)  Other (10)

NLP by examples (1)  Divide and Conquer  Grouping the information:  By importance  Automatic categorization  Text clustering  Politics (24)  International affairs (12), Conflicts (3), Terrorism (5), Nature and Environment (8),...  Science (20)  Math (2), Physics (5), Nature and Environment(3), NLP technlologies (4),...  Other (10)  Money and Banks (3), Richard Branson (4), Learning materials (3),...

NLP by examples (1)  Temporal dynamics  Before, Now, Tomorrow?  Politics (24 + 3)  International affairs (10 -2), Конфликити (3), Terrorism (6 +1), Nature and Environment (10 +2),...

NLP by examples (2)  We do value your opinion!  Positive, negative or objective?

NLP by examples (3)  Salient excepts  Persons  politics, actors, scientists, fictions characters  Organizations and Institutions  NATO, EU, BAS, Bank of England, Google, Apple, …  Geographical locations  Bulgaria, Sofia, EU, Western Europe, Tibet  Dates  Steven Paul Jobs was born in San Francisco on February 24, 1955 personcitydate

NLP by examples (3)  Salient excepts  Jobs was a demanding perfectionist who always aspired to position his businesses and their products at the forefront of the information technology industry by foreseeing and setting trends, at least in innovation and style...  As of October 9, 2011, Jobs is listed as primary inventor related to a range of technologies from actual computer and portable devices to user interfaces...

NLP by examples (3)  Salient excepts  Jobs was a demanding perfectionist who always aspired to position his businesses and their products at the forefront of the information technology industry by foreseeing and setting trends, at least in innovation and style...  As of October 9, 2011, Jobs is listed as primary inventor related to a range of technologies from actual computer and portable devices to user interfaces...

NLP by examples (4)  You might be also interested in this and that …  Suggestions for similar content  According to the textual information  According to the persons, locations and dates  According to the key concepts and ideas  According to the genre and fictions characters  Cross-lingual Information Retrieval

NLP by examples (5)  Machine translation  Text summarization  Of a single document  Of a collection of documents

NLP in practice – ATLAS project  ATLAS – multilingual content management system which harnesses NLP technologies  Supported languages: Bulgarian, English, German, Polish, Romanian and Greek. ATLAS extracts and provides Key phrases and names entities A list of similar documents The automatic categorization and text summary Machine translation Using ATLAS Software-as-a-service: API for integration with 3 rd party systems

The ATLAS project  ICT PSP project  ATLAS consortium:  Coordinator: Tetracom Interactive Solutions – Bulgaria  DFKI - Deutsches Forschungszentrum Fuer Kuenstliche Intelligenz GmbH – Germany  Atlantis Consulting SA – Greece  Institute for Bulgarian Language “Professor Luybomir Andreychin” at the Bulgarian Academy of Sciences – Bulgaria  Instytut Podstaw Informatyki Polskiej Akademii Nauk – Poland  Universität Hamburg – Germany  Universitatea Alexandru Ioan Cuza – Romania  Sveucilište u Zadru – Croatia  ITD - Institute of Technologies and Development – Bulgaria  Project duration  3 years, counting from 1 st March, 2010

Conclusion?  What are the NLP technologies?  They provide a way to harness the computational resources of the computers for better information understanding  What can they be used for?  More effective way to handle the increasing amount of multilingual information  Who can use these technologies?  Libraries  Publishing houses  Medias  Online bookstores  Layers  Banks, companies and organization

Questions... ?