September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled.

Slides:



Advertisements
Similar presentations
Legacy code support for commercial production Grids G.Terstyanszky, T. Kiss, T. Delaitre, S. Winter School of Informatics, University.
Advertisements

Welcome to Middleware Joseph Amrithraj
Presentation by Priyanka Sawarkar
Database System Concepts and Architecture
Chapter 5: Introduction to Information Retrieval
Stefania Bergamasco, Cecilia Colasanti An integrated approach to turn statistics into knowledge combining data warehouse, controlled vocabularies and advanced.
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
1 SWE Introduction to Software Engineering Lecture 22 – Architectural Design (Chapter 13)
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Mobile Web Search Personalization Kapil Goenka. Outline Introduction & Background Methodology Evaluation Future Work Conclusion.
Application architectures
Department of Computer Science and Engineering, CUHK 1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
An Agent-Oriented Approach to the Integration of Information Sources Michael Christoffel Institute for Program Structures and Data Organization, University.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Knowledge Portals and Knowledge Management Tools
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Databases & Data Warehouses Chapter 3 Database Processing.
Digital Object: A Virtual Online Storage Solution 598C Course Project Huajing Li.
By N.Gopinath AP/CSE. Why a Data Warehouse Application – Business Perspectives  There are several reasons why organizations consider Data Warehousing.
Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.
CISTI Source & SiteSearch OCLC User Meeting 2001 Danielle Langlois & Carol Serroul May 9, 2001.
Challenges in Information Retrieval and Language Modeling Michael Shepherd Dalhousie University Halifax, NS Canada.
Universität Stuttgart Universitätsbibliothek Information Retrieval on the Grid? Results and suggestions from Project GRACE Werner Stephan Stuttgart University.
GRACE Project IST EGAAP meeting – Den Haag, 25/11/2004 Giuseppe Sisto – Telecom Italia Lab.
CS621 : Seminar-2008 DEEP WEB Shubhangi Agrawal ( )‏ Jayalekshmy S. Nair ( )‏
1 Dr. Markus Hillenbrand, ICSY Lab, University of Kaiserslautern, Germany A Generic Database Web Service for the Venice Service Grid Michael Koch, Markus.
Avalanche Internet Data Management System. Presentation plan 1. The problem to be solved 2. Description of the software needed 3. The solution 4. Avalanche.
Using the SAS® Information Delivery Portal
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
Department of Computer Science and Engineering, CUHK 1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
CERN – Roberta Faggian Marque, Jan Fiete Grosse-Oetringhaus GRACE General Meeting, September 2004, Brussels 1 D6.1 Integration with the European DataGrid.
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
TOPIC CENTRIC QUERY ROUTING Research Methods (CS689) 11/21/00 By Anupam Khanal.
Chapter 3 DECISION SUPPORT SYSTEMS CONCEPTS, METHODOLOGIES, AND TECHNOLOGIES: AN OVERVIEW Study sub-sections: , 3.12(p )
Design of a Search Engine for Metadata Search Based on Metalogy Ing-Xiang Chen, Che-Min Chen,and Cheng-Zen Yang Dept. of Computer Engineering and Science.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
ICS (072)Database Systems: An Introduction & Review 1 ICS 424 Advanced Database Systems Dr. Muhammad Shafique.
GUIDED BY DR. A. J. AGRAWAL Search Engine By Chetan R. Rathod.
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
Personalized Interaction With Semantic Information Portals Eric Schwarzkopf DFKI
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.
Create Content Capture Content Review Content Edit Content Version Content Version Content Translate Content Translate Content Format Content Transform.
AliEn AliEn at OSC The ALICE distributed computing environment by Bjørn S. Nilsen The Ohio State University.
Aneka Cloud ApplicationPlatform. Introduction Aneka consists of a scalable cloud middleware that can be deployed on top of heterogeneous computing resources.
Object storage and object interoperability
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Refined Online Citation Matching and Adaptive Canonical Metadata Construction CSE 598B Course Project Report Huajing Li.
Developing GRID Applications GRACE Project
Efficient Opportunistic Sensing using Mobile Collaborative Platform MOSDEN.
General Architecture of Retrieval Systems 1Adrienn Skrop.
Indexing The World Wide Web: The Journey So Far Abhishek Das, Ankit Jain 2011 Paper Presentation : Abhishek Rangnekar 1.
Seminar on seminar on Presented By L.Nageswara Rao 09MA1A0546. Under the guidance of Ms.Y.Sushma(M.Tech) asst.prof.
Building a Data Warehouse
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
CHAPTER 3 Architectures for Distributed Systems
OGSA Data Architecture Scenarios
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Data, Databases, and DBMSs
ExaO: Software Defined Data Distribution for Exascale Sciences
AMGA Web Interface Vincenzo Milazzo
Chaitali Gupta, Madhusudhan Govindaraju
Information Retrieval and Web Design
Presentation transcript:

September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled seArch and Categorization Engine

September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The GRACE Project GRid enabled seArch and Categorization Engine GRACE proposes the development of a distributed search and categorization engine that will enable just ‑ in ‑ time, flexible allocation of data and computational resources. GRACE handles structured and unstructured textual information (text files, documents, Web pages, text stored in databases) in GRID environment. Project’s lifetime: September 2002-February 2005

September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research Why a Search Engine ? Search Engine User Query result list ? What’s behind this?

September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research What is behind the Search Engine interface? Search Engine index There are differences in the ways various search engines work, but they all perform three basic tasks: They periodically search for information. They keep an index of the words they find, and where they find them. They allow users to look for words or combinations of words found in that index. art, travel, car, school, weather, … Builds a list of words and note where they were found Builds index based on its own system of weighting Encodes the data to save space and store them for user to access

September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research Which are the problems ? result list Search Engine index art, travel, car, school, weather, … art, travel, car, school, weather, … art, travel, car, school, weather, … art, travel, car, school, weather, … Can search engine regularly update its databases to detect modified, deleted and relocated information? Can search engine keep up with the expanding number of documents?

September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research Domain Specific Metadata Search: greater accuracy Project Objectives GRACE specifically addresses the situations in which a centralized index is simply unfeasible, and a distributed search ‑ and ‑ retrieval is necessitated. Centralised solution limitations GRACE solution Smaller indexes: faster update Personalized Interfaces according to user/organization profile, access rights, subscription, etc. Decentralized Index and Processing: documents are indexed locally in each Grid ‑ node. The resulting index will be also stored locally and will allow querying on ‑ demand from other nodes in the Grid Scalability: the amount of documents may overwhelm any centric search engine (limited by network bandwidth, disk storage, computational power) Frequency of Update: the rate of update cannot be too frequent, this may prove too slow for some content sources (dynamic data, “real time” information) Access and integration: not all content sources are accessible to an external crawler (local search, authentication, heterogeneous databases…) Accuracy: central indexing only approaches the least common denominator in documents, it cannot support accurate search

September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The Categorization Engine By analyzing all of the text in real-time, the Categorization Engine autonomously infers the relevant key phrases (idioms) in a document. Classes are built on-the-fly. The Categorization Engine creates a "Real-Time" hierarchical Concept map associate pages to each class, and orders them. Virtual Self: e.g.: Categories, automatically generated from an external search engine results for ”theory of relativity”.

September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research The User Interface Accurate research fields Personalization Search history Scheduled queries and alert “My Collection” of documents Interface support English, German, Italian, Swedish API and Web interface Selection on Content Sources Thesauri-based categorization User defined classification schema Search limit

September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research GRACE Architecture Categorization Engine GRACE Search Hub GRACE node index The heterogeneous content sources are classified by knowledge domain

September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research Workload Management System Information & Monitoring Service Data Management Content Sources Data Miners Normal Form Pages Database Service CategorizerGRACE Search Application web service User local browser User Profile KD repository Login: get user settings/preferences Query: start a search Look for relevant Content Source Start the Categorization Process Get KD information: thesauri,… Start the Data Miners Query the Content Sources Send the result list NFPs preparation Save search history information Run categorization and integration Send concept map of the results Send categorized list of results GRACE Architecture File transfer Send back the result list

September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research Content Sources Content Sources integration Content sources share their contents through the local search engine. If not available, basic indexing & searching capabilities will be provided by GRACE. index The solution provided is based on Jakarta Lucene open-source project.

September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research Document Processor: it processes the input documents (from the result list) and their meta ‑ data. The output for each document is called NFP (Normal Form Page), which includes the digested document data, ready for the categorization algorithm. SEAL: The Search Engine Abstraction Layer sits on ‑ top of any query ‑ supporting module and translates the user query in API calls. Its output is a list of results. (Result-Set). Each result must contain a reference to the original document and a reference to the matching NFP file. The Data Miner This component query a single content source, returning the source’s flat result list as well as the documents in digested form for the categorization engine. It’s main components are: Content Sources Data Miners Normal Form Pages GRACE Search Result-Set

September 2003, 7 th EDG Conference, Heidelberg – Roberta Faggian, CERN/IT CERN – European Organization for Nuclear Research Workload Management System Information & Monitoring Service Data Management Content Sources Data Miners Normal Form Pages Database Service CategorizerGRACE Search Application web service User local browser User Profile KD repository User Requirements General Architecture Local Search Engine Multilingual abilities User Interface Development testbed installation (Turin&Milan) Integration on Grid middleware Testing of the components Finalization of the GRACE toolkit and testing Installation and testing on an extended testbed Final Validation and Evaluation Conclusions Project activities: Project’s lifetime: September 2002-February 2005