Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY

Slides:



Advertisements
Similar presentations
Towards Data Mining Without Information on Knowledge Structure
Advertisements

1 Senn, Information Technology, 3 rd Edition © 2004 Pearson Prentice Hall James A. Senns Information Technology, 3 rd Edition Chapter 7 Enterprise Databases.
1 Towards an Open Service Framework for Cloud-based Knowledge Discovery Domenico Talia ICAR-CNR & UNIVERSITY OF CALABRIA, Italy Cloud.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.
ASYCUDA Overview … a summary of the objectives of ASYCUDA implementation projects and features of the software for the Customs computer system.
Document #07-12G 1 RXQ Customer Enrollment Using a Registration Agent Process Flow Diagram (Switch) Customer Supplier Customer authorizes Enrollment.
Document #07-12G 1 RXQ Customer Enrollment Using a Registration Agent Process Flow Diagram (Switch) Customer Supplier Customer authorizes Enrollment.
© 2006 Open Grid Forum GGF18, 13th September 2006 OGSA Data Architecture Scenarios Dave Berry & Stephen Davey.
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination.
18 Copyright © 2005, Oracle. All rights reserved. Distributing Modular Applications: Introduction to Web Services.
11 Copyright © 2005, Oracle. All rights reserved. Creating the Business Tier: Enterprise JavaBeans.
1 Copyright © 2005, Oracle. All rights reserved. Introducing the Java and Oracle Platforms.
Presented to: By: Date: Federal Aviation Administration Registry/Repository in a SOA Environment SOA Brown Bag #5 SWIM Team March 9, 2011.
Towards a GRID Operating System: from GLinux to a Pervasive GVM Domenico TALIA DEIS University of Calabria ITALY CoreGRID Workshop.
C. Mastroianni, D. Talia, O. Verta - A Super-Peer Model for Resource Discovery Services in Grids A Super-Peer Model for Building Resource Discovery Services.
How Distributed Data Mining Tasks can Thrive as Services on Grids Domenico Talia and Paolo Trunfio Università della Calabria, Italy
Universität Innsbruck Leopold Franzens Copyright 2006 DERI Innsbruck LarCK Workshop, ISWC/ASWC Busan, Korea 16-Feb-14 Towards Scalable.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
Determine Eligibility Chapter 4. Determine Eligibility 4-2 Objectives Search for Customer on database Enter application signed date and eligibility determination.
Addition Facts
The ANSI/SPARC Architecture of a Database Environment
1 Term 2, 2004, Lecture 9, Distributed DatabasesMarian Ursu, Department of Computing, Goldsmiths College Distributed databases 3.
Abstraction Layers Why do we need them? –Protection against change Where in the hourglass do we put them? –Computer Scientist perspective Expose low-level.
NGS computation services: API's,
OMII-UK Steven Newhouse, Director. © 2 OMII-UK aims to provide software and support to enable a sustained future for the UK e-Science community and its.
Database Systems: Design, Implementation, and Management
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 14 Slide 1 Object-oriented Design 1.
1 Communication in Distributed Systems REKs adaptation of Tanenbaums Distributed Systems Chapter 2.
1 Competitive Privacy: Secure Analysis on Integrated Sequence Data Raymond Chi-Wing Wong 1, Eric Lo 2 The Hong Kong University of Science and Technology.
Configuration management
Software change management
DOROTHY Design Of customeR dRiven shOes and multi-siTe factorY Product and Production Configuration Method (PPCM) ICE 2009 IMS Workshops Dorothy Parallel.
© 2011 TIBCO Software Inc. All Rights Reserved. Confidential and Proprietary. Towards a Model-Based Characterization of Data and Services Integration Paul.
ABC Technology Project
1 Web-Enabled Decision Support Systems Access Introduction: Touring Access Prof. Name Position (123) University Name.
State of Connecticut Core-CT Project Query 8 hrs Updated 6/06/2006.
1 Use or disclosure of data contained on this sheet is subject to the restriction on the title page of this proposal or quotation. An Introduction to Data.
1 Evaluations in information retrieval. 2 Evaluations in information retrieval: summary The following gives an overview of approaches that are applied.
Machine Learning: Intro and Supervised Classification
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 4 Slide 1 Software processes 2.
Introduction to Databases
Executional Architecture
Chapter 5 Test Review Sections 5-1 through 5-4.
GG Consulting, LLC I-SUITE. Source: TEA SHARS Frequently asked questions 2.
NDIA SoS SE Committee Topics of Interest May 25, 2009.
Addition 1’s to 20.
25 seconds left…...
IT Analytics for Symantec Endpoint Protection
© Paradigm Publishing Inc Chapter 10 Information Systems.
Week 1.
We will resume in: 25 Minutes.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 12 View Design and Integration.
A SMALL TRUTH TO MAKE LIFE 100%
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 31 Slide 1 Service-centric Software Engineering 1.
Chapter 13 The Data Warehouse
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 13 Slide 1 Application architectures.
1 Distributed Agents for User-Friendly Access of Digital Libraries DAFFODIL Effective Support for Using Digital Libraries Norbert Fuhr University of Duisburg-Essen,
NextGRID & OGSA Data Architectures: Example Scenarios Stephen Davey, NeSC, UK ISSGC06 Summer School, Ischia, Italy 12 th July 2006.
The Open Grid Service Architecture (OGSA) Standard for Grid Computing Prepared by: Haoliang Robin Yu.
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
1 1 Slide Introduction to Data Mining and Business Intelligence.
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
DOMENICO TALIA (joint work with M. Cannataro, A. Congiusta, P. Trunfio) DEIS University of Calabria ITALY Grid-Based Data Mining and.
Service - Oriented Middleware for Distributed Data Mining on the Grid ,劉妘鑏 Antonio C., Domenico T., and Paolo T. Journal of Parallel and Distributed.
Department of Computer Science Sir Syed University of Engineering & Technology, Karachi-Pakistan. Presentation Title: DATA MINING Submitted By.
The Open Grid Service Architecture (OGSA) Standard for Grid Computing
به نام خدا Big Data and a New Look at Communication Networks Babak Khalaj Sharif University of Technology Department of Electrical Engineering.
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY Future Generation Grids, Dagstuhl Seminar, November 2004

2 SUMMARY n The use of computers is changing our way to make discoveries and is improving both speed and quality of the discovery processes. n In this scenario the Grid can provide an effective computational support for distributed knowledge discovery from large and distributed data sets. To this purpose we designed a system called Knowledge Grid. n This talk discusses how to design distributed knowledge discovery services, according to the OGSA model, by using the Knowledge Grid services starting from searching Grid resources, composing software and data elements, and executing the resulting application on a Grid.

3 OUTLINE n MOTIVATIONS n TOWARDS KNOWLEDGE SERVICES n THE KNOWLEDGE GRID n OGSA SERVICES FOR KNOWLEDGE DISCOVERY n A META-LEARNING EXAMPLE n CONCLUSIONS

4 MOTIVATIONS n Lots of data collected and warehoused. n Data collected and stored at enormous speeds in local databases, from remote sources, or from the sky. n Scientific simulations generating terabytes of data. n Huge data sets are hard to understand. n Traditional techniques are infeasible for raw data. n Computational science is evolving toward data-intensive applications that include data analysis, information management, and knowledge discovery.

5 MOTIVATIONS n Most data will never be examined by humans; it is analyzed and summarized by computers. n Data analysis is becoming a key element in scientific discovery and in business processes. n Data intensive applications are defined to be those that explore, query, analyze, visualize, and in general, process very large-scale data sets. n Data intensive applications help scientists in hypothesis formation companies to provide better, customized services and support decision making.

6 SCIENTIFIC OBJECTIVES n This objective can be achieved through development of techniques and tools for supporting data intensive applications and integration of Data and Computation Grids with Information and Knowledge Grids. to support the process of unification of data management and knowledge discovery systems with Grid technologies for providing knowledge-based Grid services. TOWARDS KNOWLEDGE SERVICES Grid-aware Knowledge Discovery Systems

7 n KNOWLEDGE GRID - a distributed knowledge discovery architecture that integrates data mining techniques and computational Grid resources. n In the KNOWLEDGE GRID architecture data mining tools are integrated with lower-level Grid mechanisms and services and exploit Data Grid services. n This approach benefits from "standard" Grid services and offers an open architecture that can be configured on top of generic Grid middleware. THE KNOWLEDGE GRID PAST

8 KNOWLEDGE GRID ARCHITECTURE Generic and Data Grid Services KNOWLEDGEGRIDKNOWLEDGEGRID PAST

9 THE KNOWLEDGE GRID Service Selection PAST FUTURE

10 OGSA KNOWLEDGE GRID SERVICES n The KNOWLEDGE GRID is an abstract service-based Grid architecture that does not limit the user in developing and using service-based knowledge discovery applications. n We are defining a set of Grid Services that export functionality and operations of the KNOWLEDGE GRID. n Each of the KNOWLEDGE GRID services is exposed as a persistent service, using the OGSA conventions and mechanisms. FUTURE

11 KNOWKEDGE SERVICES: A Meta-Learning Example n A simple example of meta-learning process over the KNOWLEDGE GRID. n To show how the execution of a significant distributed data mining application can benefit from the Knowledge Grid services, provided through the OGSA model. n Meta-learning aims to generate a number of independent classifiers by applying learning programs to a collection of distributed data sets in parallel. n The classifiers computed by learning programs are then collected and combined to obtain a global classifier.

12 KNOWKEDGE SERVICES: A Meta-Learning Example

13 KNOWKEDGE SERVICES: A Meta-Learning Example n A user application interacts with Knowledge Grid nodes to generate a classifier by combining the classifiers built from different subsets of a given data set. n The scenario comprises five nodes: NU, running the user application that builds the meta-learning application and visualizes the global classifier; NS, which is used for resource discovery and for steering the meta-learning application execution; NA, on which the original dataset is located and it provides a data partitioning service; NC, providing learning services which are performed in parallel over a homogeneous cluster; NZ, providing a combiner/tester service used to compute the global classifier.

14 The user application invokes the DAS and TAAS services on the node Ns specifying the required resources: two nodes providing services for the metalearning process (a learner and a combiner/tester) and for resource reservation. RESOURCE DISCOVERY AND EXECUTION PLANNING Storage Reservation Factory R User Application DAS TAAS EPMS R DAS Database Service R Partitioner Factory DAS Resource Reservation Factory R Learner Factory TAAS DAS Resource Reservation Factory R Combiner Factory TAAS RESOURCE DISCOVERY AND EXECUTION PLANNING The DAS and TAAS services of node Ns invoke the corresponding services on other Knowledge Grid nodes, in order to obtain information about the needed resources. Contacted nodes reply to node Ns sending meta- information. On node Ns, the meta- information about nodes Nc and Nz is analyzed, and such nodes are identified as candidates for the computation. The DAS and TAAS services on node Ns send this information to the U.A.. The application builds an execution plan for the meta-learning process, specifying strategies for data movement and algorithm execution. The execution plan is submitted to the EPMS of node Ns. NU NS NA NCNZ

15 The EPMS invokes the factories on Na, Nc and Nz requesting the creation of a partitioner service on node Na, and the creation of two reservation services on Nc and Nz. On node Nc,computing cycles are reserved (on each computing element) to execute the learner programs, storage space is reserved to maintain the subsets extracted from DS and the partial classifiers. On node Nz, storage space is reserved to maintain the partial and global classifiers. SCIENTIFIC OBJECTIVES KDD APPLICATION EXECUTION Storage Reservation Factory R User Application DAS TAAS EPMS R DAS Database Service R Partitioner Factory DAS Resource Reservation Factory R Learner Factory TAAS DAS Resource Reservation Factory R Combiner Factory TAAS NU NS NA NCNZ

16 SCIENTIFIC OBJECTIVES The requests made by the EPMS result in the creation of the requested services. KDD APPLICATION EXECUTION Storage Reservation Factory R User Application DAS TAAS EPMS R DAS Database Service R Partitioner Factory DAS Resource Reservation Factory R Learner Factory TAAS DAS Resource Reservation Factory R Combiner Factory TAAS Partitioner Service Reservation Service Reservation Service NU NS NA NCNZ

17 SCIENTIFIC OBJECTIVES The partitioner service interacts with the database service on the same node to extract the needed subsets from DS: n training sets, a testing set and a validation set. KDD APPLICATION EXECUTION Storage Reservation Factory R User Application DAS TAAS EPMS R DAS Database Service R Partitioner Factory DAS Resource Reservation Factory R Learner Factory TAAS Partitioner Service Reservation Service DAS Resource Reservation Factory R Combiner Factory TAAS Reservation Service NU NS NA NCNZ

18 SCIENTIFIC OBJECTIVES The EPMS invokes the DAS service on node Na, requesting to transfer the training sets to node Nc, and the testing and validation sets to node Nz; the learner factory on Nc, requesting the creation of n learner service instances to be run on the same node. KDD APPLICATION EXECUTION Storage Reservation Factory R User Application DAS TAAS EPMS R DAS Database Service R Partitioner Factory DAS Resource Reservation Factory R Learner Factory TAAS Partitioner Service Reservation Service DAS Resource Reservation Factory R Combiner Factory TAAS Reservation Service NU NS NA NCNZ

19 SCIENTIFIC OBJECTIVES On node Nc, n learner service instances are created. On each computing element of node Nc, the learner service instances generate the partial classifiers. As soon as each partial classifier is obtained, a notification message is sent to the EPMS. KDD APPLICATION EXECUTION Storage Reservation Factory R User Application DAS TAAS EPMS R DAS Database Service R Partitioner Factory DAS Resource Reservation Factory R Learner Factory TAAS Partitioner Service Reservation Service Learner Serv. DAS Resource Reservation Factory R Combiner Factory TAAS Reservation Service NU NS NA NCNZ

20 SCIENTIFIC OBJECTIVES The EPMS invokes (i) the DAS service on node Nc, requesting to transfer the generated classifiers to node Nz; the combiner/tester factory on Nz, requesting the creation of a combiner/tester service to be run on the same node. KDD APPLICATION EXECUTION Storage Reservation Factory R User Application DAS TAAS EPMS R DAS Database Service R Partitioner Factory DAS Resource Reservation Factory R Learner Factory TAAS Partitioner Service Reservation Service Learner Serv. DAS Resource Reservation Factory R Combiner Factory TAAS Reservation Service NU NS NA NCNZ

21 SCIENTIFIC OBJECTIVES On node Nz, a combiner/tester service is created to perform the combining and testing processes and generate the global classifier GC. KDD APPLICATION EXECUTION Storage Reservation Factory R User Application DAS TAAS EPMS R DAS Database Service R Partitioner Factory DAS Resource Reservation Factory R Learner Factory TAAS Partitioner Service Reservation Service Learner Serv. DAS Resource Reservation Factory R Combiner Factory TAAS Reservation Service Combiner Service NU NS NA NCNZ

22 SCIENTIFIC OBJECTIVES The EPMS invokes the DAS service on node Nz, requesting to transfer the generated global classifier to node Nu. KDD APPLICATION EXECUTION Storage Reservation Factory R User Application DAS TAAS EPMS R DAS Database Service R Partitioner Factory DAS Resource Reservation Factory R Learner Factory TAAS Partitioner Service Reservation Service Learner Serv. DAS Resource Reservation Factory R Combiner Factory TAAS Reservation Service Combiner Service NU NS NA NCNZ

23 SCIENTIFIC OBJECTIVES n Data privacy and security n KDD process state management n Complex processing patterns (Web Services are too simple to express distributed data mining processes and applications) n KDD Grid Service standards ( towards OGSA-KDAI ?) n KDD processes as G-Services Workflows n Asynchronous services n …… OPEN ISSUES FUTURE

24 SCIENTIFIC OBJECTIVES n The knowledge-building process in a distributed setting involves data and information collection, generation, and distribution followed by the collective interpretation of processed information into knowledge. n Next-generation Grids must be able to produce, use, and deploy knowledge as a basic element of advanced applications. n Knowledge-based Grids that can offer tools, components and services to support data analysis, inference, and discovery in scientific and business applications. n OGSA-based services for distributed knowledge discovery are a key element for large support of e-science and e-business. CONCLUSIONS

25 CREDITS: M. Cannataro C. Comito THANKS