Presentation is loading. Please wait.

Presentation is loading. Please wait.

Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY

Similar presentations


Presentation on theme: "Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY"— Presentation transcript:

1 Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY Future Generation Grids, Dagstuhl Seminar, November 2004

2 2 SUMMARY n The use of computers is changing our way to make discoveries and is improving both speed and quality of the discovery processes. n In this scenario the Grid can provide an effective computational support for distributed knowledge discovery from large and distributed data sets. To this purpose we designed a system called Knowledge Grid. n This talk discusses how to design distributed knowledge discovery services, according to the OGSA model, by using the Knowledge Grid services starting from searching Grid resources, composing software and data elements, and executing the resulting application on a Grid.

3 3 OUTLINE n MOTIVATIONS n TOWARDS KNOWLEDGE SERVICES n THE KNOWLEDGE GRID n OGSA SERVICES FOR KNOWLEDGE DISCOVERY n A META-LEARNING EXAMPLE n CONCLUSIONS

4 4 MOTIVATIONS n Lots of data collected and warehoused. n Data collected and stored at enormous speeds in local databases, from remote sources, or from the sky. n Scientific simulations generating terabytes of data. n Huge data sets are hard to understand. n Traditional techniques are infeasible for raw data. n Computational science is evolving toward data-intensive applications that include data analysis, information management, and knowledge discovery.

5 5 MOTIVATIONS n Most data will never be examined by humans; it is analyzed and summarized by computers. n Data analysis is becoming a key element in scientific discovery and in business processes. n Data intensive applications are defined to be those that explore, query, analyze, visualize, and in general, process very large-scale data sets. n Data intensive applications help scientists in hypothesis formation companies to provide better, customized services and support decision making.

6 6 SCIENTIFIC OBJECTIVES n This objective can be achieved through development of techniques and tools for supporting data intensive applications and integration of Data and Computation Grids with Information and Knowledge Grids. to support the process of unification of data management and knowledge discovery systems with Grid technologies for providing knowledge-based Grid services. TOWARDS KNOWLEDGE SERVICES Grid-aware Knowledge Discovery Systems

7 7 n KNOWLEDGE GRID - a distributed knowledge discovery architecture that integrates data mining techniques and computational Grid resources. n In the KNOWLEDGE GRID architecture data mining tools are integrated with lower-level Grid mechanisms and services and exploit Data Grid services. n This approach benefits from "standard" Grid services and offers an open architecture that can be configured on top of generic Grid middleware. THE KNOWLEDGE GRID PAST

8 8 KNOWLEDGE GRID ARCHITECTURE Generic and Data Grid Services KNOWLEDGEGRIDKNOWLEDGEGRID PAST

9 9 THE KNOWLEDGE GRID Service Selection PAST FUTURE

10 10 OGSA KNOWLEDGE GRID SERVICES n The KNOWLEDGE GRID is an abstract service-based Grid architecture that does not limit the user in developing and using service-based knowledge discovery applications. n We are defining a set of Grid Services that export functionality and operations of the KNOWLEDGE GRID. n Each of the KNOWLEDGE GRID services is exposed as a persistent service, using the OGSA conventions and mechanisms. FUTURE

11 11 KNOWKEDGE SERVICES: A Meta-Learning Example n A simple example of meta-learning process over the KNOWLEDGE GRID. n To show how the execution of a significant distributed data mining application can benefit from the Knowledge Grid services, provided through the OGSA model. n Meta-learning aims to generate a number of independent classifiers by applying learning programs to a collection of distributed data sets in parallel. n The classifiers computed by learning programs are then collected and combined to obtain a global classifier.

12 12 KNOWKEDGE SERVICES: A Meta-Learning Example

13 13 KNOWKEDGE SERVICES: A Meta-Learning Example n A user application interacts with Knowledge Grid nodes to generate a classifier by combining the classifiers built from different subsets of a given data set. n The scenario comprises five nodes: NU, running the user application that builds the meta-learning application and visualizes the global classifier; NS, which is used for resource discovery and for steering the meta-learning application execution; NA, on which the original dataset is located and it provides a data partitioning service; NC, providing learning services which are performed in parallel over a homogeneous cluster; NZ, providing a combiner/tester service used to compute the global classifier.

14 14 The user application invokes the DAS and TAAS services on the node Ns specifying the required resources: two nodes providing services for the metalearning process (a learner and a combiner/tester) and for resource reservation. RESOURCE DISCOVERY AND EXECUTION PLANNING Storage Reservation Factory R User Application DAS TAAS EPMS R DAS Database Service R Partitioner Factory DAS Resource Reservation Factory R Learner Factory TAAS DAS Resource Reservation Factory R Combiner Factory TAAS RESOURCE DISCOVERY AND EXECUTION PLANNING The DAS and TAAS services of node Ns invoke the corresponding services on other Knowledge Grid nodes, in order to obtain information about the needed resources. Contacted nodes reply to node Ns sending meta- information. On node Ns, the meta- information about nodes Nc and Nz is analyzed, and such nodes are identified as candidates for the computation. The DAS and TAAS services on node Ns send this information to the U.A.. The application builds an execution plan for the meta-learning process, specifying strategies for data movement and algorithm execution. The execution plan is submitted to the EPMS of node Ns. NU NS NA NCNZ

15 15 The EPMS invokes the factories on Na, Nc and Nz requesting the creation of a partitioner service on node Na, and the creation of two reservation services on Nc and Nz. On node Nc,computing cycles are reserved (on each computing element) to execute the learner programs, storage space is reserved to maintain the subsets extracted from DS and the partial classifiers. On node Nz, storage space is reserved to maintain the partial and global classifiers. SCIENTIFIC OBJECTIVES KDD APPLICATION EXECUTION Storage Reservation Factory R User Application DAS TAAS EPMS R DAS Database Service R Partitioner Factory DAS Resource Reservation Factory R Learner Factory TAAS DAS Resource Reservation Factory R Combiner Factory TAAS NU NS NA NCNZ

16 16 SCIENTIFIC OBJECTIVES The requests made by the EPMS result in the creation of the requested services. KDD APPLICATION EXECUTION Storage Reservation Factory R User Application DAS TAAS EPMS R DAS Database Service R Partitioner Factory DAS Resource Reservation Factory R Learner Factory TAAS DAS Resource Reservation Factory R Combiner Factory TAAS Partitioner Service Reservation Service Reservation Service NU NS NA NCNZ

17 17 SCIENTIFIC OBJECTIVES The partitioner service interacts with the database service on the same node to extract the needed subsets from DS: n training sets, a testing set and a validation set. KDD APPLICATION EXECUTION Storage Reservation Factory R User Application DAS TAAS EPMS R DAS Database Service R Partitioner Factory DAS Resource Reservation Factory R Learner Factory TAAS Partitioner Service Reservation Service DAS Resource Reservation Factory R Combiner Factory TAAS Reservation Service NU NS NA NCNZ

18 18 SCIENTIFIC OBJECTIVES The EPMS invokes the DAS service on node Na, requesting to transfer the training sets to node Nc, and the testing and validation sets to node Nz; the learner factory on Nc, requesting the creation of n learner service instances to be run on the same node. KDD APPLICATION EXECUTION Storage Reservation Factory R User Application DAS TAAS EPMS R DAS Database Service R Partitioner Factory DAS Resource Reservation Factory R Learner Factory TAAS Partitioner Service Reservation Service DAS Resource Reservation Factory R Combiner Factory TAAS Reservation Service NU NS NA NCNZ

19 19 SCIENTIFIC OBJECTIVES On node Nc, n learner service instances are created. On each computing element of node Nc, the learner service instances generate the partial classifiers. As soon as each partial classifier is obtained, a notification message is sent to the EPMS. KDD APPLICATION EXECUTION Storage Reservation Factory R User Application DAS TAAS EPMS R DAS Database Service R Partitioner Factory DAS Resource Reservation Factory R Learner Factory TAAS Partitioner Service Reservation Service Learner Serv. DAS Resource Reservation Factory R Combiner Factory TAAS Reservation Service NU NS NA NCNZ

20 20 SCIENTIFIC OBJECTIVES The EPMS invokes (i) the DAS service on node Nc, requesting to transfer the generated classifiers to node Nz; the combiner/tester factory on Nz, requesting the creation of a combiner/tester service to be run on the same node. KDD APPLICATION EXECUTION Storage Reservation Factory R User Application DAS TAAS EPMS R DAS Database Service R Partitioner Factory DAS Resource Reservation Factory R Learner Factory TAAS Partitioner Service Reservation Service Learner Serv. DAS Resource Reservation Factory R Combiner Factory TAAS Reservation Service NU NS NA NCNZ

21 21 SCIENTIFIC OBJECTIVES On node Nz, a combiner/tester service is created to perform the combining and testing processes and generate the global classifier GC. KDD APPLICATION EXECUTION Storage Reservation Factory R User Application DAS TAAS EPMS R DAS Database Service R Partitioner Factory DAS Resource Reservation Factory R Learner Factory TAAS Partitioner Service Reservation Service Learner Serv. DAS Resource Reservation Factory R Combiner Factory TAAS Reservation Service Combiner Service NU NS NA NCNZ

22 22 SCIENTIFIC OBJECTIVES The EPMS invokes the DAS service on node Nz, requesting to transfer the generated global classifier to node Nu. KDD APPLICATION EXECUTION Storage Reservation Factory R User Application DAS TAAS EPMS R DAS Database Service R Partitioner Factory DAS Resource Reservation Factory R Learner Factory TAAS Partitioner Service Reservation Service Learner Serv. DAS Resource Reservation Factory R Combiner Factory TAAS Reservation Service Combiner Service NU NS NA NCNZ

23 23 SCIENTIFIC OBJECTIVES n Data privacy and security n KDD process state management n Complex processing patterns (Web Services are too simple to express distributed data mining processes and applications) n KDD Grid Service standards ( towards OGSA-KDAI ?) n KDD processes as G-Services Workflows n Asynchronous services n …… OPEN ISSUES FUTURE

24 24 SCIENTIFIC OBJECTIVES n The knowledge-building process in a distributed setting involves data and information collection, generation, and distribution followed by the collective interpretation of processed information into knowledge. n Next-generation Grids must be able to produce, use, and deploy knowledge as a basic element of advanced applications. n Knowledge-based Grids that can offer tools, components and services to support data analysis, inference, and discovery in scientific and business applications. n OGSA-based services for distributed knowledge discovery are a key element for large support of e-science and e-business. CONCLUSIONS

25 25 CREDITS: M. Cannataro C. Comito THANKS


Download ppt "Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY"

Similar presentations


Ads by Google