gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI ICIS Requirements Gathering.

Slides:



Advertisements
Similar presentations
Improving Learning Object Description Mechanisms to Support an Integrated Framework for Ubiquitous Learning Scenarios María Felisa Verdejo Carlos Celorrio.
Advertisements

1 Building scientific Virtual Research Environments in D4Science Paul Polydoras University of Athens, Greece.
Unveiling ProjectWise V8 XM Edition. ProjectWise V8 XM Edition An integrated system of collaboration servers that enable your AEC project teams, your.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
D4Science: a Data Infrastructure Ecosystem for Science DL.org Autumn School – Athens, 3-8 October 2010 Leonardo Candela 6 th October 2010.
Connect. Communicate. Collaborate Click to edit Master title style MODULE 1: perfSONAR TECHNICAL OVERVIEW.
D4Science technical features and opportunities of the grid infrastructure for large scale data management Pasquale Pagano D4Science Technical.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
MS DB Proposal Scott Canaan B. Thomas Golisano College of Computing & Information Sciences.
Internet Resources Discovery (IRD) IBM DB2 Digital Library Thanks to Zvika Michnik and Avital Greenberg.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
An Agent-Oriented Approach to the Integration of Information Sources Michael Christoffel Institute for Program Structures and Data Organization, University.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
© Rheinmetall Defence 2013 The Geospatial Catalogue and Database Repository (GCDR) and the Knowledge Management System (KMS) Shane Reschke – Technical.
Massimiliano Assante – Leonardo Candela – Donatella Castelli – Pasquale Pagano Fourteenth International Conference on Grey Literature An Environment Supporting.
Architecture domain DL.org Autumn School – Athens, 3-8 October 2010 Leonardo Candela 6 th October 2010.
1 Dr. Markus Hillenbrand, ICSY Lab, University of Kaiserslautern, Germany A Generic Database Web Service for the Venice Service Grid Michael Koch, Markus.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 1 DATABASE SYSTEMS (Cont’d) Instructor Ms. Arwa Binsaleh.
A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai.
 DATABASE DATABASE  DATABASE ENVIRONMENT DATABASE ENVIRONMENT  WHY STUDY DATABASE WHY STUDY DATABASE  DBMS & ITS FUNCTIONS DBMS & ITS FUNCTIONS 
1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong
1st Workshop on Intelligent and Knowledge oriented Technologies Universal Semantic Knowledge Middleware Marek Paralič,
Virtual Research Environments: e-Infrastructures beyond Digital Libraries Pasquale Pagano CNR-ISTI RCDL08 Conference Information.
Donatella Castelli CNR-ISTI
10/18/20151 Business Process Management and Semantic Technologies B. Ramamurthy.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
1 4/23/2007 Introduction to Grid computing Sunil Avutu Graduate Student Dept.of Computer Science.
IS 325 Notes for Wednesday August 28, Data is the Core of the Enterprise.
Scenarios for a Learning GRID Online Educa Nov 30 – Dec 2, 2005, Berlin, Germany Nicola Capuano, Agathe Merceron, PierLuigi Ritrovato
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Uwe SchindlerGES 2007 – May 2-4, 2007 Data Information Service based on Open Archives Initiative Protocols and Apache Lucene Uwe Schindler 1, Benny Bräuer.
1/22/08 RTR Project Presentation to TPTF RTR Project Michael Daskalantonakis & Brian Cook.
1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February,
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
From Digital Objects to Content across eInfrastructures Content and Storage Management in gCube Pasquale Pagano CNR –ISTI on behalf of Heiko Schuldt Dept.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
Architecture View Models A model is a complete, simplified description of a system from a particular perspective or viewpoint. There is no single view.
26/05/2005 Research Infrastructures - 'eInfrastructure: Grid initiatives‘ FP INFRASTRUCTURES-71 DIMMI Project a DI gital M ulti M edia I nfrastructure.
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
Object storage and object interoperability
Building Scientific Workflows for the Fisheries and Aquaculture Management Community based on Virtual Research Environments Pedro Andrade (CERN)
Copyright (c) 2014 Pearson Education, Inc. Introduction to DBMS.
On the D4Science Approach Toward AquaMaps Richness Maps Generation Pasquale Pagano - CNR-ISTI Pedro Andrade.
D4Science and ETICS Building and Testing gCube and gCore Pedro Andrade CERN EGEE’08 Conference 25 September 2008 Istanbul (Turkey)
A DΙgital Library Infrastructure on Grid EΝabled Technology DILIGENT - Dynamic Digital Libraries and Virtual Research Environments on the Grid: the gCube.
ETICS An Environment for Distributed Software Development in Aerospace Applications SpaceTransfer09 Hannover Messe, April 2009.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Pedro Andrade > IT-GD > D4Science Pedro Andrade CERN European Organization for Nuclear Research GD Group Meeting 27 October 2007 CERN (Switzerland)
Store and exchange data with colleagues and team Synchronize multiple versions of data Ensure automatic desktop synchronization of large files B2DROP is.
InSilicoLab – Grid Environment for Supporting Numerical Experiments in Chemistry Joanna Kocot, Daniel Harężlak, Klemens Noga, Mariusz Sterzel, Tomasz Szepieniec.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
1 Tutorial Outline 30’ From Content Management Systems to VREs 50’ Creating a VRE 80 Using a VRE 20’ Conclusions.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Accessing the VI-SEEM infrastructure
Pasquale Pagano (CNR-ISTI) Project technical director
Virtual Research Environments as-a-Service
Pasquale Pagano CNR – ISTI (Pisa, Italy)
The Open Grid Service Architecture (OGSA) Standard for Grid Computing
Open Source distributed document DB for an enterprise
GSAF Grid Storage Access Framework
GSAF Grid Storage Access Framework
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Grid Services B.Ramamurthy 12/28/2018 B.Ramamurthy.
Business Process Management and Semantic Technologies
Presentation transcript:

gCube framework: the way to implement the D4Science vision Pasquale Pagano CNR-ISTI ICIS Requirements Gathering Meeting Rome, Italy, June 2008

2 D4Science vision calls for the realization of scientific e-Infrastructures that will remove all heterogeneity, sustainability, scalability, and other technical concerns from the minds of scientists, hide all related complexities from their perception, and enable them to focus on their science and collaborate on common research challenges gCube is a framework to manage distributed e-infrastructures where it is possible to define, host, and maintain dynamic Virtual Research Environments (VREs) capable to satisfy the collaboration needs of distributed Virtual Organizations (VOs)

3 e-Infrastructure An infrastructure is the basic physical and organizational structures and facilities (roads, power supplies,..) needed for the operation of a society or enterprise An e-Infrastructure provides support for effective consumption of shared resources:  hardware-bound resources (i.e. networks, storage, instruments, and computational resources),  system-level software resources (i.e. basic middleware services),  and application-level software resources (i.e. data sources and services). These e-Infrastructures offer mechanisms that concurrently exploit networks, grids and data in a seamless fashion, and will thus enable scientific communities to operate within a coherent model, regardless of the location of their research facilities.

4 Virtual Organization (VO) A Virtual Organization (VO) models sets of users and resources defining clearly and carefully  what is shared,  who is allowed to share,  and the conditions under which sharing occurs, usually based on an authentication and authorization policies. VOs may have a limited lifetime and they are dynamically created to satisfy transient needs of the constituent potentially heterogeneous actual Organizations.

5 Virtual Research Environment A Virtual Research Environment (VRE) provides a framework of applications, services and data sources dynamically identified to support the underlying processes of research/collaboration/cooperation. The purpose of a VRE is to help researchers* in all disciplines by managing the increasingly complex range of tasks involved in carrying out their activities. * Researcher has to be considered in the large, i.e. end-user, decision-makers, resource and data providers, etc.

6 gCube Infrastructure gHN RI service RI port-type RI VO VRE VO VRE ADMIN DESIGNER ADMIN USERS OWNER ADMIN

VRE Highlights

8 VRE Advantages gCube infrastructures creates new opportunities to change the VRE development model used by distributed and dynamic organisations and communities Using gCube empowered infrastructures, the organisations and communities are able to setup their own environment:  When and for the time they need it  Exploiting existing Grid-based services  Accessing to and handling of distributed multi-focused data and services  Orchestrating user defined services, with defined QoS (wrt. scalability, reliability)  Profiting from a shared storage and computational set of resources  Sharing data and services in a collaborative and efficient way

9 VRE Definition Steps

10 VRE Definition Steps

11 VRE Definition Steps

12 VRE Definition Steps

13 VRE Definition Steps

14 VRE Definition Steps

15 VRE Definition Steps

16 gCube Service Management: System Administrator Measured Effort SCOPEACTIONTIME Infrastructure install collective layer< 1 day install portal< 1 day VO install 1 DHN< 10 min register resources (DHN, data)‏< 1 min approve resource (DHN, data)‏< 1 min data publishing (metadata, indexes)‏hours manage users< 10 min VRE define VRE< 10 min approve VRE< 1 min deploy VRE< 2 hour modify VRE< 1 hour

Content Management

18 Content & Storage Management: Challenges Store large volumes of digital content in the Grid But there is much more: Metadata for each object MER_RR__2P MER RR 2P MERIS ___ALGAL_1 ALGAL World World Europe Bigger_Europe Smaller_Europe Mediterranean Iberia North_Atlantic Africa North_Africa Middle_East Portugal MER_RR__2P MER RR 2P MERIS ___ALGAL_1 ALGAL World World Europe Bigger_Europe Smaller_Europe Mediterranean Iberia North_Atlantic Africa North_Africa Middle_East Portugal MER_RR__2P MER RR 2P MERIS ___ALGAL_1 ALGAL World World Europe Bigger_Europe Smaller_Europe Mediterranean Iberia North_Atlantic Africa ENVI... MER_RR__2P MER RR 2P MERIS ___ALGAL_1 ALGAL World World Europe Bigger_Europe Mediterranean Iberia North_Atlantic Africa North_Africa Middle_East ENVI... Automatically extracted features per object (e.g., color histo- grams for images) Storage properties (e.g., size, etc.) … and all that highly inter- connected

19 gCube Data Management gCube Data Management supports researchers by providing means for  Persistently storing and physically structuring of content  Logical grouping of content in collections  Logical sharing of content among several collections  Sharing of collections in a VRE and among several VREs through shared workspace  Management of complex content consisting of several parts and having multiple representations  Storage of structured and heterogeneous metadata compliant with different formats and schemas  Programmatic/manual annotation of content via text & images, e.g. data provenance  Content linking  Definition of ‘composite documents’ template

20 gCube Data Management [cont.] gCube Data Management supports application designers/providers by providing means for  Bulk upload and update of data and metadata  Manipulation of metadata and data through a powerful Data Transformation Engine (gDTS).  Metadata can be cleaned, enriched, and transformed in different formats by exploiting mapping schema, controlled vocabulary, thesauri, and ontology to facilitate data integration and discovery  Data can be transformed to offer different views of the some goods  Replication and partition of data  Subscription and notification  Generation and publishing of new data through workflow definition, optimisation, and execution gCube exploits Grid technology file-system-like functionalities to manage data storage

21 Document Model All entities in a VRE are information objects: collections, documents, metadata Any information object can be stored or fetched independently of what it represents Information Object Storage_Property Object Reference comprise Reference_Source Reference_Target Reference_Role Reference_Propagation Info_Object_ID Property_Name Property_Type Property_Value BLOB object File object ISA

22 Content example (Earth Observation) Storage Properties Satellite Image Image Features Image Features Image Features MER_RR__2P MER RR 2P MERIS ___ALGAL_1 ALGAL World World Europe Bigger_Europe Smaller_Europe Mediterranean Iberia North_Atlantic Africa North_Africa Middle_East Portugal ENVI... Metadata as XML Document Metadata Management Content & Storage Management Feature Extraction Storage Properties

23 Data Management Architecture Content Management Layer Base Layer Storage Management Layer Content Management Service Collection Management Service Notification Service Archive Import Service Storage Management Service Any off-the-shelf DBMS, e.g. MySQL Any SRM enabled SE, e.g. DPM JDBC GFAL Replication Management Service Properties & Relations Raw File Content Cataloguing API Storage API LFC SRM

24 gCube Storage Model Content Management Layer Base Layer Storage Management Layer RDBMSgLite SE ISA BLOB object File object Content GUID Information Object Document

25 Bridging Data Sources Hosted on the infrastructure Data Sources are interfaced through.. The bridges are managed by.... the infrastructure

26 Managing Data Source Heterogeneity Mapping rules

27 gCube Infrastructure Managing Data Import DS import MR VRE 2VRE 1VRE 3VRE 4

Search Management

29 gCube Search Engine Features An open, feature-rich distributed Search Engine  Composed out of diverse, autonomous, pluggable elements.  Capturing complex application scenarios by combining information retrieval and data processing procedures Maximization of resources placed at the disposal of VRE managers and users  Ease of sharing of resources, avoiding mis-utilization and misuse  Reduction of cost of ownership and use

30 Functional overview Search types  Structured data (fielded search / xml search)  Semi structured data (xml search)  Geospatial / temporal data (R-Tree)  Content based search  Full text search  Image similarity search Access  XML-based Query Language  Web user interface (portal / search portlets)  Command line UI Retrieval  Incremental result delivery  Automatic caching  Result persistence

31 Functional Overview [cont.] Automatic description of the Content Sources To cope with  Several content sources  Potentially thematically focused  Different ranking estimation  Different structures for managing metadata / information retrieval  Different content By offering the  Selection of the appropriate sources  Merging or fusing (re-ranking) the results in meaningful lists

32 Search in action

33 Search in action

34 Search in action

35 Search in action

Process Management

37 Process Management Rationale VREs are large-scale collections of resources that provide dedicated services to manage and access digital data sources VRE applications require to integrate these (distributed) resources into a coherent whole = „Programming in the Large“ (composition of services into processes) Query Reformulation Index Access (Query Execution) Result Filtering

38 Service Process Management: Example Search Similarity search over multimedia documents 1. Query 2. Extract Features 3. Query Index 5. Present Results Index Access Content & Create Result Set Process

39 Processes in gCube Following the SOA paradigm, processes are the first choice in gCube to define and execute applications on the basis of available services gCube’s approach to Process Management on the Grid consists of three main phases  Process Design and Verification  Through a graphical user interface for specification of processes  Process Execution and Reliability  Distribute process support in the infrastructure  Dynamic allocation of resources  Sophisticated failure handling  Process Optimization  Structural process modifications to maximise parallelism

40 … Sample Application Process: Meris Global Vegetation Index (MGVI)

41 Process Monitoring A monitoring interface allows to keep track of running process instances

42 Conclusion gCube reduces the costs to manage any complex multi- domain e-Infrastructure. gCube offers an horizontal solution to manage and enrich on-demand created VREs on complex e-Infrastructures. gCube is equipped with data and metadata management facilities that allows to make interoperable heterogeneous data sources. gCube is compliant with consolidated and emerging standards. gCube offers an open family of frameworks that can be easily customised

Thank you.