Presentation is loading. Please wait.

Presentation is loading. Please wait.

EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks A GRID based platform to host multiple repositories.

Similar presentations


Presentation on theme: "EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks A GRID based platform to host multiple repositories."— Presentation transcript:

1 EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks A GRID based platform to host multiple repositories for digital content Antonio Calanducci 1 J.M. González 3, R. Ramos 2, M. Rubio 2, D.Tcaci 3 1 INFN Catania, 2 CETA-CIEMAT, 3 MAAT-G Knowledge 3rd EGEE User Forum 11-14 Febrary 2008 – Clermont-Ferrand (France)

2 2 3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Introduction Need to offer a GRID based platform to host arbitrary repositories A digital repository is a set of annotated digitalized data offered to users in a structured manner. Both digitalized data and annotations can vary greatly from one rep to another but the following commonalties are acknoledged: − There is a basic informational unit of digitalized data (a mammogram, a page of an ancient manuscript, a 3D model..) − There is metadata around each unit of digitalized data (patient info, diagnoses, translation, historical context, physical properties …) − Specific algorithms process the data (search microcalcifications, automatic translation…) − Users browse, search and update the repository, launch algorithms (GRID WMS) − Data is stored in a federated way: each institution owns and manages its content − Metadata to DB, Digitalized data to archive (GRID SE) 2

3 3 3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Goals of gLibrary/DRI To host multiple repositories of arbitrary structure On a GRID infrastructure (security, federation, …) Reduce the “cost-to-deploy”, reach new communities Open architecture Easy to use platform, web based interface Collaboration between INFN and CETA-CIEMAT Builds on INFN gLibrary 3

4 4 3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 INFN gLibrary Created by GILDA team at INFN Catania Secure, robust, easy to use interface to handle digital assets stored in GRID SE Interface to browse entries and finding files in SE –“à la iTunes” browsing allows mouse-clicks searches Built on top of gLite GRID services: any SRM SE, LFC, AMGA, VOMS authorization Authentication/Authorization − Via applet, creating a proxy cert on the user’s PC − Proxy used to interact directly with GRID elements (LFC, SE, AMGA) Files transferred directly from SE to applet and viceversa. 4

5 5 3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 gLibrary screenshots 5

6 6 3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 gLibrary/DRI Extends gLibrary by: − Making it multirepository − No predefined repository content structure: each repository describes itself. − Decoupling navigation + management from repository specifics − DRI: Digital Repositories Infrastructure A repository must provide: − A description of its navigational structures (trees, filters) and a viewer − A description of its data model − An storage engine (for data model persistence) − The DRI API specification describes HOW this is provided A repository provider can − Make its own implementation of the specification − Use (or extend) the default one provided 6

7 7 3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 gLibrary/DRI web interface 7

8 8 3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 DICOM viewer 8

9 9 3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 gLibrary/DRI API specification A repository has to provide: Data Model: –XML format description of the repository’s data –Relational data model supported –Indication of which part of the data model is saved on the federated DB and which on the Storage System Storage Module: –it takes care of data persistency –Load() and Saves() method have to be provided for loading and saving instances of the data model User Interface Module: –definition of the navigational trees and filters –viewer for the specific repository 9

10 10 3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 gLibrary/DRI API specification Data Model: The repository provider describes in XML format the data of their repository. It supports relational data models, so a parent node with dependent entries can be specified. The data model also defines which parts of the data are stored in the federated database or in the storage elements. gLibrary/DRI has a defined specification for the XML data model that the provider must abide by. Storage Module: This part takes care of the data persistence. The provider gives a set of classes for loading and saving instances of the data model. The save function would inspect a given instance to decide which part of it will be stored in an SE or in the federated database. The load function will also take care of this issue. User Interface Module: In this module the provider will specify the way of building the navigation trees and the filters that the web portal presents for the quick location of any element. In this module, the repository viewer is also contained, so the viewer function will receive an instance of the data model and the data will be properly represented. 10

11 11 3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 gLibrary/DRI API specification Contract between gLibrary/DRI platform and specific repository implementations Each application must provide three Java modules implementing the following interfaces: − DRIUIInterface for describing trees, filters and viewers − DRIStorageInterface for storing and retrieving data − DRINodeInterface for defining repository data model gLibrary/DRI engine orchestrates API calls to different interface implentations 11

12 12 3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 gLibrary/DRI UI API extract P public interface DRIUIInterface { public Vector getRepositoryTrees (String reposititoryName); public TreeHierarchy getTreeHierarchy (String treeName); public Vector getFilterNameInstances (); public Vector getFilterEntries (String filterName); public void loadViewer (String viewerClass); } public class MyRepositoryUI implements DRIUIInterface { public Vector getRepositoryTrees (String repositoryName) { // access repository config file/db/etc to get tree data … return new Vector( new Tree(“By author”), new Tree(“By date”)); } … } 12

13 13 3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 gLibrary/DRI Engine Orchestration Registered repositories MGUI.getRepositoryTrees() : what are your navigation trees? MGUI.getFilterNameInstances() what are your filters? MGUI.LoadViewer() : return an applet with the viewer application to display and manipulate the selected repository item MGUI.getFilterEntries() what are the possible values for the selected filter? 13

14 14 3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 gLibrary/DRI Storage API public interface DRIStorageInterface { public DRIGenericNode Load(String Id); public void Remove (String Id); public void CreateNew (DRIGenericNode Node); public void Save (DRIGenericNode Node); } public class MyRepositoryStorage implements DRIStorageInterface { public MyRepositoryNode Load (String id) { // access db, GRID SE, etc.. Assemble one instance of data model … MyRepositoryNode node = new MyRepositoryNode (db, data, …); return node; } … } 14

15 15 3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 gLibrary/DRI default implementation We provide a default implementation for UI and Storage APIs public class DRIUIModule implements DRIUIInteface public class DRIStorageModule implements DRIStorageInterface UI default implementation: − Loads repository trees from AMGA − Loads filter definitions from AMGA − Field display definitions from AMGA Storage − Reads repository data model from XML file − Stores/Loads data model in AMGA and marked items in SEs 15

16 16 PatientID int PatientName String Varchar(80) PatientAge Int studies Entity StorageStudy 3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 XML Data model def example 16 StorageID int Diagnose String Varchar(255) Mammogram LFN Varchar(255) DRIStorageModule stores specially marked fields in a GRID Storage Element e register them in the File Catalog DRIStorageModule stores regular fields in AMGA public class MyRepStorageModule inherits DRIStorageModule {} public class MyRepNode inherits DRIGenericNode DRI Storage module reads data model from XML files:

17 17 3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Using UI default implementation public class MyRepUIModule inherits DRIUIModule {} (not implements DRIUIInterface)‏ AMGA dump Collection: /ceta/mgplus/config/trees Content: /ceta/mgplus/config/trees/alphabetical (Collection) > ls Query> getattr 0 tag parentid path filter fields >> FromAtoD >> FromEtoJ >> FromKtoO >> FromPtoU >> FromVtoZ /ceta/mgplus/config/trees/pathologies (Collection) > ls >> 0 Query>getattr 0 tag parentid path filter fields PathologyId >> Benign >> TumorMorphology >> Spread >> Microcalcifications >> study >> ‘/ceta/mgplus/data/patient/study:PathologyId=0 and /ceta/mgplus/data/patient:MGPlusPatientId=/ceta/mgplus/data/patient/study:MGPlusStudyId’ /ceta/mgplus/data/patient:MGPlusPatientId,PatientId,PatientName,Gender,AgeAtMenarche,AgeAtMenopause Note the EMPY implementation Where MGPLUS trees are stored Alphabetical patient tree definition Contents of the alphabetical tree Pathologies tree definition Contents of pathologies tree Filter definition for Microcalcification branch 17

18 18 3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Mammography repository example Goals: a GRID based repository for mammograms, patient history and collaborative diagnoses Uses UI and Storage default implementations Provides its own viewer which accepts a MGPlusNode: − Based on Open Source TUDOR DICOM viewer − Adapted it to comply with the DRI API − Converted it into an applet − Extended functionality (display specific patient data, annotations directly on the mammograms, etc.) − Save() method retrieve directly data files from SEs using direct GridFTP transfers 18

19 19 3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Repository specific viewer 19

20 20 3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 gLibrary/DRI architecture 20

21 21 3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Technologies Web 2.0 Web interface (AJAX) PHP 5 for the front-end engine Java Servlets for the back-end DRI engine Usage of Java-PHP bridge Applets − For user authentication with their VO certificate − For viewers implementation Java Introspection XML gLite Java APIs: AMGA, LFC wrappers, JGlobus GridFTPclient 21

22 22 3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Where we are Engine deployed and working, API and default implementation working MGPlus repository implemented on DRI Current work: − Interface to launch and manage jobs on Grid WMS − Generic uploader 22

23 23 3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Conclusions and future work Effectively reduced cost by APIs and default implementation. New repository providers must: − Provide empty implementations of UI and Storage (very easy) − Describe their data model in XML (very easy) − Adapt/make viewer (difficult) Provides: − Generic multirepository platform, making GRID facilities easily accessible − attract new communities, ease of hosting Future work: − Having a SOA and JSR170 compliant − Generic viewer and tree management interface (almost ZERO cost for rep providers)‏ − EELA-II Official Digital Library product 23

24 24 3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Contacts Mailing list: –glibrary@ct.infn.itglibrary@ct.infn.it Authors: –antonio.calanducci@ct.infn.itantonio.calanducci@ct.infn.it –manuel.rubio@ciemat.esmanuel.rubio@ciemat.es –raul.ramos@ciemat.esraul.ramos@ciemat.es –dtcaci@maat-g.comdtcaci@maat-g.com –jmgonzalez@maat-g.comjmgonzalez@maat-g.com Prototypes: –https://glibrary.ct.infn.it (INFN gLibrary platform)https://glibrary.ct.infn.it –https://dri-dev.ceta-ciemat.es (gLibrary/DRI platform)https://dri-dev.ceta-ciemat.es 24

25 25 3rd EGEE User Forum, 13 Feb 08, Clermont-Ferrand Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 Questions? 25


Download ppt "EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks A GRID based platform to host multiple repositories."

Similar presentations


Ads by Google