Presentation is loading. Please wait.

Presentation is loading. Please wait.

INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org Summary of the data access session EGEE User Forum, March 3 rd, 2006 Johan Montagnat Birger.

Similar presentations


Presentation on theme: "INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org Summary of the data access session EGEE User Forum, March 3 rd, 2006 Johan Montagnat Birger."— Presentation transcript:

1 INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org Summary of the data access session EGEE User Forum, March 3 rd, 2006 Johan Montagnat Birger Koblitz

2 EGEE UF, March 3 rd, Summary of the data access session 2 Enabling Grids for E-sciencE INFSO-RI-508833 Data access parallel session ~60 persons attending (pick activity) Talks where grouped in 3 different panels –Metadata and databases access –File access –Applications 3 associated demonstrations

3 EGEE UF, March 3 rd, Summary of the data access session 3 Enabling Grids for E-sciencE INFSO-RI-508833 Agenda Panel on metadata and databases access GDSE: data source oriented computing element –Dr. Giuliano Taffoni, INFN, CNAF ATLAS metadata interface –Thomad Doherty, University of Glasgow The AMGA metadata service –Dr. Birger Koblitz, CERN Oracle on the grid –Bjorn Engsing, Oracle

4 EGEE UF, March 3 rd, Summary of the data access session 4 Enabling Grids for E-sciencE INFSO-RI-508833 DSE: Data Source Engine We define a new Grid component (G-DSE) that enables the access to a Data Source Engine and Data Source, totally integrated with the Grid Monitoring and Discovery System and Resource Broker. The new Grid Element, finally, can be built on top of the G- DSE component. Handle very long SQL queries just like a CE would handle jobs. the Query Element

5 EGEE UF, March 3 rd, Summary of the data access session 5 Enabling Grids for E-sciencE INFSO-RI-508833 GDSE integration gatekeeper JobMangerQueryManger JobProcessQueryProcess Scheduler p-in Pbs/LFS query plug-in Query DB specific driver GRAM GIS RDBMS MDS GRIS Ldap ldif RDBMS Grid Providers (snmp)

6 EGEE UF, March 3 rd, Summary of the data access session 6 Enabling Grids for E-sciencE INFSO-RI-508833 Features Data source indexing, monitoring, management and recovery GRAM or WS protocol Transactions/queries specified through RSL/JDL The grid WMS is used to support the execution The grid IS is used to monitor the transactions GSI and VOMS based access control –Different roles (administrator, writer, selecter) –Access control at tables and rows level Connects to different RDBMS Supports workflows of query jobs with inter- dependencies Support for replication Application to AstroDBs

7 EGEE UF, March 3 rd, Summary of the data access session 7 Enabling Grids for E-sciencE INFSO-RI-508833 ATLAS Metadata Inteface It is a developing application, which stores and allows access to dataset metadata for the ATLAS experiment It fulfils the need of many database-backed applications by offering a generic web service and servlet interface, through the use of self- describing databases supports geographical distribution with the use of web services and secure access with the use of grid-certificates

8 EGEE UF, March 3 rd, Summary of the data access session 8 Enabling Grids for E-sciencE INFSO-RI-508833 Adaptation of AMI Architecture for gLite Interfaces Web Service client gLite Interface method gLite Interface Implementation Controller class Result returned in XML format

9 EGEE UF, March 3 rd, Summary of the data access session 9 Enabling Grids for E-sciencE INFSO-RI-508833 Features Supports Oracle, MySQL and SQLite DBs gLite metadata interface Web Service interface (AXIS container in tomcat) Authentication: based on certificate DN Very fine grain authorization –Roles –At project or records level –May write ad hoc control classes Secured and well defined interface for providing access to metadata

10 EGEE UF, March 3 rd, Summary of the data access session 10 Enabling Grids for E-sciencE INFSO-RI-508833 AMGA: ARDA metadata interface gLite 1.5 metadata catalog Two modes –With the LFC: bind metadata to files –Standalone: general relational data Front ends –Web Service –proprietary TCP streaming protocol Implementing the gLite metadata interface Versatile, provides both performance and security Security components (optional) –SSL connections –Password/X509 certificates/proxies based authentication –Posix-ACLs and Unix permissions at table and row level Applications: LHCb, Medical Data Management, gLibrary, UnoSat...

11 EGEE UF, March 3 rd, Summary of the data access session 11 Enabling Grids for E-sciencE INFSO-RI-508833 Performances Comparison with LFC and FireMan catalogs

12 EGEE UF, March 3 rd, Summary of the data access session 12 Enabling Grids for E-sciencE INFSO-RI-508833 Replication & Federation modes

13 EGEE UF, March 3 rd, Summary of the data access session 13 Enabling Grids for E-sciencE INFSO-RI-508833 Oracle Free Oracle software –Express edition, limited to 1 CPU Support for Linux on many distributions Provides streams for replication

14 EGEE UF, March 3 rd, Summary of the data access session 14 Enabling Grids for E-sciencE INFSO-RI-508833 Discussion Standards –How does gLite commit to standards –Lot of GGF work invested in defining standards –Difficult to endorse standards as they are evolving and the global picture is not so clear today Security –Common concern, different granularities Replication –Partially implemented in existing databases, different semantics –Should this be implemented at a higher level? Distribution –Some work on information schemas for locating metadata –What about queries on a priori unlocated data? Grids of databases –Let the grid pick the “best” database for you There is room for more research activity!

15 EGEE UF, March 3 rd, Summary of the data access session 15 Enabling Grids for E-sciencE INFSO-RI-508833 Agenda Panel on file access gLite File Transfer Service –Paolo Badino, CERN Encrypted Data Storage in EGEE –Akos Frohner, CERN Storage Resource Manager Interface –Maarten Litmaath, CERN

16 EGEE UF, March 3 rd, Summary of the data access session 16 Enabling Grids for E-sciencE INFSO-RI-508833 File Transfer Service channels Logical unit of management –Represent a directed network pipe between two sites Mono-directional Independently manageable –State –Number of streams –Number of concurrent transfers Inter-VO scheduling –VO share No Routing Between specific host pairs group of hosts

17 EGEE UF, March 3 rd, Summary of the data access session 17 Enabling Grids for E-sciencE INFSO-RI-508833 Transfer Jobs and Files Job –Represent the transfer request –Identified by a GUID File –source-destination file names pair Job States File States

18 EGEE UF, March 3 rd, Summary of the data access session 18 Enabling Grids for E-sciencE INFSO-RI-508833 What SC achieved so far SC3 Rerun (January 2006) All sites achieved target rate 8/11 sites achieved nominal rate

19 EGEE UF, March 3 rd, Summary of the data access session 19 Enabling Grids for E-sciencE INFSO-RI-508833 Encryption/Decryption System Designed to fulfill biomedical application needs –Fine grain access control –Data encryption –Anonimity Based on gLiteIO, FiReMan and an SRM v1.1 Access control through gLiteIO

20 EGEE UF, March 3 rd, Summary of the data access session 20 Enabling Grids for E-sciencE INFSO-RI-508833 Encryption Anonimity: patient data separated from files (stored in AMGA) ACL access control on files (FiReMan) File keys distributed among Hydra servers with ACL

21 EGEE UF, March 3 rd, Summary of the data access session 21 Enabling Grids for E-sciencE INFSO-RI-508833 And decryption Key retrieved from the Hydra key server Data decrypted block by block in memory (OpenSSL cyphers) Encryption also works for output data

22 EGEE UF, March 3 rd, Summary of the data access session 22 Enabling Grids for E-sciencE INFSO-RI-508833 What is the SRM? Client-server interface for Storage Resource Management –De facto standard (see further on), GGF working group  http://sdm.lbl.gov/srm-wg/ –Secure web service –Defines functions that allow storage resources to be managed from both client and server perspectives  Different requirements, optimizations, concerns SRM collaboration institutes develop different implementations –CERN + RAL + INFN (CASTOR-2) –CERN/LCG (DPM) –FNAL + DESY (dCache) –JLAB (J-SRM) –LBNL (DRM, HRM) –EGRID/INFN/GridIt (StoRM)

23 EGEE UF, March 3 rd, Summary of the data access session 23 Enabling Grids for E-sciencE INFSO-RI-508833 Is the SRM a standard? “The nice thing about standards is that there are so many to choose from.” - Andrew S. Tanenbaum Version 1.1 in widespread use –But implementations have subtle incompatibilities due to ambiguities in the “standard” –Various basic functionalities not defined Version 2.1 implemented to various extents by some projects –Try to get a critical subset implemented on WLCG by autumn 2006  Use cases defined by LHC experiments, see next pages –Still lacks some features –Incompatible with version 1  Clients and servers need to support both versions during transition period (May last a long time) Version 3 definition many months away –Again incompatible

24 EGEE UF, March 3 rd, Summary of the data access session 24 Enabling Grids for E-sciencE INFSO-RI-508833 What should the SRM do? (A. Shoshani, PPDG Review, 28 Apr 2003) Manage space dynamically –Any disk caches and Mass Storage Systems –Space reservation and negotiation –Manage “lifetime” of spaces Manage files dynamically –Pin files in storage till they are released –Manage “lifetime” of files, and action when lifetime expires Manage file sharing –Policies on what to evict when space is needed  Currently always decided by back-end Manage multi-file requests –A brokering function: queue file requests, pre-stage files –Invoke file transfer services Permit site-SRM over multiple storage systems Negotiate transfer protocols

25 EGEE UF, March 3 rd, Summary of the data access session 25 Enabling Grids for E-sciencE INFSO-RI-508833 Discussion Connection between Data Management and jobs scheduling –The file catalog holds information on files location used for scheduling –Jobs are scheduled where data sits –In some cases, data could move where resources are available for computations. –Is this desirable? Legacy code is common in scientific applications –Transparent POSIX access Data encryption –Transparency

26 EGEE UF, March 3 rd, Summary of the data access session 26 Enabling Grids for E-sciencE INFSO-RI-508833 Agenda Panel on applications Space Physics Interactive Data Resource – SPIDR –Dr. Zhinzhin, Russian Academy of Science DLibrary: a multimedia contents manager system –Dr. Tony Calanducci, INFN Catania

27 EGEE UF, March 3 rd, Summary of the data access session 27 Enabling Grids for E-sciencE INFSO-RI-508833 Space Physics Interactive Data Resources SPIDR SPIDR is a de facto standard data source on solar-terrestrial physics, functioning within the framework of the ICSU World Data Centers. It is a distributed database and application server network, built to select, visualize and model historical space weather data distributed across the Internet. SPIDR can work as a fully-functional web- application (portal) or as a grid of web- services, providing functions for other applications to access its data holdings.

28 EGEE UF, March 3 rd, Summary of the data access session 28 Enabling Grids for E-sciencE INFSO-RI-508833 SPIDR components SPIDR portal combines the central XML metadata repository with a set of distributed data web services and data file collections. A user can search for data using metadata inventory, use persistent data basket to save the selection for the next session, and plot or download in parallel the selected data in different formats, including XML and NetCDF.

29 EGEE UF, March 3 rd, Summary of the data access session 29 Enabling Grids for E-sciencE INFSO-RI-508833 Real-time usage statisics for a given time interval User sessions per day Total ~20 000 registered users Per database requests for plot (red) and export (blue)

30 EGEE UF, March 3 rd, Summary of the data access session 30 Enabling Grids for E-sciencE INFSO-RI-508833 gLibrary usage scenarios Example 1: –Locate all theoretical (PPTType) PowerPoint (Type) presentations about FireMan (Keywords) given in 2005 (Date) by Uncle Sam (Speaker); –Find all the movies (Type) in which Julia Roberts (Cast) performed together with Hugh Grant (Cast) produced in USA (Country) in 2004 (ReleaseDate); or all the acoustic (Genre) mp3 (Format) audio files (Type) of Alanis Morissette (Singer) that last more than 3 minutes (Runtime). Example 2: –A doctor is looking for brain (keyword) DICOM (Type) images of male (Gender) patients older than 65 (Age). Example 3: –A job can behave as a storage crawler: it scans pre-existing files in Storage Elements to extract relevant metadata that will be published on gLibrary for further data mining.

31 EGEE UF, March 3 rd, Summary of the data access session 31 Enabling Grids for E-sciencE INFSO-RI-508833 Example of gLibrary collections /gLPPT PowerPoint /EGEEPPT EGEEDOC /gLDOC Documents /gLVideo Video /gLImage Image /gLAudio Audio Path (refers to a collection) Attributes Entry names /gLTypesCollection Theorical Type 00:30:00 Runtime Valeria Ardizzione, Giuseppe La Rocca Author R-GMA, BDII Topic Giuseppe La Rocca, Valeria Ardizzone Speaker 4 th EGEE Conferen ce Event 2005-10-23 Date Information Systems 00454dca-a269- 4b93-8a45- c4012af05600 Title Attributes Entry names /EGEEPPTCollection Pop Genre 00:03:27 Duration Dedicato A Te Album MP3 Format Le Vibrazioni Singer Dedicato A Te 4ffaffc8-26e7-4826- b460-3d5bf08081a4 SongTitle Attributes Entry names /gLAudioCollection ardizzo 00454dca-a269-4b93-8a45- c4012af05600 Passphrase Attributes Entry names /gLKeysCollection “additional features”

32 EGEE UF, March 3 rd, Summary of the data access session 32 Enabling Grids for E-sciencE INFSO-RI-508833 gLibrary Security User Requirements: –a valid proxy with VOMS extensions –VOMS Role and Group needed to be recognized by gLibrary as a contents manager. 3 kinds of users: –gLibraryManager: (s)he can create new content type and allows a generic VO user to become gLibrarySubmitter –gLibrarySubmitters: they can add new entries and define access rights on the entries they create.  Fine-grained permission (reading, writing, listing, decrypting) settings on each entry: whole VO members, VO groups, list of DNs –generic VO users: browse and make queries (on entries they have access to) Basic level of cryptography: –New files saved on SEs can be encrypted beforehand with a symmetric passphrase that will be saved in /gLKeys. Only selected users (that have a specific DN in the subject of their VOMS proxy) can access the passphrase and decrypt the file.

33 EGEE UF, March 3 rd, Summary of the data access session 33 Enabling Grids for E-sciencE INFSO-RI-508833 Features Born as an use case to demonstrate AMGA features Built on top of many gLite services Considering collaboration and integration with NA3 Document Digital Library System Fast → thanks to AMGA Secure → ACLs, encryption, and splitting Easy to use → User friendly Java GUI and portal soon available Easily extensible to support any document types (Medical Images and files, Invoices, Proceedings, Scientific Publications, Newspapers clips, …)

34 EGEE UF, March 3 rd, Summary of the data access session 34 Enabling Grids for E-sciencE INFSO-RI-508833 Discussion SPIDR want to use grids for –Security and access control –Asynchronous access to large amount of data gLibrary –Flexibility of the schema to adapt to many document types –Content analysis / indexing of documents Very different needs for database access => room for many solutions: –GDSE: Time consuming jobs on databases –AMGA: Fast access to small amounts of (returned) metadata –SPIDR: Asynchronous access to large amounts of metadata


Download ppt "INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org Summary of the data access session EGEE User Forum, March 3 rd, 2006 Johan Montagnat Birger."

Similar presentations


Ads by Google