Presentation is loading. Please wait.

Presentation is loading. Please wait.

Security and Replication of Metadata with AMGA

Similar presentations


Presentation on theme: "Security and Replication of Metadata with AMGA"— Presentation transcript:

1 Security and Replication of Metadata with AMGA
B. Koblitz, IT-PSS with N. Santos, EPFL Lausanne EGEE User Forum, Manchester May 10th, 2007

2 Overview What is AMGA? What is Metadata on the Grid? Security in AMGA
EGEE applications using AMGA Replication of Metadata with AMGA Security infrastructure for replicated Metadata Use Case: Health-e-Child Conclusions

3 What is AMGA, what Metadata?
AMGA is the Metadata catalogue of gLite 1.5, back in 3.1 Metadata is relationally structured data for grid jobs (lives normally in databases) AMGA works in 2 modes: Side-by-Side a File Catalogue (LFC): File Metadata Standalone: General relational data on Grid So why can I not access DBs directly on the Grid? You can, but what about Authentication (Grid-Proxy certificates, VOMS)? Logging, tracing? Connection pooling? AMGA brings Grid-Idea to relational DBs AMGA hides DB differences AMGA allows replication and (some) federation of data AMGA has fine-grained access control to entries based on ACLs

4 Basic Concepts Schema (directory)
Has hierarchical name and list of attributes /prod/events Attributes Have name and storage type Interface handles all types as strings Entry Live in a schema, assign values to attributes Query SELECT ... WHERE ... clause in SQL-like query language Examples createdir /jobs addattr /jobs jobStatus int addentry /jobs/job1 jobStatus 0 updateattr /jobs jobStatus 1 jobID>100 selectattr /DLibrary:FileName /DLAudio:Author /DLAudio:Album '/DLibrary:FILE=/DLAudio:FILE and like(/DLibrary:FileName, “%.mp3")‘

5 What can AMGA do? AMGA features: SOAP and Text frontends
Streamed Bulk Operations Supports single calls, sessions & connections SSL security with grid certs Own User & Group management + VOMS PostgreSQL, Oracle, MySQL, SQLite backends Can access existing DBs Query parser supports good fraction of SQL: Supports complex queries: joins, math. functions (extensible) Abstracts DB data types Checks access permissions per directory/entry via ACLs

6 Security in AMGA AMGA has built-in user and group management
SSL connections for secure connections (optional): session management for performance Grid proxy and VOMS supported (optional) Credentials are mapped to internal user Allows per entry or per table (faster) ACLs Views allow per-attribute restrictions

7 EGEE experience with Metadata
Medical Data Management Climate Research AMGA Metadata Catalogue High Energy Physics Digital Library

8 Geographic Metadata UnoSat prototype uses AMGA to store GIS (Geographic Information System) Metadata for images

9 HEP: LHCb L & B LHCb uses AMGA to centrally store the entire file provenance information from jobs processing the data 100 Million entries required (successfully tested!) 150GB data entries/day insert rate expected 10 entries/second read-rate Main challenges are reliablity, performance and size Use ORACLE RAC server as backend Production software access via Java (JSP) User (read) access: Python (inc. browser)

10 Replication in AMGA AMGA integrates replication of metadata
Asynchronous replication: Ideal for WAN DBs are consistent (transactions supported) However: Not all DBs necessarily in same state Replication makes use of hierarchical table structure Global table tree Different masters for sub-trees Only one master per table! Writes only allowed on master. Top-level master controls users/groups hold information about participating DBs Master B Top-Level Master Master A

11 Repliction and Security
AMGA integrates replication of metadata with security: Metadata and user/group configuration replicated separately Allows to have local users, different credentials Metadata and ACLs are replicated separately: Allows replicated metadata to be owned by single user

12 Replication Benchmarks
AMGA keeps logs for disconnected slaves Reconnected slaves are brought up-to-date automatically Fast recovery Scalability test Setup: 1 master 10 slaves Inserts at 90/s 10% CPU overhead for 10 slaves

13 Replication in Health-e-Child
Several dozens of hospitals providing case-data Central server with credentials for participating sites and users (replication mandatory) Data replicated from site to site on demand New sites need to be registered in base AMGA server 'Automount' mechanism for joining sites Site AMGA Base Config /configuration /siteA/patients /siteB/patients ... /common/patients } Union View First Prototype works well!

14 Summary AMGA metadata service in gLite 1.5, back in 3.1
AMGA provides Grid Layer to relational databases: Abstraction of different DB vendors Efficient LAN/WAN access Fast X509 Grid security, VOMS integration Rich set of features: Transactions, Views, Sequences, Complex Joins.... Building block for distributed DBs on Grid: Asynchronous replication Tight integration of Grid security and replication Used by several EGEE applications Health-e-Child prototype makes extensive use of replication AMGA Web Site:

15 Replication & Federation Modes
AMGA replication makes use of hierarchical concept: Partial replication Full replication Federation Proxy

16 Performance Performance required to be comparable to direct DB access by HEP applications Lean C++ Implementation Fast TCP text streaming protocol, very fast SSL sessions 1e+06 AMGA 1000 rows JDBC 1000 rows AMGA 1 row JDBC 1 row 100000 10000 1000 100 Throughput [entries/s] 1 # clients 10 100 Throughput comparison between AMGA and direct access via JDBC reading same table on a LAN

17 WAN Performance Comparison with FC protocols, connection from Taiwan:
Inserts Queries 1 10 100 1000 Throughput [entries/sec] # clients 100000 10000 AMGA Single AMGA Bulk 100 LFC FM Bulk 100 AMGA Single Entry FM Bulk 1000 Comparison with FC protocols, connection from Taiwan: 300ms latency dominates performance Reduce round-trips with sessions or holding connections (Streamed) bulk operations vital for WAN performance LFC & FM measurements: C. Munro

18 LAN Performance Protocol comparison with LFC and FiReMan catalogues:
Inserts Queries AMGA Single 100000 AMGA Single Entry AMGA Bulk 100 AMGA Bulk 100 LFC LFC FM Bulk 100 FM Bulk 1000 1000 10000 1000 100 100 10 Throughput [entries/sec] 1 10 100 1 10 100 # clients # clients Protocol comparison with LFC and FiReMan catalogues: Authentication with X509 Certs, SSL connections LFN/GUID pairs inserted, query for GUID of LFN, Oracle DB AMGA scales very well up to 100 concurrent client Streamed bulk inserts/queries are very fast! LFC & FM measurements: C. Munro Measurements 2005


Download ppt "Security and Replication of Metadata with AMGA"

Similar presentations


Ads by Google