Presentation is loading. Please wait.

Presentation is loading. Please wait.

The AMGA Metadata Catalogue

Similar presentations


Presentation on theme: "The AMGA Metadata Catalogue"— Presentation transcript:

1 The AMGA Metadata Catalogue
E-science grid facility for Europe and Latin America The AMGA Metadata Catalogue Riccardo Bruno INFN Catania, EELA-2 NA2 Training

2 Metadata Metadata is data about other data.
AMGA (ARDA Metadata Grid Application); the ‘official’ Grid metadata service in gLite (gLite v3.1) Since ‘data’ in gLite means files, AMGA was originally designed to manage metadata on Grid files; but not only! Example Grid files of movie trailers stored on the Grid Each movie file has associated different metadata: Title Duration Genre (Action, Animation, Comic, Drama etc.) Cast (List of actors) User can ‘query’ on metadata in order to get back the movie file Get trailer movie files having: Duration greater than 10 minutes Get trailer movie files having: ‘Nicole Kidman’ in the Cast

3 Simplest metadata Scenario
Some SEs and a LFC on the Grid List of LFNs AMGA Server Selected Movie Files QUERY: All trailers having ‘Animation’ as Genre

4 LFC and AMGA By design there exists a close relationship between LFC and AMGA servers to associate Metadata to Files Then Metadata can be hierarchically organized, FS like LFC AMGA …/trailers/ moviefile_1.avi moviefile_m.avi italian/ ita_movie_1.avi spanish/ es_movie_1.avi …/trailers/ moviefile_1.avi moviefile_m.avi italian/ ita_movie_1.avi spanish/ es_movie_1.avi LFC AMGA List of attributes: Name : Madagascar Genre : Animation Duration: Cast : ….

5 AMGA – Metadata Terminology
Entries List of entities having metadata associated Attribute key name, key type pair Schema Set of attributes Collection A set of entries associated with a schema Metadata List of attributes (including their values) associated with entries  collection_1/ entry_1 entry_2 collection_2/ FS Analogy Integer Char Date Entries Attribute 1 Attribute 2 Attribute n Entry 01 E01’ Attrib. 1 value E01’ Attrib. 2 value E01’ Attrib. n value Entry 02 E02’ Attrib. 2 value E02’ Attrib. n value

6 ! Schemas/Attributes may be changed ANYTIME
Metadata Example /gilda/demo/trailers/ AMGA collection: >> Title >> varchar >> Duration >> int >> Genre >> Cast collection attributes: /gilda/demo/trailers/ madagascar.avi moulinrouge.avi Collection entries: RDBMS View Entry Name/RowId Title Duration Genre Casst madagascar.avi Madagascar 12 Animation Ben Stiller, … moulinrouge.avi Moulin Rouge 14 Muscal Nicol Kidman, … Attibute values ! Schemas/Attributes may be changed ANYTIME !It is possible to create: SEQUENCES, INDEXES and CONSTRAINTS >> madagascar.avi >> madagascar >> 15 >> animation >> Ben Stiller;Chris Rock;David Schwimmer;Jada Pinkett …

7 Sub-Collections remark_0001
AMGA Collections may contain sub-collections (Dir FS Analogy) AMGA Sub-collections may or not inherit parent attributes >> Title >> varchar >> Duration >> int >> Genre >> Cast >> DubbedCast /gilda/demo/trailers/ madagasgar.avi moulinrouge.avi /gilda/demo/trailers/italian madagascar_ita.avi moulinrouge_ita.avi /gilda/demo/trailers/user_remarks remark_0001 remark_0002 AMGA trailers’ sub-collections: createdir tialian inherits >> Title >> varchar >> User >> Remark

8 AMGA as DB solution Although AMGA has been desgned to serve as a Grid File metadata service; it can be used as a DB Collection  DB Table Schema  Table Schema Attribute  Schema Column Entry  Table row/record Tables may be organized in a single directory (RDBM) or hierarchically organized (OODBM). Collection/Table Entry Name RowId Attr_1/Col_1 Attr_2/Col_2 GUID_1 RecVal(1,1) RecVal(1,2) RecVal(1,n) GUID_2 RecVal(2,1) RecVal(2,2) RecVal(2,n) GUID_m RecVal(m,1) RecVal(m,2) RecVal(m,n)

9 Interacting with AMGA Streaming front end (TCP) / amgad
Users may interact with AMGA in two different frontends Streaming front end (TCP) / amgad CLI interactive session: mdclient mdjavaclient CLI single command: mdcli APIs (C++, Java, Python, Perl, PHP) SOAP frontend (WSDL) / mdsoapserver

10 Attribute Data Types AMGA PostgreSQL MySQL Oracle SQLite Pyton int integer number(38) float double precision varchar(n) character varying(n) varchar2(n) string timestamp timestamp w/o TZ datetime timestamp(6) unsupported time(unsupported) text long numeric(p,s) numeric(p.s) Using the above datatypes you are sure that your metadata can be easily moved to all supported AMGA back-ends (DB Migration) If you do not care about DB portability, you can use, in principle, any datatypes supported by the back-end, even the more specific ones: (PostgreSQL Network Address type or Geometric ones). Are Excluded Oracle’ MySQL and PostgreSQL binary types (BLOBs) Tested solution implies the use of uuencode/uudecode (shareutils) to convert binaries into Base64 text format.

11 Security in AMGA Client Authentication based on VOMS support:
Username/password General X509 certificates (DN based) Grid-proxy certificates (DN based) VOMS support: VO mapping to defined AMGA user VOMS Role mapping to defined AMGA user VOMS Group mapping to defined AMGA group Access rights, entries and collections may have Unix like permissions for mapped users and groups Definition of ACLs (per collection and entry) Connection Through secure client/server connections – SSL

12 mdcli/mdclient A configuration template file available at
/opt/glite/etc/mdclient.config Template can be copied into $PWD/mdclient.config $HOME/.mdclient.config mdclient starts a interactive session Query> mdcli executes a single AMGA command It saves a session file storing the current session status in /tmp (i.e md_18968_amga.eela.ufrj.br_8822_0) ~]$ mdcli 'whoami' prod.vo.eu-eela.eu ~]$ mdclient Connecting to amga.eela.ufrj.br: ARDA Metadata Server 1.9.0 Query>

13 mdcli/mdclient help It is possible to get help on mdcli/mdclient commands typing help <command> or <topic> Possible topics help metadata metadata-optional directory replication constraints entry group acl index schema sequence user view site replicas ticket capabilities admin commands ~]$ mdclient Connecting to amga.ct.infn.it: ARDA Metadata Server Query> help >> help [topic] >> Displays help on a command or a topic. >> Valid topics are: help metadata metadata-optional directory replication constraints entry group acl index schema sequence user view site replicas ticket capabilities admin commands Query> help metadata >> setattr entry attribute value [attribute value]... >> Sets given attributes to specified values for all entries matching entry. >> addattr dir attribute type >> Adds a new attribute to a directory Query>

14 Simple metadata commands
Create a collection createdir <path>/<collection_name> [inherits] Associate a schema to the collection addattr <path>/<collection_name> <attr_name> <attr_type> [<attr_name> <attr_type>] … List Attributes listattr <path>/<collection_name> Remove Attributes removeattr <path>/<collection_name> <attr_name> Rename Attributes renameattr <path>/<collection_name> <attr_name> Add entries and attribute values addentry <path>/<entry_name> <attr_name> <attr_value> [<attr_name> <attr_value>] … Set an attribute value setattr <path>/<entry_name> <attr_name> <attr_value> [<attr_name> <attr_value>] … List entries listentries <path>/<collection_name>

15 Getting metadata Three commands: getattr find and selectattr
getattr pattern attribute1 attribute2 … find pattern 'query' It is possible to make complex queries throug the use of boolean operators or join queries among different collections Find Query> getattr *.avi Title Duration Genre >> madagascar.avi >> madagascar >> 15 >> animation >> moulinrouge.avi >> moulin rouge! >> 12 >> Drama;Musical;Romance Query> find *.avi 'Duration > 10' >> madagascar.avi >> moulinrouge.avi Query> find *.avi 'Title=italian:Title' >> madagascar.avi

16 Getting metadata selectattr allows to get Attribute values from given queries selectattr <attrib> … 'query' Query> selectattr trailers:Title trailers/italian:DubbedCast 'trailers:Title=trailers/italian:Title' >> madagascar >> Alessandro Besentini;Francesco Villa;Fabio De Luigi: Melman la giraffa;Michelle Hunziker;Chiara Colizzi;Oreste Baldini;Roberto Draghetti;Massimiliano Alto;Luigi Ferraro;Massimo Bitossi;Elena Magoia;Franco Mannella;Gerolamo Alchieri;Pasquale Anselmo;Roberto Pedicini;Marco Mete;Stefano De Sando;Emanuela Rossi Query> selectattr trailers:Title trailers:Duration 'like(trailers:Cast,"%Kidman%")' >> moulin rouge! >> 12

17 SQL Support It is possible to issue SQL queries in AMGA
Recognized SQL statements SELECT, INSERT, UPDATE, DELETE (uppercase) INSERT statement automatically generates a unique ID as entry name Query> SELECT Title FROM trailers WHERE trailers.Duration > 10 >> trailers.Title >> madagascar >> moulin rouge! Query> SELECT trailers:Title FROM trailers, trailers/italian WHERE trailers:Title=trailers/italian.Title; Query>

18 Users and Groups AMGA maps users to configured AMGA users and groups accordingly to LOGIN name X509/GridProxy DN VOMS Groups and Roles Main user is: root Users and groups are shown and managed POSIX like d rwx rwx (user, group) user ownweship Query> ls –l >> drwxr-x gilda /gilda/demo/trailers Query> ls –l trailers >> drwxr-x gilda /gilda/trailers/italian >> drwxr-x gilda /gilda/demo/trailers/remark >> -rwxr-x gilda madagascar.avi >> -rwxr-x gilda moulinrouge.avi

19 ACLs AMGA allow users to define ACLs for
Collections Entries (MySQL5 and PostgreSQL collection created with -acl) Use acl_show or stats <collection|entry> Since AMGA v2.0 sudo command allows root user to become any user Query> acl_show trailers >> gilda rwx >> gilda:users rwx >> system:anyuser rx Query> stat madagascar.avi >> /gilda/demo/trailers/madagascar.avi >> entry >> rwx >> r-x >> gilda

20 AMGA Replication AMGA provides a replication/federation mechanisms
Motivation Scalability – Support hundreds/thousands of concurrent users Geographical distribution – Hide network latency Reliability – No single point of failure DB Independent replication – Heterogeneous DB systems Disconnected computing – Off-line access (laptops) Architecture Asynchronous replication Master-Slave Writes are only allowed on the master Application level replication Replicate Metadata with AMGA’ commands (dump) Partial replication Supports replication of only sub-trees of the metadata hierarchy

21 AMGA Replication types
Full Replication Federation Partial Replication Proxy Commands are redirected

22 AGMA DB Import Each AMGA server rely on a dedicated DB backend
Oracle, MySQL, PostgreSQL, mSQL, other (UnixODBC) Database Import: two possibilities Import tables from the DB into an AMGA DB Backend Import AMGA DB Backend into DB hosting tables Use the import command by root to “mount” your table into the AMGA collection hierarchy Query> whoami >> root Query> createdir world Query> cd world Query> import world.City world/City Query> import world.Country world/Country Query> import world.CountryLanguage world/CountryLanguage Query> acl_add /world/ gilda:users rx Query> acl_show /world >> root rwx >> gilda:users rx >> system:anyuser rx

23 DB Access and Replication
Federation and DB Import DB Access and Replication With Federation and DB Import feature it is possible to create huge federated metadata structures AMGA slave / /movie /storage /actors /comments /movie/info /movie/title /movie/aka_title /storage/LFN /storage/SEs /actors/name /actors/info /comments/info /comments/users AMGA master AMGA master AMGA master AMGA master MySQL DB Movie Metadata PostgreSQL DB Storage Oracle DB Actors PostgreSQL DB User Comments 23

24 Jobs with AMGA Since AMGA supports Grid Proxies, jobs may access to any AMGA server (mdclient.config) Normally the Job Pilot Script uses mdcli client applications to get/set metadata

25 Jobs with AMGA Since AMGA supports Grid Proxies, jobs may access to any AMGA server (mdclient.config) Normally the Job Pilot Script uses mdcli client applications to get/set metadata EXAMPLE A grid job that selects movies accordingly to a given actor A pilot script will query the AMGA server taking the actor name as parameter and identifies the LFN The file pointed by the LFN will be uploaded to the WN In the JDL a mdclient.config file has to be specified in the InputSandbox # mdclient.config Host = amga.ct.infn.it Port = 8822 Login=NULL PermissionMask = rwx GroupMask = r-x Home = /home/gilda UseSSL = require AuthenticateWithCertificate = 1 UseGridProxy = 1 VerifyServerCert = 0 TrustedCertDir = /etc/grid-security/certificates RequireDataEncryption = 1 mdclient.config # amgajobdemo.sh #!/bin/bash echo "Looking for Actor: '"$1"'" MOVIE=$(mdcli "selectattr /gilda/demo/trailers:Title 'like(/gilda/demo/trailers:Cast,\"%${1}%\")'") echo "Selected Movie Title: '"$MOVIE"'" MOVIEFILE=$(mdcli "find /gilda/demo/trailers/*.avi 'Title = \"${MOVIE}\"'") echo "Selected Trailer avi file: '"$MOVIEFILE"'" MOVIESCD=$(mdcli "pwd") echo "Uploading LFN file '"$MOVIESCD$MOVIEFILE"'" lcg-cp lfn:$MOVIESCD$MOVIEFILE file:$PWD/movie.avi ... Pilot script # amgajobdemo.jdl Type = "Job"; JobType = "Normal"; Executable = "amgajobdemo.sh"; StdOutput = "amgajobdemo.out"; StdError = "amgajobdemo.err"; InputSandbox = {"mdclient.config", "amgajobdemo.sh"}; OutputSandbox = {"amgajobdemo.out","amgajobdemo.err"}; Arguments = "Kidman"; JDL file

26 Simple usage scenario Grid Movie On Demand

27 gMOD: grid Movie On Demand
gMOD provides a Video-On-Demand service User chooses among a list of video and the chosen one is streamed in real time to the video client of the user’s workstation For each movie a lot of details are stored and users can search a particular movie querying on one or more attributes (Title, Runtime, Country, Release Date, Genre, Director, Case, Plot Outline) Two kind of users can interact with gMOD: TrailersManagers that can administer the DB of movies and GILDA VO users (guests) that can browse, search and choose a movie to be streamed. 27

28 gMOD under the hood Built on top of gLite services:
Storage Elements, sited in different place, physically contain the movie files LFC, the File Catalogue, keeps track in which Storage Element a particular movie is located AMGA is the repository of the detailed information for each movie, and makes possible queries on them The Virtual Organization Membership Service (VOMS) is used to assign the right role to the different users The Workload Management System (WMS) is responsible to retrieve the chosen movie from the right Storage Element and stream it over the network down to the user’s desktop or laptop GENIUS allow users to interact with above Grid Services 28

29 gMOD interactions VOMS AMGA GENIUS Portal LFC SEs WNs CE WMS User
get Role VOMS AMGA GENIUS Portal LFC SEs WNs Job Request CE WMS User 29

30 gMOD screenshot 30

31 Usage scenarios Grid File metadata
Gridified DB solution (Platform Independent) Job/Infrastructure Monitoring System (GANGA/MonAMI) Handle complex job workflows Producer/Consumer Job models Trivial parallelization management Partial/Full Output retrieval (Watchdog) I/O Sharing of data among different Users and Jobs Share data among Grid users securely (sensitive data) Easy backend to develop Digital Libraries (gLibrary)

32 Conclusion AMGA – Metadata Service of gLite Features:
Part of gLite 3.1 Can be used with other middleware platforms Useful to realize simple Relational Schemas or add metadata information to Grid Files Fully Integrated with the Grid Environment (Security) Features: Replication/Federation (root) Importing existing databases (root) SQL support Security (SSH, X509, G.Proxyies,VOMS,users/groups,ACLs) APIs / client Applications SOAP Tests shown good performance/scalability 32

33 References References AMGA Web Site AMGA Manual v2.0 AMGA API Javadoc
AMGA Manual v2.0 AMGA API Javadoc AMGA Basic Tutorial More information on existing DB

34 Questions? 34

35 https://grid.ct.infn.it/twiki/bin/view/GILDA/AMGAHandsOn
Let’s practice!

36 https://grid.ct.infn.it/twiki/bin/view/GILDA/AMGAHandsOn
CETA-CIEMAT AMGA mdclient.config # Connection options Host = amga-eela.ceta-ciemat.es Port = 8822 # User settingsLogin = root PermissionMask = rwx GroupMask = r-x Home = /E2GRIS2 # Security options UseSSL = noAuthenticateWithCertificate = 0 UseGridProxy = 1 TrustedCertDir = /etc/grid-security/certificates Introduction to AMGA

37 https://grid.ct.infn.it/twiki/bin/view/GILDA/AMGAHandsOn
UFRJ # Connection options Host = amga.eela.ufrj.br Port = 8822 # User settings Login=school PermissionMask = rwx GroupMask = r-x Home = /schooldir # Security options UseSSL = require AuthenticateWithCertificate = 1 # Use certificate to authenticate UseGridProxy = 1 IgnoreCertificateNameMismatch = 1 TrustedCertDir = /etc/grid-security/certificates mdclient.config Introduction to AMGA


Download ppt "The AMGA Metadata Catalogue"

Similar presentations


Ads by Google