Presentation is loading. Please wait.

Presentation is loading. Please wait.

IST-2006-026409 www.eu-eela.org E-infrastructure shared between Europe and Latin America The AMGA metadata catalog with use cases Domenico Vicinanza, CERN.

Similar presentations


Presentation on theme: "IST-2006-026409 www.eu-eela.org E-infrastructure shared between Europe and Latin America The AMGA metadata catalog with use cases Domenico Vicinanza, CERN."— Presentation transcript:

1 IST-2006-026409 www.eu-eela.org E-infrastructure shared between Europe and Latin America The AMGA metadata catalog with use cases Domenico Vicinanza, CERN EELA Tutorial, Santiago, September 2006

2 IST-2006-026409 E-infrastructure shared between Europe and Latin America www.eu-eela.org Santiago, Chile, EELA Tutorial, 06-07.09.2006 2 Background and Motivation for AMGA Interface, Architecture and Implementation Metadata Replication on AMGA Deployment Examples GILDA Use cases Contents

3 IST-2006-026409 E-infrastructure shared between Europe and Latin America www.eu-eela.org Santiago, Chile, EELA Tutorial, 06-07.09.2006 3 Introduction Data on the GRID are represented by millions of files spread over several sites To locate and find those of interest, users and applications need and efficient mechanism to query and discover information about their contents This is provided by –associating descriptive attributes (metadata) to the data –exposing this information in catalogues which can be queried to locate files (thanks to their attributes) Metadata are typically modeled as couples (key, value)

4 IST-2006-026409 E-infrastructure shared between Europe and Latin America www.eu-eela.org Santiago, Chile, EELA Tutorial, 06-07.09.2006 4 Accessing metadata Accessing metadata is conceptually similar to accessing databases but …having clients going directly to the database is not the most convenient (secure and effective) solution!!! A better solution is to have a simple interface for metadata access on the GRID Such a metadata interface –Should be defined in terms of metadata concepts (key, values) –It grants the access to the metadata hiding the DB structure and implementation (transparency) –It would be fully compatible with distributed sources of information (distributed data storages) …it would work as a simplified relational DB interface

5 IST-2006-026409 E-infrastructure shared between Europe and Latin America www.eu-eela.org Santiago, Chile, EELA Tutorial, 06-07.09.2006 5 Metadata services A metadata service to be used in a GRID environment should satisfy some specific requirements: It must expose a complete but simple interface, so non technical user can easily use it It should be flexible and support dynamic schemas (since there is no single schema which can support the whole application domain) It must support a hierarchical structure (related metadata can be grouped together and an isolated from other metadata) Security is required to provide different access levels to different users

6 IST-2006-026409 E-infrastructure shared between Europe and Latin America www.eu-eela.org Santiago, Chile, EELA Tutorial, 06-07.09.2006 6 Basic concepts The basic concepts of the metadata interface are Entries (the names of the data item or resource being described) Attributes (the (key, value) pair) Schema (the logical group of attributes) Entries are associated with one or more schemas and inherit the attributes defined in those schemas This is the only way of associating an attribute to an entry (it is not possible to have attributes associated directly with entries) Schemas are defined dynamically by the user

7 IST-2006-026409 E-infrastructure shared between Europe and Latin America www.eu-eela.org Santiago, Chile, EELA Tutorial, 06-07.09.2006 7 2004 - ARDA evaluated existing Metadata Services from HEP experiment: AMI (ATLAS), RefDB (CMS), Alien Metadata Catalogue (ALICE)..and proposed an interface for Metadata access on the GRID –Based on requirements of LHC experiments –Generic - not bound to a particular application domain –Designed jointly with the gLite/EGEE team –Incorporates feedback from GridPP Adopted as the official EGEE Metadata Interface –Endorsed by PTF (Project Technical Forum of EGEE) ARDA/gLite Metadata Interface

8 IST-2006-026409 E-infrastructure shared between Europe and Latin America www.eu-eela.org Santiago, Chile, EELA Tutorial, 06-07.09.2006 8 ARDA developed a P roject T ask F orce in order to develop: –AMGA – ARDA Metadata Grid Application It began as prototype to evaluate the Metadata Interface –Evaluated by community since the beginning (LHCb and Ganga) –Matured quickly thanks to users feedback AMGA is currently part of the gLite middleware –Official Metadata Service for EGEE –First release with gLite 1.5 –Also available as standalone component It is expanding to other user communities: –HEP, Biomed, UNOSAT… AMGA Implementation

9 IST-2006-026409 E-infrastructure shared between Europe and Latin America www.eu-eela.org Santiago, Chile, EELA Tutorial, 06-07.09.2006 9 Metadata organised as an hierarchy –Collections can contain sub-collections –Analogy to file system:  Collection  Directory; Entry  File Flexible Queries –SQL-like query language –Joins between schemas –Example QUERY EXAMPLE: > getattr /gilda/santiago/musica/*.mov Name Duration AMGA Features

10 IST-2006-026409 E-infrastructure shared between Europe and Latin America www.eu-eela.org Santiago, Chile, EELA Tutorial, 06-07.09.2006 AMGA Security Unix style permissions ACLs – per-collection or per-entry. Secure connections – SSL Client Authentication based on –Username/password –General X509 certificates –Grid-proxy certificates Access control via a Virtual Organization Management System (VOMS)

11 IST-2006-026409 E-infrastructure shared between Europe and Latin America www.eu-eela.org Santiago, Chile, EELA Tutorial, 06-07.09.2006 11 C++ multiprocess server –Runs on any Linux flavour Backends –Oracle, MySQL, PostgreSQL, SQLite Two frontends –TCP Streaming  High performance  Client API for: C++, Java, Python, Perl, Ruby –SOAP  Interoperability Also implemented as standalone Python library –Data stored on filesystem AMGA Implementation

12 IST-2006-026409 E-infrastructure shared between Europe and Latin America www.eu-eela.org Santiago, Chile, EELA Tutorial, 06-07.09.2006 12 Designed for scalability –Asynchronous operation  Reading from DB and sending data to client –Response sent to client in chunks  No limit on the maximum response size Example: TCP Streaming –Text based protocol (like SMTP, POP3,…) –Response streamed to client Client: listattr entry Server: 0 entry value1 value2 … Architecture

13 IST-2006-026409 E-infrastructure shared between Europe and Latin America www.eu-eela.org Santiago, Chile, EELA Tutorial, 06-07.09.2006 13 Full replication Partial replication FederationProxy Metadata management

14 IST-2006-026409 E-infrastructure shared between Europe and Latin America www.eu-eela.org Santiago, Chile, EELA Tutorial, 06-07.09.2006 14 LHCb-bookkeeping (keep additional information from executed jobs) –Migrated bookkeeping metadata to ARDA prototype  20M entries, 15 GB  Large amount of static metadata –Feedback valuable in improving interface and fixing bugs –AMGA showing good scalability Ganga –Job management system  Developed jointly by Atlas and LHCb –Uses AMGA for storing information about job status  Small amount of highly dynamic metadata Early adopters of AMGA

15 IST-2006-026409 E-infrastructure shared between Europe and Latin America www.eu-eela.org Santiago, Chile, EELA Tutorial, 06-07.09.2006 Medical Data Manager – MDM –Store and access medical images and associated metadata on the Grid –Built on top of gLite 1.5 data management system –Demonstrated at last EGEE conference (October 05, Pisa) Strong security requirements –Patient data is sensitive –Data must be encrypted –Metadata access must be restricted to authorized users AMGA used as metadata server –Demonstrates authentication and encrypted access –Used as a simplified DB More details at: – https://uimon.cern.ch/twiki/bin/view/EGEE/DMEncryp tedStorage Biomed

16 IST-2006-026409 E-infrastructure shared between Europe and Latin America www.eu-eela.org Santiago, Chile, EELA Tutorial, 06-07.09.2006 16 TCP Streaming Front-end –mdcli & mdclient and C++ API (md_cli.h, MD_Client.h) –Java Client API and command line mdjavaclient.sh & mdjavacli.sh (also under Windows !!) –Python Client API SOAP Frontend (WSDL) –C++ gSOAP –AXIS (Java) –ZSI (Python) Accessing AMGA

17 IST-2006-026409 E-infrastructure shared between Europe and Latin America www.eu-eela.org Santiago, Chile, EELA Tutorial, 06-07.09.2006 17 AMGA – Metadata Service of gLite Part of gLite (but still not certificed in gLite 3.0. it will be done with 3.1 release) –Useful for simplified DB access –Integrated on the Grid environment (Security) Replication/Federation features Tests show good performance/scalability Already deployed by several Grid Applications –LHCb, ATLAS, Biomed, … AMGA Web Site – http://project-arda-dev.web.cern.ch/project-arda-dev/metadata/ Conclusion

18 IST-2006-026409 E-infrastructure shared between Europe and Latin America www.eu-eela.org Santiago, Chile, EELA Tutorial, 06-07.09.2006 18 gLibrary AMGA for geospatial metadata: GIS (Geographical Information System) gMOD GILDA Use cases

19 IST-2006-026409 E-infrastructure shared between Europe and Latin America www.eu-eela.org Santiago, Chile, EELA Tutorial, 06-07.09.2006 19 Huge amounts of data can be saved on SEs How can we easily find later a file that we need? –(if you have good memory, its GUID could be a solution) –File Catalogues just let us to arrange files in folders and subfolders, no way to query on their contents –Metadata Catalogues are a possible solution, but not always “affordable” especially for non expert users (powerful but complex to use) Our solution: a higher level application built on top of many gLite grid services: a Metadata Catalogue + File Catalogues + Storage Elements  gLibrary Requirements: easy to use, fast, secure, extensible gLibrary Motivations

20 IST-2006-026409 E-infrastructure shared between Europe and Latin America www.eu-eela.org Santiago, Chile, EELA Tutorial, 06-07.09.2006 20 gLibrary is a higher level application built on top of many gLite grid services: a Metadata Catalogue + File Catalogues + Storage Elements …with these requirements: easy to use, fast, secure, extensible gLibrary attempts to create a Multimedia Management System on the Grid –Examples of Multimedia Contents handled by gLibrary:  Images, Movies, Audio Files  Office Documents (Powerpoint, Word, Excel, OpenOffice)  E-Mails, PDFs, HTMLs  Customized versions of well-know document type (ex. EGEE PPTs) Keep track and organize in a uniform way all the additional details (metadata) of files saved in Storage Elements and registered in File Catalogues Provide users an easy way to locate and retrieve files based on their contents gLibrary goals

21 IST-2006-026409 E-infrastructure shared between Europe and Latin America www.eu-eela.org Santiago, Chile, EELA Tutorial, 06-07.09.2006 21 Examples (Office/Entertainment): Locate all theoretical (PPTType) PowerPoint (Type) presentations about FireMan (Keywords) given in 2005 (Date) by John.S (Speaker); Find all the movies (Type) in which Julia Roberts (Cast) performed together with Hugh Grant (Cast) produced in USA (Country) in 2004 (ReleaseDate) All the acoustic (Genre) mp3 (Format) audio files (Type) of Alanis Morissette (Singer) that last more than 3 minutes (Runtime). Usage Scenarios

22 IST-2006-026409 E-infrastructure shared between Europe and Latin America www.eu-eela.org Santiago, Chile, EELA Tutorial, 06-07.09.2006 22 Example 2 (Biomed): A doctor is looking for brain (keyword) DICOM (Type) images of male (Gender) patients older than 65 (Age). Example 3 (Complex activities): A job can behave as a storage crawler: it scans pre- existing files in Storage Elements to extract relevant metadata that will be published on gLibrary for further data mining.

23 IST-2006-026409 E-infrastructure shared between Europe and Latin America www.eu-eela.org Santiago, Chile, EELA Tutorial, 06-07.09.2006 23 Files are saved on SEs and registered into file catalogues (LFC and/or FiReMan) The AMGA Metadata Catalogue is used to archive and organize metadata and to answer users’ queries. gLibrary is built using the following AMGA collections: –/gLibrary contains generic metadata for each entry –/gLAudio, /gLImage, /gLVideo, /gLPPT, /EGEEPPT, /gLDoc, … are examples of collections of “additional features” (shown later) –/gLTypes  keeps the associations between document types and the names of the collection that contains the “additional features”  is used by gLibrary to find out where it has to look when new document types are added into the system (extensibility) –/gLKeys is used to store Decryption Keys gLibrary prototype implementation

24 IST-2006-026409 E-infrastructure shared between Europe and Latin America www.eu-eela.org Santiago, Chile, EELA Tutorial, 06-07.09.2006 24 Collection/gLibrary Entry Names Attributes FileNamePathNameTypeSubmitter 4ffaffc8-26e7-4826-b460- 3d5bf08081a4 DedicatoAte.mp3/grid/gilda/calanducciAudioTony Calanducci 00454dca-a269-4b93-8a45- c4012af05600 ardizzonelarocca_is_231005.ppt.gpg/grid/gilda/calanducci/ EGEE EGEEDOCTony Calanducci /gLibrary (continuum) Attributes SubmissionDateEncryptionDescriptionKeywordsCreationDate 2006-01-05 00:00:00falseSong of the Italian Band “Le Vibrazioni” Vibrazioni2004-02-05 00:00:00 2005-01-05 16:44:22truegLite Information SystemR-GMA, RGMA, BDII, IS2005-10-05 23:40... Example of Entries

25 IST-2006-026409 E-infrastructure shared between Europe and Latin America www.eu-eela.org Santiago, Chile, EELA Tutorial, 06-07.09.2006 25 Collection/gLTypes Entry names Attributes Path (refers to a collection) Audio /gLAudio Image /gLImage Video /gLVideo Documents /gLDOC PowerPoint /gLPPT EGEEDOC /EGEEPPT Collection/EGEEPPT Entry names Attributes TitleRuntimeAuthorTypeDateEventSpeakerTopic 00454dca-a269- 4b93-8a45- c4012af05600 Information Systems 00:30:00Valeria Ardizzione, Giuseppe La Rocca Theorical2005-10-234 th EGEE Conferen ce Giuseppe La Rocca, Valeria Ardizzone R-GMA, BDII Collection/gLAudio Entry names Attributes SongTitleDurationAlbumGenreSingerFormat 4ffaffc8-26e7-4826- b460-3d5bf08081a4 Dedicato A Te00:03:27Dedicato A TePopLe VibrazioniMP3 Collection/gLKeys Entry names Attributes Passphrase 00454dca-a269-4b93-8a45- c4012af05600 ardizzo “additional features” Example of gLibrary collections

26 IST-2006-026409 E-infrastructure shared between Europe and Latin America www.eu-eela.org Santiago, Chile, EELA Tutorial, 06-07.09.2006 26 User Requirements: –a valid proxy with VOMS extensions –VOMS Role and Group needed to be recognized by gLibrary as a contents manager. 3 kinds of users: –gLibraryManager: (s)he can create new content type and allows a generic VO user to become gLibrarySubmitter –gLibrarySubmitters: they can add new entries and define access rights on the entries they create.  Fine-grained permission (reading, writing, listing, decrypting) settings on each entry: whole VO members, groups, list of DNs –generic VO users: browse and make queries (on entries they have access to) Basic level of cryptography: –New files saved on SEs can be encrypted beforehand with a symmetric passphrase that will be saved in /gLKeys. Only selected users (that have a specific DN in the subject of their VOMS proxy) can access the passphrase and decrypt the file. gLibrary Security

27 IST-2006-026409 E-infrastructure shared between Europe and Latin America www.eu-eela.org Santiago, Chile, EELA Tutorial, 06-07.09.2006 27 AMGA Datatypes –Using the above datatypes you are sure that your metadata can be easily moved to all supported back-ends –If you do not care about DB portability, you can use, in principle, as entry attribute type ALL the datatypes supported by the back- end, even the more esoteric ones (PostgreSQL Network Address type or Geometric ones) We played a little bit with GIS Datatype offered by MySQL 5 AMGA for GIS Datatype Metadata

28 IST-2006-026409 E-infrastructure shared between Europe and Latin America www.eu-eela.org Santiago, Chile, EELA Tutorial, 06-07.09.2006 28 Query> listattr /ESR/opera_nno >> Dataset >> varchar(30) >> File_Name >> varchar(50) >> Footprint >> multipolygon >> Lat >> numeric(8,2) >> Level >> varchar(5) >> Lon >> numeric(8,2) >> Orbit >> int(5) >> Proc_centre >> varchar(50) >> Proc_date >> timestamp >> Start_Date >> timestamp >> Stop_Date >> timestamp... We created a /ESR/opera_nno collection asking AMGA to use the MyISAM table engine We used insert command that evaluates all inserted values: insert sameEntryName Dataset "GOME" Level 2 Version "v1.1" Orbit 25421 File_Name "/grid/esr/gome/utv/2000/03/00301000.utv" Start_Date '"2000-02-29 00:01:00.0"' Stop_Date '"2000-02-29 00:58:00.0"' Footprint 'MPolyFromText("MULTIPOLYGON(((82.96 -59.12,75.95 -89.07,75.95 -89.07,76.46 - 94.77,76.84 -100.85,77.07 -107.21,77.13 -115.34,77.00 -121.80,76.72 -… … 150.74,85.47 -136.17,85.80 -117.93,85.57 -94.31,84.94 -78.84,84.03 - 67.39,82.96 -59.12)))")' Proc_centre "EGEE" Proc_date '"2005-10-14 13:20:00.0"' File_input "00301000.lv1" Proc_description '"Algorithm: utv"' Example with ESR data

29 IST-2006-026409 E-infrastructure shared between Europe and Latin America www.eu-eela.org Santiago, Chile, EELA Tutorial, 06-07.09.2006 29 As a summary, the following functions work : GeomFromText(), MPolyFromText(), Contains(), AsText() In principle PostgreSQL+PostGIS would also work but this is not fully tested. Query> selectattr /ESR/opera_nno:File_Name /ESR/opera_nno:Start_Date /ESR/opera_nno:Stop_Date 'Contains(/ESR/opera_nno:Footprint, GeomFromText("POINT(82.96 -59.12)"))' >> /grid/esr/gome/utv/2000/03/00301000.utv >> 2000-02-29 00:01:00 >> 2000-02-29 00:58:00 Query> selectattr /ESR/opera_nno:File_Name AsText(/ESR/opera_nno:Footprint) '' >> /grid/esr/gome/utv/2000/03/00301000.utv >> MULTIPOLYGON(((82.96 -59.12,75.95 -89.07,75.95 -89.07,76.46 -94.77,76.84 - 100.85,77.07 -107.21,77.13 -115.34,77 -121.8,76.72 -128.08,76.3 -134.03,75.74 - 139.59,75.07… Let’s check if the entry was properly inserted (we need to use AsText() to decode a MultiPolygon): We want to look for a Polygon that cointains a given point: Sample queries

30 IST-2006-026409 E-infrastructure shared between Europe and Latin America www.eu-eela.org Santiago, Chile, EELA Tutorial, 06-07.09.2006 30 gMOD provides a Video-On-Demand service User chooses among a list of video and the chosen one is streamed in real time to the video client of the user’s workstation For each movie a lot of details (Title, Runtime, Country, Release Date, Genre, Director, Case, Plot Outline) are stored and users can search a particular movie querying on one or more attributes Two kind of users can interact with gMOD: TrailersManagers that can administer the db of movies (uploading new ones and attaching metadata to them); GILDA VO users (guest) can browse, search and choose a movie to be streamed. gMOD: grid Movie On Demand

31 IST-2006-026409 E-infrastructure shared between Europe and Latin America www.eu-eela.org Santiago, Chile, EELA Tutorial, 06-07.09.2006 31 Built on top of gLite services + GENIUS web portal: Storage Elements, sited in different places, physically contain the movie files FireMan, the File Catalogue, keeps track in which Storage Element a particular movie is located AMGA is the repository of the detailed information for each movie, and makes possible queries on them The Virtual Organization Membership Service (VOMS) is used to assign the right role to the different users The Workload Management System (WMS) is responsible to retrieve the chosen movie from the right Storage Element and stream it over the network down to the user’s desktop or laptop gMOD under the hood

32 IST-2006-026409 E-infrastructure shared between Europe and Latin America www.eu-eela.org Santiago, Chile, EELA Tutorial, 06-07.09.2006 32 VOMS FireMan Catalogue Metadata Catalogue WNWN WN CE Storage Elements User Genius Portal Workload Management System get Role AMGA gMOD interactions

33 IST-2006-026409 E-infrastructure shared between Europe and Latin America www.eu-eela.org Santiago, Chile, EELA Tutorial, 06-07.09.2006 33 gMOD is accesible through the Genius Portal (https://glite-tutor.ct.infn.it) gMOD screenshot

34 IST-2006-026409 E-infrastructure shared between Europe and Latin America www.eu-eela.org Santiago, Chile, EELA Tutorial, 06-07.09.2006 34 Thanks to Riccardo Bruno who developed the first version of these slides


Download ppt "IST-2006-026409 www.eu-eela.org E-infrastructure shared between Europe and Latin America The AMGA metadata catalog with use cases Domenico Vicinanza, CERN."

Similar presentations


Ads by Google