Presentation on theme: "Metadata (data about data) GridPP-15, Paul Millar."— Presentation transcript:
Metadata (data about data) GridPP-15, Paul Millar
GridPP-15Metadata2 Contents Monitoring metadata service Event-level metadata Work improving AMI Cataloguing ATLAS metadata Conclusions
GridPP-15Metadata3 Monitoring a metadata service Requirements document has been released Why do people want to monitor metadata services? How to make that happen? What already exist out there? What still needs to be done?
GridPP-15Metadata4 JMX – servlet monitoring
GridPP-15Metadata5 Monitoring Architecture
GridPP-15Metadata6 MonAMI architecture
GridPP-15Metadata7 MonAMI summary Follows the UNIX do one thing well philosophy. Takes data from from a site and pushes it somewhere. Plugin architecture: easy to add extra targets Easy to configure (really!) Multiligual: currently speaks Ganglia, but will (soon) speak Nagios, LEMON, R-GMA?,... Monitors: Apache, tomcat/JMX, MySQL,...
GridPP-15Metadata8 ATLAS Event Level Metadata Event tag infrastructure is being developed to: – Allow physicists to exclude uninteresting events from data sample – Extract samples of specific interest into a smaller fileset, for repeated running – Provide a global view of the data, useful for data mining Event tag infrastructure is currently under review Content & usage of event tags under discussion with physics groups Tags produced from Rome data for 2.3 million events, stored at CERN – https://uimon.cern.ch/twiki/bin/view/Atlas/RomeTagFAQ
GridPP-15Metadata9 Integration with DDM For event tag infrastructure to be used successfully, it must be able to work with ATLAS Distributed Data Management system (DQ2) DQ2 uses datasets as basic units of data manipulation Event tag tools currently refer only to files Currently working on ways to solve this problem DQ2 Dataset 1 File 1 File 2 File 3 Tag browser File 1 File 3
GridPP-15Metadata10 Cataloguing Event Tags Event tags can now be built and written to a master database. – This may be replicated to T1 and T2s – Different physics groups may build their own tags Geographically diverse locations for data. How should these be catalogued? – Many open questions (e.g., should AOD and Tag be distributed together, and how?) – Aim to build on prototype CollectionCatalog, to design and implement Event Tag Catalogue
GridPP-15Metadata11 AMI VOMS Completed requirements analysis of the impact of VOMS on AMI. AMI will use ATLAS-wide agreed VOMS groups and roles. Mapping of VOMS groups to AMI decided. Problem of getting the proxy certificate into AMI... using your web browser... Solutions? mod_gridsite or MyProxy.
GridPP-15Metadata12 AMI and SQLite
GridPP-15Metadata13 Cataloguing metadata Looking at ATLAS, there's metadata in: DDM (several databases) Production system (ProdDB) Event-level (tag) database(s) COOL (used for correlation of DAQ and run Numbers) AMI Catalogues datasets by physics metadata. Detector Description (file catalogues, but should be able to ignore these) Risk of information being duplicated, unknown or impossible to get at. Implement easy navigation between different metadata
GridPP-15Metadata14 Summary The metadata collaboration's work ongoing with various aspects of metadata. Monitoring requirements document released. Opportunity for working together for developing additional monitoring tools. AMI will soon support off-line analysis How to implement event-level metadata is going through discussion stage and into prototyping.