Marianne BargiottiBK Workshop – CERN - 6/12/2007 1 Bookkeeping Meta Data catalogue: present status Marianne Bargiotti CERN.

Slides:



Advertisements
Similar presentations
Building Portals to access Grid Middleware National Technical University of Athens Konstantinos Dolkas, On behalf of Andreas Menychtas.
Advertisements

ATLAS/LHCb GANGA DEVELOPMENT Introduction Requirements Architecture and design Interfacing to the Grid Ganga prototyping A. Soroko (Oxford), K. Harrison.
File-Metadata Management System For The LHCb Experiment Carmine Cioffi Department of Physics, University of Oxford CHEP04 Interlaken, 27 September 2004.
1 OBJECTIVES To generate a web-based system enables to assemble model configurations. to submit these configurations on different.
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
1 Grid services based architectures Growing consensus that Grid services is the right concept for building the computing grids; Recent ARDA work has provoked.
B.Sc. Multimedia ComputingMedia Technologies Database Technologies.
Multiple Tiers in Action
Application for Internet Radio Directory 19/06/2012 Industrial Project (234313) Kickoff Meeting Supervisors : Oren Somekh, Nadav Golbandi Students : Moran.
8 Chapter Eight Server-side Scripts. 8 Chapter Objectives Create dynamic Web pages that retrieve and display database data using Active Server Pages Process.
The ATLAS Production System. The Architecture ATLAS Production Database Eowyn Lexor Lexor-CondorG Oracle SQL queries Dulcinea NorduGrid Panda OSGLCG The.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Introduction to R-GMA: Relational Grid Monitoring Architecture.
Scoring Program Updates & XML upload to the NSRCA web site July 2013.
LSC Segment Database Duncan Brown Caltech LIGO-G Z.
Nightly Releases and Testing Alexander Undrus Atlas SW week, May
OracleAS Reports Services. Problem Statement To simplify the process of managing, creating and execution of Oracle Reports.
Marcel Casado NCAR/RAP WEATHER WARNING TOOL NCAR.
RISICO on the GRID architecture First implementation Mirko D'Andrea, Stefano Dal Pra.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
Conditions DB in LHCb LCG Conditions DB Workshop 8-9 December 2003 P. Mato / CERN.
ISetup – A Guide/Benefit for the Functional User! Mohan Iyer January 17 th, 2008.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
LHCb week, 27 May 2004, CERN1 Using services in DIRAC A.Tsaregorodtsev, CPPM, Marseille 2 nd ARDA Workshop, June 2004, CERN.
Bookkeeping Tutorial. Bookkeeping & Monitoring Tutorial2 Bookkeeping content  Contains records of all “jobs” and all “files” that are created by production.
ILDG Middleware Status Chip Watson ILDG-6 Workshop May 12, 2005.
Some Design Notes Iteration - 2 Method - 1 Extractor main program Runs from an external VM Listens for RabbitMQ messages Starts a light database engine.
MobileMAN Internal meetingHelsinki, June 8 th 2004 NETikos activity in MobileMAN project Veronica Vanni NETikos S.p.A.
ITCS373: Internet Technology Lecture 5: More HTML.
ALICE, ATLAS, CMS & LHCb joint workshop on
A Brief Documentation.  Provides basic information about connection, server, and client.
6 th Annual Focus Users’ Conference 6 th Annual Focus Users’ Conference Import Testing Data Presented by: Adrian Ruiz Presented by: Adrian Ruiz.
DDM Monitoring David Cameron Pedro Salgado Ricardo Rocha.
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
David Adams ATLAS DIAL/ADA JDL and catalogs David Adams BNL December 4, 2003 ATLAS software workshop Production session CERN.
EGEE User Forum Data Management session Development of gLite Web Service Based Security Components for the ATLAS Metadata Interface Thomas Doherty GridPP.
E-infrastructure shared between Europe and Latin America FP6−2004−Infrastructures−6-SSA gLite Information System Pedro Rausch IF.
9 Copyright © 2009, Oracle. All rights reserved. Deploying and Reporting on ETL Jobs.
Managing Data DIRAC Project. Outline  Data management components  Storage Elements  File Catalogs  DIRAC conventions for user data  Data operation.
© FPT SOFTWARE – TRAINING MATERIAL – Internal use 04e-BM/NS/HDCV/FSOFT v2/3 JSP Application Models.
Bookkeeping Tutorial. 2 Bookkeeping content  Contains records of all “jobs” and all “files” that are produced by production jobs  Job:  In fact technically.
Collaborative Planning Training. Agenda  Collaboration Overview  Setting up Collaborative Planning  User Setups  Collaborative Planning and Forecasting.
Transformation System report Luisa Arrabito 1, Federico Stagni 2 1) LUPM CNRS/IN2P3, France 2) CERN 5 th DIRAC User Workshop 27 th – 29 th May 2015, Ferrara.
LHCb File-Metadata: Bookkeeping Carmine Cioffi Department of Physics, Oxford University UK Metadata Workshop Oxford, 04 July 2006.
"Data sources index" a web application to list projects in Hadoop Luca Menichetti.
DIRAC Data Management: consistency, integrity and coherence of data Marianne Bargiotti CERN.
ESG-CET Meeting, Boulder, CO, April 2008 Gateway Implementation 4/30/2008.
LCG Distributed Databases Deployment – Kickoff Workshop Dec Database Lookup Service Kuba Zajączkowski Chi-Wei Wang.
K. Harrison CERN, 22nd September 2004 GANGA: ADA USER INTERFACE - Ganga release status - Job-Options Editor - Python support for AJDL - Job Builder - Python.
Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.
1 ECHO ECHO 9.0 for Data Partners Rob Baker January 23, 2007.
Status of tests in the LCG 3D database testbed Eva Dafonte Pérez LCG Database Deployment and Persistency Workshop.
IBM Express Runtime Quick Start Workshop © 2007 IBM Corporation Deploying a Solution.
AMGA-Bookkeeping Carmine Cioffi Department of Physics, Oxford University UK Metadata Workshop Oxford, 05 July 2006.
Finding Data in ATLAS. May 22, 2009Jack Cranshaw (ANL)2 Starting Point Questions What is the latest reprocessing of cosmics? Are there are any AOD produced.
1 DIRAC Data Management Components A.Tsaregorodtsev, CPPM, Marseille DIRAC review panel meeting, 15 November 2005, CERN.
9/21/04 James Gallagher Server-Side: The Basics This part of the workshop contains an overview of the two servers which OPeNDAP has developed. One uses.
Interstage BPM v11.2 1Copyright © 2010 FUJITSU LIMITED INTEGRATION.
Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.
Jean-Philippe Baud, IT-GD, CERN November 2007
Client/Server Databases and the Oracle 10g Relational Database
GSAF Grid Storage Access Framework
New developments on the LHCb Bookkeeping
AMGA Web Interface Vincenzo Milazzo
Data Management in LHCb: consistency, integrity and coherence of data
DIRAC Data Management: consistency, integrity and coherence of data
Enterprise Java Beans.
Status and plans for bookkeeping system and production tools
Production client status
Best Practices in Higher Education Student Data Warehousing Forum
Presentation transcript:

Marianne BargiottiBK Workshop – CERN - 6/12/ Bookkeeping Meta Data catalogue: present status Marianne Bargiotti CERN

Marianne BargiottiBK Workshop – CERN - 6/12/ Outline  BK overview  Logical data model and DB schema  BK services and User Interface  Conclusions  Appendix A,B

BK Workshop – CERN - 6/12/ LHCb Bookkeeping Meta Data Catalogue  The Bookkeeping (BK) is the AMGA * based system that manages the meta-data infos (file-metadata) of data files. It contains information about jobs, files and their relations: Job: Application name, Application version, Application parameters, which files it has generated etc.. File: size, event, filename, from which job it was generated etc.  The Bookkeeping DB represents the main gateway for users to select the available data and datasets.  Three main services are available: Booking service: to write data to the bookkeeping Servlets service: for the BK web browsing and selection of the data files. AMGA server: for remote application use. *: AMGA is the ARDA implementation of the ARDA/gLite Metadata Catalog Interface (see

Marianne BargiottiBK Workshop – CERN - 6/12/ Logical data models  The AMGA-Schema shows how the information is logically grouped:  The logical model is built around the two main entities: Jobs and Files The relation between them is of type input/output. A job can take one or more files as input and produce more than a file (usually a data file plus a couple of log files). Around these two entities there is a full set of satellite entities (Fileparams, Jobparams etc) that help to keep extra information. At each entity is associated one or more attributes. For instance LFN is an attribute of Files or Program Name is an attribute of Jobparams. the entities in the AMGA logical model are directories (see Appendix A)

5 DB schema  The database tables are logically grouped in two (plus one) sets based on their functionality: the Warehouse tables :  at each AMGA directory/entity is associated a table in the database (see Appendix A) the Views:  The views summarise the information stored in the Warehouse database to best suite the physicists query providing a good performance  Most important: rottree and jobfileinfo views: each row in roottree summarise the attributes associate to the data-files stored in a JobFileInfoXXX table (as many JobFileInfoXXX tables as the entries in the roottree table) Auxiliary tables  The process that elaborates the Warehouse data to create or update the Views makes use of auxiliary tables (see Appendix B)

Marianne BargiottiBK Workshop – CERN - 6/12/ BK service  The bookkeeping service is made up of:  an application: BkkManager.  and two sub-services: BkkReceiver tomcat  Plus: two satellite services tightly related to the bookkeeping: FileCatalog and BkkMonitor.  All these services are currently deployed on volhcb01.cern.ch

Marianne BargiottiBK Workshop – CERN - 6/12/ Booking of data  The Booking of data is how the information about jobs and files reach the bookkeeping and how they are registered in the database.  The information about jobs and files are sent in xml format and are stored in files  Two central services involved: BkkReceiver BkkManager

Marianne BargiottiBK Workshop – CERN - 6/12/ BkkReceiver  BkkReceiver is responsible for receiving the xml files and stores them in a directory. The directory works as a queue where files are processed in a FIFO order.  BkkReceiver service is listening on port 8092 (on the deployment machine volhcb01) for jobs to send the xml formatted information about the generated files.

Marianne BargiottiBK Workshop – CERN - 6/12/ BkkManager  BkkManager is responsible of reading the xml files, checking the correctness of their format and information and uploading the new data in the database. Two DTD definition files are used:  one is used to define the jobs and files tags (Book.xml)  the second is used for the information on the replica (Replica.dtd).

Marianne BargiottiBK Workshop – CERN - 6/12/ NewConfirm servlet  BkkManager does deploy, to accomplish his duty, the NewConfirm servlet service. NewConfirm takes care of checking the conformity of the xml files to its DTD format then it checks that the information provided are correct. The information is inserted in the Database only if all the checks are fine. If one of check fails an error message is saved in a file and no information is uploaded.

Marianne BargiottiBK Workshop – CERN - 6/12/ Night updates  The BkkManager application takes care of selecting the xml files from the queue and asks NewConfirm to book them.  Every night it makes a backup of all the xml files that have been successfully booked and run the update views script. before extracting the xml files from the queue, it checks the errors generated during the processing of the previous xml files to see if there are files that need to be reprocessed.

Marianne BargiottiBK Workshop – CERN - 6/12/ Tomcat & BkkMonitor  Tomcat is the servlets container used by the bookkeeping listening on port 8080 on the deployment machine.  BkkMonitor is a monitoring service which controls FileCatalog and BkkReceiver servers.  The service actively ping these two services at one minute interval. In case of problems (service not responding):  warning sent to BK operation manager in charge  the server will be restarted.

Marianne BargiottiBK Workshop – CERN - 6/12/ User Interface: Bookkeeping Web Page  The web page allows to browse the bookkeeping contents and get information about file and their provenance.  It is also used to generated Gaudi Card, the list of files to be processed by a job.  On left frame links many browsing options: File look-up, Job look-up, Production look- up, BK summary Dataset search: retrieve a list of files based on their provenance history.  The result is:

14 FileCatalog  FileCatalog is the service used by the genCatalog script and the bookkeeping to get information about the Physical File Name of a file and its ancestors. It is a frontend to the LFC and bookkeeping database. No security is required on this service since it provides read only API.  Accessible through the web page selecting the ‘Dataset Replicated at’ section: the system looks first for LFNs in the bookkeeping database and then it tries to get the physical location for each of them from LFC search is expensive: always done on a limited number of files !! (bunches of 200)

Marianne BargiottiBK Workshop – CERN - 6/12/ Conclusion  Several issues raised: By users: web interface:  lack of functionality in the data sets search Java code to be replaced with python necessity of having a defined structure embedded in DIRAC  Forthcoming changes in the DB schema with data taking  necessity for a new versatile tool able to match different requests

16 Appendix A Description of each entity/directory:  Jobs: Each Job has a Configuration Name and Version plus the date of its execution. These three attributes are always present. Extra information about it is kept in the Jobparms and Inputfiles entities.  Jobparams: Provide extra information about job like the program name and version, the location where it was executed etc. Some attributes may not be compulsory.  Inputfiles: Contains the list of input files used by each job. No entries are present for jobs that didn’t take any input file.  Files: At each file is always associated the Logical Filename, the job that generated it and the type of file.  Fileparams: Similar to Jobparams provides extra information about files like the file size, file GUID etc. Some attributes may be not compulsory.  Qualityparams: This entity provides information on the quality of the files. It says for which group of physicists a file may be of interest.  Eventtypes: Keeps information about the event types like its description.  Typeparams: Extra information about the file type: name, description and version. The description may not be present.

Marianne BargiottiBK Workshop – CERN - 6/12/ AppendixB  Auxiliary tables: FileSummary: the table contains an entry for each files with all the related information. JobSummary: the table contains an entry for each job with related information. JobHistory: contains for each job the information on his immediate ancestor if any. JobHistory2Level: contains for each job the information on his ancestor of second degree if any. Summary: Contains an entry for each possible n-tuple (eventtype,config,filetype, dbversion, program0, inputfile1, program1, inputfile2, program2). Jobs_FileSummary: Is just the join of filesummary and jobsummary done on the column job_id.

Marianne BargiottiBK Workshop – CERN - 6/12/ AMGA server  There is no direct access to the DB: the access is direct toward the AMGA server, which then will take care of contacting the DB to serve the information to the client. The AMGA server comes with client APIs for C++, Python, Java and PerlAMGA