Presentation is loading. Please wait.

Presentation is loading. Please wait.

Implementing Metadata Using RLS/LCG James Cunha Werner University of Manchester

Similar presentations


Presentation on theme: "Implementing Metadata Using RLS/LCG James Cunha Werner University of Manchester"— Presentation transcript:

1 Implementing Metadata Using RLS/LCG James Cunha Werner University of Manchester http://www.hep.man.ac.uk/u/jamwer/

2 Metadata Meeting - Grenoble 2005 James Werner jamwer2000@hotmail.com Babar Experiment The BaBar experiment studies the differences between matter and antimatter, to throw light on the problem, posed by Sakharov, of how the matter-antimatter symmetric Big Bang can have given rise to today’s matter- dominated universe. High energy collisions between electrons and positrons produce other elementary particles, giving tracks and clusters which are recorded by several high granularity detectors and from which the properties of the short- lived particles can be deduced.

3 Metadata Meeting - Grenoble 2005 James Werner jamwer2000@hotmail.com Each recorded collision, called an event, comprises a large volume of data, and thousand of millions of events are recorded, giving a total dataset size of hundreds of thousands of Gigabytes (or hundreds of Terabytes).

4 Metadata Meeting - Grenoble 2005 James Werner jamwer2000@hotmail.com Sources of Data in Babar

5 Metadata Meeting - Grenoble 2005 James Werner jamwer2000@hotmail.com Amount of data # FilesSize (TB)Events (Million) Run16,9722.0593 Run211,5276.31,925 Run37,3833.2951 Run416,67112.23,999 Run5 (2xRun4) ???32,000248 Run6 (2xRun5) ???64,0004816 Run7 (2xRun6) ???128,00010032 SuperBabar ! Systematic errors >>> statistical errors Same amount of Monte Carlo Generated data!

6 Metadata Meeting - Grenoble 2005 James Werner jamwer2000@hotmail.com Data Structure The user interface to the eventstore: event "collection". Each collection represents an ordered series of N events and a user can choose to read the events from the 1st one in the sequence or from any given offset into the sequence. Data components: – hdr - event header –usr - user data –tag - tag information –cnd - candidate information –aod - "analysis object data" –tru - MC truth data (only in MC data) –esd - "event summary data" –sim - "sim" data from BgsApp or MooseApp like GHits/GVertices (only in MC data) –raw - subset of raw data from xtc persisted in the Kanga eventstore

7 Metadata Meeting - Grenoble 2005 James Werner jamwer2000@hotmail.com Data organisation How data are stored (level of detail): micro = hdr + usr + tag + cnd + aod (+ tru) mini = micro + esd Data access: collections - these are "logical" names that users use to configure their jobs. These are site-independent so (assuming the site has imported the data) the same collection name should work at any site. logical file names (LFN) - these are site-independent names give to all files in the eventstore. Any references within the event data itself _must_ use LFN's so that these remain valid when they are moved from site to site. physical file names (PFN) - these are file names that will vary from site to site. In practice they are usually derived from the LFN's by adding a prefix that encapsulates how the data is accessed at that site.

8 Metadata Meeting - Grenoble 2005 James Werner jamwer2000@hotmail.com

9 Metadata Meeting - Grenoble 2005 James Werner jamwer2000@hotmail.com Feeding RLS with metadata Generation of basic metadata file with files selection: #!/bin/bash BbkDatasetTcl --dbsite=local > MetaLista.txt cat MetaLista.txt | awk '// {print "BbkDatasetTcl --site local --nolocal \""$1"\"";}' >> geratcl chmod 700 geratcl./geratcl Feeding RLS with basic files #!/bin/bash ls *.tcl | awk '// {split($1,a,"."); print "edg-rm --vo babar cr file:///home/jamwer/PgmCM2/MetaData/"$1 " -l lfn:"a[1] " > " a[1]".rlstok";}' >> alimrls chmod 700 alimrls./alimrls

10 Metadata Meeting - Grenoble 2005 James Werner jamwer2000@hotmail.com Conformity CE catalogue Run evaluation software to establish CE conformity and perform catalogue update. #!/bin/bash ldapsearch -x -H ldap://lcgbdii02.gridpp.rl.ac.uk:2170 -b 'Mds-vo- name=local,o=Grid' '(&(objectClass=GlueCE)(GlueCEAccessControlBaseRule=VO:baba r))' | grep "GlueCEUniqueID:" > cenames.txt cat cenames.txt | awk '// {print "./catal "$2;}' > subload.sh chmod 700 subload.sh./subload.sh cat loadrlssubm >> $1.histo cat $1.histo | awk ' /Sub/ {FileName=$2} /https/ {HandleName=$2; print "echo " HandleName "> " FileName".tok " }' >> gridtok chmod 700 gridtok./gridtok

11 Metadata Meeting - Grenoble 2005 James Werner jamwer2000@hotmail.com Conformity validation Verify if site follow experiment standards: #!/bin/bash echo Hostname `/bin/hostname` echo Start time: `/bin/date` echo local=`pwd` echo “Babar initialisation ". $VO_BABAR_SW_DIR/babar-grid-setup-env.sh echo echo “Environment variables" printenv echo cd $local echo Arquivos disponiveis: $local ls echo echo " - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - " echo cd $BFDIST/releases/14.5.2 srtpath 14.5.2 Linux24RH72_i386_gcc2953 cd $local BbkDatasetTcl --dbsite=local > MetaLista.txt cat MetaLista.txt | awk '// {print "BbkDatasetTcl --site local \""$1"\"";}' >> geratcl chmod 700 geratcl./geratcl export CE_NAME=$1 ls *.tcl | awk -v site=CE_NAME '// {split($1,a,"."); print "edg-rm --vo babar addAlias `cat " $1"` lfn:"a[1]"."site ;}' >> alimrls chmod 700 alimrls./alimrls echo echo " - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - " echo echo End time: `/bin/date`

12 Metadata Meeting - Grenoble 2005 James Werner jamwer2000@hotmail.com Analysis Submission to Grid Single command:./easygrid dataset_name Perform Handlers management and submission Configurable to achieve user’s requirements Software based in State-machine –Verify skimdata available: If not available perform BbkDatasetTCL to generate skimData. Each file will be a job. –Verify if there are handlers pending If not, script generation (gera.c) with edg-job-submit and ClassAdds, and script execution. Nest for submission policy and optimisation. If yes, verify job status. When the all jobs ended, recover results in user folder. (Prototype)

13 Metadata Meeting - Grenoble 2005 James Werner jamwer2000@hotmail.com Job Submission system, metadata and data

14 Metadata Meeting - Grenoble 2005 James Werner jamwer2000@hotmail.com Metadata/Event files and Computer elements For each dataset there is a metadata file containing the names of the event files. These physical files are registered with the RLS, with several logical file names in the format datsetname_CEJobQueue assigned to them as aliases, showing the CEs which contain copies of that dataset. Searching all the aliases for a dataset name provides a list of CEs to which jobs can be submitted.

15 Metadata Meeting - Grenoble 2005 James Werner jamwer2000@hotmail.com Managing large files in Grid The analysis executable is allocated in the SE and its logical file name (LFN) is also catalogued in the RLS so any WN need download it only once. Metadata not only for data, but to support other files as well.

16 Metadata Meeting - Grenoble 2005 James Werner jamwer2000@hotmail.com Gera Generation of all necessary information to submit the jobs on the Grid. –Job Description Language (JDL) files –the script with all necessary tasks to run the analysis remotely at a WN –some grid dependent analysis parameters. The JDL files define the input sandbox with all necessary files to be transferred WN balance load algorithm matches requirements to perform the task optimally.

17 Metadata Meeting - Grenoble 2005 James Werner jamwer2000@hotmail.com Running analysis programs When the task is delivered in the WN, scripts start running to initialize the specific Babar environment, and the analysis software is downloaded.

18 Metadata Meeting - Grenoble 2005 James Werner jamwer2000@hotmail.com Benchmarks The different behavior of electrons, hadrons, and muons can be distinguished. Performing this analysis takes 7 days using one computer 24 hours a day. Using 10 CPUs in parallel, accessed via the Grid, it took only 8 hours. Behavior of particles in the BaBar Electromagnetic Calorimeter (EMC )

19 Metadata Meeting - Grenoble 2005 James Werner jamwer2000@hotmail.com Pi+- N Pi0 decays, with N= 1, 2, 3 and 4 Invariant masses of pairs of gammas, as measured by the EMC, from Pi0 decay produce a mass peak at 135 MeV (the peak in the plot). All other combinations are spread randomly around all energies (background). There were 81,700,000 events in the dataset and it took 4 days to run in production, with 26 jobs in parallel: to run it in one single computer would take more than 3 months.

20 Metadata Meeting - Grenoble 2005 James Werner jamwer2000@hotmail.com Summary Easygrid is working and provides all job submission structure using LCG grid, RLS and metadata management. Provides handlers management transparent to the user. Easy to use !!! Configurable to achieve user’s requirements and maybe for other experiments as well. See homepage http://www.hep.man.ac.uk/u/jamwer/ for more details.http://www.hep.man.ac.uk/u/jamwer/ Thanks for the opportunity!


Download ppt "Implementing Metadata Using RLS/LCG James Cunha Werner University of Manchester"

Similar presentations


Ads by Google