Presentation is loading. Please wait.

Presentation is loading. Please wait.

ARDA Prototypes Julia Andreeva/CERN On behalf of the ARDA team CERN.

Similar presentations


Presentation on theme: "ARDA Prototypes Julia Andreeva/CERN On behalf of the ARDA team CERN."— Presentation transcript:

1 ARDA Prototypes Julia Andreeva/CERN On behalf of the ARDA team CERN

2 LHCC Comprehensive Review 22.11.2004 Julia Andreeva, CERN2 Overview Main directions of the ARDA activities Experience with the gLite middleware ARDA prototypes, status and plans Conclusions

3 LHCC Comprehensive Review 22.11.2004 Julia Andreeva, CERN3 ARDA and HEP experiments EGEE middleware (gLite) LHCb Ganga,Dirac, Gaudi, DaVinci… Alice ROOT,AliRoot, Proof… CMS Cobra,Orca, OCTOPUS… Atlas Dial,Ganga, Athena, Don Quijote ARDA LCG2 ARDA is an LCG project whose main task is to enable LHC analysis on the GRID

4 LHCC Comprehensive Review 22.11.2004 Julia Andreeva, CERN4 Middleware Prototype Available for us since May 18 th –In the first month, many problems connected with the stability of the service and procedures –At that point just a few worker nodes available –Most important services are available: file catalog, authentication module, job queue, meta-data catalog, package manager, Grid access service –A second site (Madison) available since end of June –CASTOR access to the actual data store Currently 34 worker nodes are available at CERN 10 nodes (RH7.3, PBS) 20 nodes (low end, SLC, LSF) 4 nodes (high end, SLC, LSF) 1 node is available in Wisconsin Number of CPUs will increase Number of sites will increase

5 LHCC Comprehensive Review 22.11.2004 Julia Andreeva, CERN5 Authentication and authorization gLite uses Globus 2.4 Grid-Certificates(X.509) to authenticate + authorize, session not encrypted VOMS is used for VO Management Unfortunately, till now getting access to gLite for a new user is often painful due to registration problems. It takes minimum one day, but can take up to two weeks!

6 LHCC Comprehensive Review 22.11.2004 Julia Andreeva, CERN6 Accessing gLite Access through gLite shell -User-friendly Shell implemented in Perl -Shell provides a set of Unix-like commands and a set of gLite specific commands Perl API - no API to compile against, but Perl-API sufficient for tests, though it is poorly documented

7 LHCC Comprehensive Review 22.11.2004 Julia Andreeva, CERN7 Workload Management System ARDA has been evaluating two WMSs WMS derived from Alien – Task Queue (available since April) –pull model –integrated with gLite shell, file catalog and package manager WMS derived from EDG (available since middle of October) –currently push model (pull model not yet possible but foreseen) –not yet integrated with other gLite components (file catalogue, package manager, gLite shell)

8 LHCC Comprehensive Review 22.11.2004 Julia Andreeva, CERN8 WMS observations Integration WMS with File Catalog very useful Definition of input and output data Specification of input data by file name and metadata queries Jobs splitting driven by data in file catalog Integration of WMS with Package Management very useful Service character of Package Management provides on demand installation Full access to debugging information very important Stdout/stderr of executing jobs System information Lightweight deployment of client interface Client has to be easy installable Client should work behind Firewalls and NAT routers oWorker nodes should be shared between deployed WMSs As long as several WMS are deployed - their usage has to be transparent for the user (same JDL syntax, worker nodes should be accessible through both systems, they should provide the same functionality and need to be integrated with other gLite services)

9 LHCC Comprehensive Review 22.11.2004 Julia Andreeva, CERN9 Job submission Steps required for submitting of the user job to gLite: -Register executable in the user bin directory -Create JDL file where executable, required packages, input and output files, possibly some additional requirements are defined -Run submit command providing JDL file as an input Straight forward, did not experience any problems (but system stability) Advanced features for job-submission tested by ARDA Job splitting implemented by gLite is based on the gLite file catalogue LFN hierarchy This functionality is widely used in the ARDA prototypes Different job-splitting policy (on the file, directory, SE level) can be chosen by the user (file catalog snapshot from CMS and LHCb used) An additional advantage is using of only one master job ID for tracing of the processing of all sub-jobs belonging to the same master job. Output files of all sub-jobs are collected in the master job “proc” directory.

10 LHCC Comprehensive Review 22.11.2004 Julia Andreeva, CERN10 Job Submission: Stability Job queues monitored at CERN every hour: 80% Success rate (Jobs don't do anything real) In recent weeks general instability was observed, testbed support can not be a responsibility of almost a single person

11 LHCC Comprehensive Review 22.11.2004 Julia Andreeva, CERN11 Data Management ARDA has been evaluating two DMS gLite File Catalog (derived from Alien) (deployed in April) –Allowed to access experiments data from CERN CASTOR and – with low efficiency– from the Wiscosin installation –Mainly using RFIO –LFN name space is organized as a very intuitive hierarchical structure –MySQL backend Fireman File Catalogue (deployed in November) –Just delivered to us –gliteIO –Oracle backend

12 LHCC Comprehensive Review 22.11.2004 Julia Andreeva, CERN12 File catalogue performance tests Good performance due to streaming Find matching 2500 entries in 10000 entry directory: 80 concurrent queries 0.35 s/query 2.6s startup time Fireman performance tests are currently ongoing gLite catalogue performs well

13 LHCC Comprehensive Review 22.11.2004 Julia Andreeva, CERN13 gLiteIO We started to study gLiteIO (as soon as it became available to us) –ARDA contributed to gLiteIO development (support of AIOD integration) Some aspects requirement: –gLiteIO has to be rock solid! High performance! Graceful error recovery! No data corruption even under high load and high concurrency!!! Tests are currently ongoing

14 LHCC Comprehensive Review 22.11.2004 Julia Andreeva, CERN14 Package management Multiple approaches exist for handling of the experiment software and user private packages on the Grid. Two extremes: - “Static”: Pre-installation of the experiment software is implemented by a site manager with further publishing of the installed software, installation resides “forever” in the shared area, can be removed only by a site manager. Job can run only on a site where required package is preinstalled. - “Dynamic”: Installation is done on demand at the worker node before job assigned to a given node starts execution. Installation can be removed as soon as job execution is over. Current gLite package management implementation can handle “Light-weight” installations, close to the second approach. gLite package manager was tested by ARDA team for this kind of installations. Clearly more work has to be done to satisfy different use cases

15 LHCC Comprehensive Review 22.11.2004 Julia Andreeva, CERN15 gLite related ARDA activities: Metadata Modern file systems have metadata attached to the file/directory gLite has provided a prototype interface and implementation mainly for the Biomed community The gLite file catalog has some metadata functionality (tested by ARDA) –Information containing file properties (file metadata attributes) can be defined in a tag attached to a directory in the file catalog. Any arbitrary number of tag tables can be attached to the corresponding directory table. –Access to the metadata attributes is via gLite shell or Perl API –Knowledge of schema is required –No schema evolution Can these limitations be overcome?

16 LHCC Comprehensive Review 22.11.2004 Julia Andreeva, CERN16 gLite related ARDA activities Metadata studies ARDA preparatory work –Stress testing of the existing experiment metadata catalogues was performed –Existing implementations showed to share similar problems ARDA technology investigation –On the other hand usage of extended file attributes in modern systems (NTFS, NFS, EXT2/3 SCL3,ReiserFS,JFS,XFS) was analyzed: a sound POSIX standard exists! –Presentation in LCG-GAG and discussion with gLite –As a result of metadata studies a prototype for metadata catalogue was developed

17 LHCC Comprehensive Review 22.11.2004 Julia Andreeva, CERN17 Metadata prototype performance tests Comparing performance of the metadata catalogue prototype and gLite catalogue. Tested operations: -query catalogue by meta attributes -attaching meta attributes to the files

18 LHCC Comprehensive Review 22.11.2004 Julia Andreeva, CERN18 Prototypes overview LHC Experiment Main focus Basic prototype component Experiment analysis application framework Middleware prototype GUI to Grid GANGA DaVincigLite Interactive analysis PROOF ROOT AliROOTgLite High level service DIAL AthenagLite Use of maximum native gLite functionality Aligned with the APROM activity ORCAgLite

19 LHCC Comprehensive Review 22.11.2004 Julia Andreeva, CERN19 LHCb Basic component of the prototype defined by the experiment : GANGA - Gaudi/Athena aNd Grid Alliance GANGA GUI JobOptions Algorithms Collective & Resource Grid Services Submitting jobs Monitoring Retrieving results Framework for job creating- submitting-monitoring GAUDI Program Experiment Book-keeping DB ARDA contributions : –GANGA Release management and software process CVS, Savannah,… –GANGA Participating in the development driven by the GANGA team –GANGA-gLite Integrating of GANGA with gLite Enabling job submission through GANGA to gLite Job splitting and merging Retrieving results –GANGA-gLite-DaVinci Enabling real analysis jobs (DaVinci) to run on gLite using GANGA framework Running DaVinci jobs on gLite Installing and managing LHCb software on gLite using gLite package manager

20 LHCC Comprehensive Review 22.11.2004 Julia Andreeva, CERN20 LHCb Current Status GANGA job submission handler for gLite has been developed DaVinci job running on gLite submitted through GANGA Submission of user jobs is working Command line interface (CLI) prototype for GANGA has been developed Can submit jobs using the gLite job-splitter Demonstration of the LHCb end- to-end analysis prototype was made at the 19 th LHCb Software Week (two weeks ago)

21 LHCC Comprehensive Review 22.11.2004 Julia Andreeva, CERN21 LHCb Related activities : –GANGA-DIRAC (LHCb production system) Convergence with GANGA/components/experience Submitting jobs to DIRAC using GANGA –GANGA-Condor Enabling submission of jobs through GANGA to Condor –Metadata catalog (Bookkeeping) Performance tests Collaboration going on Interest for our prototype Short term plans –Involve people from LHCb physics community (limited number) in testing for getting feed back from the user side One person (PhD student) already involved –Integrating LHCb software releases with the gLite package manager

22 LHCC Comprehensive Review 22.11.2004 Julia Andreeva, CERN22 USER SESSION PROOF SLAVES PROOF PROOF MASTER SERVER PROOF SLAVES Site A Site C Site B The ALICE/ARDA is evolving the ALICE analysis system ALICE Basic components of the prototype defined by the experiment : ROOT and PROOF Analysis approach: – ALICE experiment provides the UI and the analysis application (AliROOT) – GRID middleware gLite provides all the rest

23 LHCC Comprehensive Review 22.11.2004 Julia Andreeva, CERN23 ALICE The interactive analysis session was presented at the Super Computing 2004

24 LHCC Comprehensive Review 22.11.2004 Julia Andreeva, CERN24 gLite related activities C/C++ API Lack of C/C++ API represents a problem for experiment prototypes: C++ access library for gLite and C library for Posix like IO is developed by ARDA Idea: Create an interface sending text- commands to server: –UUEncode Strings –Send Strings via gSOAP –Authentication via GSI (Globus TK3) –Encrypt with SSL (cache credential on the service level (provide a stateful authenticated channel) High performance increase compared to SOAP calls with structures (multithreaded server with cached authentication) Protocol quite proprietary... Essential for the ALICE prototype (but generic enough to be interesting for anybody)

25 LHCC Comprehensive Review 22.11.2004 Julia Andreeva, CERN25 Current status Developed gLite C++ API and API Service (providing generic interface to any GRID service) C++ API is integrated into ROOT (will be added to the next ROOT release). As a result job submission and job status query for batch analysis can be done from inside ROOT. Bash interface for gLite commands with catalogue expansion is developed First version of the interactive analysis prototype is ready Batch analysis model is improved - submission and status query are integrated into ROOT - job splitting based on XML query files - application (Aliroot) reads file using xrootd without prestaging Short term plans Create generic API service accessible to all Alice users for batch analysis using bash CLI for the Alice data challenge phase III Make interactive prototype available to Alice users Create default XML datasets and default JDLs for analysis ALICE

26 LHCC Comprehensive Review 22.11.2004 Julia Andreeva, CERN26 ATLAS Basic component of the prototype DIAL- Distributed Analysis of Large datasets ARDA contribution: Integrating DIAL with gLite (main starategic line in the ATLAS distributed analysis) Enabling Atlas analysis jobs (Athena application) submitted through DIAL to run on gLite Integrate gLite with Atlas data management based on Don Quijote Tests on AMI (Metadata catalogue) Contribution to the combined test beam Improvements to AtCom, GUI for job definition (AMI), submission and monitoring

27 LHCC Comprehensive Review 22.11.2004 Julia Andreeva, CERN27 ATLAS SE Nordugrid DQ server RLS DQ server DQ Client Don Quijote and gLite DQ server RLS SE RLS gLiteLCG GRID3 Current status : DIAL server has been adapted to CERN environment and installed at CERN First implementation of gLite scheduler for DIAL available Still depending on a shared file system for inter-job communication ATHENA jobs submitted through DIAL are run on gLite middleware Integration of gLite with Atlas file management based on Don Quijote is in progress, first prototype is ready Realistic ATHENA jobs executed on the gLite prototype by non- ARDA users (physicists). See next transparency. Future plans : Evolve ATLAS prototype to work directly with glite middleware: Authentication and seamless data access

28 LHCC Comprehensive Review 22.11.2004 Julia Andreeva, CERN28 ATLAS Combined Test Beam Example: ATLAS TRT data analysis done by PNPI St Petersburg Number of straw hits per layer Real data processed at gLite Standard RecExTB Data from CASTOR Processed on gLite worker node

29 LHCC Comprehensive Review 22.11.2004 Julia Andreeva, CERN29 Ongoing development of the first end-to-end prototype for enabling CMS analysis jobs on gLite Main strategy is to use as much of native middleware functionality as gLite can provide and only in case of very CMS specific tasks develop something on top of existing middleware CMS RefDB PubDB Workflow planner with gLite back-end and command line UI gLite Dataset and owner name defining CMS data collection Points to the corresponding PubDB where POOL catalog for a given data collection is published POOL catalog and a set of COBRA META files Register required info in gLite catalog Creates and submits jobs to gLite, Queries their status Retrieves output

30 LHCC Comprehensive Review 22.11.2004 Julia Andreeva, CERN30 CMS - Using MonAlisa for user job monitoring Demonstrated at Super Computing 2004

31 LHCC Comprehensive Review 22.11.2004 Julia Andreeva, CERN31 CMS PhySh (Physicists’ Shell) should provide an entry point for the CMS analysis and handling of physics data. The idea behind PhySh is to combine information from different CMS DBs (RefDB, PubDB, SCRAM, PHEDEX) into a single virtual file-system and to provide a file-handling-like interface to the user. Related activities Develop a job submission service for PhySh Integrate PhySh with gLite middleware components like file and metadata catalogues Data management task is vital for CMS The evolution of PubDB from the experience of RefDB is of high interest for ARDA because it provides effective access to the data not only for a production system but for individual users Participating in the development of PubDB (Publication DB) distributed data bases for publishing information about available data collection CMS-wide Participating in the redesign of RefDB (Reference DB), CMS meta data catalog and production book- keeping data base

32 LHCC Comprehensive Review 22.11.2004 Julia Andreeva, CERN32 Current status: ORCA analysis jobs (real user code) generated by CMS end-to-end prototype using gLite job-splitting functionality and instrumented for MonAlisa monitoring successfully ran on the gLite testbed Work focused to enable merging of the output files produced by the child sub- jobs belonging to the same parent master job is under way Future plans: Give a demonstration of the first working version of the CMS prototype at the next CMS week in the beginning of December Involve CMS users (limited number) for testing of the first version of the prototype Use of the new version of gLite package manager (as soon as it is available) for handling of the “heavy” CMS software distributions on gLite Depending on CMS decision either evolve this prototype according to the users feed back, or integrate it with the tool(s) which CMS would choose for ARDA prototype CMS

33 LHCC Comprehensive Review 22.11.2004 Julia Andreeva, CERN33 Conclusions and outlook ARDA uses all components made available on the gLite prototype –Experience and feedback First version of analysis systems are being demonstrated –We look forward to have users!

34 LHCC Comprehensive Review 22.11.2004 Julia Andreeva, CERN34 BACKUP Transparencies Metadata catalogue prototype

35 LHCC Comprehensive Review 22.11.2004 Julia Andreeva, CERN35 BACKUP Transparencies ATLAS Basic component of the prototype DIAL- Distributed Analysis of Large datasets Dataset1Dataset2 Dataset Result1 Application CodeResult Task Result2 User analysis framework Scheduler Job1 Job2 Event data, summary data, tuples ROOT,JAS,SEAL Athena, dialpaw, ROOT Collects results Does splitting


Download ppt "ARDA Prototypes Julia Andreeva/CERN On behalf of the ARDA team CERN."

Similar presentations


Ads by Google