Presentation on theme: "CMS Applications Towards Requirements for Data Processing and Analysis on the Open Science Grid Greg Graham FNAL CD/CMS for OSG Deployment 16-Dec-2004."— Presentation transcript:
CMS Applications Towards Requirements for Data Processing and Analysis on the Open Science Grid Greg Graham FNAL CD/CMS for OSG Deployment 16-Dec-2004
12/16/04Greg Graham FNAL CD/CMS2 CMS Applications - General Requirements Access to large datasets at a few “central” sites Access to small datasets at many distributed sites Ability to move large datasets between sites Ability to create jobs to run against these datasets Ability to submit jobs and track progress Ability to control/restrict access to sites/resources Ability to lookup information about datasets and jobs
12/16/04Greg Graham FNAL CD/CMS3 Specific Application Examples CMS Distributed Processing Environment (DPE) –VDT/Grid2003 based software package to provide CMS specific software on top of Grid software. Monte Carlo Production –MCRunjob (CMS Tool) to create jobs, MOP (PPDG) to submit jobs using Condor-G, ConfMon to provide site parameters for MOP. Large Scale Data Transfer –srmcp to transfer results of production from one site(transient) to another (permanent) –Phedex to transfer data with metadata (GridFTP)
12/16/04Greg Graham FNAL CD/CMS4 Monte Carlo Processing Service A Clarens based system for generating, processing, and analyzing Monte Carlo data. –Runjob, MOP, DAR software repository, and MOPDb deployed behind Clarens Web services –SC2004 demo: point and click MC generation and analysis (Root tuples also served by Clarens) Currently deployed on top of DPE; and it requires Clarens. Status: Deployed now, need to groom CMS users –Needed: a parameter service to accept and store arbitrary job configuration parameters AND a context service
12/16/04Greg Graham FNAL CD/CMS5 CMS History with Grid Using Condor-G/Globus based technology to do real CMS MC production since –Shook out bugs and performance issues, used MOP Using Grid2003 technology to do real production since –Stakeholder in security (SAZ), registration (VOMRS), and data transfer protocols We plan to migrate to an OSG product based on the current Grid2003 –Must meet requirements and we are working to discover those –In the meantime, we assume it will work like it does currently for DPE running on top of Grid2003 cache
12/16/04Greg Graham FNAL CD/CMS6 Current CMS Deployment Activities for OSG Within the DPE scope: –Moving to VDT to be consistent with Grid3-dev –Testing latest versions of SRM We are able to run MOP production with older versions of srm. Craig Prescott is investigating later versions with Timur. 12/13/04 OSGD milestone. –MCPS rollout on OSG 3/1/05 OSGD Milestone –Testing Condor-C and providing feedback to the Condor team; and also testing VDT 1.3 No milestone listed in the OSG deployment doc Keeping up will help us be ready for OSG “turn on”
12/16/04Greg Graham FNAL CD/CMS7 Conclusion The requirements for CMS applications running on OSG can be gleaned from looking at the current requirements for running on Grid2003. The requirements laid out here should be concretized in two documents –CMS Requirements for OSG Deployment To track the current requirements –Impact of OSG Deployment on CMS Software To track evolution of the requirements CMS has a lot of experience running on the Grid –Procedures are in place to deal with an evolving middleware environment.
Summary of Known Requirements for OSG Deployment from CMS
12/16/04Greg Graham FNAL CD/CMS9 Infrastructure: MC Production Support for MOP style job submission –Condor-G/Globus from VDT 1.14 or better But we are exploring use of Condor-C –Information service ConfMon MDS based hack of the Glue Schema to tell MOP where to find software remotely, where to deposit output files. But we are (hopefully) moving to GridCat –Space to drop in CMS application software and hold the output temporarily –Servers to move the data off of the remote site. srmcp is preferred, GridFTP is default right now.
12/16/04Greg Graham FNAL CD/CMS10 Infrastructure: Data Access The requirements are less well known at the moment Directory based lookup of data products –Since this is CMS based data, we would expect that CMS clients would be used to do lookup. Are there any common lookup operations? Then CLARENS may be required on the client side. Data movement from “large” central sites to/from “small” sites –srmcp, GridFTP clients are required. Data movement between all sites and “push” from large sites –srmcp and GridFTP servers are required.
12/16/04Greg Graham FNAL CD/CMS11 Infrastructure: Common Security: –Middleware needs to have strong authentication Kerberos tickets or equivalent VO authentication –Middleware needs to support the callouts or other mechanisms used by SAZ database and GUMS We are now dependent upon gridmapfiles, but I am not sure if this is required –Participating sites need to support the interfaces and provide information needed by VOMRS. Required to submit jobs to Fermilab, maybe not required to accept jobs from Fermilab ;-)
12/16/04Greg Graham FNAL CD/CMS12 Infrastructure: Common Information services –Real-time information about running jobs and resource usage –Historical information and accounting (soft requirement) –Remote viewing of selected logfiles would also be useful (soft requirement - satisfied by operations staff?) Catalog services –CMS will initially come in with its own file and metadata catalogs. In the future we may rely on Globus RLS for file replicas. Open question if common cataloging services would be useful. Service Discovery –CMS will initially come in with its own service discovery method (ie- the null one;-). In the future, we may rely on CLARENS based services.
12/16/04Greg Graham FNAL CD/CMS13 LCG Interoperability We currently have a job creation and submission tool that can submit to either LCG or Grid2003 resources. Interoperability at a lower level may also be required to satisfy simultaneously the needs of the CMS collaboration and the institutional needs of Fermilab. –This is currently under development and we are very interested in the results.