INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org Planck Simulations Status of the Application C. Vuerli, G. Taffoni, A. Barisani, A. Zacchei,

Slides:



Advertisements
Similar presentations
INAF experience in Grid projects C. Vuerli, G. Taffoni, V. Manna, A. Barisani, F. Pasian INAF – Trieste.
Advertisements

A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
INAF experience in Grid projects F. Pasian INAF. Wed 17 May GRID.IT Project The GRID.IT Project The GRID.IT Project –Application 1 Accessing Databases.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.
Nicholas LoulloudesMarch 3 rd, 2009 g-Eclipse Testing and Benchmarking Grid Infrastructures using the g-Eclipse Framework Nicholas Loulloudes On behalf.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
Miguel Branco CERN/University of Southampton Enabling provenance on large-scale e-Science applications.
D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.
RISICO on the GRID architecture First implementation Mirko D'Andrea, Stefano Dal Pra.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: GridKA School 2009 MPI on Grids 1 MPI On Grids September 3 rd, GridKA School 2009.
FP6−2004−Infrastructures−6-SSA IPv6 and Grid Middleware: the EUChinaGRID experience Gabriella Paolini – GARR Valentino.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Lessons learnt from the EGEE Application Porting Support activity Gergely Sipos Coordinator.
Enabling Grids for E-sciencE ENEA and the EGEE project gLite and interoperability Andrea Santoro, Carlo Sciò Enea Frascati, 22 November.
INFSO-RI Enabling Grids for E-sciencE Project Gridification: the UNOSAT experience Patricia Méndez Lorenzo CERN (IT-PSS/ED) CERN,
1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
INFSO-RI Enabling Grids for E-sciencE Workload Management System Mike Mineter
1 DIRAC – LHCb MC production system A.Tsaregorodtsev, CPPM, Marseille For the LHCb Data Management team CHEP, La Jolla 25 March 2003.
LCG Middleware Testing in 2005 and Future Plans E.Slabospitskaya, IHEP, Russia CERN-Russia Joint Working Group on LHC Computing March, 6, 2006.
US Planck Data Analysis Review 1 Christopher CantalupoUS Planck Data Analysis Review 9–10 May 2006 CTP Working Group Presented by Christopher Cantalupo.
F. Fassi, S. Cabrera, R. Vives, S. González de la Hoz, Á. Fernández, J. Sánchez, L. March, J. Salt, A. Lamas IFIC-CSIC-UV, Valencia, Spain Third EELA conference,
The european ITM Task Force data structure F. Imbeaux.
Resource Brokering in the PROGRESS Project Juliusz Pukacki Grid Resource Management Workshop, October 2003.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
Author - Title- Date - n° 1 Partner Logo WP5 Summary Paris John Gordon WP5 6th March 2002.
Cracow Grid Workshop October 2009 Dipl.-Ing. (M.Sc.) Marcus Hilbrich Center for Information Services and High Performance.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Provenance Challenge gLite Job Provenance.
Enabling Grids for E-sciencE EGEE-III INFSO-RI Using DIANE for astrophysics applications Ladislav Hluchy, Viet Tran Institute of Informatics Slovak.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Next steps with EGEE EGEE training community.
CERN Using the SAM framework for the CMS specific tests Andrea Sciabà System Analysis WG Meeting 15 November, 2007.
CEOS WGISS-21 CNES GRID related R&D activities Anne JEAN-ANTOINE PICCOLO CEOS WGISS-21 – Budapest – 2006, 8-12 May.
Enabling Grids for E-sciencE Astronomical data processing workflows on a service-oriented Grid architecture Valeria Manna INAF - SI The.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Design of an Expert System for Enhancing.
FRANEC and BaSTI grid integration Massimo Sponza INAF - Osservatorio Astronomico di Trieste.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks MSG - A messaging system for efficient and.
Planck Report on the status of the mission Carlo Baccigalupi, SISSA.
INFSO-RI Enabling Grids for E-sciencE Charon Extension Layer. Modular environment for Grid jobs and applications management Jan.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid2Win : gLite for Microsoft Windows Roberto.
INFSO-RI Enabling Grids for E-sciencE CRAB: a tool for CMS distributed analysis in grid environment Federica Fanzago INFN PADOVA.
INFSO-RI Enabling Grids for E-sciencE EGEE is a project funded by the European Union under contract IST Job sandboxes.
Università di Perugia Enabling Grids for E-sciencE Status of and requirements for Computational Chemistry NA4 – SA1 Meeting – 6 th April.
INFSO-RI Enabling Grids for E-sciencE Using of GANGA interface for Athena applications A. Zalite / PNPI.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Regional Nagios Emir Imamagic /SRCE EGEE’09,
INFSO-RI Enabling Grids for E-sciencE Introduction to Grid Computing, EGEE and Bulgarian Grid Initiatives, Sofia, South.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Practical using WMProxy advanced job submission.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Software Licensing in the EGEE Grid infrastructure.
Alien and GSI Marian Ivanov. Outlook GSI experience Alien experience Proposals for further improvement.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Overview of gLite, the EGEE middleware Mike Mineter Training Outreach Education National.
Pedro Andrade > IT-GD > D4Science Pedro Andrade CERN European Organization for Nuclear Research GD Group Meeting 27 October 2007 CERN (Switzerland)
Enabling Grids for E-sciencE LRMN ThIS on the Grid Sorina CAMARASU.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Constraints on primordial non-Gaussianity.
1 Tutorial Outline 30’ From Content Management Systems to VREs 50’ Creating a VRE 80 Using a VRE 20’ Conclusions.
INFSO-RI Enabling Grids for E-sciencE EGEE is a project funded by the European Union under contract IST Report from.
Status and requirements of PLANCK
U.S. ATLAS Grid Production Experience
CMS High Level Trigger Configuration Management
INFN-GRID Workshop Bari, October, 26, 2004
Simulations and Data Reduction of the ESA Planck mission
Accounting at the T1/T2 Sites of the Italian Grid
Grid and Scientific applications
Simulation use cases for T2 in ALICE
Laura Bright David Maier Portland State University
Introduction to the SHIWA Simulation Platform EGI User Forum,
Presentation transcript:

INFSO-RI Enabling Grids for E-sciencE Planck Simulations Status of the Application C. Vuerli, G. Taffoni, A. Barisani, A. Zacchei, F. Pasian INAF – Information Systems Unit and OA Trieste Geneve, 1 March 2006

Enabling Grids for E-sciencE INFSO-RI Wednesday 1 March Outline Description of the Application and Scientific goals The Grid added value Experiences and results with using the EGEE infrastructure Future perspectives and issues for use of Grid Technology Summing up status

Enabling Grids for E-sciencE INFSO-RI Wednesday 1 March Description The Planck Mission Measure cosmic microwave background (CMB) –succeeds COBE, Boomerang & WMAP missions –aims at even higher resolution Timeline –launch August 2007 –start of observations 2008 –duration >1 year Characteristics –continuous data stream (TOD) –large datasets (a TOD of ~7 TB for the whole LFI mission) –changing calibration (parameters configuration) –high-performance computing for data analysis

Enabling Grids for E-sciencE INFSO-RI Wednesday 1 March Description COBE & Planck

Enabling Grids for E-sciencE INFSO-RI Wednesday 1 March Description Brief introduction Goal: make possible N simulations of the whole Planck/LFI mission (~14 months), each time with different cosmological and instrumental parameters Full sky maps production for frequencies GHz by means of two complete sky surveys Sensitivity of a few μK per pixel 0,3° in amplitude 22 channels for LFI, 48 for HFI Data volume produced at the end of the mission: ~2TB for LFI and ~15TB for HFI Computing requirements: ~100Tflops for raw data reduction, foregrounds extraction and CMB maps creation

Enabling Grids for E-sciencE INFSO-RI Wednesday 1 March Description The Level-S Purpose of the Level-S: –ground checks (pre-launch phases); –DPCs Pipelines tuning; –control check & corrections (operational phase). Pipeline chained but not parallel (43 executables and a few libraries) Languages used are C/C++/Fortran/F90; Shell/Perl for scripts; Typical application that benefits by distributed computing techniques Porting of Monte Carlo simulation code by Sam Leach Planck simulation is a set of 70 instances of the Pipeline (22 for LFI and 48 for HFI)

Enabling Grids for E-sciencE INFSO-RI Wednesday 1 March Description The Level-S CMB Power Spectrum (cmbfast) CMB maps Data analysis CMB Map (synfast) Foregrounds and Beam Patterns Instrumental Noise Scanning Strategy TOD Mission simulation

Enabling Grids for E-sciencE INFSO-RI Wednesday 1 March Description Our application in summary Is it parallel? NO it runs concurrently. Do we need MPI/parallel? Yes. In later phase for data analysis (16/32 CPUs in the site). Do we produce data? YES, we have an intensive data production. Can be more than 1 PB. How long does it run? From 6h (short) up to 36h (long) Access to/exchange of data coming from other experiments (IVOA, MAGIC)

Enabling Grids for E-sciencE INFSO-RI Wednesday 1 March Grid added values CPUs and Data CPU power: –E-computing lab; –Production burst; –Efficient CPU usage/sharing. Data storing/sharing: –Distributed data for distributed users; –Replica and security; –Common interface to software and data. Planck simulations are highly computing demanding and produce a huge amount of data. Such resources cannot be usually afforded by a single research institute, both in terms of computing power and data storage space.

Enabling Grids for E-sciencE INFSO-RI Wednesday 1 March Grid added values Qualitative view Native collaboration tool; Common Interface to the users; Flexible environment; New approach to data and S/W sharing; Collaborative work for simulations and reduction: –less time, less space, less frustration…. VObs view: –Sharing data over a shared environment; Native authentication/authorization mechanism; A federation of users within a VO fosters the scientific collaboration; Collaborative work between different applications.

Enabling Grids for E-sciencE INFSO-RI Wednesday 1 March Experiences and Results First Tests First tests performed on a workstation aimed at identifying the computational and storage needs of the simulation SW in detail LFI 30GHzLFI 44GHzLFI 70GHz shortlongshortlongshortlong 12 m389 m13 m623 m17 m834 m 0.9 GB34.2 GB1.2 GB45.3 GB1.7 GB75 GB Total longshort h1.3 TB5.5 h31 GB Computational time on a dual CPU 2.4 GHz workstation with 2 GB of RAM for the whole simulation of the LFI mission [4 radiometers at 30 GHz, 6 at 44 GHz and 12 a 70 GHz]

Enabling Grids for E-sciencE INFSO-RI Wednesday 1 March Experiences and Results Grid Environment To allow users to run Planck simulation SW we need to create some specific services on Grid general environment; They must be used to run both one or more SDPs (Single Detector Pipeline) or the whole MSJ (Mission Simulation Job); They are modular and easy to integrate with new pipeline stages when some upgrade is needed (this is necessary if we want to allow users to develop new codes).

Enabling Grids for E-sciencE INFSO-RI Wednesday 1 March Experiences and Results Grid Environment Step 1: Deployment of the simulations code on the Grid as RPMs. In this first test however we used the Replica Manager to copy and register the SW. Step 2: Creation of an application specific environment on top of the UI: a set of Perl scripts are available allowing a user to configure a pipeline and submit it to the Grid. Step 3: Implementation of a metadata description to identify the cosmological and instrumental parameters and to associate them to the GUID of a complex output file (TODs, maps, noise contributions etc.)  Important for post processing analysis.

Enabling Grids for E-sciencE INFSO-RI Wednesday 1 March Experiences and Results Grid Environment

Enabling Grids for E-sciencE INFSO-RI Wednesday 1 March Experiences and Results Simulations description A number of test simulations of the MSJ run with the same parameters used on the dual-CPU WS; Initially selected only the Grid sites equipped with Xeon WNs at CPU speed of ~2400 and sites with at least 1 free CPU; Run sets of MSJ with different degree of parallelization; Tests repeated 30 times under different load conditions of the Grid to verify the stability of both the submission tools and of the Grid environment. We noticed that RB usually assigns each SDP to a different site, so the MSJ runs on a truly distributed environment. However, a few times the jobs were assigned to the same site but to different WNs. As expected, no significant benefit or decay in performance was noticed in those cases. Also different Grid load did not change significantly the results.

Enabling Grids for E-sciencE INFSO-RI Wednesday 1 March Experiences and Results Simulations description Set of tests involving the whole computing power and data storage available to our VO (~5000 CPUs of different kind). Submission of 100 concurrent MSJs from different UIs with the only requirement of finding enough free disk space to save the output. Tests span two years starting from summer 2004 and using different versions of the MW (2.2, 2.4 and 2.6) and within different VOs The whole test lasts for ~3 days and was repeated different times under different load conditions of the Grid with no significant change in the results Long simulations could require to modify CFTSIO to allow I/O directly on Grid through GridFTP

Enabling Grids for E-sciencE INFSO-RI Wednesday 1 March Experiences and Results Simulations workflow/1

Enabling Grids for E-sciencE INFSO-RI Wednesday 1 March Experiences and Results Simulations workflow/2

Enabling Grids for E-sciencE INFSO-RI Wednesday 1 March Experiences and Results Scalability on Grid

Enabling Grids for E-sciencE INFSO-RI Wednesday 1 March Experiences and Results Post-processing example To verify the possibility of using the stored TODs for some kind of post-processing we applied the destriping algorithm on the TODs produced during the short runs. –Metadata are used to locate the files and to identify their GUIDs. –The configuration/submission Planck tools are modified to create a JDL with the input-file option that points to the TOD GUID. –The input-file option is used by the RB to force the job to run in the Grid site where the input data are stored. This optimizes the data transfer which is in this way restricted to the site LAN. –The Grid "configurator" is modified to download any input data file specified in the input-file option before running the pipeline.

Enabling Grids for E-sciencE INFSO-RI Wednesday 1 March Experiences and Results Post-processing example The “destriping” procedure runs for ~20 minutes for 30 GHz channel up to ~40 minutes for the 70 GHz channel on a dual AMD workstation. On Grid the run time for a simulations set of 22 radiometer is ~55 minutes with a gain of a factor of 10 in performances compared with times required on the workstation.

Enabling Grids for E-sciencE INFSO-RI Wednesday 1 March observed sky after de-striping map of de-striping residuals param maps maps, TOD param Node 1 Node k User Node CE WN... Experiences and Results Post-processing example WorkstationGridGain short330 m25 m13 long15342 m955 m16 Dual CPU WS 2,4 GHz with 2 GB di RAM vs. Grid

Enabling Grids for E-sciencE INFSO-RI Wednesday 1 March Future Perspectives and Issues ProC and G-DSE ProC is a scientific workflow engine developed in the framework of IDIS (Integrated Data and Information System) collaboration  It executes “pipelines” of modules Workflows, directed “acyclic” graphs  It allows the assembly of Pipelines using building block modules Modules may be heterogeneous (FORTRAN, C, C++, Java, Python, GDL/IDL,...); also sub-pipelines  It is a data-driven, forward-chaining system –It has components for...  graphical editing of workflow layouts  checking for consistency & completeness  Execution The G-DSE makes of databases a new embedded resource of the Grid. It enables a new Grid QE to access databases and use them for data analysis (see presentation by G. Taffoni on Thu March 2 nd, 2006, at 2.00 PM “Data Access on the Grid” session)

Enabling Grids for E-sciencE INFSO-RI Wednesday 1 March Future Perspectives and Issues Planck VO Evolution Users who joined the VO members; UI in each site; Quantum-Grid in each site; Regional AreaCurrent statusFuture status R.A.: Italy15 CPUS 300 GB CPUS 1 TB (total). All INAF sites are now in the startup phase R.A.: Spain30 CPUS 240 GBmore R.A.: Francenone6 CPUS 360 GB (total) R.A.: UKnone2 CPUS 240 GB (total) R.A.: Germanynone2 CPUS 240 GB (total) R.A.: The Netherlandsnone2 CPUS 240 GB (total) VO Planck may currently rely on ~5000 available CPUs

Enabling Grids for E-sciencE INFSO-RI Wednesday 1 March Future Perspectives and Issues Open issues Technical and administrative management of the VO  OK Basic gridification of the Application  OK Main issues met in 2005 –Slow startup process of Planck VO  Slow start up of interactions between Planck VO site managers and national ROCs  Some technical initial problems (e.g. VOMS)  The management of the VO has proved to be more complex with respect our expectations  Heterogeneous VO –Some problem on WN environment –Metadata = DSE (work in progress) –Grid-FS complicated and not user-friendly –Debugging

Enabling Grids for E-sciencE INFSO-RI Wednesday 1 March Future Perspectives and Issues Effects/corrective actions …therefore: –Still missing shared resources within the VO –The gridification of Planck pipelines has still to be completed –Until now extensive tests involving all nodes of the VO were not possible Corrective action –On-site (at VO sites) meetings and training events (involving Planck and INAF VOs) addressed to site managers and users and scheduled for the next months for a:  Fast startup of VO nodes with new shared resources available;  Gridification of new pipelines;  Extensive tests within the VO. Future strictly dependent from a number of factors: gLite (!?!), support (?), EGEE-2 (?)

Enabling Grids for E-sciencE INFSO-RI Wednesday 1 March Summing up status VO Setup: Management; Technical Management; VO manager; Site managers; RLS; VOMS; RB  Planck users cert;  Planck sites setup; EGEE site support. Application Setup: Basic Gridification; First tests; INFN production Grid; Benchmarks;  Extended gridification;  Data&metadata (GDSE!!!);  ProC & DMC gridification;  Tests:  Runs;  Data;

Enabling Grids for E-sciencE INFSO-RI Wednesday 1 March End of Presentation Thank you for your attention