A Grid Approach to Geographically Distributed Data Analysis for Virgo F. Barone, M. de Rosa, R. De Rosa, R. Esposito, P. Mastroserio, L. Milano, F. Taurino,

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

DataTAG WP4 Meeting CNAF Jan 14, 2003 Interfacing AliEn and EDG 1/13 Stefano Bagnasco, INFN Torino Interfacing AliEn to EDG Stefano Bagnasco, INFN Torino.
EU 2nd Year Review – Jan – Title – n° 1 WP1 Speaker name (Speaker function and WP ) Presentation address e.g.
Workload management Owen Maroney, Imperial College London (with a little help from David Colling)
INFSO-RI Enabling Grids for E-sciencE Workload Management System and Job Description Language.
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
Adaptive Hough transform for the search of periodic sources P. Astone, S. Frasca, C. Palomba Universita` di Roma “La Sapienza” and INFN Roma Talk outline.
Job Submission The European DataGrid Project Team
A Computation Management Agent for Multi-Institutional Grids
INFSO-RI Enabling Grids for E-sciencE EGEE Middleware The Resource Broker EGEE project members.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Services Abderrahman El Kharrim
DataGrid is a project funded by the European Union 22 September 2003 – n° 1 EDG WP4 Fabric Management: Fabric Monitoring and Fault Tolerance
The DataGrid Project NIKHEF, Wetenschappelijke Jaarvergadering, 19 December 2002
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
A Grid Resource Broker Supporting Advance Reservations and Benchmark- Based Resource Selection Erik Elmroth and Johan Tordsson Reporter : S.Y.Chen.
DataGrid Kimmo Soikkeli Ilkka Sormunen. What is DataGrid? DataGrid is a project that aims to enable access to geographically distributed computing power.
Workload Management Massimo Sgaravatto INFN Padova.
Makrand Siddhabhatti Tata Institute of Fundamental Research Mumbai 17 Aug
Test Configuration for Control For the test configuration we used a VME based control system, constituted by a VME crate with a VMPC4a from Cetia (CPU.
The EDG Testbed Deployment Details The European DataGrid Project
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
A Lightweight Platform for Integration of Resource Limited Devices into Pervasive Grids Stavros Isaiadis and Vladimir Getov University of Westminster
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
F.Fanzago – INFN Padova ; S.Lacaprara – LNL; D.Spiga – Universita’ Perugia M.Corvo - CERN; N.DeFilippis - Universita' Bari; A.Fanfani – Universita’ Bologna;
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.
Computational grids and grids projects DSS,
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
Problemi e strategie relativi al calcolo off-line di VIRGO Laura Brocco Universita’ di Roma “La Sapienza” & INFN Roma1 for the VIRGO collaboration.
L ABORATÓRIO DE INSTRUMENTAÇÃO EM FÍSICA EXPERIMENTAL DE PARTÍCULAS Enabling Grids for E-sciencE Grid Computing: Running your Jobs around the World.
Grid Technologies  Slide text. What is Grid?  The World Wide Web provides seamless access to information that is stored in many millions of different.
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
INFSO-RI Enabling Grids for E-sciencE Workload Management System Mike Mineter
1 DIRAC – LHCb MC production system A.Tsaregorodtsev, CPPM, Marseille For the LHCb Data Management team CHEP, La Jolla 25 March 2003.
- Distributed Analysis (07may02 - USA Grid SW BNL) Distributed Processing Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software.
1 Spectral filtering for CW searches S. D’Antonio *, S. Frasca %&, C. Palomba & * INFN Roma2 % Universita’ di Roma “La Sapienza” & INFN Roma Abstract:
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
June 24-25, 2008 Regional Grid Training, University of Belgrade, Serbia Introduction to gLite gLite Basic Services Antun Balaž SCL, Institute of Physics.
TERENA 2003, May 21, Zagreb TERENA Networking Conference, 2003 MOBILE WORK ENVIRONMENT FOR GRID USERS. TESTBED Miroslaw Kupczyk Rafal.
CLRC and the European DataGrid Middleware Information and Monitoring Services The current information service is built on the hierarchical database OpenLDAP.
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
A GRID solution for Gravitational Waves Signal Analysis from Coalescing Binaries: preliminary algorithms and tests F. Acernese 1,2, F. Barone 2,3, R. De.
Glite. Architecture Applications have access both to Higher-level Grid Services and to Foundation Grid Middleware Higher-Level Grid Services are supposed.
High-Performance Computing Lab Overview: Job Submission in EDG & Globus November 2002 Wei Xing.
EGEE is a project funded by the European Union under contract IST WS-Based Advance Reservation and Co-allocation Architecture Proposal T.Ferrari,
Summary from WP 1 Parallel Section Massimo Sgaravatto INFN Padova.
Data Management The European DataGrid Project Team
Data Management The European DataGrid Project Team
S. Frasca INFN – Virgo and “La Sapienza” Rome University Baton Rouge, March 2007.
The DataGrid Project NIKHEF, Wetenschappelijke Jaarvergadering, 19 December 2002
Grid Activities in CMS Asad Samar (Caltech) PPDG meeting, Argonne July 13-14, 2000.
All-sky search for continuous gravitational waves: tests in a grid environment Cristiano Palomba INFN Roma1 Plan of the talk: Computational issues Computing.
ATLAS Distributed Analysis DISTRIBUTED ANALYSIS JOBS WITH THE ATLAS PRODUCTION SYSTEM S. González D. Liko
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI Services for Distributed e-Infrastructure Access Tiziana Ferrari on behalf.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Enabling Grids for E-sciencE Work Load Management & Simple Job Submission Practical Shu-Ting Liao APROC, ASGC EGEE Tutorial.
BaBar & Grid Eleonora Luppi for the BaBarGrid Group TB GRID Bologna 15 febbraio 2005.
Workload Management Workpackage
Grid2Win Porting of gLite middleware to Windows XP platform
Grid Computing: Running your Jobs around the World
The EDG Testbed Deployment Details
The VIRGO DATA ANALYSIS Fulvio Ricci
GGF OGSA-WG, Data Use Cases Peter Kunszt Middleware Activity, Data Management Cluster EGEE is a project funded by the European.
INFN-GRID Workshop Bari, October, 26, 2004
Introduction to Grid Technology
5. Job Submission Grid Computing.
Wide Area Workload Management Work Package DATAGRID project
Presentation transcript:

A Grid Approach to Geographically Distributed Data Analysis for Virgo F. Barone, M. de Rosa, R. De Rosa, R. Esposito, P. Mastroserio, L. Milano, F. Taurino, G.Tortone INFN Napoli Università di Napoli “Federico II” Università di Salerno L. Brocco, S. Frasca, C. Palomba, F. Ricci INFN Roma1 Università di Roma “La Sapienza” GWADW 2002 – Isola d’Elba (Italy) – May

Outline scientific goals and requirements basic concepts of GRID what the Grid offers layout of VIRGO Virtual Organisation application to gravitational waves data analysis conclusions

Scientific goals and requirements the coalescing binaries and periodic sources analysis needs large computing power ~ 300 Gflops for coalescing binaries search ~ 1000 Gflops for periodic sources search computational grids allows to use computing resources available in different laboratories/institutions

GRID: a definition GRID: an infrastructure to allow the sharing and coordinated use of resources within large, dynamic and multi-institutionals communities;

Basic resources of DataGrid Middleware DataGrid is an European Community project (3 years) to develop Grid Middleware and testbed infrastructure on European scale; need to execute a program Computing Element (CE) need to access data Storage Element (SE) need to move data network

GRID resource that provides CPU cycles Examples: clusters of PCs supercomputers... Computing Element (CE)

GRID resource that provides disk space to store files Examples: simple disks pool big Mass Storage System... Data is accessible to all processes running on CEs via multiple protocols Storage Element (SE)

Grid resource A Grid resource provides a standard interface (protocol and API) that is common to that type of resource: all CEs talk the same protocol (CE protocol) independently of the underlying batch system; all SEs talk the same protocol (SE protocol) independently of the underlying Mass Storage System

What the Grid offers independence from execution location the user doesn’t want to know where a job will run (what CE) independence from data location the user doesn’t want to know where is data (what SE); security authentication, authorization;

Independence from execution location

Workload Management System Resource Broker (RB) a Resource Broker tries to find a good match between the job requirements and preferences and the available resources, in particular CEs Job Submission Service (JSS) the Job Submission Service then guarantees a reliable job submission and monitoring

Scheduling criteria 1.authorization information 2.data availability 3.job requirements 4.job preferences 5.accounting

Monitoring/Information System The Resource Broker needs some information: what are available resources ? what is their status ? The Resource Broker query the Monitoring Information System to locate producers (CE, SE,...) and then obtain data directly from producers;

status update “pushed” on MIS data obtained from CE

Logging and bookkeeping The LB service is a database of events concerning jobs and the other service of Workload Management System (RB and JSS) provides status info for jobs; designed to be highly reliable and available;

Independence from data location

Replica Catalogue (RC) With Replica Catalogue the same file (master) can exists in multiple copies (replicas) LFN – Logical File Name: name for a set of replicas example: lfn://virgo.org/virgofile-1.dat PFN – Physical File Name: location of a replica example: pfn://virgo-se.na.infn.it/virgo/virgofile-1.dat it’s up to RB to translate LFN in PFN to locate the SE “closed” to a CE

GridFtp GridFtp is an efficient data transfer protocol Features: GSI security; multiple data channels for parallel transfers; partial file transfers; third-party (direct server-to-server) transfers; interrupted transfer recovery;

saturation of lowest bandwith INFN Napoli – 34 Mbit/s CNAF Bologna – 98 Mbit/s GridFTP tests period “standard FTP” average bandwith

Grid Approach to Geographically Distributed Data Analysis for Virgo

Layout of VIRGO Virtual Organisation Computing Element Worker Node 1 Worker Node 3 Storage Element Worker Node 2 CNAF-Bologna Resource Broker Information Index Replica Catalogue Computing Element Worker Node 1 Worker Node 2 User Interface INFN Roma1 Computing Element Worker Node 1 Worker Node 2 User Interface INFN Napoli GARR E0 run Storage Element

Computing Element Worker Node 1 Worker Node 2 Worker Node 3 Storage Element Computing Element Worker Node 1 Computing Element Worker Node 1 Job submission mechanism User Interface PBS Resource Broker I I IS OS

Job submission mechanism  The general scheme for distributed computation is the following:  multiple jobs submission from the Rome UI;  the Resource Broker interrogates the Information Index and submit each job to an available WN; the Input Data file is staged from the SE on the WN;  the output is sent back to the UI or published on SE;  the Resource Broker automatically distributes the jobs among the nodes (according to specifications in the JDL file) unless we decide to tie a given job to a particular node;  job scheduling at the node level is done via PBS.

Grid tests for coalescing binaries search 1/2 Algorithm: standard matched filters Templates generated at PN order 2 with Taylor approximants Data VIRGO E0 run start GPS time: data length: 600 s Conditions raw data resampled at 2 kHz lower frequency: 60 Hz upper frequency: 1 kHz search space: 2 – 10 solar masses minimal match: 0.97 number of templates: ~ 40000

Grid tests for coalescing binaries search 2/2 Step 1 The data were extracted from CNAF-Bologna Mass Storage System. The extraction process reads the VIRGO standard frame format, performs a simple resampling and publishes the selected data file on the Storage Element; Step 2 The search was performed dividing the template space in 200 subspace and submitting from Napoli User Interface a job for each template subspace. Each job reads the selected data file from the Storage Element (located at CNAF-Bologna) and runs on the Worker Nodes selected by Resource Broker in the VIRGO VO. Finally, the output data of each job were retrieved from Napoli User Interface.

Grid tests for periodic sources search The analysis for periodic sources search is based on a hierarchical approach in which coherent steps, based on FFTs and incoherent ones, based on the Hough Transform, alternates. At each iteration a more refined analysis is done on the selected candidates. This procedure fits very well in a geographically distributed computational scheme. The whole problem can be divided in a number of independent smaller tasks, each performed by a given computational node. E.g. each node can analyze a frequency band and/or a portion of the sky. We have performed some preliminary test to evaluate the DataGrid software with respect to our analysis problem. For the GRID tests we have used the code for the Hough Transform. The source spin-down is not taken into account. The input of the code is given by a “peak map” in the time-frequency plane.

Grid tests for periodic sources search 1/2 The tests consists of two phases: 1.Production of input data on the SE; 2.Distributed computation.  We start from raw data of engineering run E1 (~ 5 hours) and the steps are the following:  channel extraction;  decimation at 1 kHz;  generation of periodograms by computing interlaced and windowed FFT (T_FFT= s);  peaks selection (above two times the average noise); The produced time-frequency peaks map covers 20 Hz in frequency (from 480 to 500 Hz).

Grid tests for periodic sources search 2/2  Each computing node processes a subset of the whole frequency band. Each job runs according to this scheme:  reads its initial reference frequency and the velocity vector direction;  migrates on a worker node;  takes from the SE the input data corresponding to the frequency band associated to that job;  calculates the current frequency band of interest, i.e the Doppler band;  calculates the Hough Transform;  iterates on the reference frequency until the full band has been processed.  The output of each job would be a set of candidates which will be followed in the next coherent phase.

Conclusions  we have successfully verified that multiple jobs can be submitted and the output retrieved with small overhead time;  computational grids seems very suitable to perform data analysis for coalescing binaries and periodic sources searches;  Future plans  testing MPI-job submission for coalescing binaries search (feature provided in next DataGrid release);  testing the whole data analysis chain for periodic sources search;  first tests for network analysis among interferometers;