Ideas for a virtual analysis facility Stefano Bagnasco, INFN Torino CAF & PROOF Workshop CERN Nov 29-30, 2007.

Slides:



Advertisements
Similar presentations
Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.
Advertisements

SLA-Oriented Resource Provisioning for Cloud Computing
Xrootd and clouds Doug Benjamin Duke University. Introduction Cloud computing is here to stay – likely more than just Hype (Gartner Research Hype Cycle.
S.Chechelnitskiy / SFU Simon Fraser Running CE and SE in a XEN virtualized environment S.Chechelnitskiy Simon Fraser University CHEP 2007 September 6 th.
GSIAF "CAF" experience at GSI Kilian Schwarz. GSIAF Present status Present status installation and configuration installation and configuration usage.
CPS216: Advanced Database Systems (Data-intensive Computing Systems) How MapReduce Works (in Hadoop) Shivnath Babu.
Statistics of CAF usage, Interaction with the GRID Marco MEONI CERN - Offline Week –
1 Status of the ALICE CERN Analysis Facility Marco MEONI – CERN/ALICE Jan Fiete GROSSE-OETRINGHAUS - CERN /ALICE CHEP Prague.
Pilots 2.0: DIRAC pilots for all the skies Federico Stagni, A.McNab, C.Luzzi, A.Tsaregorodtsev On behalf of the DIRAC consortium and the LHCb collaboration.
Status Report on Tier-1 in Korea Gungwon Kang, Sang-Un Ahn and Hangjin Jang (KISTI GSDC) April 28, 2014 at 15th CERN-Korea Committee, Geneva Korea Institute.
José M. Hernández CIEMAT Grid Computing in the Experiment at LHC Jornada de usuarios de Infraestructuras Grid January 2012, CIEMAT, Madrid.
OSG Middleware Roadmap Rob Gardner University of Chicago OSG / EGEE Operations Workshop CERN June 19-20, 2006.
Group Computing Strategy Introduction and BaBar Roger Barlow June 28 th 2005.
1 Evolution of OSG to support virtualization and multi-core applications (Perspective of a Condor Guy) Dan Bradley University of Wisconsin Workshop on.
1 BIG FARMS AND THE GRID Job Submission and Monitoring issues ATF Meeting, 20/06/03 Sergio Andreozzi.
:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: GridKA School 2009 MPI on Grids 1 MPI On Grids September 3 rd, GridKA School 2009.
DataGrid Applications Federico Carminati WP6 WorkShop December 11, 2000.
Grid Technologies  Slide text. What is Grid?  The World Wide Web provides seamless access to information that is stored in many millions of different.
Can we use the XROOTD infrastructure in the PROOF context ? The need and functionality of a PROOF Master coordinator has been discussed during the meeting.
PROOF work progress. Progress on PROOF The TCondor class was rewritten. Tested on a condor pool with 44 nodes. Monitoring with Ganglia page. The tests.
Sejong STATUS Chang Yeong CHOI CERN, ALICE LHC Computing Grid Tier-2 Workshop in Asia, 1 th December 2006.
BOF: Megajobs Gracie: Grid Resource Virtualization and Customization Infrastructure How to execute hundreds of thousands tasks concurrently on distributed.
WNoDeS – Worker Nodes on Demand Service on EMI2 WNoDeS – Worker Nodes on Demand Service on EMI2 Local batch jobs can be run on both real and virtual execution.
PROOF Cluster Management in ALICE Jan Fiete Grosse-Oetringhaus, CERN PH/ALICE CAF / PROOF Workshop,
Multi-core jobs at the RAL Tier-1 Andrew Lahiff, Alastair Dewhurst, John Kelly February 25 th 2014.
1 PRAGUE site report. 2 Overview Supported HEP experiments and staff Hardware on Prague farms Statistics about running LHC experiment’s DC Experience.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Usage of virtualization in gLite certification Andreas Unterkircher.
Stefano Belforte INFN Trieste 1 Middleware February 14, 2007 Resource Broker, gLite etc. CMS vs. middleware.
Overview of grid activities in France in relation to FKPPL FKPPL Workshop Thursday February 26th, 2009 Dominique Boutigny.
CASTOR evolution Presentation to HEPiX 2003, Vancouver 20/10/2003 Jean-Damien Durand, CERN-IT.
Evolution of a High Performance Computing and Monitoring system onto the GRID for High Energy Experiments T.L. Hsieh, S. Hou, P.K. Teng Academia Sinica,
Grid and Cloud Computing Alessandro Usai SWITCH Sergio Maffioletti Grid Computing Competence Centre - UZH/GC3
NTU Cloud 2010/05/30. System Diagram Architecture Gluster File System – Provide a distributed shared file system for migration NFS – A Prototype Image.
Peter Couvares Associate Researcher, Condor Team Computer Sciences Department University of Wisconsin-Madison
High Availability Technologies for Tier2 Services June 16 th 2006 Tim Bell CERN IT/FIO/TSI.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The LCG interface Stefano BAGNASCO INFN Torino.
ISU Workshop IV 4/5/07 1 OSG  ISU Brief History: ISU linux farm began producing B A B AR MC in 2001 (12 cpus, 200 GB) by 2005 had 41.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Definitions Information System Task Force 8 th January
INRNE's participation in LCG Elena Puncheva Preslav Konstantinov IT Department.
Accounting John Gordon WLC Workshop 2016, Lisbon.
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Job Management Claudio Grandi.
DGAS Distributed Grid Accounting System INFN Workshop /05/1009, Palau Giuseppe Patania Andrea Guarise 6/18/20161.
INFSO-RI Enabling Grids for E-sciencE Padova site report Massimo Sgaravatto On behalf of the JRA1 IT-CZ Padova group.
BaBar & Grid Eleonora Luppi for the BaBarGrid Group TB GRID Bologna 15 febbraio 2005.
CNAF - 24 September 2004 EGEE SA-1 SPACI Activity Italo Epicoco.
ALICE: le strategie per l’analisi Massimo Masera Dipartimento di Fisica Sperimentale e INFN sezione di Torino Workshop CCR e INFN-GRID 2009.
Quattor installation and use feedback from CNAF/T1 LCG Operation Workshop 25 may 2005 Andrea Chierici – INFN CNAF
CREAM Status and plans Massimo Sgaravatto – INFN Padova
Virtual machines ALICE 2 Experience and use cases Services at CERN Worker nodes at sites – CNAF – GSI Site services (VoBoxes)
CERN Openlab Openlab II virtualization developments Havard Bjerke.
WLCG IPv6 deployment strategy
ALICE & Clouds GDB Meeting 15/01/2013
ALICE and LCG Stefano Bagnasco I.N.F.N. Torino
ATLAS Cloud Operations
Report PROOF session ALICE Offline FAIR Grid Workshop #1
Distributed Job Submission in a Dynamic Virtual Environment
Status of the CERN Analysis Facility
INFN-GRID Workshop Bari, October, 26, 2004
Grid status ALICE Offline week Nov 3, Maarten Litmaath CERN-IT v1.0
GSIAF & Anar Manafov, Victor Penso, Carsten Preuss, and Kilian Schwarz, GSI Darmstadt, ALICE Offline week, v. 0.8.
David Cameron ATLAS Site Jamboree, 20 Jan 2017
CREAM-CE/HTCondor site
GSIAF "CAF" experience at GSI
PES Lessons learned from large scale LSF scalability tests
Grid status ALICE Offline week March 30, Maarten Litmaath CERN-IT v1.1
Support for ”interactive batch”
Presentation transcript:

Ideas for a virtual analysis facility Stefano Bagnasco, INFN Torino CAF & PROOF Workshop CERN Nov 29-30, 2007

Stefano Bagnasco - INFN Torino PROOF Workshop - CERN Nov Caveat! This is just an idea we’re starting to work on in Torino We don’t even have a prototype yet But Federico urged for a presentation… …so I substituted facts with brightly coloured animated diagrams.

Stefano Bagnasco - INFN Torino PROOF Workshop - CERN Nov Analysis in the tiered model At Tier-1s Large number of CPUs Feasible to take some out of the Grid infrastructure to build a PROOF-based Analysis Facility Or may even be possible to “drain” jobs and switch to interactive mode quickly At Tier-3s Very small number of CPUs Probably not a Grid site, at least with gLite middleware Use PROOF And Tier-2s? Most resources are provided ad Grid WNs In the ALICE computing model, this is where user analysis runs

Stefano Bagnasco - INFN Torino PROOF Workshop - CERN Nov LCG Worker Node Virtual proof cluster LCG Worker Node LCG CE LCG Worker Node Xen Dom0 PROOF Slave Xrootd Server Xen can dynamically allocate resources to either machine Both memory and CPU scheduling priority! Memory is the issue, CPU priority limit is enough Normal operation: PROOF slaves are “dormant” (minimal memory allocation, very low CPU priority) Interactive access: dinamically increase resources to the PROOF instances, job on WN slows down Alternatively, “wake up” more slaves

Stefano Bagnasco - INFN Torino PROOF Workshop - CERN Nov Dynamical allocation LCG CE Query WN Query PROOF Master Slave Advantages Grid batch job on the WN ideally never completely stops, only slows down Non-CPU-intensive I/0 operations can go on and do not timeout Both environments are sandboxed and independent, no interference No don’t actual need to be LCG WNs at all, can be anything

Stefano Bagnasco - INFN Torino PROOF Workshop - CERN Nov Pros and cons But… Needs well-stuffed boxes to be viable LCG Deployments don’t mix up well with other stuff There is an issue with advertised CPU power (e.g. in ETT). In a multi-VO environment is this acceptable? Is it clearly possible that some WN-side batch jobs will crash even if one provides a huge swap space. Will this be acceptable? … Advantages Grid batch job on the WN ideally never completely stops, only slows down Non-CPU-intensive I/0 operations can go on and do not timeout Both environments are sandboxed and independent, no interference No actual need to be LCG WNs at all, can be anything Can have quickly a working prototype, and add advanced features later

Stefano Bagnasco - INFN Torino PROOF Workshop - CERN Nov Virtual Analysis Facility For Alice “Director” Globally manages the resource allocation This is the missing piece to be developed Next slide! Shopping list: Xen Two (or maybe more) virtual machines per physical one LCG WN (or whatever) On one of the virtual machines PROOF + xrootd One (or more) slaves per physical machine One head node (master)

Stefano Bagnasco - INFN Torino PROOF Workshop - CERN Nov The missing piece Can easily have a semi-static prototype Or completely static, just setting CPU limits Just a “slider” to move resources by hand This is not very far, essentially a deployment issue An idea by P. Buncic: use SmartDomains Developed by X. Gr é hant (HP Fellow at CERN Openlab) This application is not its primary use case Not all the needed functionality is there Following step is a truly dynamical system Coupled with PROOF Master Measures load and automatically starts more workers/assign more resources as needed

Stefano Bagnasco - INFN Torino PROOF Workshop - CERN Nov Xen Dom0 LCG Worker Node Xrootd Server Proof Slave Deployment on Multicore machines One VM, several PROOF Workers Assign more resources to the VM when starting a fresh worker Proof Slave

Stefano Bagnasco - INFN Torino PROOF Workshop - CERN Nov Xen Dom0 LCG Worker Node Xrootd Server Proof Slave Deployment on Multicore machines One VM per PROOF Worker Maybe running xrootd on Dom0? Proof Slave

PROOF Workshop CERN Nov 29-30, 2007 Thanks! Ideas, advice & suggestions please! to.infn.it