Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t DBES P. Saiz The future of AliEn.

Slides:



Advertisements
Similar presentations
DataTAG WP4 Meeting CNAF Jan 14, 2003 Interfacing AliEn and EDG 1/13 Stefano Bagnasco, INFN Torino Interfacing AliEn to EDG Stefano Bagnasco, INFN Torino.
Advertisements

During the last three years, ALICE has used AliEn continuously. All the activities needed by the experiment (Monte Carlo productions, raw data registration,
EGEE-II INFSO-RI Enabling Grids for E-sciencE The gLite middleware distribution OSG Consortium Meeting Seattle,
CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.
1 CHEP 2000, Roberto Barbera Tests of data management services in EDG 1.2 ALICE Off-line Week,
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
1 Grid services based architectures Growing consensus that Grid services is the right concept for building the computing grids; Recent ARDA work has provoked.
1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu
ALICE Operations short summary and directions in 2012 Grid Deployment Board March 21, 2011.
ALICE Operations short summary and directions in 2012 WLCG workshop May 19-20, 2012.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES News on monitoring for CMS distributed computing operations Andrea.
Summary of issues and questions raised. FTS workshop for experiment integrators Summary of use  Generally positive response on current state!  Now the.
AliEn Tutorial MODEL th May, May 2009 Installation of the AliEn software AliEn and the GRID Authentication File Catalogue.
DIRAC Web User Interface A.Casajus (Universitat de Barcelona) M.Sapunov (CPPM Marseille) On behalf of the LHCb DIRAC Team.
CERN - IT Department CH-1211 Genève 23 Switzerland t The High Performance Archiver for the LHC Experiments Manuel Gonzalez Berges CERN, Geneva.
AliEn uses bbFTP for the file transfers. Every FTD runs a server, and all the others FTD can connect and authenticate to it using certificates. bbFTP implements.
CERN IT Department CH-1211 Geneva 23 Switzerland t CF Messaging System Ivan, Omar, Sergio 14 march 2012.
CERN IT Department CH-1211 Geneva 23 Switzerland t The Experiment Dashboard ISGC th April 2008 Pablo Saiz, Julia Andreeva, Benjamin.
CERN IT Department CH-1211 Genève 23 Switzerland t EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES P. Saiz (IT-ES) AliEn job agents.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
1 st December 2003 JIM for CDF 1 JIM and SAMGrid for CDF Mòrag Burgon-Lyon University of Glasgow.
CERN IT Department CH-1211 Geneva 23 Switzerland t Daniel Gomez Ruben Gaspar Ignacio Coterillo * Dawid Wojcik *CERN/CSIC funded by Spanish.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Successful Common Projects: Structures and Processes WLCG Management.
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES A. Abramyan, S. Bagansco, S. Banerjee, L. Betev, F. Carminati,
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Priorities update Andrea Sciabà IT/GS Ulrich Schwickerath IT/FIO.
AliEn AliEn at OSC The ALICE distributed computing environment by Bjørn S. Nilsen The Ohio State University.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES A. Abramyan, S. Bagansco, S. Banerjee, L. Betev, F. Carminati,
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES GGUS Ticket review T1 Service Coordination Meeting 2010/10/28.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
Database authentication in CORAL and COOL Database authentication in CORAL and COOL Giacomo Govi Giacomo Govi CERN IT/PSS CERN IT/PSS On behalf of the.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Upcoming Features and Roadmap Ricardo Rocha ( on behalf of the.
JAliEn Java AliEn middleware A. Grigoras, C. Grigoras, M. Pedreira P Saiz, S. Schreiner ALICE Offline Week – June 2013.
+ AliEn site services and monitoring Miguel Martinez Pedreira.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Andrea Sciabà Hammercloud and Nagios Dan Van Der Ster Nicolò Magini.
DIRAC Pilot Jobs A. Casajus, R. Graciani, A. Tsaregorodtsev for the LHCb DIRAC team Pilot Framework and the DIRAC WMS DIRAC Workload Management System.
David Adams ATLAS ATLAS-ARDA strategy and priorities David Adams BNL October 21, 2004 ARDA Workshop.
EGEE is a project funded by the European Union under contract IST Package Manager Predrag Buncic JRA1 ARDA 21/10/04
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The LCG interface Stefano BAGNASCO INFN Torino.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES L. Betev, A. Grigoras, C. Grigoras, P. Saiz, S. Schreiner AliEn.
Enabling Grids for E-sciencE CMS/ARDA activity within the CMS distributed system Julia Andreeva, CERN On behalf of ARDA group CHEP06.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Andrea Sciabà Ideal information system - CMS Andrea Sciabà IS.
Alien and GSI Marian Ivanov. Outlook GSI experience Alien experience Proposals for further improvement.
CERN - IT Department CH-1211 Genève 23 Switzerland t Grid Reliability Pablo Saiz On behalf of the Dashboard team: J. Andreeva, C. Cirstoiu,
+ AliEn status report Miguel Martinez Pedreira. + Touching the APIs Bug found, not sending site info from ROOT to central side was causing the sites to.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES The AliEn File Catalogue Jamboree on Evolution of WLCG Data &
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
Creating a simplified global unique file catalogue Miguel Martinez Pedreira Pablo Saiz.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES A. Abramyan, S. Bagnasco, L. Betev, D. Goyal, A. Grigoras, C.
Federating Data in the ALICE Experiment
Installation of the ALICE Software
ALICE and LCG Stefano Bagnasco I.N.F.N. Torino
The EDG Testbed Deployment Details
Progress on NA61/NA49 software virtualisation Dag Toppe Larsen Wrocław
UML diagrams for the AliEn job execution part and PackMan service
Torrent-based software distribution
ALICE FAIR Meeting KVI, 2010 Kilian Schwarz GSI.
INFN-GRID Workshop Bari, October, 26, 2004
GSIAF & Anar Manafov, Victor Penso, Carsten Preuss, and Kilian Schwarz, GSI Darmstadt, ALICE Offline week, v. 0.8.
ALICE Physics Data Challenge 3
MC data production, reconstruction and analysis - lessons from PDC’04
Short update on the latest gLite status
Torrent-based software distribution
LCG middleware and LHC experiments ARDA project
Support for ”interactive batch”
Initial job submission and monitoring efforts with JClarens
Offline framework for conditions data
Presentation transcript:

Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES P. Saiz The future of AliEn

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 227 Mar 2013 Pablo Saiz ALICE offline week Table of contents Current statusOngoing workFuture plansSummary

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 327 Mar 2013 Pablo Saiz ALICE offline week AliEn File Catalogue –LFN to PFN mapping –Metadata –700 M entries TaskQueue –Job execution model –Package management –50K concurrent jobs File transfers Used by ALICE and PANDA

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 427 Mar 2013 Pablo Saiz ALICE offline week AliEn versions v2-19 ** : Current version of ALICE –With plenty of patches v2-20: Current version of PANDA –Json, removal of PackMan, Catalogue layout v2-21: Development release –GUIDless catalogue After a release has been adopted, database change go to new release

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 527 Mar 2013 Pablo Saiz ALICE offline week AliEn improvements not yet used by ALICE Catalogue structure –InnoDB tables, foreign keys, numeric id –2-day downtime or creating 1 week hybrid version Removal of PackMan service –Clients can handle package installation by themselves JSON communication –Backward incompatible. Full redeployment File popularity –Requires changes in the Central Services

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 627 Mar 2013 Pablo Saiz ALICE offline week Current work File Catalogue jAliEnPopularity Classads Trust Model Priority Price AliEn/PoDVO to VO

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 727 Mar 2013 Pablo Saiz ALICE offline week File Catalogue Investigate File Catalogue using file system –Using all features from real file system: user, quotas, Prototype of AliEn creating entries on FS: –700M entries in the ALICE catalogue –Ext4 not up to the challenge  reiserfs –One entry per file  one entry per directory Locking, simultaneous clients, booking entries, backups –Prototype was discontinued File catalogue without GUID –See Miguel’s presentation

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 827 Mar 2013 Pablo Saiz ALICE offline week jAliEn Already used in production: –Managing productions –Data transfers –Data cleanup Server part for the web interface Need to: –Improve the ROOT plugin –Integrate on FITS

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 927 Mar 2013 Pablo Saiz ALICE offline week Other improvements TaskQueue improvements: –Store diffs between original and final JDL –Remove Classad library –Retrial mechanism Separation of price and priority –Priority: select user –Price: sort among the jobs of the same use More worker nodes platforms: SLC6 Fedora

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 1027 Mar 2013 Pablo Saiz ALICE offline week File Popularity Developed by A. Abramyan and N. Manukyan Requires patches in central services v2-19 Frequency of file access: –Including errors –File types Identify: –In demand files  increase replicas –Other files  decrease replicas –Broken files

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 1127 Mar 2013 Pablo Saiz ALICE offline week Other contributions AliEn trust model –Define service/user trust, and schedule jobs/storage accordingly –Sergio Guinez, TALCA AliEn/PoD integration –Interactive analysis on the grid –Cinzia Luzzi VO to VO submission –Submit jobs from one VO to another, output visible in both –PANDA colleagues

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 1227 Mar 2013 Pablo Saiz ALICE offline week PANDA GRID/AliEn developers Link

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 1327 Mar 2013 Pablo Saiz ALICE offline week Future work Testing Framework Job Brokering User credentials Scaling up

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 1427 Mar 2013 Pablo Saiz ALICE offline week Testing framework Create environment to test new approaches Up to know: –BITS & FITS (functionality tests) –PANDA (becoming a mature GRID) –Development VO: ALICE_TEST Setup and running for one year Used for some train analyses Users have different priorities

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 1527 Mar 2013 Pablo Saiz ALICE offline week Development environment I FC TQ SE CE SECE …… FC TQ SESE CECE SESE CECE ……

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 1627 Mar 2013 Pablo Saiz ALICE offline week Environment I One way catalogue synchronization –Take snapshot of catalogue Duplicate small percentage of jobs –5,10% of TQ Jobs get executed twice –Easy to check output –Duplication of work –Setting new SE that will be erased Test of the full scale catalogue

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 1727 Mar 2013 Pablo Saiz ALICE offline week Development Environment II FC TQ SE CE SECE …… FC TQ CECE CECE … CE

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 1827 Mar 2013 Pablo Saiz ALICE offline week Using VO to VO submission –Once the plugin becomes available… New VO with only CE –Easier to setup –Using same SE as ALICE If jobs fail, reschedule them Does not test the full catalogue

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 1927 Mar 2013 Pablo Saiz ALICE offline week Alternative Job Brokering Two level broker: –Broker dispatches batches of job to CM –CM distributes among worker nodes –Bigger dependency on vobox –Reduce load on central services New job optimizer: –Groups jobs together Ideally, with the same input –Send group to the JobAgent

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 2027 Mar 2013 Pablo Saiz ALICE offline week User credentials Glexec Propagate user credentials to worker node Sign jdl and changes –Traceability As already presented by S. Schreiner sId=0&materialId=slides&confId=111325http://indico.cern.ch/getFile.py/access?contribId=58&sessionId=9&re sId=0&materialId=slides&confId=111325

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 2127 Mar 2013 Pablo Saiz ALICE offline week Factor 1000 scale up… Number of sites: 80  – SETI, BOINC, … Opportunistic sites (without vobox) Number of nodes: 50K jobs  50M jobs –Amazon has 0.5M servers [1] Decentralized Job brokering Amount of information:30 PB  30EB –One tenth of the world’s info! [2] I/O bottleneck Number of files: 700M  700B –Default ext4, max 4B [1] [2]

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 2227 Mar 2013 Pablo Saiz ALICE offline week Factor 1:1000 scale up It will require quite some tuning… Luckily, factor 10 is not even questioned –And that’s more than enough for the expected increase in resources

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 2327 Mar 2013 Pablo Saiz ALICE offline week After more than 13 years…

CERN IT Department CH-1211 Geneva 23 Switzerland t ES 2427 Mar 2013 Pablo Saiz ALICE offline week Summary AliEn can handle current load –80 sites, 50K concurrent jobs, 700 M files An increase of 10 should be easy Plenty of areas for research/improvement –Catalogue –Job distribution –jAliEn AliEn needs a new project leader –Thank you for the last 13 years!