8-Dec-15T.Wildish / Princeton1 CMS analytics A proposal for a pilot project CMS Analytics.

Slides:

Advertisements

Similar presentations

Dynamic Grid Optimisation TERENA Conference, Lijmerick 5/6/02 A. P. Millar University of Glasgow.

Advertisements

31/03/00 CMS(UK)Glenn Patrick What is the CMS(UK) Data Model? Assume that CMS software is available at every UK institute connected by some infrastructure.

Data Mining in Computer Games By Adib Adam Hussain & Mohammed Sarfraz.

– Unfortunately, this problems is not yet fully under control – No enough information from monitoring that would allow us to correlate poor performing.

23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.

The CrossGrid project Juha Alatalo Timo Koivusalo.

Understanding Operating Systems 1 Overview Introduction Operating System Components Machine Hardware Types of Operating Systems Brief History of Operating.

The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.

CHEP 2015 Analysis of CERN Computing Infrastructure and Monitoring Data Christian Nieke, CERN IT / Technische Universität Braunschweig On behalf of the.

Peer-to-peer archival data trading Brian Cooper and Hector Garcia-Molina Stanford University.

Business Analytics, Part I Introduction Presented by Scott Koegler Editor, ec-bp.org.

Anomaly detection Problem motivation Machine Learning.

Analysis of Simulation Results Andy Wang CIS Computer Systems Performance Analysis.

L3 Filtering: status and plans D  Computing Review Meeting: 9 th May 2002 Terry Wyatt, on behalf of the L3 Algorithms group. For more details of current.

October 24, 2000Milestones, Funding of USCMS S&C Matthias Kasemann1 US CMS Software and Computing Milestones and Funding Profiles Matthias Kasemann Fermilab.

Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.

Introduction Optimizing Application Performance with Pinpoint Accuracy What every IT Executive, Administrator & Developer Needs to Know.

December 17th 2008RAL PPD Computing Christmas Lectures 11 ATLAS Distributed Computing Stephen Burke RAL.

Operating Systems.

Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 1 Introduction Read:

Grid Data Management A network of computers forming prototype grids currently operate across Britain and the rest of the world, working on the data challenges.

Claudio Grandi INFN Bologna CMS Operations Update Ian Fisk, Claudio Grandi 1.

03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.

A Dynamic Data Grid Replication Strategy to Minimize the Data Missed Ming Lei, Susan Vrbsky, Xiaoyan Hong University of Alabama.

CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.

Your university or experiment logo here Caitriana Nicholson University of Glasgow Dynamic Data Replication in LCG 2008.

SAM and D0 Grid Computing Igor Terekhov, FNAL/CD.

Free and Open Source Software Leaders: Chapter 8 Technology and the Administrator Chris Cerulli Summer 2012.

9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.

Operating Systems David Goldschmidt, Ph.D. Computer Science The College of Saint Rose CIS 432.

Meeting, 5/12/06 CMS T1/T2 Estimates à CMS perspective: n Part of a wider process of resource estimation n Top-down Computing.

PARALLEL COMPUTING overview What is Parallel Computing? Traditionally, software has been written for serial computation: To be run on a single computer.

Systems Analysis and Design in a Changing World, Fourth Edition

EGEE-III INFSO-RI Enabling Grids for E-sciencE Overview of STEP09 monitoring issues Julia Andreeva, IT/GS STEP09 Postmortem.

The LHCb Italian Tier-2 Domenico Galli, Bologna INFN CSN1 Roma,

Stefano Belforte INFN Trieste 1 Middleware February 14, 2007 Resource Broker, gLite etc. CMS vs. middleware.

1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.

Claudio Grandi INFN Bologna CMS Computing Model Evolution Claudio Grandi INFN Bologna On behalf of the CMS Collaboration.

CMS Computing Model Simulation Stephen Gowdy/FNAL 30th April 2015CMS Computing Model Simulation1.

SLACFederated Storage Workshop Summary For pre-GDB (Data Access) Meeting 5/13/14 Andrew Hanushevsky SLAC National Accelerator Laboratory.

Software Engineering Lecture # 1. What is Software? 2 Software is a set of items or objects that includes: programs data documents.

Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,

Software Engineering (CSI 321) Project Planning & Estimation 1.

Avanade Confidential – Do Not Copy, Forward or Circulate © Copyright 2014 Avanade Inc. All Rights Reserved. For Internal Use Only SharePoint Insights (BETA)

CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.

Stephen Gowdy FNAL 9th Feb 2015CMS Computing Model Simulation 1.

LHCbDirac and Core Software. LHCbDirac and Core SW Core Software workshop, PhC2 Running Gaudi Applications on the Grid m Application deployment o CVMFS.

Ian Bird WLCG Networking workshop CERN, 10 th February February 2014

LHCbComputing Computing for the LHCb Upgrade. 2 LHCb Upgrade: goal and timescale m LHCb upgrade will be operational after LS2 (~2020) m Increase significantly.

GDB, 07/06/06 CMS Centre Roles à CMS data hierarchy: n RAW (1.5/2MB) -> RECO (0.2/0.4MB) -> AOD (50kB)-> TAG à Tier-0 role: n First-pass.

CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.

Dynamic Data Placement: the ATLAS model Simone Campana (IT-SDC)

AliRoot survey: Reconstruction P.Hristov 11/06/2013.

Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.

INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.

Joe Foster 1 Two questions about datasets: –How do you find datasets with the processes, cuts, conditions you need for your analysis? –How do.

Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.

PRIN STOA-LHC: STATUS BARI BOLOGNA-18 GIUGNO 2014 Giorgia MINIELLO G. MAGGI, G. DONVITO, D. Elia INFN Sezione di Bari e Dipartimento Interateneo.

ScotGRID is the Scottish prototype Tier 2 Centre for LHCb and ATLAS computing resources. It uses a novel distributed architecture and cutting-edge technology,

Operating Systems & System Software

SuperB and its computing requirements

for the Offline and Computing groups

Software Engineering (CSI 321)

Disk capacities in 2017 and 2018 ALICE Offline week 12/11/2017.

ALICE Computing Upgrade Predrag Buncic

ExaO: Software Defined Data Distribution for Exascale Sciences

Computer Fundamentals

Cost Estimation Van Vliet, chapter 7 Glenn D. Blank.

Kanga Tim Adye Rutherford Appleton Laboratory Computing Plenary

Presentation transcript:

8-Dec-15T.Wildish / Princeton1 CMS analytics A proposal for a pilot project CMS Analytics

Goals for this project Learn how to run an analytics project in CMS Find holes in our monitoring, plug them Improve the way we do use computing resources in Run-2 First of a series, expand scope and ambition along with our experience 2

How can we make more efficient use of our computing resources? CPU: CMS has a shortage of CPU Disk: we have ~enough, but don’t use it well Network: we treat it as free and infinite Budgets: flat, if we’re lucky Manpower: shrinking Expectations for Run-2: very high! => Need greater understanding of how we do things now so we can improve them 3

How do we understand our current system better than now? Simple monitoring not enough –Monitor for debugging in near-time, not good enough for long term –Monitoring per sub-system not well integrated, hard to find correlations between them E.g. interference between user stageout of files from batch jobs and scheduled transfers? –But we have all that data – in principle! => Need to consolidate our monitoring and figure out how to use it! 4

How do we start? Top-down approach:  –Lots of planning, meetings, targets, manpower… –Unlikely to deliver anything useful with long projects –We don’t even know exactly what we want yet! Bottom-up approach: –Pick a few things we want to learn about the system –Start pilot projects to see how we can measure them –Incremental improvements, learn as we go 5

Pilot project: Predicting the popularity of new datasets before they are delivered We have popularity data going back ~3 years We have DDP that can make more replicas of popular data or delete unpopular data Don’t know how many replicas to make of a dataset before it’s been used for a while Can we predict popularity of a new dataset? –Want to do this for data and MC –Want to do this before the data becomes available 6

Predicting popularity: inputs Data type/content –Physics triggers or MC parameters (lepton, jets…) –Software processing steps (RECO, AOD…) –Software version (look for behavioural interactions) –For MC: who requested it (which physics group?) –#unique users, their physics interests and past activities –Sources: SiteDB, HN, DBS, dashboard, CRAB… Popularity and replica information –PopDB & PhEDEx (found a hole already!) 7

Predicting popularity: outputs Estimate of number of replicas needed initially –Don’t need long term prediction, DDP takes care of that –N.B. only care about popular data, for which having too few initial replicas creates bottlenecks Estimate of where to place the data –Based on pattern of user activity –How much CPU or I/O is typically used on similar datasets? –Site reliability, both in past for similar data and now 8

Predicting popularity: method Basics: data import & cleaning, define a data- frame Try a few simple algorithms to start with: –Clustering, decision trees… –Recommender systems (collaborative filtering) Measure performance –On historical data and on new data produced today This is the bulk of our learning curve 9

Next steps… Taking it further: –Look for ‘conference effects’ –Look for interactions between older and newer processings of the same data –See what else we can do with that data? Other analytics projects –Data transfer latency, understanding network traffic, analysis job runtime, job-scheduler optimisation… – 10

Open question: co-operation between IT and CMS? What do we share, and how? –Hardware? Can we use your toys or should we get our own? –Tools? It makes sense to converge where we can, understand why not where we can’t. OTOH, explore vs. exploit, may be useful to share trying alternatives –Experience? Replicating mistakes is not useful Larger projects? –Can we do anything together that we couldn’t do alone? –Look forward to common projects: IT/CMS/ATLAS? 11

Conclusion CMS is embarking on a series of analytics projects –We want to learn how to run analytics projects, what hardware, software and skills it takes –Start a few pilot projects to bootstrap this effort Consolidate and improve our monitoring –Plug gaps that will improve our analytics potential in the future Improve use of computing resources for >= Run-2 –Aim for incremental improvement, proven ROI 12