October 2011 1 David Toback Department of Physics and Astronomy Mitchell Institute for Fundamental Physics and Astronomy January 2016 Big Computing in.

Slides:

Advertisements

Similar presentations

1 The K2K Experiment (Japan) Production of neutrinos at KEK and detection of it at Super-K Data acquisition and analysis is in progress Goals Detection.

Advertisements

Guy Almes¹, Daniel Cruz¹, Jacob Hill 2, Steve Johnson¹, Michael Mason 1 3, Vaikunth Thukral¹, David Toback¹, Joel Walker² 1.Texas A&M University 2.Sam.

October 2011 David Toback, Texas A&M University Research Topics Seminar 1 David Toback January 2015 Big Computing and the Mitchell Institute for Fundamental.

FUTURE TECHNOLOGIES Lecture 13.  In this lecture we will discuss some of the important technologies of the future  Autonomic Computing  Cloud Computing.

High Performance Computing Course Notes Grid Computing.

Duke Atlas Tier 3 Site Doug Benjamin (Duke University)

Randall Sobie The ATLAS Experiment Randall Sobie Institute for Particle Physics University of Victoria Large Hadron Collider (LHC) at CERN Laboratory ATLAS.

January 2011 David Toback, Texas A&M University Texas Junior Science and Humanities Symposium 1 David Toback Texas A&M University Texas Junior Science.

The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 15 th April 2009 Visit of Spanish Royal Academy.

Jean-Yves Nief, CC-IN2P3 Wilko Kroeger, SCCS/SLAC Adil Hasan, CCLRC/RAL HEPiX, SLAC October 11th – 13th, 2005 BaBar data distribution using the Storage.

March 2011 David Toback, Texas A&M University Davidson Scholars 1 David Toback Texas A&M University Davidson Scholars March 2011 The Big Bang, Dark Matter.

Computer Science Prof. Bill Pugh Dept. of Computer Science.

Efficiently Sharing Common Data HTCondor Week 2015 Zach Miller Center for High Throughput Computing Department of Computer Sciences.

HEP Prospects, J. Yu LEARN Strategy Meeting Prospects on Texas High Energy Physics Network Needs LEARN Strategy Meeting University of Texas at El Paso.

October 2011 David Toback, Texas A&M University Research Topics Seminar 1 David Toback Texas A&M University Research Topics Seminar September 2012 Cosmology.

Hall D Online Data Acquisition CEBAF provides us with a tremendous scientific opportunity for understanding one of the fundamental forces of nature. 75.

January 2009 David Toback, Saturday Morning Physics 1 Prof. David Toback Texas A&M University January 2014 Dark Matter and the Big Bang Theory.

Scientific Data Infrastructure in CAS Dr. Jianhui Scientific Data Center Computer Network Information Center Chinese Academy of Sciences.

October 24, 2000Milestones, Funding of USCMS S&C Matthias Kasemann1 US CMS Software and Computing Milestones and Funding Profiles Matthias Kasemann Fermilab.

25 February 2000Tim Adye1 Using an Object Oriented Database to Store BaBar's Terabytes Tim Adye Particle Physics Department Rutherford Appleton Laboratory.

Open Science Grid For CI-Days Internet2: Fall Member Meeting, 2007 John McGee – OSG Engagement Manager Renaissance Computing Institute.

High Energy Physics At OSCER A User Perspective OU Supercomputing Symposium 2003 Joel Snow, Langston U.

Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.

Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.

INTRODUCTION The GRID Data Center at INFN Pisa hosts a big Tier2 for the CMS experiment, together with local usage from other HEP related/not related activities.

Open Science Grid For CI-Days Elizabeth City State University Jan-2008 John McGee – OSG Engagement Manager Manager, Cyberinfrastructure.

Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002.

Monitoring the Grid at local, national, and Global levels Pete Gronbech GridPP Project Manager ACAT - Brunel Sept 2011.

8th November 2002Tim Adye1 BaBar Grid Tim Adye Particle Physics Department Rutherford Appleton Laboratory PP Grid Team Coseners House 8 th November 2002.

SLUO LHC Workshop: Closing RemarksPage 1 SLUO LHC Workshop: Closing Remarks David MacFarlane Associate Laboratory Directory for PPA.

October 2011 David Toback, Texas A&M University Research Topics Seminar 1 David Toback Texas A&M University Mitchell Institute for Fundamental Physics.

Supercomputing in High Energy Physics Horst Severini OU Supercomputing Symposium, September 12-13, 2002.

Instrumentation of the SAM-Grid Gabriele Garzoglio CSC 426 Research Proposal.

6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.

Data Logistics in Particle Physics Ready or Not, Here it Comes… Prof. Paul Sheldon Vanderbilt University Prof. Paul Sheldon Vanderbilt University.

November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.

Ian Bird LHC Computing Grid Project Leader LHC Grid Fest 3 rd October 2008 A worldwide collaboration.

…building the next IT revolution From Web to Grid…

Les Les Robertson LCG Project Leader High Energy Physics using a worldwide computing grid Torino December 2005.

Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.

High Energy Physics and Grids at UF (Dec. 13, 2002)Paul Avery1 University of Florida High Energy Physics.

High Energy Physics & Computing Grids TechFair Univ. of Arlington November 10, 2004.

Scientific Computing at SLAC: The Transition to a Multiprogram Future Richard P. Mount Director: Scientific Computing and Computing Services Stanford Linear.

High Energy FermiLab Two physics detectors (5 stories tall each) to understand smallest scale of matter Each experiment has ~500 people doing.

Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow UK-T0 Meeting 21 st Oct 2015 GridPP.

Cyberinfrastructure Overview Russ Hobby, Internet2 ECSU CI Days 4 January 2008.

Outline  Higgs Particle Searches for Origin of Mass  Grid Computing  A brief Linear Collider Detector R&D  The  The grand conclusion: YOU are the.

Computing for LHC Physics 7th March 2014 International Women's Day - CERN- GOOGLE Networking Event Maria Alandes Pradillo CERN IT Department.

Final Implementation of a High Performance Computing Cluster at Florida Tech P. FORD, X. FAVE, K. GNANVO, R. HOCH, M. HOHLMANN, D. MITRA Physics and Space.

LHC Computing, CERN, & Federated Identities

Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.

October 2011 David Toback, Texas A&M University Research Topics Seminar 1 David Toback Texas A&M University Research Topics Seminar September 2013 Cosmology,

Victoria A. White Head, Computing Division, Fermilab Fermilab Grid Computing – CDF, D0 and more..

1 Open Science Grid: Project Statement & Vision Transform compute and data intensive science through a cross- domain self-managed national distributed.

1 Particle Physics Data Grid (PPDG) project Les Cottrell – SLAC Presented at the NGI workshop, Berkeley, 7/21/99.

Grid technologies for large-scale projects N. S. Astakhov, A. S. Baginyan, S. D. Belov, A. G. Dolbilov, A. O. Golunov, I. N. Gorbunov, N. I. Gromova, I.

Scientific Computing at Fermilab Lothar Bauerdick, Deputy Head Scientific Computing Division 1 of 7 10k slot tape robots.

HPC need and potential of ANSYS CFD and mechanical products at CERN A. Rakai EN-CV-PJ2 5/4/2016.

LHC collisions rate: Hz New PHYSICS rate: Hz Event selection: 1 in 10,000,000,000,000 Signal/Noise: Raw Data volumes produced.

Particle Physics Sector Young-Kee Kim / Greg Bock Leadership Team Strategic Planning Winter Workshop January 29, 2013.

HIGH ENERGY PHYSICS DATA WHERE DATA COME FROM, WHERE THEY NEED TO GO In their quest to understand the fundamental nature of matter and energy, Fermilab.

Scaling Science Communities Lessons learned by and future plans of the Open Science Grid Frank Würthwein OSG Executive Director Professor of Physics UCSD/SDSC.

Cosmology and Particle Physics in a Single Experiment

LHC DATA ANALYSIS INFN (LNL – PADOVA)

Readiness of ATLAS Computing - A personal view

US CMS Testbed.

Big Computing in High Energy Physics David Toback

Big Computing in High Energy Physics

Particle Physics Theory

Using an Object Oriented Database to Store BaBar's Terabytes

Presentation transcript:

October David Toback Department of Physics and Astronomy Mitchell Institute for Fundamental Physics and Astronomy January 2016 Big Computing in High Energy Physics College of Science Big Data Workshop

Outline Particle Physics and Big Computing Big Collaborations and their needs: –What makes it hard? What makes it easy? –Individual Experiments Collider Physics/LHC/CMS Dark Matter/CDMS Experiment Phenomenology How we get our needs met: Brazos Cluster Accomplishments and Status Lessons learned and Requirements for the future Conclusions Jan 2016 COS Big Data Workshop David Toback Big Computing in High Energy Physics2

Particle Physics and Big Computing High Energy Physics = Particle Physics All activities within the Mitchell Institute –Includes both theory and experiment, as well as other things like String Theory and Astronomy Big picture of science: Making discoveries at the interface of Astronomy, Cosmology and Particle Physics Big Computing/Big Data needs from 4 user/groups –CMS Experiment at the Large Hadron Collider (LHC) –Dark Matter Search using the CDMS Experiment –High Energy Phenomenology –Other (mostly CDF experiment at Fermilab, and Astronomy group) Jan 2016 COS Big Data Workshop David Toback Big Computing in High Energy Physics3

What makes it Easy? What makes it Hard? Advantages: Physics goals well defined Algorithms well defined, much of the code is common –We already have the benefits of lots of high quality interactions with scientists around the world Lots of world-wide infrastructure and brain power for data storage, data management, shipping and processing Massively parallel data (high throughput, not high performance) Disadvantages Need more disk space than we have (VERY big data) Need more computing than we have (VERY big computing) Need to move the data around the world for analysis, for 1000’s of scientists at hundreds of institutions –Political and security issues Need to support lots of code we didn’t write Being able to bring, store and process lots of events locally is a major competitive science advantage –Better students, more and higher profile results, more funding etc. –National labs provide huge resources, but they aren’t enough Jan 2016 COS Big Data Workshop David Toback Big Computing in High Energy Physics4

CMS Experiment at the LHC (Discovery of the Higgs) Collider Physics at CERN/Fermilab have often been the big computing drivers in the world (brought us the WWW and still drive Grid computing worldwide) Experiments have a 3-tiered, distributed computing model on the Open Science Grid to handle the 10’s of petabytes and millions of CPU hours each month –Fermilab alone runs more than 50M CPU- hrs/month Taking data now At A&M: Run jobs as one of many GRID sites as part of an international collaboration (we are a Tier 3) 4 Faculty, with ~20 postdocs + students Jan 2016 COS Big Data Workshop David Toback Big Computing in High Energy Physics5

Dark Matter Searches with the CDMS Experiment Much smaller experiment (~100 scientists), but same computing issues –Only 100’s of Tb, and just millions of CPU-hrs /Month –4 faculty, ~20 postdoc + students Today: Most computing done at A&M, most data storage at Stanford Linear Accelerator Center (SLAC) –Big roles in Computing Project Management Future: Will start taking data again in 2020 –Experiment will produce 10’s Tb/month –Will share event processing with national labs (SLAC, Fermilab, SNOLab and PNNL) –We will be a Tier 2 center 6 Jan 2016 COS Big Data Workshop David Toback Big Computing in High Energy Physics 6

Particle Theory/ Phenomenology Particle Phenomenologist's at MIST do extensive theoretical calculations and VERY large simulations of experiments: –Collider data to see what can be discovered at LHC –Dark Matter detectors for coherent neutrino scattering experiments –Interface between Astronomy, Cosmology and Particle Physics Just need high throughput, a few Gb of memory and a few 10’s of Tb –Jobs don’t talk to each other Jan 2016 COS Big Data Workshop David Toback Big Computing in High Energy Physics7

Overview of Brazos* and Why we use it (and not somewhere else) HEP not well suited to existing Supercomputing at A&M because of experimental requirements –Jobs and data from around the world –Firewall issues for external users –Automated data distribution and local jobs regularly accessing remote databases Well matched to the Brazos cluster: high THROUGHPUT, not high PERFORMANCE –Just run LOTS of independent jobs on multi-Tb datasets –Have gotten a big bang for our by buck by cobbling together money to become a stakeholder Purchased ~$200k of computing/disk over the last few years from the Department of Energy, ARRA, NHARP, Mitchell Institute and College of Science –309 Compute nodes/3096 cores for the cluster, Institute owns 700 cores, can run on other cores opportunistically(!) –~200 Tb of disk, Institute owns about 100Tb, can use extra space if available –Can get ~1Tb/hour from Fermilab, 0.75Tb/hr from SLAC –Links to places around the world (CERN-Switzerland, DESY-Germany, CNAF-Spain, UK, FNAL-US, Pisa, CalTech, Korea, France Etc.) –Accepts jobs from Open Science Grid (OSG) –Excellent support from Admins – Almes (PI), Dockendorf and Johnson More detail on how we run at: *Special Thanks to Guy Almes for his Leadership, Management and wise council Jan 2016 COS Big Data Workshop David Toback Big Computing in High Energy Physics8

Fun Plots about how well we’ve done Cumulative CPU Hours –More than 25M core-hours used! CPU-hrs per month –Picked up speed with new operating system and sharing rules –7 Months over 1.5M core- hours/month Jan 2016 COS Big Data Workshop David Toback Big Computing in High Energy Physics9

More fun numbers: Sharing the Wealth October 2015 –1.95M Total core-hours used –1,628,972 core hours Katrina Colletti (CDMS) Lots of individual users: –3 over 1M integrated core-hrs –23 over 100k –39 over 10k –49 over 1k Well spread over Dark Matter, LHC, Pheno and other Jan 2016 COS Big Data Workshop David Toback Big Computing in High Energy Physics10

Students Already had 4 PhD’s using the system, with more graduations coming in the next year or so Two of my students who helped bring up the system have gone on to careers in Scientific Computing: –Mike Mason: Master’s 2010 Now Computing professional at Los Alamos –Vaikunth Thukral: Master’s 2011 Now Computing professional at Stanford Linear Accelerator Center (SLAC) on LSST (Astronomy) Jan 2016 COS Big Data Workshop David Toback Big Computing in High Energy Physics11

Some Lessons Learned Monitoring how quickly data gets transferred can tell you if there are bad spots in the network locally as well as around the world –Found multiple bad/flaky boxes in Dallas using PerfSonar Monitoring how many jobs each user has running tells you how well the batch system is doing fair-share and load balancing –Much harder than it looks, especially since some users are very “bursty”: They don’t know exactly when they need to run, and when they need to run they have big needs NOW (telling them to plan doesn’t help) Experts that know both the software and the Admin is a huge win –Useful to have users interface with local software experts (my students) as the first line of defense before bugging Admins Hard to compete with national labs as they are better set up for “collaborative work” since they trust collaborations, but we can’t rely on them alone –Upside to working at the lab: Much more collective disk and CPU, important data stored locally –Downside: No one gets much of the disk or CPU (most of our users could use both, but choose to work locally if they can) –Different balancing of security with ability to get work done is difficult Jan 2016 COS Big Data Workshop David Toback Big Computing in High Energy Physics12

Online Monitoring Constantly interrogate the system –Disks up? Jobs running? Small data transfers working? Run short dummy-jobs for various test cases –Both run local jobs as well as accept automated jobs from outside Automated alarms for the “first line of defense” team (not Admins), as well as the Admin team –Send as well as make the monitoring page Red More detail about our monitoring at Jan 2016 COS Big Data Workshop David Toback Big Computing in High Energy Physics13

Conclusions High Energy Particle Physicists are (and have been) Leaders in Big Data/Big Computing for decades Local users continue this tradition at A&M by effectively using the Brazos cluster for our High Throughput needs We are happy to help others Want to help us? –The ability to get data here quickly has ameliorated short term problems for now, but we need much more disk and CPU –Have used Brazos and Ada for Simulations, but Ada has limited our ability to store data and run jobs. Getting Ada on the OSG would be helpful –The amount of red tape to get jobs in, and allow our non-A&M colleagues to run, has been significant (but not insurmountable). Slowly getting better –Provide software support in addition to Admins Bottom line: Been happy with the Brazos cluster (thanks Admins!) as they helped us discover the Higgs Boson, but our needs have been growing faster than the CPU and Disk availability, and we’ve struggled to support the software Jan 2016 COS Big Data Workshop David Toback Big Computing in High Energy Physics14