Presentation is loading. Please wait.

Presentation is loading. Please wait.

Status of DØ Computing at UTA Introduction The UTA – DØ Grid team DØ Monte Carlo Production The DØ Grid Computing –DØRAC –DØSAR –DØGrid Software Development.

Similar presentations


Presentation on theme: "Status of DØ Computing at UTA Introduction The UTA – DØ Grid team DØ Monte Carlo Production The DØ Grid Computing –DØRAC –DØSAR –DØGrid Software Development."— Presentation transcript:

1 Status of DØ Computing at UTA Introduction The UTA – DØ Grid team DØ Monte Carlo Production The DØ Grid Computing –DØRAC –DØSAR –DØGrid Software Development Effort Impact on Outreach and Education Conclusions DoE Site Visit Nov. 13, 2003 Jae Yu University of Texas at Arlington

2 Nov. 13, 2003Status DØ Computing Effort DoE Site Visit, Jae Yu 2 UTA has been producing DØ MC events as the US leader UTA led the effort to Start remote computing at DØ Define remote computing architecture at DØ Implement the remote computing design at DØ in the US Leverage on experience as the ONLY active US DØ MC farm  This became no longer true  UTA is the leader in US DØ Grid effort The UTA DØ Grid team has been playing a leadership role in monitoring software development Introduction

3 Nov. 13, 2003Status DØ Computing Effort DoE Site Visit, Jae Yu 3 The UTA-DØGrid Team Faculty: Jae Yu, David Levine (CSE) Research Associate: HyunWoo Kim –SAM/Grid expert –Development of McFarm SAM/Grid job manager Software Program Consultant: Drew Meyer –Development, improvement, and maintenance of McFarm CSE Master’s Degree Students: –Nirmal Ranganathan: Investigation of Resource needs in Grid execution EE M.S. Student: Prashant Bhamidipati –MC Farm operation and McPerM development PHY Undergraduate Student: David Jenkins –Take over MC Farm Operation and Development of Monitoring database Graduated: –Three CSE MS students  All are at industry –One CSE Undergraduate student  on MS program at U. of Washington

4 Nov. 13, 2003Status DØ Computing Effort DoE Site Visit, Jae Yu 4 UTA DØ MC Production Have two independent farms –Swift farm (HEP) 36 P3 866MHz cpu’s 250Mbyte/cpu A total of.6TB disk space –CSE Farm 12 P3 866MHz cpu’s McFarm as our production control software Statistics (11/1/2002 – 11/12/2003): –Produced: ~10M –Delivered: ~ 8M

5 Nov. 13, 2003Status DØ Computing Effort DoE Site Visit, Jae Yu 5 What do we want to do with the data? Want to analyze data no matter where we are!!! Location and time independent analysis

6 Nov. 13, 2003Status DØ Computing Effort DoE Site Visit, Jae Yu 6 DØ Data Taking Summary 30~40M events/mo

7 Nov. 13, 2003Status DØ Computing Effort DoE Site Visit, Jae Yu 7 Total expected data size is ~4PB (4 million GB=100km of 100GB Hard drives)!!! Detectors are complicated  Need many people to construct and make them work Collaboration is large and scattered all over the world Allow software development at remote institutions Optimized resource management, job scheduling, and monitoring tools Efficient and transparent data delivery and sharing What do we need for efficient data analyses in a HEP experiment?

8 Nov. 13, 2003Status DØ Computing Effort DoE Site Visit, Jae Yu 8 650 Collaborators 78 Institutions 18 Countries DØ Collaboration

9 Nov. 13, 2003Status DØ Computing Effort DoE Site Visit, Jae Yu 9 Old Deployment Models Started with Fermilab-centric SAM infrastructure in place, … …transition to hierarchically distributed Model 

10 Nov. 13, 2003Status DØ Computing Effort DoE Site Visit, Jae Yu 10 Central Analysis Center (CAC) Desktop Analysis Stations DAS …. DAS …. Institutional Analysis Centers IAC... IAC … Normal Interaction Communication Path Occasional Interaction Communication Path Regional Analysis Centers RAC …. RAC DØ Remote Analysis Model (DØRAM)

11 Nov. 13, 2003Status DØ Computing Effort DoE Site Visit, Jae Yu 11 What is a DØRAC? A large concentrated computing resource hub An institute willing to provide storage and computing services to a few small institutes in the region An institute capable of providing increased infrastructure as the data from the experiment grows An institute willing to provide support personnel Complementary to the central facility

12 Nov. 13, 2003Status DØ Computing Effort DoE Site Visit, Jae Yu 12 DØ Southern Analysis Region (DØSAR) The first US Region centered around the UTA – RAC Mexico/Brazil OU/ LU UAZ Rice LTU UTA KU KSU Ole Miss It is a regional virtual organization (RVO) within the greater DØ VO!!

13 Nov. 13, 2003Status DØ Computing Effort DoE Site Visit, Jae Yu 13 SAR Institutions First Generation IAC’s –Langston University –Louisiana Tech University –University of Oklahoma –UTA Second Generation IAC’s –Cinvestav, Mexico –Universidade Estadual Paulista, Brazil –University of Kansas –Kansas State University Third Generation IAC’s –Ole Miss, MS –Rice University, TX –University of Arizona, Tucson, AZ

14 Nov. 13, 2003Status DØ Computing Effort DoE Site Visit, Jae Yu 14 Goals of DØ Southern Analysis Region Prepare institutions within the region for grid enabled analyses using RAC at UTA Enable IAC’s to contribute to the experiment as much as they can, including MC production and data re- processing Provide GRID enabled software and computing resources to DØ collaboration Provide regional technical support and help new IAC’s Perform physics data analyses within the region Discover and draw in more computing and human resources from external sources

15 Nov. 13, 2003Status DØ Computing Effort DoE Site Visit, Jae Yu 15 SAR Workshops Biennial Workshops to promote healthy regional collaboration and to share expertise Had two workshops –April 18 – 19, 2003 at UTA: ~40 participants –Sept. 25 – 26, 2003 at OU: 32 participants Each workshop had different goals and outcomes –Established SAR, RAC & IAC web pages and e-mail –Identified Institutional representatives –Enabled three additional IAC’s with MC production –Paired new institutions with existing ones

16 Nov. 13, 2003Status DØ Computing Effort DoE Site Visit, Jae Yu 16 SAR Strategy Setup all IAC’s with full DØ Software setup (DØRACE Phase 0 – IV) Install Condor (or PBS) batch control system on desktop farms or clusters Install McFarm MC Production control Produce MC events on IAC machines Install globus for monitoring information transfer Install SAM-Grid and interface McFarm to it Submit jobs through SAM/Grid and monitor them Perform analysis at individual’s desk

17 Nov. 13, 2003Status DØ Computing Effort DoE Site Visit, Jae Yu 17 SAR Software Status Up-to-date with DØ Releases McFarm MC Production control Condor or PBS as batch control Globus v2.xx for grid enabled communication –Globus & DOE SG Certificates obtained and installed SAM/Grid on two of the farms (UTA IAC farms)

18 Nov. 13, 2003Status DØ Computing Effort DoE Site Visit, Jae Yu 18 UTA Software for SAR McFarm Job control –All DØSAR institutions use this product for automated MC Production Ganglia resource monitoring –Contains 7 clusters (332 CPU’s), including Tata institute, India McFarmGraph: MC Job status Monitoring system using gridftp –Provides detailed information for a MC request McPerM: MC Farm Performance Monitoring

19 Nov. 13, 2003Status DØ Computing Effort DoE Site Visit, Jae Yu 19 Ganglia Grid Resource Monitoring 1 st SAR wrkshp

20 Nov. 13, 2003Status DØ Computing Effort DoE Site Visit, Jae Yu 20 Job Status Monitoring: McFarmGraph

21 Nov. 13, 2003Status DØ Computing Effort DoE Site Visit, Jae Yu 21 Farm Performance Monitor: McPerMMcPerM Increased Productivity

22 Nov. 13, 2003Status DØ Computing Effort DoE Site Visit, Jae Yu 22 UTA RAC and Its Status NSF MRI funded facility –Joint proposal of UTA HEP and CSE + UTSW Med. –2 HEP, 10 CSE and 2 UTSW Medical Core System (high throughput Research system) –CPU: 64 P4 Xeon 2.4GHz (total ~ 154 GHz ) –Memory & NIC: 1 GB/CPU & 1 Gbit/sec port each (total of 64 Gbytes) –Storage: 5TB Fiber Channel supported by 3 GFS servers (3Gbit/sec throughput) –Network: Faundary switch w/ 52 Gbit/sec + 24 100Mbit/sec ports Expansion system (high CPU cycle, large storage Grid system) –CPU: 100 P4 Xeon 2.6GHz (total ~ 260 GHz ) –Memory & NIC: 1 GB/CPU & 1 Gbit/sec port each (total of 100 Gbytes) –Storage: 60TB IDE RAID supported by 10 NFS servers –Network: 52 Gbit/sec The full facility went online on Oct. 31, 2003 Software installation in progress Plan to participate in SC2003 demo next week

23 Nov. 13, 2003Status DØ Computing Effort DoE Site Visit, Jae Yu 23 Just to Recall Two Years Ago…. Disk Server...... IDE-RAID Gbit Switch IDE Hard drives are ~$2.5/GByte Each set of IDE RAID array gives ~1.6TByte – hot swappable Can be configured to have up to 10-16TB in a rack Modest server can manage the entire system Gbit network switch provide high throughput transfer to outside world Flexible and scalable system Need an efficient monitoring and error recovery system Communication to resource management

24 Nov. 13, 2003Status DØ Computing Effort DoE Site Visit, Jae Yu 24 UTA DØRAC 100 P4 Xeon 2.6GHz CPU = 260 GHz 64TB of Disk space 84 P4 Xeon 2.4GHz CPU = 202 GHz 7.5TB of Disk space Total CPU: 462 GHz Total disk: 73TB Total Memory: 168Gbyte Network bandwidth: 54Gb/sec

25 Nov. 13, 2003Status DØ Computing Effort DoE Site Visit, Jae Yu 25 SAR Accomplishments Held two workshops and the third is planned All first generation institutions produce MC events using McFarm on desktop PC farms –Generated MC events: OU: 300k, LU: 250k, LTU: 150k, UTA: ~1.3M –Discovered additional resources Significant local expertise have been accumulated in running farms and producing MC events Produced several documents, including two DØ notes Hold regular bi-weekly meetings (VRVS) to keep up progress Working toward data re-processing

26 Nov. 13, 2003Status DØ Computing Effort DoE Site Visit, Jae Yu 26 SAR Computing Resources InstitutionsCPU (GHz)Storage (TB)People Cinvestav131.11F+? Langston1311F+1GA LTU25+120.5+0.51F+1PD+2GA KU12??1F+1PD(?) KSU401.21F+2GA OU36+27(OSCER)1.8 + 120(tape)4F+3PD+2GA Sao Paulo60+144(future)31F+Many UTA192312F+1.4PD+0.5C+ 3GA Total43040+120(tape)12F+6PD+10GA

27 Nov. 13, 2003Status DØ Computing Effort DoE Site Visit, Jae Yu 27 SAR Plans Four second generation IAC’s have been paired with four first generation institutions –Success is defined as: Regular production and delivery of MC events to SAM using McFarm Install SAM/’Grid and perform a simple SAM job –Add all these new IAC’s to ganglia, McFarmGraph and McPerM Discover and integrate more resources for DØ –Integrate OU’s OSCER cluster –Integrate other institution’s large, university-wide resources Move toward grid enabled regional physics analyses –Collaborators need to be educated to use the system

28 Nov. 13, 2003Status DØ Computing Effort DoE Site Visit, Jae Yu 28 Future Software Projects Preparation of UTA DØRAC equipment –MC Production (DØ is suffering from shortage of resources.) –Re-reconstruction –SAM/Grid McFarm –Integration of re-processing –Enhanced monitoring –Better error handling McFarm Interface to SAM/Grid (job_manager) –Initial script successfully tested for SC2003 demo Work with SAM-Grid team for monitoring database and integration of McFarm technology Improvement and maintenance of McFramGraph and McPerM Universal Graphical User Interface to Grid ( PHY PhD Student)

29 Nov. 13, 2003Status DØ Computing Effort DoE Site Visit, Jae Yu 29 SAR Physics Interests OU/LU: –EWSB/Higgs searches –Single top search –CPV / Rare decays in heavy flavors –SUSY LTU: –Higgs search –B-tagging UTA: –SUSY –Higgs searches –Diffractive physics Diverse topics but can define common samples

30 Nov. 13, 2003Status DØ Computing Effort DoE Site Visit, Jae Yu 30 Funding at SAR Hardware Support –UTA – RAC : NSF MRI –UTA – IAC : DoE + Local Totally independent of RAC resources Need to more hardware to adequately support desktop analyses utilizing RAC resources Software Support –Mostly UTA Local funding  Will run out this year!!! –Many tries for different sources but none worked We seriously need help to –Maintain the leadership in DØ Remote Computing –Maintain the leadership in grid computing –Realize the DØRAM and expeditious physics analyses

31 Tevatron Grid Framework: SAM-Grid DØ already has data delivery part of the Grid system (SAM) Project started in 2001 as part of the PPDG collaboration to handle DØ’s expanded needs. Current SAM-Grid team includes: –Andrew Baranovski, Gabriele Garzoglio, Lee Lueking, Dane Skow, Igor Terekhov, Rod Walker (Imperial College), Jae Yu (UTA), Drew Meyer (UTA), HyunWoo Kim (UTA) in Collaboration with U. Wisconsin Condor team. http://www-d0.fnal.gov/computing/grid UTA is working on developing an interface for McFarm to SAM-Grid This brings the entire SAR institutions + any institutions with McFarm into the DØGrid

32 Nov. 13, 2003Status DØ Computing Effort DoE Site Visit, Jae Yu 32 Fermilab Grid Framework (SAM-Grid) UTA

33 Nov. 13, 2003Status DØ Computing Effort DoE Site Visit, Jae Yu 33 UTA-FNAL CSE Master’s Student Exchange Program In order to establish usable Grid software in the DØ time scale, the project needs highly skilled software developers –FNAL cannot afford computer professionals –UTA - CSE department has 450 MS students  Many are highly trained but back at school due to economy –Students can participate in cutting-edge Grid computing topics in real-life situation –Students’ Master’s thesis become a well documented record of the work which lacks in many HEP computing projects The third generation students are at FNAL working on improvement of SAM – Grid and its implementation  two semester circulation period Previous two generations have made a significant impact to SAM – Grid –One of the four previous generation students is in PhD program at CSE –One at Wisconsin Condor team  Possibility to get into PhD –Two are at industry

34 Nov. 13, 2003Status DØ Computing Effort DoE Site Visit, Jae Yu 34 Impact to Education and Outreach UTA DØ Grid program graduated –Trained: 12 (10 MS + 1 Undergraduate) students –Graduated: 5 CSE Masters + 1 Under grad –CSE Grid Course: Many class projects on DØ Quarknet –UTA is one of the founding institutions of QuarkNet programs –Initiated TECOS project –Other School-top cosmic projects across the nation need storage and computing resources  QuarkNet Grid –Will be working with QuarkNet for data storage & eventual use of computing resources by teachers and students UTA Recently became a member of Texas grid (HiPCAT) –HEP is leading this effort –Strongly supported by the university –Expect significant increase in infrastructure, such as bandwidth

35 Nov. 13, 2003Status DØ Computing Effort DoE Site Visit, Jae Yu 35 Conclusions UTA DØ – Grid team has accomplished tremendously UTA played a leading role in DØ Remote Computing –MC production –Design of DØ Grid architecture –Implementation of the DØRAM DØ Southern Analysis Region is a great success –Four new institutions (3 US) are now MC production sites –Enabled exploitation of available intelligence and resources in an extremely distributed environments –Remote expertise being accumulated

36 Nov. 13, 2003Status DØ Computing Effort DoE Site Visit, Jae Yu 36 UTA – DØRAC is up and running  Software installation in progress –Soon to add significant resources to SAR and to DØ Sam-Grid interface to McFarm working  One step closer to establish a globalized grid UTA – FNAL MS student exchange program is very successful UTA DØ Grid computing program has significant impact to outreach and education UTA is the ONLY DØ US institution who’s been playing a leading role in DØ grid  Makes UTA unique The local support runs out this year!!  UTA needs support to maintain leadership in and support for DØ Remote Computing


Download ppt "Status of DØ Computing at UTA Introduction The UTA – DØ Grid team DØ Monte Carlo Production The DØ Grid Computing –DØRAC –DØSAR –DØGrid Software Development."

Similar presentations


Ads by Google