1 3/26/02 Midrange Computing Workshop Sandy Merola Gary Jung March 26, 2002.

Slides:



Advertisements
Similar presentations
Total Utility Management Services, LLC is committed to helping your organization make the best informed energy decisions with decades of cost-proven results.
Advertisements

IT INFRASTRUCTURE AND EMERGING TECHNOLOGIES
Overview of Midrange Computing Resources at LBNL Gary Jung March 26, 2002.
MCITP Guide to Microsoft Windows Server 2008 Server Administration (Exam #70-646) Chapter 11 Windows Server 2008 Virtualization.
Presented by Scalable Systems Software Project Al Geist Computer Science Research Group Computer Science and Mathematics Division Research supported by.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
Managing the Information Technology Resource Jerry N. Luftman
IS112 – Chapter 1 Notes Computer Organization and Programming Professor Catherine Dwyer 2003.
Building a Cluster Support Service Implementation of the SCS Program UC Computing Services Conference Gary Jung SCS Project Manager
1 IS112 – Chapter 1 Notes Computer Organization and Programming Professor Catherine Dwyer Fall 2005.
Chapter Thirteen Maintaining and Upgrading a Network.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
Chapter 13 Organizing Information System Resources MIS Department Centralization and Decentralization Outsourcing Computer Facilities and Services.
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
VAP What is a Virtual Application ? A virtual application is an application that has been optimized to run on virtual infrastructure. The application software.
Building E-Commerce Applications and Infrastructure.
Section 11.1 Identify customer requirements Recommend appropriate network topologies Gather data about existing equipment and software Section 11.2 Demonstrate.
2 Technology and Knowledge Why is technological knowledge important? Jobs, finance, personal, family, movies, car, education (other than computer science),
HEPiX Catania 19 th April 2002 Alan Silverman HEPiX Large Cluster SIG Report Alan Silverman 19 th April 2002 HEPiX 2002, Catania.
Design Completion A Major Milestone System is Presented to Users and Management for Approval.
Organizing Information Technology Resources
1 Midrange Computing Working Group Process and Goals Background The MRC Working Group Phase I: - Assessment and Findings - Recommendations for a path forward.
Local Area Networks (LAN) are small networks, with a short distance for the cables to run, typically a room, a floor, or a building. - LANs are limited.
Chapter 6 Operating System Support. This chapter describes how middleware is supported by the operating system facilities at the nodes of a distributed.
PCGRID ‘08 Workshop, Miami, FL April 18, 2008 Preston Smith Implementing an Industrial-Strength Academic Cyberinfrastructure at Purdue University.
2 Systems Architecture, Fifth Edition Chapter Goals Describe the activities of information systems professionals Describe the technical knowledge of computer.
SCSC 311 Information Systems: hardware and software.
Goals How can programs using MRCs help each other? How can ITSD help MRC-using programs? Is there utility in creating a shared resource?
1.1 Operating System Concepts Introduction What is an Operating System? Mainframe Systems Desktop Systems Multiprocessor Systems Distributed Systems Clustered.
Alessandra CiocioAugust 10, CSAC meeting1 Mid-Range Computing Working Group Report CSAC and ITSD are working in partnership to determine the value.
Alessandra CiocioMay 11, CSAC meeting1 Mid-Range Computing Working Group Report CSAC and ITSD are working in partnership to determine the value.
Cloning Windows NT Systems Mainly based on experiences at RAL and Oxford.
BNL Tier 1 Service Planning & Monitoring Bruce G. Gibbard GDB 5-6 August 2006.
Module 4: Systems Development Chapter 14: Design And Implementation.
Alessandra CiocioApril 6, CSAC meeting1 Mid-Range Computing Working Group Report CSAC and ITSD are working in partnership to determine the value.
VMware vSphere Configuration and Management v6
1 Recommendations Now that 40 GbE has been adopted as part of the 802.3ba Task Force, there is a need to consider inter-switch links applications at 40.
The LBNL Perceus Cluster Infrastructure Next Generation Cluster Provisioning and Management October 10, 2007 Internet2 Fall Conference Gary Jung, SCS Project.
Adoption and Use of Electronic Medical Records (in Federally Qualified Health Centers) and Supporting an ASP Community Care Network of Virginia, Inc.
© 2014 IBM Corporation Does your Cloud have a Silver Lining ? The adoption of Cloud in Grid Operations of Electric Distribution Utilities Kieran McLoughlin.
Cis339 Chapter 2 The Origins of Software 2.1 Modern Systems Analysis and Design Fifth Edition.
A. CiocioITSD/CSAC Retreat March 3, Scientific Cluster Support Program SCS Steering Committee Report.
Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.
Introduction TO Network Administration
Cloud Computing Lecture 5-6 Muhammad Ahmad Jan.
Building and managing production bioclusters Chris Dagdigian BIOSILICO Vol2, No. 5 September 2004 Ankur Dhanik.
PDSF and the Alvarez Clusters Presented by Shane Canon, NERSC/PDSF
CS4315A. Berrached:CMS:UHD1 Introduction to Operating Systems Chapter 1.
MRC Recap/Progress/Path forward March – MRC Workshop –Centralized services are of interest –Not clear whether there is lab-wide requirement for MRC June.
Tackling I/O Issues 1 David Race 16 March 2010.
VDI AND DAAS – SAY WHAT?!? Bob Marshall, MD MPH MISM FAAFP Faculty, Valley Family Medicine Residency Faculty, DoD Clinical Informatics Fellowship.
Alessandra CiocioCSAC meeting - November 9, Milestones Completion of Phase I - MRC WG Document on initial assessment and findings - Recommendations.
Dr. Hussein Al-Bahadili Faculty of Information Technology Petra University Securing E-Transaction 1/24.
Building PetaScale Applications and Tools on the TeraGrid Workshop December 11-12, 2007 Scott Lathrop and Sergiu Sanielevici.
Component 8/Unit 1bHealth IT Workforce Curriculum Version 1.0 Fall Installation and Maintenance of Health IT Systems Unit 1b Elements of a Typical.
Servelite - Complete IT Solutions. Servelite IT solutions specialize in providing Home solutions and Business solutions. We focus upon delivering quality.
Operations WBS Dictionary Integrated Control Systems Henrik Carling ESS/ICS Date:
Chapter 1 Computer Technology: Your Need to Know
IC3 GS3 Standard COMPUTING FUNDAMENTALS Module
Network Video Switching Control System Presentation
LQCD Computing Operations
Physical Architecture Layer Design
Chapter 1: Introduction
Chapter 2 The Origins of Software
King Saud University College of Engineering IE – 462: “Industrial Information Systems” Fall – 2018 (1st Sem H) Chapter 2 Information System.
Purchasing a Solution Chapter 9.
PLANNING A SECURE BASELINE INSTALLATION
Agenda The current Windows XP and Windows XP Desktop situation
DSS Architecture MBA 572 Craig K. Tyran Fall 2002.
Lecture Topics: 11/1 Hand back midterms
Presentation transcript:

1 3/26/02 Midrange Computing Workshop Sandy Merola Gary Jung March 26, 2002

2 3/26/02 Approach Survey results Options: –Support –Shared computational resources Open discussion Firm next steps

3 3/26/02 Survey: Received 43 Responses Environmental Energy Technologies7 AFRD7 Nuclear Science5 Physics5 NERSC4 Physical Biosciences4 Chemical Sciences3 Life Sciences3 Material Sciences3 Earth Sciences2 No response from ALS and Genome

4 3/26/02 Type of Research Experimental NS, HEP 12 Simulation/Modeling EETD, AFRD, ESD, PBD, LSD 12 Theory CSD, AFRD, MS, EETD 9

5 3/26/02 Current Primary Computing System Linux, Mac, SGI, Solaris, Compaq Alpha desktops 26 PDSF Physics, NS 9 NERSC IBM SP Utilization4 Linux Clusters2 18 Processor IBM Power 3 Cluster1 Cray T3E1

6 3/26/02 Impact of Increased Computing Resources Analyze larger volume of data16 Analyze experimental data faster19 Perform larger simulations20 Perform faster simulations27 Perform simulations with higher resolutions19 Implement new alogrithms resulting in improved simulations 18 Almost all Physics, NS, PBD would use high performance computing to do larger volumes and analyze data faster

7 3/26/02 Form of Computing That Would Be Most Useful Medium Cluster16 Medium size SMP15 High Performance Desktop4 Other1

8 3/26/02 Critical Elements In A New System Memory size25 Processor clockspeed25 Storage20 Network connectivity16 I/O14 Tightly coupled processors9

9 3/26/02 Source of Software Written by group27 Freely available8 Commercial6

10 3/26/02 Midrange Computing Readiness Ready now17 Will be ready shortly7 Will be ready mid-term7 Will be ready long-term3 Unsure8

11 3/26/02 How Parallelizable Is Your Code? Already done12 Easy5 Moderately difficult5 Difficult6 Inconceivable1 Unnecessary, serial OK11 Unsure3 Memory Model – most respondents indicated either distributed or shared could be accommodated; many didn’t know

12 3/26/02 Planned Procurements Linux cluster13 Expansion of current clusters2 SMP consideration2 No change3 Unsure23

13 3/26/02 Support Prepurchase consulting17 Vendor negotiating expertise13 Facilities20 Configuration expertise25 HW maintenance22 Ongoing support25 Application porting support8

14 3/26/02 Comments Quality of support Cost of support (reasonable) Leveraging NERSC Networking infrastructure In the case of a pooled or institutional usage, it is important to determine the appropriate size of the shared resource So, now we discuss support options

15 3/26/02 Pertinent Issues for Support Standardization Cannot fully realize economies of scale if clusters are different More difficult to manage a cluster built by someone else Scale ITSD currently supports 2 small clusters and is willing to develop a service offering Support of larger clusters would require the Laboratory to develop the expertise

16 3/26/02 Pre-Purchase Consulting Deliver the basics for RFP What can we provide? Advice on small to mid size clusters up to 32 nodes (more complex at > 32 nodes – e.g., network switch latency issues) ITSD might setup a small cluster to provide a “try before you buy” service Cost analysis of purchase, timeline, and effort Specifying systems HW configuration or components Specifying peripherals such as racks, UPS, kvm terminal switches Specifying cluster distribution Estimating software licensing costs Recommendations for data storage systems Vendor recommendations

17 3/26/02 Computer Room Space What is the advantage of a centralized Facility? Machine room environment: Access to electrical infrastructure Proper air conditioning Access to high speed local area & wide area networks Secure card key access

18 3/26/02 Facilities: Examples of Costs One time costs: Transportation, seismic bracing, electrical $1,000 LBNL Network Drop$400 per drop Facilities Coordination (1.5 – 2 days)$1,500 Recurring Costs: Housing costs in either 50A-2109 or 50B-1275 Computer Room per rack (including space & electricity) $225/rack/mo

19 3/26/02 Initial Set Up and Configuration Major set up tasks: Assembly of racks and equipment HW assembly and network wiring Build master node, set up file systems Install PGI compilers Integration of 3 rd party compilers (Portland Group) Build Myrinet drivers/kernel modules Build client image Install client node file systems Example: Estimate of effort for a 10 node system with Myrinet, PGI compilers: 3 days

20 3/26/02 Hardware Maintenance PC hardware tends to be less reliable, especially on larger clusters Important to get a responsible vendor Users with larger clusters should consider purchasing spares

21 3/26/02 Systems & Security Administration What does CIS provide: Upgrades Updating of nodes Security/SSH Troubleshooting Crash recovery User account admin Network admin – sendmail, NFS Installation of 3 rd party software Software license management Scheduler Monitoring of nodes

22 3/26/02 Advantages of Institutional Set Up and Support Better coverage, expertise Expertise, knowledge Economy of scale Best practice Standardization Can mean days instead of weeks for troubleshooting Cyber protection and emergency response

23 3/26/02 Cost Factors What are the cost factors in providing ongoing systems admin? # of cluster nodes # of users Is the system used for code development or production running?

24 3/26/02 Effort What is the level of effort to provide system admin support? Minimal Level Standard Level 10 node cluster w/ 1 master node1.5 days/mo3 days/mo node cluster w/ 1 master node 2 days4 days node cluster w/ 1 master node 2.5 days/mo5 days/mo * Current effort costs are $110/hr or $880/day

25 3/26/02 Feasibility Some issues may not be feasible for us to address (outside our core competency at this time) Determining if code is suitable to run on a cluster Defining classes of problems – some may run better depending on cluster configuration Porting issues: How do we marry code to cluster? Formal procurement/negotiations

26 3/26/02 Shared Computational Resource 20 respondents indicated they may be interested in pooling resources with another project to gain access to a larger system or lower support costs Same respondents would also be interested in pooling with several projects Approximately 15 of 17 respondents who are considering procurement, stated a preference for a cluster

27 3/26/02 Shared Resource Options 1.No offering at this time Acceptable 2.Provide systems support as a gradual mechanism to create a shared resource 3.Procure an institutional MRC 4.Build on an existing computational resource alvarez, PDSF, or division owned

28 3/26/02 Shared Resource A shared mid-range computing resource must be: Appropriate Sustainable This implies: Compatible user requirements Advantage to the programs Affordable acquisition Sustainable financial model

29 3/26/02 Issues There must be an added-value that results from sharing before divisions/projects would be willing to give up control of owning/running their own systems Cheaper Expertise Environment Fungibility of resources Cybersecurity If ITSD were to facilitate this, it must build expertise to provide added-value $ Time

30 3/26/02 Issues Under any approach, there is an institutional startup cost for shared resource A combined and shared resource could be managed to provide a more powerful resource than the same capability owned and controlled individually Bky Lab management must see an institutional advantage in order to allocate overhead dollars

31 3/26/02 Growing A Shared Resource Systems support may be a gradual means of creating an shared resource Fungible resource could allow building/sharing of a larger machine given future divisional investments Lab overhead might help with this, if a large institutional advantage can be recognized

32 3/26/02 Procure an Institutional MRC A number of divisions could contribute to the acquisition and startup costs of a new MRC

33 3/26/02 Build On Existing Computational Resources Discussion: What could be the role of PDSF? What could be the role of alvarez? Is there an existing divisional owned computer that could serve as the foundation for growing a shared resource? Other pertinent questions?

34 3/26/02 Path Forward ITSD will provide a specific acquisition and/or support proposal at your invitation If there is sufficient interest, ITSD will facilitate a working group that will result in the creation of a shared resource