Grid Computing: Harnessing Underutilized Resources UNCW Department of Chemistry & Biochemistry Seminar September 24, 2004 Ned H. Martin.

Slides:



Advertisements
Similar presentations
1 Copyright © 2002 Pearson Education, Inc.. 2 Chapter 1 Introduction to Perl and CGI.
Advertisements

High Performance Computing Course Notes Grid Computing.
1 of 6 WINDOWS COMPARISION WINDOWSYEARSFEATURES WINDOWS The first version of windows provided a new software environment for developing and running.
Windows Computers Akash Patel.
1 Distributed, Internet and Grid Computing. 2 Distributed Computing Current supercomputers are too expensive ASCI White (#1 in TOP500) costs more than.
1 Teaching Grid Computing across North Carolina and Beyond Dr. Clayton Ferner University of North Carolina Wilmington Dr. Barry Wilkinson University of.
Introduction to Computers QUME Some objectives  define the term, computer, and discuss four basic computer operations  understand the terms hardware.
1 Workshop 20: Teaching a Hands-on Undergraduate Grid Computing Course SIGCSE The 41st ACM Technical Symposium on Computer Science Education Friday.
1 IEEE SoutheastCon 2005, April 8-10, 2005, Ft. Lauderdale, Florida. JXPL: An XML-based Scripting Language for Workflow Execution in a Grid Environment.
Outline.1 Grid Computing Spring 2007 Tuesday/Thursday 11:00 am - 12:15 pm Instructors Dr. Barry Wilkinson University of North Carolina, Charlotte and Dr.
How Clients and Servers Work Together. Objectives Learn about the interaction of clients and servers Explore the features and functions of Web servers.
The History of Computers By: Casey Walsh. Introduction Computer history can be broken down into five generations of change. Computer history can be broken.
Computer Systems CS208. Major Components of a Computer System Processor (CPU) Runs program instructions Main Memory Storage for running programs and current.
1st Project Introduction to HTML.
Distributed Computer Architecture Benjamin Jordan, Kevin Cone, Jason Bradley.
Chapter ONE Introduction to HTML.
Operating System.
SOFTWARE.
Operating Systems Chapter 4.
Scientific Computing on Smartphones David P. Anderson Space Sciences Lab University of California, Berkeley April 17, 2014.
Lesson 4 Computer Software
1.Training and education 2.Consulting 3.Travel 4.Hardware 5.Software Which of the following is not included in a firm’s IT infrastructure investments?
TeraGrid Gateway User Concept – Supporting Users V. E. Lynch, M. L. Chen, J. W. Cobb, J. A. Kohl, S. D. Miller, S. S. Vazhkudai Oak Ridge National Laboratory.
Lesson 6 Operating Systems and Software
Open Science Grid For CI-Days Internet2: Fall Member Meeting, 2007 John McGee – OSG Engagement Manager Renaissance Computing Institute.
The Computer Systems By : Prabir Nandi Computer Instructor KV Lumding.
Computer Concepts – Illustrated 8 th edition Unit A: Computer and Internet Basics.
DISTRIBUTED COMPUTING
CMPF124:Basics Skills for Knowledge Workers Introduction to Windows OS.
GridFE: Web-accessible Grid System Front End Jared Yanovich, PSC Robert Budden, PSC.
WebMO: A Web-Based Interface for MOPAC Jordan R. Schmidt and William F. Polik Department of Chemistry, Hope College, Holland, MI
1 What is the history of the Internet? ARPANET (Advanced Research Projects Agency Network) TCP/IP (Transmission Control Protocol/Internet Protocol) NSFNET.
1 Programming in C. 2 The Abacus  The abacus, a simple counting aid, may have been invented in Babylonia (now Iraq) in the fourth century B.C.
Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
Institute For Digital Research and Education Implementation of the UCLA Grid Using the Globus Toolkit Grid Center’s 2005 Community Workshop University.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Ned H. Martin Department of Chemistry and Biochemistry University of North Carolina at Wilmington Gaussian 03 Calculations using GridNexus.
Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
CSCI 1101 INTRODUCTION TO COMPUTERS 5. Basic Computer Architecture.
Marco Cattaneo - DTF - 28th February 2001 File sharing requirements of the physics community  Background  General requirements  Visitors  Laptops 
© Paradigm Publishing, Inc. 4-1 Chapter 4 System Software Chapter 4 System Software.
TeraGrid Gateway User Concept – Supporting Users V. E. Lynch, M. L. Chen, J. W. Cobb, J. A. Kohl, S. D. Miller, S. S. Vazhkudai Oak Ridge National Laboratory.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) Giuseppe Andronico INFN Sez. CT / Consorzio COMETA Beijing,
7. Grid Computing Systems and Resource Management
CONTENT  Introduction Introduction  Operating System (OS) Operating System (OS) Operating System (OS)  Summary Summary  Application Software Application.
Internet Infrastructure Min Ding Smeal College of Business Administration Pennsylvania State University.
Microsoft Office 2008 for Mac – Illustrated Unit D: Getting Started with Safari.
Active-HDL Server Farm Course 11. All materials updated on: September 30, 2004 Outline 1.Introduction 2.Advantages 3.Requirements 4.Installation 5.Architecture.
SEPTEMBER 8, 2015 Computer Hardware 1-1. HARDWARE TERMS CPU — Central Processing Unit RAM — Random-Access Memory  “random-access” means the CPU can read.
Operating System Basics. Outline The User Interface Running Programs Managing Files Managing Hardware Utility Software.
Introduction to Computer
Deploying Regional Grids Creates Interaction, Ideas, and Integration
Clouds , Grids and Clusters
Popular Operating System Chapter 8
Project 1 Introduction to HTML.
Recap: introduction to e-science
Constructing a system with multiple computers or processors
Chapter 16: Distributed System Structures
Chapter 1 – Introduction to Computers
Chapter 3 Hardware and software 1.
Grid Computing Done by: Shamsa Amur Al-Matani.
Constructing a system with multiple computers or processors
Information Technology Ms. Abeer Helwa
Introduction to Computer Concept
Unit# 5: Internet and Worldwide Web
Chapter 3 Hardware and software 1.
Presentation transcript:

Grid Computing: Harnessing Underutilized Resources UNCW Department of Chemistry & Biochemistry Seminar September 24, 2004 Ned H. Martin

Outline Definition of Grid computing A brief history of computing Growth of computing power Rationale for Grid computing How a Grid works Examples of Grid projects Grid computing in NC Limitations of Grid computing UNCW Grid initiative: GridNexus What’s next?

Definition of Grid Computing Grid computing is a form of distributed computing that involves coordinating and controlled sharing of diverse computing, applications, data, storage, or network resources across dynamic and geographically dispersed multi-institutional virtual organizations. A user of Grid computing does not need to have the data and the software on the same computer, and neither must be on the user’s home (login) computer.

Grid Computing The term Grid computing suggests a computing paradigm similar to an electric power grid - a variety of resources contribute power into a shared "pool" for many consumers to access on an as-needed basis.

Background of Grid Computing The idea of Grid computing resulted from the confluence of three developments: –The proliferation of largely unused computing resources (especially desktop computers) –Their greatly increased cpu speed in recent years –The widespread availability of fast, universal network connections (the Internet).

Brief History of Computing 1943: "I think there is a world market for maybe 5 computers." Thomas Watson, chairman of IBM 1947: Testudo: The very first computer in the Netherlands; the relay-based machine was 5 m long. Adding took 30 s and multiplication 45 s.

Brief History of Computing 1949: "Computers in the future may weigh no more than 1.5 tons." -Popular Mechanics, forecasting the relentless march of science 1957: "I have traveled the length and breadth of this country and talked with the best people, and I can assure you that data processing is a fad that won't last out the year." -The business book editor for Prentice Hall.

Brief History of Computing 1977: "There is no reason anyone would want a computer in their home." -Ken Olson, president, chairman and founder of Digital Equipment Corp. 1980: "DOS addresses only 1 Megabyte of RAM because we cannot imagine any applications needing more." -Microsoft on the development of DOS. 1981: "640k ought to be enough for anybody." -Bill Gates

Brief History of Computing 1979: Introduction of the 8086 chip by Intel; used a 16 bit processor; too expensive, so an 8 bit version was developed (the 8088), which was chosen by IBM for the first IBM PC; available clock frequencies up to 10 MHz. It had an instruction set of about 300 operations. At introduction the fastest processor was the 8 MHz version which achieved 0.8 MIPs (0.8 x 10 6 instructions per second) and contained 29,000 transistors.

Brief History of Computing 1982: Intel released. It supported clock frequencies of up to 20 MHz. At introduction the fastest version ran at 12.5 MHz, achieved 2.7 MIPs and contained 134,000 transistors. 1985: Intel DX released. It supported clock frequencies of up to 33 MHz. At the date of release the fastest version ran at 20 MHz and achieved 6.0 MIPs. It contained 275,000 transistors.

Brief History of Computing 1989: Intel DX released by Intel. It contained the equivalent of about 1.2 million transistors. At the time of release the fastest version ran at 25 MHz and achieved up to 20 MIPs. Later versions had clock speeds up to 100 MHz. 1993: Intel Pentium released. At that time it was only available in 60 & 66 MHz versions which achieved up to 100 MIPs, with over 3.1 million transistors.

Brief History of Computing 1995: Pentium Pro released. At introduction it achieved a clock speed of up to 200 MHz. It achieved 440 MIPs and contained 5.5 million transistors - this was nearly 2400 times as many as the first microprocessor in and capable of 70,000 times as many instructions per second. 2004: Pentium 4 chips available with clock speeds of up to 3.6 GHz providing 11,356 MIPS and containing 125,000,000 transistors. 2005: 500,000,000 transistors !!!

Growth of Computing Power ts/

Rationale for Grid Computing The proliferation of largely unused computing resources (especially desktop computers, of which 152 million were sold in 2003). Their greatly increased cpu speed in recent years (now >3 GHz). The widespread availability of fast, universal network connections (the Internet).

Rationale for Grid Computing High performance computers (formerly called supercomputers) are very expensive to buy and maintain. Much of the enhancement of computing power recently has come through the application of mulltiple cpus to a problem (e.g., NCSC had a 720 processor IBM parallel computer). Many computing tasks relegated to these (especially massively parallel) computers could be performed by a “divide and conquer” strategy using many more, although slower, processors as are available on a Grid.

How a Grid Works The term "grid computing" suggests a computing paradigm similar to an electric power grid - a variety of resources contribute power into a shared "pool" for many consumers to access on an as-needed basis Ideally the user does not know or care where the computing operation is being performed; the process is invisible to the user. Middleware handles security, authentication, authorization, resource selection and routing of input and output seamlessly.

Examples of Grid Projects DNet (distributed.net) GRID.ORG (anti-cancer ligand screening) IBM Smallpox cure Entropia.orgCERN

Grid Projects: –A large-scale search through data gathered by radiotelescopes in P.R. for evidence of extraterrestrial life –Involved more than 3 million computers averaging about 14 TeraFLOPS, or 14 trillion floating point operations per second, –Utilized over 500,000 years of processing time in the past year and a half.

Grid Projects: DNet DNet (distributed.net) –Began in 1997 as the first general-purpose distributed computing network on the Internet –Highly successful in bringing individuals together to complete cryptographic challenges via a distributed environment. –Equivalent to more than 160,000 PII 266Mhz computers working 24 hours a day, 7 days a week, 365 days a year! –The core distributed.net development team joined United Devices in 2000.

Grid Projects: GRID.ORG The United Devices Cancer Research Project (GRID.ORG) will advance research to uncover new cancer drugs through the combination of chemistry, computers, and specialized software. The research centers on proteins that have been determined to be a possible target for cancer therapy. Through a process called "virtual screening", LigandFit docking software by Accelrys identifies molecules that interact with these proteins, and determines which ones have a high likelihood of being developed into a drug. In the first year and a half, over 3.5 million drug candidates were screened using over a million personal computers.

Grid Projects: Smallpox Cure Smallpox cure –To help find a cure for smallpox, IBM and a group of partners harnessed the processing power of 2 million idle PCs. They then screened 35 million drug compounds and smallpox proteins to find the most effective cure.

Grid Projects: Entropia In 1997, Entropia applied idle computers worldwide to problems of scientific interest. In just two years, this network grew to encompass 30,000 computers with an aggregate speed of over one teraflop per second. Among its several scientific achievements is the identification of the largest known prime number.

Grid Projects: CERN CERN –By 2005, detectors at the Large Hadron Collider at CERN, the European Laboratory for Particle Physics will produce several petabytes of data per year - a million times the storage capacity of a desktop computer –Just the basic data analysis requires 20 tflops/s of computing power (the fastest supercomputer produces 3 teraflops per second). –more sophisticated analyses will need orders of magnitude more computing power

Grid Computing in NC NCBioGrid ( an outgrowth of the High Performance Computing and Data Storage Focus Group of the NC Genomics and Bioinformatics Consortium Genomics and Bioinformatics Consortiumwww.ncbiogrid.org/NC Genomics and Bioinformatics Consortium NC Computing Grid – now includes 7 universities plus MCNC; UNCW will be joining soon UNCW Grid – started as a grid for UNCW bioinformatics/genomics research, expanded now into chemistry and business applications.

Limitations of Grid Computing Currently, although efforts are being made to standardize protocols (e.g., Globus toolkit and Avaki), interacting with Grid services remains a complex process. Most of the existing applications that access Grid services require the user to type cumbersome commands, often using a command-line interface. Creating new clients and services requires programming in a language such as C or Java and using a host of libraries for interacting with Open Grid Services Infrastructure, Grid Security Infrastructure, Web Services Description Language and other standards.

Limitations of Grid Computing These tools and techniques are useful to a select group of computing specialists; however the only way to make Grid resources accessible to a wide range of users is to provide a relatively simple graphical user interface (GUI). The UNCW Grid project proposes to develop a Graphical Grid User Interface that is easy to use and can access a wide range of applications. Our hope is to create an interface to Grid computing that accomplishes what Internet browsers (Netscape and Internet Explorer) did to open up the WWW.

UNCW Grid Initiative: GridNexus This initiative grew in part out of a need for HPC resources following the closure of the NCSC in June 2003, coupled with the availability of faculty with software programming expertise and others with computing applications that could benefit from use of a Grid. The UNC-OP funded UNCW’s proposal for $557,634 over two years to develop Grid portals (GUI middleware to allow users to access software on computers on a Grid).

UNCW Grid Initiative: GridNexus The UNCW Grid Computing Project is a two-year collaborative project among a multi-discipline, multi-investigator core research team at UNCW and several discipline-focused researchers at partner institutions: NCSU, WCU, NCCU, ECU, and CFCC. The research areas and institutional interests of this project are: Advanced Grid Software Development (UNCW) Computational Chemistry (UNCW and ECU) Bioinformatics (UNCW, NCSU, and NCCU) Combinatorics (UNCW) Business Computing (UNCW and NCCU) Education and Training (UNCW, WCU, CFCC) This project proposes to develop a Grid interface that is easy-to-use and may be used by a wide-range of applications and users. We have developed an innovative graphical user interface (GUI) for grid applications. In particular, we introduced a new scripting language (JXPL) designed for web-based services, a GUI for creating scripts, and have demonstrated the use of these tools with grid services.

UNCW Grid Initiative: GridNexus UNCW’s initiative is unique in that it involves undergraduate students as the main players in the development of the Grid portal (GUI). Undergraduate computer science students are partnered with faculty and students in application areas (chemistry, biology, business) to develop graphical front-ends to access services (programs) on computers on the Grid. Grid portals are being developed for the two computational chemistry programs (Gaussian 03 and DMol ) most often used in research by our faculty and students.

Resources of UNCW Grid Beowulf cluster – 16 PIII processors in Computer Sciences Department Fire and FireDev servers plus disc storage devices PQS Quantum Cube – 8 cpu cluster with PQS and Gaussian 03 computational chemistry software, plus TCP-Linda environment. An 8 processor IBM blade cluster with 0.5 tB disk storage will be added soon. Other computers may be added, including the possibility of using all computing lab computers, or possibly even all faculty/staff computers (when not in use).

Remote Computing before Grid 1.Telnet to remote computer, login (separate login and password for each user account and for each computer) 2.FTP input data file from local computer to remote machine (requires login, password) 3.Create and edit an input file for job (using vi or other text editor) 4.Create a.job file, edit it if necessary 5.Select queue based on # cpus and time required; submit.job file 6.Check progress of calculation by periodically: telnet to remote machine; look for file that indicates completion of job. 7.FTP output file to local computer 8.Open output file in text editor, examine numerical data 9.Open output file in a commercial program on local computer to visualize structure Now, to submit a quantum chemistry calculation Now, to submit a quantum chemistry calculation to a remote computer, e.g., at NCSU, one must: to a remote computer, e.g., at NCSU, one must:

Remote Computing on a Grid 1.Login to Grid (single user login and password to access ANY Grid resource) 2.Select a data file and job parameters from pull-down menus; click to submit (.input and.job file is created automatically by Grid middleware, job is submitted automatically to an appropriate available computer) 3.Upon completion of computation, output file is automatically sent to local computer to visualize structure (which can also be automated). In the future, using Grid middleware to submit a quantum chemistry calculation to a remote computer at NCSU: In the future, using Grid middleware to submit a quantum chemistry calculation to a remote computer at NCSU:

Development of a Grid Portal The objective is to make accessing HPC resources (wherever they may be located) easy to scientists who are not computer savvy. Most computation involves doing various mathematical operations on a dataset. A GUI approach is employed, in which the user, after a single login that checks authentication and authorization, can create a ‘workflow’ of functions/operations graphically by connecting boxes dragged from a series of lists of options, then applying that series of steps to a dataset. Such a ‘workflow’ can be saved for subsequent application to another dataset.

Development of a Grid Portal Job submission: Ideally in a grid, the grid middleware should select the ‘best’ resource – those computers that are available, capable, and have the software needed to handle the job. The user need not select – nor know – where the computation is taking place. In fact, the job may even be passed from one computer to another for various aspects of the calculation. The output is returned to the user’s workstation or account, rather than the user having to access and download the output file from a remote computer.

UNCW’s Grid Portal: GridNexus 3 main application types: genomics/ bioinformatics, business and chemistry Chemistry resources on UNCW Grid: –PQS Quantum Cube – 8 cpu cluster with PQS and Gaussian 03 computational chemistry software and TCP-Linda –Beowulf Cluster – 16 cpu cluster with Gaussian 03 computational chemistry software and TCP-Linda –Soon to be added: IBM blade server with 8 or 16 cpus; Gaussian 03 will be installed on it. –Java script for file transformation…e.g., to convert HyperChem file into a Gaussian 03 input file

Quantum Chemistry Portal A GUI is under development to allow a user to select the following from pull-down menus within ‘boxes’ that are linked into a ‘workflow’: –Data input file –Transform to another file type if necessary –Level of calculation: HF, DFT, MP2, etc. –Basis set: 6-31G(d,p), G(2d,p), etc. –Number of processors needed –CPU time requested –Keywords: opt, nmr, freq, pop=npa, etc. –Charge and multiplicity

Design of UNCW Grid GUI Select from pull-down menus in categories: Data sets (Windows Explorer-like file browser) File Type Transformer Level of Theory Basis Set CPU Time# Processors Keywords Chg. & Multiplicity SubmitVisualize

Design of UNCW Grid GUI Select from pull-down menus in categories: Data sets (Windows Explorer-like file browser) File Type Transformer Level of Theory Basis Set CPU Time# Processors Keywords Chg. & Multiplicity SubmitVisualize HF MP2 DFT

Design of UNCW Grid GUI Functions can be grouped into sets called “workflows” for repetitive operations: Data sets (Windows Explorer-like file browser) File Type Transformer Level of Theory Basis Set CPU Time# Processors Keywords Chg. & Multiplicity SubmitVisualize

Design of UNCW Grid GUI Preferences among choices can be saved as part of the workflow: Data sets (Windows Explorer-like file browser) File Type Transformer HF 6-31G(d) NMR 0,1 SubmitVisualize

Design of UNCW Grid GUI The result is a much more simplified process for the user: Calculate, Visualize Select data, Transform it

Design of UNCW Grid GUI Multiple repeatedly used sets of commands (‘workflows’) can be saved A user’s preferences within a workflow (e.g., level of theory, basis set, # processors, cpu time requested, keywords, charge and multiplicity) could be saved also (future design feature). In the future a user may need only to specify a data set (file) and link it to a pre-set ‘workflow’ to initiate a calculation!

Chemistry Portal Initially, the portal will operate under Linux Next it will be ported to operate under Windows Eventually, computations will be submitted online through web browsers This could be accomplished from any devise (e.g., pc, laptop, or even a cell phone) that can access the Internet.

JXPL Language UNCW Mathematics Faculty Dr. Jeff Brown with help from Computer Science Faculty Dr. Clayton Ferner and recent graduate Mike Wood developed a new java-base programming language called JXPL. JXPL is the language used in the GridNexus project, and is a language commonly used with web services and grid services The advantages of JXPL include: –It is readily extensible –Interfaces easily with (LISP-like) data structures in GUI –JXPL scripts are written in XML, a commonly used language

What’s Next? More “filters” to transform data need to be developed and tested Fancier graphics may be added to the GUIs More computational nodes will be added to the Grid. The eventual goal is to include all NC institutions of higher learning. Extend Grid to include more software applications Extend Grid services to other disciplines Include industry and businesses as users and developers.

References: OP_Grid_Project%20Overview.htm OP_Grid_Project%20Overview.htm

Acknowledgments UNC-OP for funding the UNCW Grid Initiative Proposal: “Fostering Undergraduate Research Partnerships through a Graphical User Environment for the North Carolina Computing Grid,” Dr. Ron Vetter, PI “Fostering Undergraduate Research Partnerships through a Graphical User Environment for the North Carolina Computing Grid,” Dr. Ron Vetter, PI –Co-PIs: Dr. Rebecca S. Boston, NCSU; Dr. Anthony Wilkinson, WCU; Dr. Marilyn McClelland, NCCU; Dr. Libero Bartolotti, ECU; Ms. Judy Porter, CFCC. –UNCW Participants: Computer Science: Dr. Ron Vetter, Dr. Clayton Ferner, Dr. David Berman, and Dr. Tom Hudson. Information Technology Systems: Dr. Bob Tyndall and Mr. Bobby Miller. Mathematics and Statistics: Dr. Jeff Brown. Chemistry and Biochemistry: Dr. Ned H. Martin. Biological Sciences: Dr. Ann Stapleton Information Systems and Operations Management: Dr. Tom Janicki. –UNCW Computer Science students working on the Chemistry portal: Tristan Carland, Jerry Martin, Andrew Martin