The Distributed Data Interface in GAMESS Brett M. Bode, Michael W. Schmidt, Graham D. Fletcher, and Mark S. Gordon Ames Laboratory-USDOE, Iowa State University.

Slides:



Advertisements
Similar presentations
Program Analysis and Tuning The German High Performance Computing Centre for Climate and Earth System Research Panagiotis Adamidis.
Advertisements

HIGH PERFORMANCE ELECTRONIC STRUCTURE THEORY Mark S. Gordon, Klaus Ruedenberg Ames Laboratory Iowa State University BBG.
CMPE 421 Parallel Computer Architecture MEMORY SYSTEM.
Types of Parallel Computers
Information Technology Center Introduction to High Performance Computing at KFUPM.
History of Distributed Systems Joseph Cordina
IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)
Sinfonia: A New Paradigm for Building Scalable Distributed Systems Marcos K. Aguilera, Arif Merchant, Mehul Shah, Alistair Veitch, Christonos Karamanolis.
Hitachi SR8000 Supercomputer LAPPEENRANTA UNIVERSITY OF TECHNOLOGY Department of Information Technology Introduction to Parallel Computing Group.
Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Machine.
Cross Cluster Migration Remote access support Adianto Wibisono supervised by : Dr. Dick van Albada Kamil Iskra, M. Sc.
07/14/08. 2 Points Introduction. Cluster and Supercomputers. Cluster Types and Advantages. Our Cluster. Cluster Performance. Cluster Computer for Basic.
CPP Staff - 30 CPP Staff - 30 FCIPT Staff - 35 IPR Staff IPR Staff ITER-India Staff ITER-India Staff Research Areas: 1.Studies.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.
Parallel Communications and NUMA Control on the Teragrid’s New Sun Constellation System Lars Koesterke with Kent Milfeld and Karl W. Schulz AUS Presentation.
Applying Twister to Scientific Applications CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.
Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.
Shilpa Seth.  Centralized System Centralized System  Client Server System Client Server System  Parallel System Parallel System.
1 Titanium Review: Ti Parallel Benchmarks Kaushik Datta Titanium NAS Parallel Benchmarks Kathy Yelick U.C. Berkeley September.
RSC Williams MAPLD 2005/BOF-S1 A Linux-based Software Environment for the Reconfigurable Scalable Computing Project John A. Williams 1
Online Systems Status Review of requirements System configuration Current acquisitions Next steps... Upgrade Meeting 4-Sep-1997 Stu Fuess.
Tools and Utilities for parallel and serial codes in ENEA-GRID environment CRESCO Project: Salvatore Raia SubProject I.2 C.R. ENEA-Portici. 11/12/2007.
Energy Profiling And Analysis Of The HPC Challenge Benchmarks Scalable Performance Laboratory Department of Computer Science Virginia Tech Shuaiwen Song,
Distributed Shared Memory: A Survey of Issues and Algorithms B,. Nitzberg and V. Lo University of Oregon.
Seaborg Cerise Wuthrich CMPS Seaborg  Manufactured by IBM  Distributed Memory Parallel Supercomputer  Based on IBM’s SP RS/6000 Architecture.
Jan 3, 2001Brian A Cole Page #1 EvB 2002 Major Categories of issues/work Performance (event rate) Hardware  Next generation of PCs  Network upgrade Control.
Slide 1 MIT Lincoln Laboratory Toward Mega-Scale Computing with pMatlab Chansup Byun and Jeremy Kepner MIT Lincoln Laboratory Vipin Sachdeva and Kirk E.
SUMA: A Scientific Metacomputer Cardinale, Yudith Figueira, Carlos Hernández, Emilio Baquero, Eduardo Berbín, Luis Bouza, Roberto Gamess, Eric García,
GU Junli SUN Yihe 1.  Introduction & Related work  Parallel encoder implementation  Test results and Analysis  Conclusions 2.
Advanced Computer Architecture 0 Lecture # 1 Introduction by Husnain Sherazi.
Planned AlltoAllv a clustered approach Stephen Booth (EPCC) Adrian Jackson (EPCC)
CMAQ Runtime Performance as Affected by Number of Processors and NFS Writes Patricia A. Bresnahan, a * Ahmed Ibrahim b, Jesse Bash a and David Miller a.
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
Achieving Scalability, Performance and Availability on Linux with Oracle 9iR2-RAC Grant McAlister Senior Database Engineer Amazon.com Paper
Srihari Makineni & Ravi Iyer Communications Technology Lab
Parallelization of Classification Algorithms For Medical Imaging on a Cluster Computing System 指導教授 : 梁廷宇 老師 系所 : 碩光通一甲 姓名 : 吳秉謙 學號 :
PARALLEL COMPUTING overview What is Parallel Computing? Traditionally, software has been written for serial computation: To be run on a single computer.
Integrating New Capabilities into NetPIPE Dave Turner, Adam Oline, Xuehua Chen, and Troy Benjegerdes Scalable Computing Laboratory of Ames Laboratory This.
Towards Exascale File I/O Yutaka Ishikawa University of Tokyo, Japan 2009/05/21.
Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
In Large-Scale Cluster Yutaka Ishikawa Computer Science Department/Information Technology Center The University of Tokyo
Computing Resources at Vilnius Gediminas Technical University Dalius Mažeika Parallel Computing Laboratory Vilnius Gediminas Technical University
On High Performance Computing and Grid Activities at Vilnius Gediminas Technical University (VGTU) dr. Vadimas Starikovičius VGTU, Parallel Computing Laboratory.
Diskless Checkpointing on Super-scale Architectures Applied to the Fast Fourier Transform Christian Engelmann, Al Geist Oak Ridge National Laboratory Februrary,
Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.
HPC Components for CCA Manoj Krishnan and Jarek Nieplocha Computational Sciences and Mathematics Division Pacific Northwest National Laboratory.
By Chi-Chang Chen.  Cluster computing is a technique of linking two or more computers into a network (usually through a local area network) in order.
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder.
The 2001 Tier-1 prototype for LHCb-Italy Vincenzo Vagnoni Genève, November 2000.
PROOF tests at BNL Sergey Panitkin, Robert Petkus, Ofer Rind BNL May 28, 2008 Ann Arbor, MI.
Nov 14, 08ACES III and SIAL1 ACES III and SIAL: technologies for petascale computing in chemistry and materials physics Erik Deumens, Victor Lotrich, Mark.
What is it and why do we need it? Chris Ward CS147 10/16/2008.
Background Computer System Architectures Computer System Software.
Multicore Applications in Physics and Biochemical Research Hristo Iliev Faculty of Physics Sofia University “St. Kliment Ohridski” 3 rd Balkan Conference.
Constructing a system with multiple computers or processors 1 ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson. Jan 13, 2016.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
PERFORMANCE OF THE OPENMP AND MPI IMPLEMENTATIONS ON ULTRASPARC SYSTEM Abstract Programmers and developers interested in utilizing parallel programming.
Group Members Hamza Zahid (131391) Fahad Nadeem khan Abdual Hannan AIR UNIVERSITY MULTAN CAMPUS.
Roy Taragan Shaham Kenat
Grid Computing.
CRESCO Project: Salvatore Raia
Chapter 17: Database System Architectures
Overview of big data tools
Constructing a system with multiple computers or processors
Designing a PC Farm to Simultaneously Process Separate Computations Through Different Network Topologies Patrick Dreher MIT.
Types of Parallel Computers
Presentation transcript:

The Distributed Data Interface in GAMESS Brett M. Bode, Michael W. Schmidt, Graham D. Fletcher, and Mark S. Gordon Ames Laboratory-USDOE, Iowa State University 10/7/99

2 b General Atomic and Molecular Electronic Structure System First principles - fully quantum mechanicalFirst principles - fully quantum mechanical Created from other programs in ~1980Created from other programs in ~1980 Developed by Dr. Mark Gordon’s research group since 1982 with Dr. Michael Schmidt as the principle developer.Developed by Dr. Mark Gordon’s research group since 1982 with Dr. Michael Schmidt as the principle developer. Parallelization begin in 1991Parallelization begin in 1991 Emphasis on Distributed memory systemsEmphasis on Distributed memory systems Currently includes methods for treating 1-atom to several hundred atomsCurrently includes methods for treating 1-atom to several hundred atoms What is GAMESS?

3 Partial list of capabilities C = Uses disk storage D = Minimal disk usage P = Parallel execution

4 First Generation Parallel Code b Parallel communications were performed using either: TCGMSGTCGMSG Vendor supplied MPIVendor supplied MPI b Parallel version was usually a slightly modified version of the sequential code

5 IBM-SUR cluster b 22 IBM RS/ P-260: –Dual 200MHz Power3 CPUs –4 Mb of Level 2 cache –1 GByte of RAM –18 GBytes fast local disks –Jumbo Frames Gig Ethernet –Integrated Fast-Ethernet b Fast Ethernet Switch to all b 3x9 port Gigabit Switches

6 Gigabit Performance on the IBM 43P-260 Cluster

7 Test Molecule b b Ti(C 5 H 5 ) 2 C 2 H 4 SiHCl 3 b b Basis Set 6-31G(d,p) on C and H. SBKJC ECP on Si, Ti, and Cl extended with 1 d-type polarization function on Si and Cl. 345 total basis functions

8 Parallel SCF b Very good scaling dependant on the size of the molecule. b Large systems show nearly linear scaling through 256 nodes

9 Successes and Limitations b SCF methods scale very well b Most methods run in parallel b Good use is made of aggregate CPU and disk resources. b MP2 and MCSCF methods scale to only a few (8-32) nodes b The aggregate memory is not utilized so jobs are still limited by the memory size of one node.

10 Second Generation Methods b New methods should take advantage of the aggregate memory of a parallel system Implies a higher communication demandsImplies a higher communication demands Many to many messaging profileMany to many messaging profile b Methods should scale to hundreds of nodes (at least) b Demanding local storage needs

11 The Distributed Data Interface (DDI) DDI provides the core functions needed to treat a portion of the memory on each node as part of a global shared array.

12 DDI b Runs on top of: MPI (MPI-2 preferred)MPI (MPI-2 preferred) TCP/IP socketsTCP/IP sockets b Lightweight - Provides only the functionality needed by GAMESS b Is not intended as a general purpose library. b Does optimize for mixed SMP and distributed memory systems

13 New MP2 implementation b Uses DDI to utilize the aggregate memory of the parallel machine at the expense of communications b Trades some symmetry in the MP2 equations for better parallel scalability Requires more memory than the sequential versionRequires more memory than the sequential version Is slower than the sequential version on 1 CPUIs slower than the sequential version on 1 CPU

14 MP2 Scalability

15 Conclusions b DDI provides a scalable way of taking advantage of the global memory of a parallel system b The new MP2 code demonstrates code written specifically for parallel execution without replacing the sequential version.

16 Future Work b DDI needs further work to enhance the features and increase robustness, or possibly needs to be replaced with a more general library such as the GA tools from PNNL. b The global shared memory approach is being applied to many other parts of GAMESS to increase scalability.

17 Thanks! b David Halstead b Guy Helmer For $: b IBM Corp. for an SUR grant (of 15 Workstations) b DOE MICS program (interconnects and 7 workstations) b Air Force OSR (long term dev. Funding) b DOD CHSSI program (improved parallelization)