A High-Performance Scalable Graphics Architecture Daniel R. McLachlan Director, Advanced Graphics Engineering SGI.

Slides:



Advertisements
Similar presentations
Distributed Processing, Client/Server and Clusters
Advertisements

COMPUTER GRAPHICS CS 482 – FALL 2014 NOVEMBER 10, 2014 GRAPHICS HARDWARE GRAPHICS PROCESSING UNITS PARALLELISM.
GPU and PC System Architecture UC Santa Cruz BSoE – March 2009 John Tynefield / NVIDIA Corporation.
Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.
RealityEngine Graphics Kurt Akeley Silicon Graphics Computer Systems.
© Chinese University, CSE Dept. Software Engineering / Software Engineering Topic 1: Software Engineering: A Preview Your Name: ____________________.
GPU System Architecture Alan Gray EPCC The University of Edinburgh.
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
Prof. Srinidhi Varadarajan Director Center for High-End Computing Systems.
Graphics Hardware and Software Architectures
Cache Coherent Distributed Shared Memory. Motivations Small processor count –SMP machines –Single shared memory with multiple processors interconnected.
IMGD 4000: Computer Graphics in Games Emmanuel Agu.
Distributed Processing, Client/Server, and Clusters
Silicon Graphics, Inc. Poster Presented by: SGI Proprietary Technologies for Breakthrough Research Rosario Caltabiano North East Higher Education & Research.
Reconfigurable Application Specific Computers RASCs Advanced Architectures with Multiple Processors and Field Programmable Gate Arrays FPGAs Computational.
DDDDRRaw: A Prototype Toolkit for Distributed Real-Time Rendering on Commodity Clusters Thu D. Nguyen and Christopher Peery Department of Computer Science.
Parallel Rendering Ed Angel
Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Machine.
CPE 731 Advanced Computer Architecture Multiprocessor Introduction
Fast Isosurface Visualization on a High-Resolution Scalable Display Wall Adam Finkelstein Allison Klein Kai Li Princeton University Sponsors: DOE, Intel,
Cluster Computing Slides by: Kale Law. Cluster Computing Definition Uses Advantages Design Types of Clusters Connection Types Physical Cluster Interconnects.
NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 1 Cluster Archtectures and.
Parallel Graphics Rendering Matthew Campbell Senior, Computer Science
1 MASTERING (VIRTUAL) NETWORKS A Case Study of Virtualizing Internet Lab Avin Chen Borokhovich Michael Goldfeld Arik.
Sort-Last Parallel Rendering for Viewing Extremely Large Data Sets on Tile Displays Paper by Kenneth Moreland, Brian Wylie, and Constantine Pavlakos Presented.
What is Concurrent Programming? Maram Bani Younes.
Shilpa Seth.  Centralized System Centralized System  Client Server System Client Server System  Parallel System Parallel System.
Computer System Architectures Computer System Software
LECTURE 9 CT1303 LAN. LAN DEVICES Network: Nodes: Service units: PC Interface processing Modules: it doesn’t generate data, but just it process it and.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved DISTRIBUTED SYSTEMS.
Parallel Rendering 1. 2 Introduction In many situations, standard rendering pipeline not sufficient ­Need higher resolution display ­More primitives than.
2 Systems Architecture, Fifth Edition Chapter Goals Discuss the development of automated computing Describe the general capabilities of a computer Describe.
Computer Graphics Graphics Hardware
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
LOGO Service and network administration Storage Virtualization.
Stream Processing Main References: “Comparing Reyes and OpenGL on a Stream Architecture”, 2002 “Polygon Rendering on a Stream Architecture”, 2000 Department.
Parallel Rendering. 2 Introduction In many situations, a standard rendering pipeline might not be sufficient ­Need higher resolution display ­More primitives.
Headline in Arial Bold 30pt HPC User Forum, April 2008 John Hesterberg HPC OS Directions and Requirements.
CLUSTER COMPUTING TECHNOLOGY BY-1.SACHIN YADAV 2.MADHAV SHINDE SECTION-3.
A Framework for Visualizing Science at the Petascale and Beyond Kelly Gaither Research Scientist Associate Director, Data and Information Analysis Texas.
Operating System Issues in Multi-Processor Systems John Sung Hardware Engineer Compaq Computer Corporation.
A SEMINAR ON 1 CONTENT 2  The Stream Programming Model  The Stream Programming Model-II  Advantage of Stream Processor  Imagine’s.
By Chi-Chang Chen.  Cluster computing is a technique of linking two or more computers into a network (usually through a local area network) in order.
| nectar.org.au NECTAR TRAINING Module 4 From PC To Cloud or HPC.
Outline Why this subject? What is High Performance Computing?
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
Rehab AlFallaj.  Network:  Nodes: Service units: PC Interface processing Modules: it doesn’t generate data, but just it process it and do specific task.
Tackling I/O Issues 1 David Race 16 March 2010.
Scientific Computing Goals Past progress Future. Goals Numerical algorithms & computational strategies Solve specific set of problems associated with.
3/12/2013Computer Engg, IIT(BHU)1 CUDA-3. GPGPU ● General Purpose computation using GPU in applications other than 3D graphics – GPU accelerates critical.
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
Cluster computing. 1.What is cluster computing? 2.Need of cluster computing. 3.Architecture 4.Applications of cluster computing 5.Advantages of cluster.
Background Computer System Architectures Computer System Software.
Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.
Chapter 16 Client/Server Computing Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.
Constructing a system with multiple computers or processors 1 ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson. Jan 13, 2016.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
Application of General Purpose HPC Systems in HPEC
VirtualGL.
CLUSTER COMPUTING Presented By, Navaneeth.C.Mouly 1AY05IS037
Constructing a system with multiple computers or processors
sv7: Blazing Visualization on a Commodity Cluster
Computer-Generated Force Acceleration using GPUs: Next Steps
Parallel I/O System for Massively Parallel Processors
Introduction to Multiprocessors
Constructing a system with multiple computers or processors
The performance requirements for DSP applications continue to grow and the traditional solutions do not adequately address this new challenge Paradigm.
Chapter 4 Multiprocessors
RADEON™ 9700 Architecture and 3D Performance
Presentation transcript:

A High-Performance Scalable Graphics Architecture Daniel R. McLachlan Director, Advanced Graphics Engineering SGI

Growth in Model Sizes Worldwide Production of Information Exabytes Source: Gartner Images courtesy of Parametric Technology Corporation; Photodisc, and Magic Earth, LLC

Problems Are Getting Increasingly Complex Over Time Bumper Bumper, hood, engine, wheelsEntire car Crash dummy Organ damage E-crash dummy Images courtesy of EAI; SCI Institute, NLM, Theoretical Biophysics Group of the Beckman Institute at UIUC; Livermore Software Technology Corporation

The Complexity of the Simple Images courtesy of Procter & Gamble Potato Chips Diapers

Graphic Cards Are Outpacing PC Architecture and Bandwidth Performance Gap Graph based on relative scale.

Performance 2003 Visualization Low cost Fast simple polygons Single screen image quality 1992 Extreme resolution Absolute visual quality VAN Solving complex problems Dense data sets Visualization Breaks The Cognitive Barrier For Better Decisions Addressing Real Needs Images courtesy of Advantage CFD; SCI institute; NLM; Theoretical Biophysics Group of the Beckman Institute at UIUC; Laboratory for Atmospheres, NASA Goddard Space Flight Center; Donghoon Shin, Art Center College of Design, Nvidia Corporation; ATI Technologies, Inc; and Nintendo Co., Ltd. Clusters Graphics

Cluster Comparison Pros Cheap Industry standard High display list performance Good for “embarrassingly parallel” problems Can potentially scale to 1000s of processors Cons Cumbersome to program High administration costs Few applications for visualization Difficult to scale for large problems Difficult to dynamically load balance Lack of software productivity tools Often requires data replication Reliability Limited to 2GB memory space

Traditional Clusters SGI ® NUMAflex™ node + OS... Fast NUMAflex™ interconnect Global shared memory node + OS... Commodity interconnect mem node + OS node + OS node + OS node + OS node + OS node + OS node + OS node + OS What is shared memory? All nodes operate on one large shared memory space, instead of each node having its own small memory space Shared memory is high-performance All nodes can access one large memory space efficiently, so complex communication and data passing between nodes aren’t needed Big data sets fit entirely in memory; less disk I/O is needed Shared memory is cost-effective and easy to deploy It requires less memory per node, because large problems can be solved in big shared memory Simpler programming means lower tuning and maintenance costs The Benefits of Shared Memory 1-2 CPUs per node < 64 CPUs per node

How SGI ® Onyx ® Enables the Role System at a Glance Scalable Graphics I/O Scalable Disk I/OScalable Resolution Appropriate Delivery Scalable Rendering Scalable Data CompositorNetworkCompositorNetwork Scalable Graphics Scalable Compute and Large Memory SGI Onyx Large Data Sets Scalable Interaction

Moving from a fixed rendering path… Images courtesy of Pratt and Whitney Canada and Magic Earth, LLC …to a scalable and programmable rendering path. Application accelerators Geometry Silicon Graphics ® Onyx4™ UltimateVision™ Changing the Application Paradigm

Scaling A Shift in Pipe Paradigm 3. Time-based decomposition Even more powerful in combination All modes can be used separately or combined in any number of ways Data courtesy of DaimlerChrysler, Images courtesy of MAK Visible Human public data set 1. Screen-based decomposition 2. Eye-based decomposition 4. Data-based decomposition

Multi-Tier Composition Composite output of multiple compositors e.g., first layer does 2D composition, second layer does anti-aliasing Visual Serving Composited output sent to workstations for viewing and/or editing Compositor Flexibility

Silicon Graphics ® Onyx4™ UltimateVision™ System Architecture 2 Graphics Pipes CPU 8GB RAM Memory Controller Memory Controller SGI ® NUMA scalability Standard I/O or 2 Graphics Pipes Optional

Silicon Graphics ® Onyx4™ UltimateVision™ Solving bigger and more complex problems World’s most scalable visualization system Up to 32 GPUs in an SSI architecture World-leading computational capability Up to 64 CPUs per node, scalable to 1024 processors Solves system b/w limitations of PCs and clusters Up to 8 NUMAlink 3 connections to a single shared memory pool New-generation programmable graphics architecture OpenGL Shading Language Conclusion