Download presentation
Presentation is loading. Please wait.
1
Workshop on Commodity-Based Visualization Clusters Learning From the Stanford/DOE Visualization Cluster Mike Houston, Greg Humphreys, Randall Frank, Pat Hanrahan
2
Workshop on Commodity-Based Visualization Clusters 2 Outline Stanford’s current cluster –Design decisions –Performance evaluation –Bottleneck evaluation Cluster “Landscape” –General classification –Bottleneck evaluation Stanford’s next cluster –Design goals –Research directions
3
Workshop on Commodity-Based Visualization Clusters 3 Stanford/DOE Visualization Cluster The Chromium Cluster
4
Workshop on Commodity-Based Visualization Clusters 4 Cluster Configuration (Jan. 2000) Cluster: 32 graphics nodes + 4 server nodes Computer: Compaq SP750 –2 processors (800 MHz PIII Xeon, 133MHz FSB) –i840 core logic (big issue for vis-clusters) Simultaneous fast graphics and networking Network: 64-bit, 66 MHz PCI Graphics: AGP-4x –256 MB memory –18GB SCSI 160 disk (+ 3*36GB on servers) Graphics (Sept. 2002) –16 NVIDIA GeForce3 w/ DVI (64 MB) –16 NVIDIA GeForce4 TI4200 w/ DVI (128 MB) Network –Myrinet 64-bit, 66 MHz (LANai 7)
5
Workshop on Commodity-Based Visualization Clusters 5 Graphics Evaluation NVIDIA GeForce3 –25 MTri/s triangle rate observed –680 MPix/s fill rate observed NVIDIA GeForce4 –60 MTri/s triangle rate observed –800 MPix/s fill rate observed Read Pixels performance –35 MPix/s (140 MB/s) RGBA –22 MPix/s (87 MB/s) Depth Draw Pixels performance –45 MPix/s (180 MB/s) RGBA –21 MPix/s (85 MB/s) Depth
6
Workshop on Commodity-Based Visualization Clusters 6 Network Evaluation Myrinet LANai 7 PCI64A boards –Theoretical Limit: 160 MB/s –142 MB/s observed peak under Linux –~100 MB/s observed sustained under Linux ServerNet not chosen –Driver support –Large switching infrastructure required Gigabit Ethernet –Performance and scalability concerns
7
Workshop on Commodity-Based Visualization Clusters 7 Myrinet Issues Fairness: Clients starved of network resources –Implemented credit scheme to minimize congestion Lack of buffering in switching fabric –Causes poor performance in high load conditions –Open issue Partitioned Cluster Unpartitioned Cluster
8
Workshop on Commodity-Based Visualization Clusters 8 i840 Chipset Evaluation 66MHz 64bit PCI performance not full speed: –210 MB/s PCI read (40% of theoretical peak) –288 MB/s PCI write (54% of theoretical peak) –Combined read/write ~121 MB/s AGP –Fast Writes / Side Band Addressing unstable under Linux
9
Workshop on Commodity-Based Visualization Clusters 9 Sort-First Performance Configuration –Application runs application on client –Primitives distributed to servers Tiled Display –4x3 @ 1024x768 –Total resolution: 4096x2304, 9 Megapixel Quake 3 –50 fps Atlantis –450 fps
10
Workshop on Commodity-Based Visualization Clusters 10 Sort-Last Performance Configuration –Parallel rendering on multiple nodes –Composite to final display node Volume Rendering on 16 nodes –1.57 GVox/s [Humphreys 02] –1.82 GVox/s (tuned) 9/02 –256x256x1024 volume 1 rendered twice 1 Data Courtesy of G. A Johnson, G.P.Cofer, S.L Gewalt, and L.W. Hedlund from the Duke Center for In Vivo Microscopy (an NIH/NCRR National Resource)
11
Workshop on Commodity-Based Visualization Clusters 11 Cluster Accomplishments Development Platform –WireGL –Chromium Cluster configuration replicated Interactive Performance –256x512x1024 volume @ 15fps –9 Megapixel Quake3 @ 50fps
12
Workshop on Commodity-Based Visualization Clusters 12 Sources of Bottlenecks Sort-First –Packing speed (processor) –Primitive distribution (network and bus) –Rendering (processor and graphics chip) Sort-Last –Rendering (graphics chip) –Composite (network, bus, and read/draw pixels)
13
Workshop on Commodity-Based Visualization Clusters 13 Bottleneck Evaluation – Stanford Sort-First: Processor and Network Sort-Last: Network and Read/Draw
14
Workshop on Commodity-Based Visualization Clusters 14 The Landscape of Graphics Clusters Many Options –Low End <$2500/node –Mid End ~$5000/node –High End >$7500/node Tradeoffs –Different bottlenecks –Price/Performance –Scalability –Usage Evaluation –Based off of published benchmarks and specs
15
Workshop on Commodity-Based Visualization Clusters 15 Cluster Interconnect Options Many choices –GigE ~100 MB/s –Myrinet 2000 (http://www.myrinet.com)http://www.myrinet.com 245MB/s –SCI/Dolphin (http://www.dolphinics.com)http://www.dolphinics.com 326 MB/s –Quadrics (http://www.quadrics.com)http://www.quadrics.com 340 MB/s Future options –10 GigE –Infiniband –HyperTransport
16
Workshop on Commodity-Based Visualization Clusters 16 Low End General Definition –Single CPU –Consumer Mainboard –Integrated Graphics –High Speed commodity network Example Node Configuration –Nvidia NForce2 –AMD Athlon 2400+ –512 MB DDR –GigE and 10/100 –1U rack chassis –Estimated Price: $1500
17
Workshop on Commodity-Based Visualization Clusters 17 Bottleneck Evaluation – Low End Bus/Network limited
18
Workshop on Commodity-Based Visualization Clusters 18 Mid End General Definition –Dual Processor –“Workstation” mainboard –High performance bus 64-bit PCI or PCI-X –High Speed Commodity / Low end cluster interconnect –High-End consumer graphics board Example Node Configuration –Intel i860 –Dual Intel P4 Xeon 2.4GHz –2GB RDRAM –ATI Radeon 9700 –GigE onboard + Myrinet 2000 –2U rack chassis –Estimated Price: $4000
19
Workshop on Commodity-Based Visualization Clusters 19 Bottleneck Evaluation – Mid End Sort-First: Network limited Sort-Last: Read/Draw and Network limited
20
Workshop on Commodity-Based Visualization Clusters 20 High End General Definition –Dual or Quad processor –Cutting edge bus PCI-X, HyperTransport, PCI Enhanced –High Speed Commodity/ High end cluster interconnect –“Professional” graphics board –RAID system Example Node Configuration –ServerWorks GC-WS –Dual P4 Xeon 2.6GHz –Nvidia Quadro4 900XGL –4GB DDR –GigE onboard + Infiniband –Estimated Price: $7500
21
Workshop on Commodity-Based Visualization Clusters 21 Bottleneck Evaluation – High End Sort-First: Well balanced Sort-Last: Read/Draw limited
22
Workshop on Commodity-Based Visualization Clusters 22 Balanced System is Key Only as fast as slowest component –Spend money where it matters!
23
Workshop on Commodity-Based Visualization Clusters 23 Goals for Next Cluster Performance –Sort-Last 5 GVox/s 1 GTri/s –Sort-First at 4096x2304 Quake3 @ >100fps Research –Remote visualization –Time-varying datasets –Compositing
24
Workshop on Commodity-Based Visualization Clusters 24 What we plan to build 16 Node cluster, 1U nodes Mainboard chipsets –Intel Placer –ServerWorks GC-WS –AMD Hammer Memory –2-4GB Graphics Chip –Nvidia NV30 –ATI R300/350 Interconnect –Infiniband, Quadrics Disk –IDE RAID or SCSI
25
Workshop on Commodity-Based Visualization Clusters 25 Continuing Chipset Issues Why do chipsets perform so poorly? –“Workstation” Intel i860 –215 MB/s read (40% of theoretical) –300 MB/s write (56% of theoretical) AMD 760MPX –300 MB/s read (56% of theoretical) –312 MB/s write (59% of theoretical) –“Server” ServerWorks ServerSet III LE –423 MB/s read (79% of theoretical) –486 MB/s write (91% of theoretical) Why can’t a “server” have an AGP slot? Performance numbers from http://www.conservativecomputer.com
26
Workshop on Commodity-Based Visualization Clusters 26 Ongoing Bottlenecks Readback performance –Will be fixed “soon” –Hardware compositing? Chipset Performance –Achieve fraction of theoretical –Need faster busses in commodity chipsets Network Performance –Scalability –Fast is VERY expensive
27
Workshop on Commodity-Based Visualization Clusters 27 Conclusions What we still need –More vendors –More chipsets –More performance Graphics Clusters are getting better –Chipsets –Interconnects –Form factor –Processing –Graphics Chips Things are really starting to get interesting!
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.