Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Hash-Join n Hybrid hash join m R: 71k rows x 145 B m S: 200k rows x 165 B m TPC-D lineitem, part n Clusters benefit from one-pass algorithms n IDISK.

Similar presentations


Presentation on theme: "1 Hash-Join n Hybrid hash join m R: 71k rows x 145 B m S: 200k rows x 165 B m TPC-D lineitem, part n Clusters benefit from one-pass algorithms n IDISK."— Presentation transcript:

1 1 Hash-Join n Hybrid hash join m R: 71k rows x 145 B m S: 200k rows x 165 B m TPC-D lineitem, part n Clusters benefit from one-pass algorithms n IDISK benefits from more processors, faster network n IDISK Speedups: m NCR: 1.2X m Compaq:5.9X 5.9X 1.2X

2 2 Other Uses for IDISK n Software RAID n Backup accelerator m High speed network connecting to tapes m Compression to reduce data sent, saved n Performance Monitor m Seek analysis, related accesses, hot data n Disk Data Movement accelerator m Optimize layout without using CPU, buses

3 3 IDISK App: Network attach web, files n Snap!Server: Plug in Ethernet 10/100 & power cable, turn on n 32-bit CPU, flash memory, compact multitasking OS, SW update from Web n Network protocols: TCP/IP, IPX, NetBEUI, and HTTP (Unix, Novell, M/S, Web) n 1 or 2 EIDE disks n 6GB $950,12GB $1727 (7MB/$, 14¢/MB) source: www.snapserver.com, www.cdw.com 9” 15”

4 4 Related Work CMU “Active Disks” >Disks, {>Memory, CPU speed, network} / Disk Apps Small functions e.g., scan UCSB “Active Disks” Medium functions e.g., image General Purpose UCB “Intelligent Disks” source: Eric Riedel, Garth Gibson, Christos Faloutsos,CMU VLDB ‘98; Anurag Acharya et al, UCSB T.R.

5 5 IDISK Summary n IDISK less expensive by 10X to 2X, faster by 2X to 12X? m Need more realistic simulation, experiments n IDISK scales better as number of disks increase, as needed by Greg’s Law n Fewer layers of firmware and buses, less controller overhead between processor and data n IDISK not limited to database apps: RAID, backup, Network Attached Storage,... n Near a strategic inflection point?

6 6 Messages from Architect to Database Community n Architects want to study databases; why ignored? m Need company OK before publish! (“DeWitt” Clause) m DB industry, researchers fix if want better processors m SIGMOD/PODS join FCRC? n Disk performance opportunity: minimize seek, rotational latency, ultilize space v. spindles n Think about smaller footprint databases: PDAs, IDISKs,... m Legacy code a reason to avoid virtually all innovations??? m Need more flexible/new code base?

7 7 Acknowledgments n Thanks for feedback on talk from M/S BARC (Jim Gray, Catharine van Ingen, Tom Barclay, Joe Barrera, Gordon Bell, Jim Gemmell, Don Slutz) and IRAM Group (Krste Asanovic, James Beck, Aaron Brown, Ben Gribstad, Richard Fromm, Joe Gebis, Jason Golbus, Kimberly Keeton, Christoforos Kozyrakis, John Kubiatowicz, David Martin, David Oppenheimer, Stelianos Perissakis, Steve Pope, Randi Thomas, Noah Treuhaft, and Katherine Yelick) n Thanks for research support: DARPA, California MICRO, Hitachi, IBM, Intel, LG Semicon, Microsoft, Neomagic, SGI/Cray, Sun Microsystems, TI

8 8 Questions? Contact us if you’re interested: email: patterson@cs.berkeley.edu http://iram.cs.berkeley.edu/

9 9 1970s != 1990s n Scan Only n Limited communication between disks n Custom Hardware n Custom OS n Invent new algorithms n Only for for databases n Whole database code n High speed communication between disks n Optional intelligence added to standard disk (e.g., Cheetah track buffer) n Commodity OS n 20 years of development n Useful for WWW, File Servers, backup

10 10 Stonebraker’s Warning “The history of DBMS research is littered with innumerable proposals to construct hardware database machines to provide high performance operations. In general these have been proposed by hardware types with a clever solution in search of a problem on which it might work.” Readings in Database Systems (second edition), edited by Michael Stonebraker, p.603

11 11 Grove’s Warning “...a strategic inflection point is a time in the life of a business when its fundamentals are about to change.... Let's not mince words: A strategic inflection point can be deadly when unattended to. Companies that begin a decline as a result of its changes rarely recover their previous greatness.” Only the Paranoid Survive, Andrew S. Grove, 1996

12 12 Clusters of PCs? n 10 PCs/rack = 20 disks/rack n 1312 disks / 20 = 66 racks, 660 PCs n 660 /16 = 42 100 Mbit Ethernet Switches + 9 1Gbit Switches n 72 racks / 4 = 18 UPS n Floor space: aisles between racks to access, repair PCs 72 / 8 x 120 = 1100 sq. ft. n HW, assembly cost: ~$2M n Quality of Equipment? n Repair? n System Admin.?

13 13 Today’s Situation: Microprocessor MIPS MPUs R5000R1000010k/5k n Clock Rate200 MHz 195 MHz1.0x n On-Chip Caches32K/32K 32K/32K 1.0x n Instructions/Cycle 1(+ FP)4 4.0x n Pipe stages55-71.2x n ModelIn-orderOut-of-order--- n Die Size (mm 2 ) 84 2983.5x m without cache, TLB32205 6.3x n Development (man yr..)603005.0x n SPECint_base955.78.81.6x

14 14 Potential Energy Efficiency: 2X-4X n Case study of StrongARM memory hierarchy vs. IRAM memory hierarchy  cell size advantages  much larger cache  fewer off-chip references  up to 2X-4X energy efficiency for memory m less energy per bit access for DRAM Memory cell area ratio/process: P6,  ‘164,SArm cache/logic : SRAM/SRAM : DRAM/DRAM 20-50:8-11:1

15 15 Today’s Situation: DRAM $16B $7B Intel: 30%/year since 1987; 1/3 income profit

16 16 MBits per square inch: DRAM as % of Disk over time source: New York Times, 2/23/98, page C3, “Makers of disk drives crowd even more data into even smaller spaces” 470 v. 3000 Mb/ sq. in. 9 v. 22 Mb/ sq. in. 0.2 v. 1.7 Mb/sq. in. 0% 5% 10% 15% 20% 25% 30% 35% 40% 19741980198619921998

17 17 What about I/O? n Current system architectures have limitations n I/O bus performance lags other components n Parallel I/O bus performance scaled by increasing clock speed and/or bus width m E.g.. 32-bit PCI: ~50 pins; 64-bit PCI: ~90 pins  Greater number of pins  greater packaging costs n Are there alternatives to parallel I/O buses for IRAM?

18 18 Serial I/O and IRAM n Communication advances: fast (Gbit/s) serial I/O lines [YankHorowitz96], [DallyPoulton96] m Serial lines require 1-2 pins per unidirectional link m Access to standardized I/O devices »Fiber Channel-Arbitrated Loop (FC-AL) disks »Gbit/s Ethernet networks n Serial I/O lines a natural match for IRAM n Benefits m Serial lines provide high I/O bandwidth for I/O-intensive applications m I/O BW incrementally scalable by adding more lines »Number of pins required still lower than parallel bus


Download ppt "1 Hash-Join n Hybrid hash join m R: 71k rows x 145 B m S: 200k rows x 165 B m TPC-D lineitem, part n Clusters benefit from one-pass algorithms n IDISK."

Similar presentations


Ads by Google