Download presentation
Presentation is loading. Please wait.
Published byLaura Sanders Modified over 9 years ago
1
© Stavros Harizopoulos 2006 Performance Tradeoffs in Read-Optimized Databases Stavros Harizopoulos MIT CSAIL joint work with: Velen Liang, Daniel Abadi, and Sam Madden massachusetts institute of technology
2
© Stavros Harizopoulos 2006 massachusetts institute of technology2 Read-optimized databases 45 … 37 Joe … Sue 1 … 2 column stores 1 Joe 45 … … … 2 Sue 37 row stores Sybase IQ MonetDB CStore SQL Server DB2 Oracle Materialized views, multiple indices, compression Read optimizations: How does column-orientation affect performance?
3
© Stavros Harizopoulos 2006 massachusetts institute of technology3 Rows vs. columns column datarow data 1 Joe 45 2 Sue 37 … … … single file project Joe 45 12…12… Joe Sue 45 37 … … 3 files Joe 45 reconstruct Joe 45 Study performance tradeoffs solely in data storage seek
4
© Stavros Harizopoulos 2006 massachusetts institute of technology4 Performance study Methodology –Built storage manager from scratch –Sequential scans –Analyze CPU, disk, memory Findings –Columns are generally more I/O efficient –Competing traffic favors columns –Conditions where columns are CPU-constrained –Conditions where rows are MemBW-constrained
5
© Stavros Harizopoulos 2006 massachusetts institute of technology5 Talk outline System architecture Workload and Experiments Analysis Conclusions
6
© Stavros Harizopoulos 2006 massachusetts institute of technology6 System architecture Block-iterator operators –Single-threaded, C++, Linux AIO No buffer pool –Use filesystem, bypass OS cache Compression Dense-pack 60% full 100% full
7
© Stavros Harizopoulos 2006 massachusetts institute of technology7 Compression methods Dictionary Bit-pack –Pack several attributes inside a 4-byte word –Use as many bits as max-value Delta –Base value per page –Arithmetic differences … ‘low’ … … ‘high’ … … ‘low’ … … ‘normal’ … … 00 … … 10 … … 00 … … 01 …
8
© Stavros Harizopoulos 2006 massachusetts institute of technology8 Storage engine S SELECT name, age WHERE age > 40 apply predicate(s) Joe 45 … S S #POS 45 #POS … Joe 45 … apply predicate #1 row scannercolumn scanner age name
9
© Stavros Harizopoulos 2006 massachusetts institute of technology9 Platform 3.2GHz CPUL2RAM 1MB 1GB 180 MB/sec 3.2 GB/sec DISKS direct IO 100ms read 10ms seek L2 cache prefetching read 128 bytes (striped) prefetching:
10
© Stavros Harizopoulos 2006 massachusetts institute of technology10 Workload LINEITEM (wide) –60m rows → 9.5 GB ORDERS (narrow) –60m rows → 1.9 GB Query 150 bytes50 bytes 32 bytes12 bytes SELECT a1, a2, a3, … WHERE a1 yields variable selectivity
11
© Stavros Harizopoulos 2006 massachusetts institute of technology11 Wide tuple: 10% selectivity selected bytes per tuple time (sec) Large prefetch hides disk seeks in columns Row Row (CPU only) Column (CPU only) Column 25B10B69B int 4B text char 1B
12
© Stavros Harizopoulos 2006 massachusetts institute of technology12 Wide tuple: 10% sel. (CPU) time (sec) row store # attributes selected column store Row-CPU suffers from memory stalls
13
© Stavros Harizopoulos 2006 massachusetts institute of technology13 Column-CPU efficiency with lower selectivity Wide tuple: 10% sel. (CPU) 0.1% # attributes selected column store time (sec) row store
14
© Stavros Harizopoulos 2006 massachusetts institute of technology14 Narrow tuple: 10% selectivity Memory stalls disappear in narrow tuples Compression: similar to narrow (not shown) time (sec) selected bytes per tuple # attributes selected row storecolumn store
15
© Stavros Harizopoulos 2006 massachusetts institute of technology15 Varying prefetch size No prefetching hurts columns in single scans time (sec) no competing disk traffic selected bytes per tuple Row (any prefetch size) Column 48 (x 128KB) Column 16 Column 8 Column 2
16
© Stavros Harizopoulos 2006 massachusetts institute of technology16 Varying prefetch size No prefetching hurts columns in single scans Under competing traffic, columns outperform rows for any prefetch size no competing disk traffic with competing disk traffic selected bytes per tuple time (sec)
17
© Stavros Harizopoulos 2006 massachusetts institute of technology17 Analysis Central parameter in analysis: cycles per disk byte (cpdb) What can it model: More / fewer disks More / fewer CPUs CPU / disk competing traffic Trends in cpdb: 10 → 30 from 1995 to 2006 Further increase with multicore chips
18
© Stavros Harizopoulos 2006 massachusetts institute of technology18 Analysis Rows favored by narrow tuples and low cpdb –Disk-bound workloads have higher cpdb 10% selectivity 50% projection tuple width cycles per disk byte speedup of cols over rows 2 1.6 – 2 1.2 – 1.6 0.8 – 1.2 0.4 – 0.8 (cpdb)
19
© Stavros Harizopoulos 2006 massachusetts institute of technology19 See our paper for the rest CPU time breakdowns, L2 prefetcher Disk prefetching implementation Compression results Non-pipelined column scanner Analysis
20
© Stavros Harizopoulos 2006 massachusetts institute of technology20 Conclusions Given enough space for prefetching, columns outperform rows in most workloads Competing traffic favors columns Memory-bandwidth bottleneck in rows Future work –Column scanners, random I/O, write performance
21
© Stavros Harizopoulos 2006 massachusetts institute of technology21 Thank you db.csail.mit.edu/projects/cstore
22
© Stavros Harizopoulos 2006 massachusetts institute of technology22 Analysis SizeFile various DB schemas TupleWidth MemBytesCycle memory bus speed f # of selected attributes I CPU work cpdb (cycles per disk byte) more / fewer disks more / fewer CPUs CPU / disk competing traffic parameterwhat it can model
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.