Compressing Very Large Data Sets in Oracle Luca Canali, CERN UKOUG Conference Birmingham, December 1 st, 2009
Outline Physics databases at CERN and compression Use cases for compression and archive for Physics CERN Current experience with10g Advanced compression tests with 11g Oracle and compression internals A closer look on how compression is implemented in Oracle BASIC, OLTP, Columnar compression Appendix: how to play with hybrid columnar without exadata 2Compressing Very Large Data Sets in Oracle, Luca Canali
CERN and LHC Compressing Very Large Data Sets in Oracle, Luca Canali3
LHC data LHC data correspond to about 20 million CDs each year! Concorde (15 Km) Balloon (30 Km) CD stack with 1 year LHC data! (~ 20 Km) Mt. Blanc (4.8 Km) RDBMS play a key role for the analysis of LHC data 4Compressing Very Large Data Sets in Oracle, Luca Canali
Why Compression? The ‘high level’ view: Databases are growing fast, beyond TB scale CPUs are becoming faster There is a opportunity to reduce storage cost by using compression techniques Gaining in performance while doing that too Oracle provides compression in the RDBMS 5Compressing Very Large Data Sets in Oracle, Luca Canali
Making it Work in Real World Evaluate gains case by case Not all applications can profit Not all data models can allow for it Compression can give significant gains for some applications In some other cases applications can be modified to take advantage of compression Comment: Our experience of deploying partitioning goes on the same track Implementation involves developers and DBAs 6
Compressing Very Large Data Sets in Oracle, Luca Canali Evaluating Compression Benefits Compressing segments in Oracle Save disk space Can save cost in HW Beware that capacity in often not as important as number of disks, which determine max IOPS Compressed segments need less blocks so Less physical IO required for full scan Less logical IO / space occupied in buffer cache Beware compressed segments will make you consume more CPU 7
Compressing Very Large Data Sets in Oracle, Luca Canali Compression and Expectations A 10TB DB can be shrunk to 1TB of storage with a 10x compression? Not really unless one can get rid of indexes Data ware house like with only FULL SCAN operations Data very rarely read (data on demand, almost taken offline) Licensing costs Advanced compression option required for anything but basic compression Exadata storage required for hybrid columnar 8
Compressing Very Large Data Sets in Oracle, Luca Canali Use Case 1: Data Life Cycle Transactional application with historical data Data has an active part (high DML activity) Older data is made read-only (or read-mostly) As data ages, becomes less and less used 9
Compressing Very Large Data Sets in Oracle, Luca Canali Example of Archiving and Partitioning for Physics Applications Separation of active and the read-mostly part Data is inserted with a timestamp (log-like) Most queries will specify a timestamp value or a time interval Range/interval partitioning, by time comes natural Example: Atlas’ PANDA, DASHBOARD Other option: application takes care of using multiple tables Example: PVSS More flexible but need app to maintain metadata Alternative: archiving as post-processing This steps allow to add modifications Ex: changing index structure (Atlas’ PANDA) 10
Compressing Very Large Data Sets in Oracle, Luca Canali Archive DB and Compression Non-active data can be compressed To save storage space In some cases speed up queries for full scans Compression can be applied as post-processing On read-mostly data partitions –(Ex: Atlas’ PVSS, Atlas MDT DCS, LCG’s SAME) With alter table move or online redefinition Active data Non compressed or compressed for OLTP (11g) 11
Compressing Very Large Data Sets in Oracle, Luca Canali Data Archive and Indexes Indexes do not compress well Drop indexes in archive when possible Risk archive compression factor dominated by index segments Important details when using partitioning Local partitioned indexes preferred for ease of maintenance and performance Note limitation: columns in unique indexes need be superset of partitioning key May require some index change for the archive –Disable PKs in the archive table (Ex: Atlas PANDA) –Or change PK to add partitioning key 12
Compressing Very Large Data Sets in Oracle, Luca Canali Use Case 2: Archive DB Move data from production to a separate archive DB Cost reduction: archive DB is sized for capacity instead of IOPS Maintenance: reduces impact of production DB growth Operations: archive DB is less critical for HA than production 13
Compressing Very Large Data Sets in Oracle, Luca Canali Archive DB in Practice Detach ‘old’ partitions form prod and load them on the archive DB Can use partition exchange to table Also transportable tablespace is a tool that can help Archive DB post-move jobs can implement compression for archive (exadata 11gR2) Post-move jobs may be implemented to drop indexes Difficult point: One needs to move a consistent set of data Applications need to be developed to support this move Access to data of archive need to be validated with application owners/developers New releases of software need to be able to read archived data 14
Example of Data Movement 15Compressing Very Large Data Sets in Oracle, Luca Canali
Use Case 3: Read-Only Large Fact Table Data warehouse like Data is loaded once and queried many times Table access has many full scans Compression Save space Reduces physical IO Can be beneficial for performance 16
Compressing Very Large Data Sets in Oracle, Luca Canali Is There Value in the Available Compression Technology? Measured compression factors for tables: About 3x for BASIC and OLTP In prod at CERN, example: PVSS, LCG SAME, Atlas TAGs, Atlas MDT DCS 10-20x for hybrid columnar (archive) more details in the following Compression can be of help for the use cases described above Worth investigating more the technology Compression for archive is very promising 17
Compressing Very Large Data Sets in Oracle, Luca Canali Compression for Archive – Some Tests and Results Tests of hybrid columnar compression Exadata V1, ½ rack Oracle 11gR2 Courtesy of Oracle Remote access to a test machine in Reading 18
Advanced Compression Tests Representative subsets of data from production exported to Exadata V1 Machine: Applications: PVSS (slow control system for the detector and accelerator) GRID monitoring applications File transfer applications (PANDA) Log application for ATLAS Exadata machine accessed remotely to Reading, UK for a 2-week test Tests focused on : OLTP and Hybrid columnar compression factors Query speedup 19Compressing Very Large Data Sets in Oracle, Luca Canali
Hybrid Columnar Compression on Oracle 11gR2 and Exadata Data from Svetozar Kapusta – Openlab (CERN). 20Compressing Very Large Data Sets in Oracle, Luca Canali Compression factor
Full Scan Speedup – a Basic Test 21Compressing Very Large Data Sets in Oracle, Luca Canali
IO Reduction for Full Scan Operations on Compressed Tables Data from Svetozar Kapusta – Openlab (CERN). Speed up of full scan 22Compressing Very Large Data Sets in Oracle, Luca Canali
Time to Create Compressed Tables Data from Svetozar Kapusta – Openlab (CERN). 23Compressing Very Large Data Sets in Oracle, Luca Canali
A Closer Look to Compression....the Technology Part
Compressing Very Large Data Sets in Oracle, Luca Canali Oracle Segment Compression - What is Available Heap table compression: Basic (from 9i) For OLTP (from 11gR1) 11gR2 hybrid columnar (11gR2 exadata) Other compression technologies Index compression Key factoring Applies also to IOTs Secure files (LOB) compression 11g compression and deduplication Compressed external tables (11gR2) Details not covered here 25
Compressing Very Large Data Sets in Oracle, Luca Canali Compression - Syntax 11gR2 syntax for compressed tables Example: create table MYCOMP_TABLE compress BASIC compress for OLTP compress for QUERY [LOW|HIGH] compress for ARCHIVE [LOW|HIGH] as select * from MY_UNCOMP_TABLE; 26
Compressing Very Large Data Sets in Oracle, Luca Canali Compression Details – Basic and ‘OLTP’ Compression From Oracle ‘concepts manual’: The format of a data block that uses basic and OLTP table compression is essentially the same as an uncompressed block. The difference is that a symbol table at the beginning of the block stores duplicate values for the rows and columns. The database replaces occurrences of these values with a short reference to the symbol table. How to investigate what goes on? alter system dump datafile block min block max ; 27
Block Dumps Show Symbols’ Table block_row_dump: tab 0, row tl: 13 fb: --H-FL-- lb: 0x0 cc: 2 col 0: [ 3] col 1: [ 5] 49 4e bindmp: 00 d4 02 cb cd 49 4e tab 0, row tl: 13 fb: --H-FL-- lb: 0x0 cc: 2 col 0: [ 3] col 1: [ 5] c 45 bindmp: 00 b6 02 cb cd c 45 tab 1, row tl: 11 fb: --H-FL-- lb: 0x0 cc: 3 col 0: [ 3] col 1: [ 5] c 45 col 2: [ 5] f 4c 24 bindmp: 2c cd f 4c 24 tab 1, row tl: 13 fb: --H-FL-- lb: 0x0 cc: 3 col 0: [ 3] col 1: [ 5] 49 4e col 2: [ 7] 49 5f bindmp: 2c cf 49 5f tab 1, row tl: 10 fb: --H-FL-- lb: 0x0 cc: 3 col 0: [ 3] col 1: [ 5] c 45 col 2: [ 4] 43 4f 4e 24 bindmp: 2c cc 43 4f 4e 24 Actual binary dump of row Symbols’ table is tab0 Uncompressed data Actual data: tab1 Row1: SYSICOL$ TABLE Row2: SYSI_USER1 INDEX Row3: SYSCON$ TABLE … SYS INDEX 28Compressing Very Large Data Sets in Oracle, Luca Canali
OLTP Compression – Not Limited to Direct Load Allows ‘normal INSERT’ into the table 29 Initially Uncompressed Block Compressed Block Partially Compressed Block Compressed Block Empty Block Legend Header Data Free Space Uncompressed Data Compressed Data Picture: B. Hodak Oracle (OOW ‘09 and OTN whitepaper).
Compressing Very Large Data Sets in Oracle, Luca Canali Compression and INSERT operations Test: insert row by row with a PL/SQL loop into a compressed table Test fromT. Kyte’s presentation on 11g (UKOUG 2007) Following test used 50k rows from dba_objects Basic compression reverts to no compression Hybrid columnar reverts to OLTP compression 30 Table Compr TypeTable Blocks Elapsed Time (from trace) no compression sec basic compression sec comp for oltp sec comp for query high sec
Extra CPU used to perform block compression Compressing Very Large Data Sets in Oracle, Luca Canali Compression Internals – OLTP 31 When is the block compressed? How much CPU does this operation take? trace to investigate EXEC #4:c=0,e=24,p=0,cr=0,cu=1,mis=0,r=1,dep=1,og=1,plh=0,tim= EXEC #2:c=0,e=25,p=0,cr=0,cu=1,mis=0,r=1,dep=1,og=1,plh=0,tim= EXEC #4:c=1000,e=527,p=0,cr=1,cu=5,mis=0,r=1,dep=1,og=1,plh=0,tim= EXEC #2:c=0,e=28,p=0,cr=0,cu=1,mis=0,r=1,dep=1,og=1,plh=0,tim= EXEC #4:c=0,e=26,p=0,cr=0,cu=1,mis=0,r=1,dep=1,og=1,plh=0,tim= Block is compressed when it becomes full i.e. When it reaches PCTFREE Each time block compression takes place, the insert operation takes more CPU to execute
Comments on Basic and OLTP compression Basic and OLTP compression use the same underlying algorithm Rows are compressed by the use of a symbols’ table Block structure not too different from ‘normal’ heap tables Compression factors variations depend on data Also sorting (row order) can make a difference A basic example has been shown Block dumps can be used for further investigations Compressing Very Large Data Sets in Oracle, Luca Canali32
Compressing Very Large Data Sets in Oracle, Luca Canali A Closer Look at Hybrid Columnar Compression Data is stored in compression units (CUs), a collection of blocks (around 32K) Each compression unit stores data internally ‘by column’: This enhances compression 33 CU HEADER BLOCK HEADER C3 C4 C1 C2 C7 C5 C6C8 Logical Compression Unit Picture and info from: B. Hodak, Oracle (OOW09 presentation on OTN).
Block Dumps Hybrid Columnar 1/2 block_row_dump: tab 0, row tl: 8016 fb: --H-F--N lb: 0x0 cc: 1 nrid: 0x00d col 0: [8004] Compression level: 02 (Query High) Length of CU row: 8004 kdzhrh: PC CBLK: 4 Start Slot: 00 NUMP: 04 PNUM: 00 POFF: 7954 PRID: 0x00d PNUM: 01 POFF: PRID: 0x00d PNUM: 02 POFF: PRID: 0x00d PNUM: 03 POFF: PRID: 0x00d CU header: CU version: 0 CU magic number: 0x4b445a30 CU checksum: 0xf980be25 CU total length: CU flags: NC-U-CRD-OP ncols: 15 nrows: 6237 algo: 0 CU decomp length: len/value length: row pieces per row: 1 num deleted rows: 0 START_CU: f f d e d d b2 00 d d d Comp for query high CU of 4 blocks 6237 rows are contained in this CU 34Compressing Very Large Data Sets in Oracle, Luca Canali CU length
Block Dumps Hybrid Columnar 2/2 Compression level: 04 (Archive High) Length of CU row: 8004 kdzhrh: PC CBLK: 12 Start Slot: 00 NUMP: 19 PNUM: 00 POFF: 7804 PRID: 0x00d PNUM: 01 POFF: PRID: 0x00d … PNUM: 16 POFF: PRID: 0x00d PNUM: 17 POFF: PRID: 0x00d PNUM: 18 POFF: PRID: 0x00d CU header: CU version: 0 CU magic number: 0x4b445a30 CU checksum: 0x966d9a47 CU total length: CU flags: NC-U-CRD-OP ncols: 15 nrows: algo: 0 CU decomp length: len/value length: row pieces per row: 1 num deleted rows: 0 START_CU: f c e 7c 00 d d cc 00 d d 1c 00 d c 6c 00 d Comp for archive high CU of 19 blocks rows are contained in this CU 35Compressing Very Large Data Sets in Oracle, Luca Canali CU length
Compressing Very Large Data Sets in Oracle, Luca Canali Compression Factors An artificial tests to put compression algorithms into work Two different types of test table Constant: each row contains a random string Random: each row contains a repetition of a given string 100M rows of about 200 bytes each Details of the test in the notes for this slide 36
Compressing Very Large Data Sets in Oracle, Luca Canali Compression Factors 37 Table Type Compression TypeBlocks UsedComp Factor Constant Tableno compression Constant Tablecomp basic Constant Tablecomp for oltp Constant Table comp for archive high Constant Tablecomp for query high Random Tableno compression Random Tablecomp basic Random Tablecomp for oltp Random Table comp for archive high Random Table comp for query high
Compressing Very Large Data Sets in Oracle, Luca Canali Hybrid Columnar and gzip Compression for archive reaches high compression How does it compare with gzip? A simple test to give a ‘rough idea’ Test: used a table populated with dba_objects Results ~20x compression in both cases 38 MethodUncompressedCompressedRatio gzip bytes bytes22 compress for archive high896 blocks48 blocks19
Compressing Very Large Data Sets in Oracle, Luca Canali Compression and DML What happens when I update a row on compressed tables? What about locks? BASIC and OLTP: the updated row stays in the compressed block ‘usual’ Oracle’s row-level locks Hybrid columnar: Updated row is moved, as in a delete + insert How to see that? With dbms_rowid package New row is OLTP compressed if possible Lock affects the entire CU that contains the row 39
Compressing Very Large Data Sets in Oracle, Luca Canali Compression and Single-row Index Range Scan access Test SQL: select from a copy of dba_objects with a index on object_id Predicate: ‘where object_id=100 Note: ‘owner’ is the first column of the test table 40 Table Compr Type Consistent gets (select *) Consistent gets (select owner) no compression44 basic compression44 comp for oltp44 comp for query high95 comp for query low95 comp for archive low105 comp for archive high245
Comments on Hybrid Columnar compression The algorithms used by the various hybrid columnar compression levels are not exposed A few guesses from simple tests: Compression Units seem to group together several 1000s of rows, up to 32K rows Compression of table data in the CU seems to have similar compression efficiency than gzip Rows are not identifiable inside a block dump of a CU Single-row index access requires more consistent block gets Compressing Very Large Data Sets in Oracle, Luca Canali41
Compressing Very Large Data Sets in Oracle, Luca Canali Appendix How to test hybrid columnar compression without exadata storage? The correct answer: you don’t! In the case of ‘a playground’.. there is a way.. The issue: ORA-64307: hybrid columnar compression is only supported in tablespaces residing on Exadata storage 42
Compressing Very Large Data Sets in Oracle, Luca Canali Chain of Requirements SQL> select PREDICATE_EVALUATION from dba_tablespaces where tablespace_name='USERS'; PREDICATE_EVALUATION STORAGE -> Note default is HOST Predicate evaluation is determined by a diskgroup attribute: SQL> select VALUE from v$asm_attribute where NAME='cell.smart_scan_capable'; VALUE TRUE -> Note on non-exadata storage it’s FALSE 43
Compressing Very Large Data Sets in Oracle, Luca Canali How to Force Change (corrupt) an ASM Attribute Alter diskgroup does not work, as ASM checks wether storage is exadata or not Workaround (for playgrounds only): Modify attribute directly on ASM file N. 9 How? Find where the extent for ASM file 9 is located by querying x$kffxp Read block 3 of ASM file 9 with Oracle’s kfed utility Edit the local copy of the block and change cell.smart_scan_capable attribute Write block 3 of ASM file 9 with kfed utility Details and important post-change steps in notes part of this slide 44
Compressing Very Large Data Sets in Oracle, Luca Canali Conclusions Oracle and DB tables compression: Successfully used in production physics DBs In particular archival of read-mostly data Also for DW-like workload Works well with partitioning in our experience Caveat: indexes do not compress as well as table data Compression factors vary with data Benefits are to be evaluated case by case Advanced compression in 11g Compression for OLTP very interesting Hybrid columnar compression Can provide for high compression ratios Currently available in prod only on exadata 45
Acknowledgments Physics DB team and in particular for this work: Maria Girone, Svetozar Kapusta, Dawid Wojcik Oracle, in particular for the opportunity of testing on Exadata: Monica Marinucci, Bill Hodak, Kevin Jernigan More info: 46Compressing Very Large Data Sets in Oracle, Luca Canali