Presentation on theme: "Building Things :Disks Databases. Introduction This is a mix and match section It contains all the information that does not have a home elsewhere in."— Presentation transcript:
Building Things :Disks Databases
Introduction This is a mix and match section It contains all the information that does not have a home elsewhere in this presentation It covers –Disk layouts –Disk Farms –Striping on Striping –Database build
Disk Layouts Basically in an IQ-M system you can have a number of different disk configurations We consider the following –Simple SCSI (or UDMA-66) Disks –Basic Disk Farms (Collections of raw disks) –RAID Systems (NT RAID, Clariion etc.) –Disk Subsystems (EMC, MTI Guardian etc.)
Basic Disk Farms Here we are considering a number of dumb disks There is no intelligence in the disks – or the controllers Here is where we use IQ-M striping –Here we have a number of disk drives (on various controllers) –We stripe across a series of disks
Disk Striping Basic Disks 1-dimensional IQ Striping In the 1-dimensional striping we are spreading the data writes (and reads) across a series of disk drives – limiting the overall disk head movement.
Problems with Disk Striping IQ-M has to make location decisions as to where to write and read the data It is not as simple as writing to bit bucket or a serial list of disk blocks It is faster than simple 1 disk = 1 device The performance hit is measured in extra micro seconds per read/write – but with a 10 TBytes database this could be important
ADCB RAID – 0,1 Redundant Array of Inexpensive Disks Raid 1 - Mirror In RAID 1 – Mirror, we can consider that Disks A and B are mirrored and Disks C and D This improves read and write performance, as the controller will pick which disk heads are closer to the data required. Also there is an improvement in data security – 2 copies ADCB Raid 0 - Stripe In RAID 0 – Stripe, we can spread the reads and writes as per the last slide This dramatically improves the read and write performance, but does nothing for the data security
ADCB RAID – 0/1 and 5 Redundant Array of Inexpensive Disks Raid 0/1 – Stripe and Mirror In RAID 0/1 – Stripe and Mirror, we can see that the stripe sets on A and B mirror to C and D This gives security – in the mirroring (copies) of the data, and improved performance due both to the two copies and the stripe sets ADCB Raid 5 – Checksum Multi-read In RAID 5 – Csum/MR, The data is in stripes across the disks then a checksum is written to the final disk This gives a small performance improvement for reads, no improvement for writes (in fact it will slow writes down!). The only bonus is security. The system will allow a disk failure in the pack without requiring the 2 to 1 overhead of mirroring systems
Other RAID There is RAID 2, 3 and 4 These are variations on the mirroring/striping and checksum theme There are also proprietary RAID schemes –RAID SThe is EMCs RAID system –RAID MThis is the MTI RAID Scheme We will come on to talk about proprietary schemes in a few slides time
Well is RAID useful ? Yes it is – mainly ! If the RAID system handles the data distribution at the hardware level (with RAID controllers) then RAID is both fast and safe Windows NT (and 2000) can drive RAID through the operating system This can be slower than IQ-M performing the striping –Watch out for large numbers of reads – and individual read performance slowing down –Also Avoid RAID 5. It is very slow for writes
Disk Subsystems (emc, HDS) These are disk arrays where we have no idea what the internal organization of the disks is (and nor do we care) Generally we can consider that we are writing to memory (and usually we are!) The tuning of the disks is usually best left up to the Hardware Support guys
Striping on Striping There is one very contentious area in disk organisation This is 2 dimensional disk striping or striping on stripes This is driven by using an operating system or RAID unit to provide 1 dimensional striping on a series of disks Then we apply IQ striping onto the existing striping 2 Dimensional stripes Do it right – its very fast, do it wrong and it isnt!
A diagram Operating System Striping AS IQ-M Striping The reason this works is now the individual disks only have a very small amount of data to read and write so we rely on very fast disk -> processor communications If this is done properly it is the fastest disk access for ASIQ-M The secret, if there is one, is to make the block size the same as the micro stripe size… But this needs further experimentation
Disk Striping By default Disk_Striping is set ON for RAW devices and OFF for file system devices We should be using RAW devices – they are (generally) a lot faster and potentially safer than file system devices You may want to play with this parameter if you are running devices on disk farms or RAID array systems
File system Storage If you must have the IQ Store on Operating System File System OS_File_Cache_Buffering can be set off for Solaris and Windows NT (and 2000) However the system will slow down in the following areas –IQ Page Size < file system block size –During Loads –Solaris > 4Gb memory The description in the manual is awesome!
Create Database OK so we have built the device or devices – about time we considered some of the options to create the database Of all of the options when issuing a CREATE DATABASE command the following slide details the most important
CASE - 1 CASE –CASE RESPECT is the fastest –There is a 10-20% hit going to CASE IGNORE Implications to FP Indexes –Regardless of RESPECT vs. IGNORE all 1-byte and 2-byte FP indexes store all the binary values for the data –So ABC, abc, Abc, Abc are all stored even for CASE IGNORE
CASE - 2 Remember because we store all bitmaps we can go from 1-byte to 2-byte FP, or 2-byte to flat FP where we might not want to A solution to this is to set the server in CASE RESPECT (because it is faster) Then use an ETL tool to rtrim() and ucase() or lcase() all of the incoming character data
CASE - 2 The HG index stores data in what is called conditioned mode. –For a CASE IGNORE database there is only one entry per logical value ABC = abc = Abc etc. –For a CASE RESPECT database there has to be one entry per value ABC != abc != Abc etc.
CASE – 3 For an LF index we hold partially conditioned values –For CASE RESPECT and CASE IGNORE all values have a bit- map –This can be wasteful on space The reason for having this is two fold –To allow for the recreation of the FP index from the LF –To allow for some rare cases (some group bys) where we still project values from an LF index
COLLATION –Set to ISO_BINENG, this is the fastest –If you must have a collation sequence this will slow the system down by around 10% for 8 bit character sets and 50% for multi-byte character sets –There are substantial slow downs for all multi-byte character sets
IQ PAGE SIZE - 1 This is an area of extreme contention The IQ PAGE size (effectively) determines the the size of the smallest addressable area in memory NOT ON DISK ! The disk parameter is the BLOCK SIZE
IQ PAGE SIZE – 2 The rules for IQ PAGE SIZE (or memory buffer size) are simple Set to 64K unless…. –Set to 128K when the memory model exceeds around 1 Gbytes per cache, and the number of rows in the FACT table exceeds 100m –Set to 256K when the memory model exceeds 4 Gbytes per cache (this may be a rare case!) –Do not set to 512K – there is a little bug…
BLOCK SIZE The block size is set automatically when the IQ PAGE SIZE is set (In 11.x IQ you could set the Max Compression parameter that would vary the number of blocks per page) In IQ 12, you can vary the BLOCK SIZE but it does not do much except in some extreme cases
So what is BLOCK SIZE? Disk Device Memory Cache A Page A Chunk (A variable length object) When a page is written out of memory it is compressed. Only the resulting used blocks within the page are written to disk, this set of blocks is called a chunk
Writing to Disk Default Page in memory 64Kb – 16 Blocks Compress before Disk Write Operation 10 Blocks used before compression Compressor 4 blocks used after compression Disk Write 1 Chunk on Disk the size of 4 blocks - 16kbytes
Disk_Striping_Packed The problem with large systems is that we only have 1 freelist This means that if we want 8 blocks of space we will grab the first > 8 space we can find, which tends to fragment the devices If Disk_Striping_Packed is ON then we have one freelist for each number of blocks available –1 for 1 block free –1 for 2 blocks free –Etc. The trade-off is better space usage against slightly worse localisation.
BLOCKS So as can be seen the IO operation is performed in blocks but how many are written at one time depends upon –The usage (how much of the page is used) –The Compressibility (how well the used blocks compress) We have found the default works the best –That is for a 64K page 16 blocks of 4k
Block Size The compressor is designed to run in 2Kbyte chunks so –Never make the block size less than 2Kbyte –Always make is a multiple of 2Kbyte The compressor does not have a huge amount of memory to run in so –If the Page size is big (>128Kbyte) do not set the block size much bigger than 32-64Kbyte
Scoping - 1 ASIQ-M used the ASA scoping rules, not the ANSI rules In ANSI there are strict rules as to where aliases can be used, and what they are ASA has rules that an extension to the ANSI rules and allow much more freedom In ASA you can re-use and rename anything, anywhere
Scoping - 2 An example: Select alpha = 27, beta, gamma From tablename Where alpha between 27 and 29 In ASE rows are returned that match the where clause i.e. alpha between 27 and 29 In ASA/ASIQ all table rows are returned Why ?
Scoping – 3 Select alpha = 27, beta, gamma From tablename Where alpha between 27 and 29 In ASIQ the following happens –In the select we have defined an alias alpha and assigned a value to it (27) –Then in the where clause we check the value of alpha against the values 27 thru 29 and the condition is true because alpha is 27 Note: Alpha is defined as an alias here not a column identifier
A Picture Select alpha = 27, beta, gamma From tablename Where alpha between 27 and 29 ASE ASIQ In ASE this alpha is a alias In ASE this alpha is a column name In ASIQ this alpha is a alias In ASIQ this alpha is also a alias, because it has been defined To make the code correct for ASIQ you need to prepend the second alpha with The table name. E.g. tablename.alpha