Database Design (for IQ-M). Introduction This section has been re-vamped for the 12.4.3 course I have removed all the design bits that are not absolutely.

Database Design (for IQ-M)

Introduction This section has been re-vamped for the 12.4.3 course I have removed all the design bits that are not absolutely for IQ This is to allow me to add more information directly related to IQ So this is not a Data Warehouse Design or Methodology course, I have colleagues that do it much better then me!

Aggregation This was a dirty word in IQ-M Not anymore It is not sensible to trawl through 10 billion rows of data, when a simple aggregation table can reduce the queried data to 10 thousand rows – or less Even with IQ-M we can improve performance by as much as 100 times using aggregation correctly

A Statement Just because ASIQ-M is fast does not give us the right to generate slow databases or applications

Primary Key/Foreign Key IQ-M does not enforce Primary Key/Foreign Key relationships – but it will do soon (12.5 Q2 2002) The optimiser does use the PK/FK relationship for query planning Only specify this relationship if the relationship does exist Incorrect specification can result in query plan errors (performance degradation) and possibly errors ASA does modify a join that is defined as PK/FK to an ANSI NATURAL join – this can cause problems with orphan rows

Key Specification In a Data Warehouse the production key is not, generally, used as the warehouse key It is more acceptable practice to use a generated key Make this key an Unsigned INT or BIGINT This is the absolutely most efficient key datatype in IQ- M

Primary Keys In IQ-M a Primary Key is an ANSI standard Primary Key –It is UNIQUE –It must not be null If specified as a table or column constraint then a specialised form of the HG index is created

Foreign Keys Always generate an HG index on a Foreign Key If the relationship is 1:1 then generate the Foreign Key column as a UNIQUE –This will force auto generation of a unique HG index Again try to specify join columns as Unsigned INT or BIGINT

A digression on Datatypes There are some very important issues concerning datatypes We have discussed the actions of the indexes – there are areas where an index can be forced to run slowly if the datatype is specified wrongly Always consider the requirements for the datatype In correct datatype specification is as bad as incorrect index selection

Signed vs. Unsigned If you don’t need signed data in an int or bigint – use UNSIGNED This will speed up the accessing of the HNG index sometimes doubling the performance –HNG stores negative data as 1s complement  This means SUM() AVG() etc. run quickly  But range checks require another set of scans –If we stored as 2s complement then  Range checks would run with 1 scan  But SUM() AVG() would be slower!!

Other Datatype Issues Signed vs. Unsigned does not affect the other indexes to any great degree But… The selection of datatypes does We have already discussed keys but some other areas are worth commenting on…

Long Varchar() - 1 A long varchar() is defined as a varchar() with a length greater than 255. If you can avoid this please try to –Only FP and word index index is allowed –No enumerated indexes or HNG We have seen a number of customers who use varchar(1024) as Primary Keys –please DO NOT DO THIS!!

Long Varchar() - 2 Long varchar() are stored as 256 byte chunks, so using 4 bytes in a varchar(32000) only uses 256 bytes By default these 256 byte chunks are memset (set to zeros to improve compression) There is an upgrade option to memset existing 12.4.0 varchar() – this is worth doing, if you have the time!

Char() vs. Varchar() Always, if you can use char() Generally this will improve performance, at the modest cost of storing some small number of extra bytes Query performance on retrieval of char() vs. Varchar() indicates that there can be a 2-3% performance hit per column, and we have seen 10% degradation on single columns

Memset - 1 Initialize_Memory_To_Ones_On_Allocation Some slides ago I mentioned memset This is a function of IQ that allows varchar() – and other objects – to have their unused portions set to one on initialization This will improve the performance of the compression algorithms (usually) It may possible slow down loads – but I have not found this to be the case

Memset - 2 To memset existing varchar() when upgrading to 12.4.2 set the following option Convert_Varchar_To_1242 To bring the HG indexes to 12.4.2 standard when upgrading a pre 12.4.2 database set the option Convert_HG_To_1242 In the 12.4.3 upgrade, both these options happen by default

Float, Real and Double Unless you really need them – please do not use –FLOAT –REAL –DOUBLE They can only have Flat FP indexes – no others The do not store “exact” values – only approximate Please try to use –NUMERIC –DECIMAL

NUMERIC and DECIMAL Numeric and Decimal are aliases of each other Any numeric or decimal with a precision of less than 12 will be stored as an INT (with conversions) Any numeric or decimal with a precision of between 12 and 20 will be stored as a BIGINT (with conversions)

Join Columns You must generate the database schema with the table join columns having the same datatype. INT, UINT and BIGINT are best, but the column datatypes for each join must be the same Conversion cost is horrendous

Case and Collation Sequences In terms of RAW performance the fastest IQ database is one where CASE is set to RESPECT and the collation sequence is BINARY (ISO_bineng) This is probably not suitable for the general application of the database or warehouse server CASE set to IGNORE is the next fastest, then changes in the collation sequence The performance hits can be quite high (around 10- 20% - we think!)

String Searches String Searches such as substr(1,3,col_name) are really very slow, they rely on FP searches With low cardinality (1 and 2 byte FP) data the search is faster, but this can still be a restriction Create a new column which is the first 3 characters of the col_name column, then search on this This way there is no function call, so no projection, so the optimiser can use a fast index LF or HG (or if it is a range query an HNG)

Telephone Numbers A classic example of the above is the telephone number +1-301-896-1733 +1-> Country Code 301-> Area Code 896-> Sub Area Code 1733-> Local Number Make this 4 columns (actually 5 - the whole number), then searches use fast indexes

Date time As with telephone numbers, try storing a data time as as series of columns (or a dimension table) Try creating columns DDMMYY HHMMSS DoWeekDoYearQuarter etc.

Date vs. Datetime A slightly better solution to the above can be considered in the light of the 1 and 2 byte FP indexes Try storing the date part of a datetime as a date and the time part as hh mm ss So: Datetime ->date_col, hh_col, mm_col, ss_col

Loading Dates There is NO default date or datetime format for loads into IQ The format must be explicitly set for the load/insert to get the best performance However some formats are conversion enhanced

Enhanced Conversion formats DD/MM/YYYY DD.MM.YYYY DD-MM-YYYY HH:NN:SS HHNNSS HH:NN:SS.S HH:NN:SS.SS HH:NN:SS.SSS HH:NN:SS.SSSS HH:NN:SS.SSSSS YYYY-MM-DD HH:NN:SS YYYYMMDD HHNNSS YYYY-MM-DD HH:NN:SS YYYY-MM-DD HH:NN:SS.S YYYY-MM-DD HH:NN:SS.SS YYYY-MM-DD HH:NN:SS.SSS YYYY-MM-DD HH:NN:SS.SSSS YYYY-MM-DD HH:NN:SS.SSSSS

Date Load So it is better to use Col1DATE(‘YYYY-MM-DD’) than Col1ASCII(10) The performance enhancement can be as much as a 100 fold speed up in loads (for small tables)

UNION In IQ-M 12.4.3 the UNION clause has very few disadvantages Generally UNIONs are all processed in parallel so if you have a low user count they work well Also the delete question now can be solved Do not use DISTINCT in the UNION clause, or in the SELECT statement

UNION and Delete If you are storing a fixed (in time) amount of data e.g.. 6 months –Then every month you delete 1/6 th of the data in the table –This is expensive It is better to split the fact table into 6 x one month tables –At the end of the month you truncate the oldest table –And possibly rename the table sets –Remember for Multiplex table rename is DDL and hence can only be done in simplex mode!

What is wrong Here ? SELECTcol1, col2 FROMtable1, table2, table3 WHEREtable1.col1 < table2.col1 AND table1.col1 > table3.col1

Cartesian Joins These are expensive – they involve the join of every row in one table to every row in a second table. –Table A 1,000,000 rows –Table B 100,000 rows –Worktable 100,000,000,000 rows Select * from T, R where T.a = 10 Cartesian Select * from T, R where T.a between R.b and T.b Cartesian Select * from T, R where ABS(T.a * R.b) = T.b Cartesian But Select * from T, R where ABS(T.a * T.b) = R.bNot Cartesian

Database Design - End

Database Design (for IQ-M). Introduction This section has been re-vamped for the 12.4.3 course I have removed all the design bits that are not absolutely.

Similar presentations

Presentation on theme: "Database Design (for IQ-M). Introduction This section has been re-vamped for the 12.4.3 course I have removed all the design bits that are not absolutely."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Database Design (for IQ-M). Introduction This section has been re-vamped for the 12.4.3 course I have removed all the design bits that are not absolutely.

Similar presentations

Presentation on theme: "Database Design (for IQ-M). Introduction This section has been re-vamped for the 12.4.3 course I have removed all the design bits that are not absolutely."— Presentation transcript:

Similar presentations

About project

Feedback