School of Computing and Management Sciences © Sheffield Hallam University Finding Data –In a list of 700 composers, how do we find Berlioz? –The row with.

School of Computing and Management Sciences © Sheffield Hallam University Finding Data –In a list of 700 composers, how do we find Berlioz? –The row with Berlioz details will exist in a physical data block. –The system knows where, by mapping from a ROWID to a physical block. –The physical location will probably be totally unrelated to any attributes of the row, including the composer’s name alphabetical order

School of Computing and Management Sciences © Sheffield Hallam University Finding Data First option is to run sequentially through all the rows: –Read Row 1 is this Belioz? –No, so: –Read Row 2…….and so on. Disadvantages: –What if we were looking for Wiren from 1million composers? –No control over the number of disk reads required Advantages: –No processing overhead –Rapid access for small volumes

School of Computing and Management Sciences © Sheffield Hallam University Finding Data Binary Search: –Read Row (row count)/2 is this > Belioz or < Berlioz? –If less then: –Read Row ((row count)/2)/2 and so on….. Disadvantages: –Processing required to control where to binary chop –No control over the number of disk reads required Advantages: –Will often be less read intensive than sequential

School of Computing and Management Sciences © Sheffield Hallam University Finding Data But what if we could discover the ROWID, and hence the physical address of the row? This would enable us to go straight to the data :-) There will however be a cost: :-( –the processing needed to “look up” the ROWID –the need to maintain the index –the storage implications –performance hits because of multiple writes for each created or updated row –The problem of keeping indexes consistent - a well known problem with Paradox, for example

School of Computing and Management Sciences © Sheffield Hallam University Finding Data Two main types of index –B-tree Index –Bitmap Index The selection of index type depends primarily upon Cardinality: –Low cardinality means: columns in which the number of distinct values is small compared to the number of rows in the table. –bitmap indexes are best for low cardinality columns B-tree indexes are most effective for high-cardinality data such as Name, or Phone Number A B-tree index can grow to be larger than the indexed data. Bitmap indexes can be significantly smaller than a corresponding B-tree index

School of Computing and Management Sciences © Sheffield Hallam University B-Tree A..D F..HH..Z ROOT Branch blocks NameRowid Arnoldxxxxx Bartokxxxyy Berliozxxxzz Bizet………….. NameRowid Finziqqqqq Greighhhhh ……………… Leaf blocks

School of Computing and Management Sciences © Sheffield Hallam University B-Tree Issues What is the most efficient number of branches? –More means definitely more links, but maybe less reads overall –More means more data, more blocks, slower performance All leaf blocks of the tree are at the same depth, so retrieval of any record from anywhere in the index takes approximately the same amount of time. –Be that good, or bad :-)

School of Computing and Management Sciences © Sheffield Hallam University Bitmap Index Sample Bitmap Index on Gender. 'M' 'F' ROWID: cust_id70 0 1 ROWID: cust_id80 0 1 ROWID: cust_id90 1 0

School of Computing and Management Sciences © Sheffield Hallam University Bitmap Index Not appropriate for OLTP applications with a heavy load of concurrent INSERTs, UPDATEs, and DELETEs Primarily intended for decision support in data warehousing applications where users typically query the data rather than update it. Bitmap indexes are also not suitable for columns that are primarily queried with less than or greater than comparisons. Bitmap indexes can substantially improve performance of queries with the following characteristics:  The WHERE clause contains multiple predicates on low- or medium- cardinality columns.  The individual predicates on these low- or medium-cardinality columns select a large number of rows.

School of Computing and Management Sciences © Sheffield Hallam University To index, or not to index….. The purpose of an index is to reduce the cost of data retrieval. But at a cost. Overhead involved in maintenance and use of secondary indexes has to be balanced against performance improvement gained when retrieving data. This includes: –adding/updating an index record to every secondary index whenever row is inserted/updated; –increase in disk space required –possible performance degradation during query optimization to consider all secondary indexes.

School of Computing and Management Sciences © Sheffield Hallam University To index, or not to index….. Considerations: –The relative “cost” of a Full Table Scan. For eg: Create an index if you frequently want to retrieve less than 15% of the rows in a large table. The percentage varies greatly according to the relative speed of a table scan and how clustered the row data is about the index key. The faster the table scan, the lower the percentage; the more clustered the row data, the higher the percentage. –Source: Oracle manual –In Oracle, primary + unique keys automatically have indexes

School of Computing and Management Sciences © Sheffield Hallam University To index, or not to index….. Rules of thumb (adapted from Connolly and Begg): –(1) Do not index small tables. –(2) Index PK of a relation –(3) Index any FK if heavily used: Oracle doesn’t enforce this –(4) Add secondary index to any attribute that is heavily used as a secondary key. –(5) Add secondary index on attributes that are involved in: selection or join criteria; ORDER BY; GROUP BY; and other operations involving sorting (such as UNION or DISTINCT).

School of Computing and Management Sciences © Sheffield Hallam University To index, or not to index….. Rules of thumb (Connolly and Begg): –(6) Add secondary index on attributes involved in built- in functions. –(7) Avoid indexing an attribute or relation that is frequently updated. –(8) Avoid indexing an attribute if the query will retrieve a significant proportion of the tuples in the relation. –(9) Avoid indexing attributes that consist of long character strings.

School of Computing and Management Sciences © Sheffield Hallam University Oracle Indexing Options Straightforward Index: –Use the SQL command CREATE INDEX –CREATE INDEX emp_ename ON emp(ename); UNIQUE: –CREATE UNIQUE INDEX uniq_dept_dname ON dept(dname); –Be sure you need this as it is a costly extra! Bitmap Index: –create bitmap index i_alloc on Requirements(Allocated) ;

School of Computing and Management Sciences © Sheffield Hallam University Oracle Indexing Options Function-Based Index –for example Case-Insensitive Searches assisted by: –CREATE INDEX Name_Idx ON Emp (UPPER(Ename)); note: initialisation parameter QUERY_REWRITE_ENABLED needs to be TRUE for this to work. SHU92 is set to false, so you can’t create a function based index Reverse Key Indexes –Useful performance improvement in environments that tend towards index leaf “hot spots” –Architecturally the same as B-Tree but –the bytes of the column data values are reversed. This means that –sequential entries are more evenly distributed meaning that many inserts of sequential values are spread

School of Computing and Management Sciences © Sheffield Hallam University IOT: An Oracle 8i+ alternative: –An index-organized table has a storage organization that is a variant of a primary B-tree. Unlike an ordinary table whose data is stored as an unordered collection, data for an index-organized table is stored in a primary key sorted manner –Index-organized tables provide faster access to table rows by the primary key. –Since rows are stored in primary key order, range access by the primary key involves minimum block accesses.

School of Computing and Management Sciences © Sheffield Hallam University IOT: An Oracle 8i+ alternative: CREATE TABLE WardReqIOT ( WardIDNUMBER, RequestDateDATE, GradeNUMBER, QtyReqNUMBER, AuthorisedByVARCHAR2(30), CONSTRAINT PK_WardReqIOT PRIMARY KEY (WardID, RequestDate, Grade ) ) ORGANIZATION INDEX INCLUDING QtyReq OVERFLOW Tablespace Users Must have a primary key Reduce rowsize in B-tree by relegating low-use columns

School of Computing and Management Sciences © Sheffield Hallam University IOT: When? Broadly speaking, any table that is frequently searched and accessed almost exclusively via the primary key But a table which needs secondary indexes and unique key constraints can bring problems IOTs can improve performance by reducing the I/O associated with having to read blocks from both an index and a table. IOTs can also save storage because they avoid the need to duplicate key columns in both an index and table segments The down-side is the increased maintenance requirements: Its an index, which do need rebuilding.

School of Computing and Management Sciences © Sheffield Hallam University Partitions Oracle can managing very large tables comprising millions of rows. However: –Performing maintenance operations such as bulk deletion and back-up on such large tables is difficult, particularly in a 24/7 environment. –There can be performance issues associated with inserting into, or reading from tables this large. From Oracle 8i one solution can be to partition the logical object into separate physical segments

School of Computing and Management Sciences © Sheffield Hallam University Partitions Depending upon the type of data being kept rows may be divided into a series of key ranges, such as ones based on date. Although a each partition will have the same logical structure as the others, they may have different physical storage properties. Enables the effect of striping the data across several disks, speeding up access to the data by allowing parallel reading to occur. Furthermore, if the historic data is unlikely to be updated, the majority of the partitions in a history table can be stored in a read-only tablespace (good for performance) This approach can also reduce back-up times and volume.

School of Computing and Management Sciences © Sheffield Hallam University Partitions Partitions can be created in a number of ways:  Range partitioning - as for example, by month  Hash partitioning - if there is not likely to be an even spread of data caused by a Range Partition. Hash partitioning works because rows are mapped into partitions based on a hash value of the partitioning key.  List partitioning - unordered and unrelated sets of data to be grouped and organized together very naturally. For example ’Malaysia' and ’Hong Kong' may be values used to create a ”Far East Region" partition

School of Computing and Management Sciences © Sheffield Hallam University Partitions Range based example: CREATE TABLE sales ( invoice_no NUMBER, sale_year INT NOT NULL, sale_month INT NOT NULL, sale_day INT NOT NULL ) PARTITION BY RANGE ( sale_year, sale_month, sale_day) ( PARTITION sales_q1 VALUES LESS THAN ( 1999, 04, 01 ) TABLESPACE tsa, PARTITION sales_q2 VALUES LESS THAN ( 1999, 07, 01 ) TABLESPACE tsb, PARTITION sales_q3 VALUES LESS THAN ( 1999, 10, 01 ) TABLESPACE tsc, PARTITION sales q4 VALUES LESS THAN ( 2000, 01, 01 ) TABLESPACE tsd )

School of Computing and Management Sciences © Sheffield Hallam University Constraints Help enforce business rules and as such are essential: –CHECK(), References,Unique, Not NULL There is however, a processing overhead –“In general, the cost of including an integrity constraint is, at most, the same as executing a SQL statement that evaluates the constraint”. (Oracle Manual) MySQL does not support RI, claiming it to be too performance costly. –After all, foreign keys shouldn’t be needed if developers did their job properly! Oracle does not automatically create an index but you MUST!

School of Computing and Management Sciences © Sheffield Hallam University Finding Data –In a list of 700 composers, how do we find Berlioz? –The row with.

Similar presentations

Presentation on theme: "School of Computing and Management Sciences © Sheffield Hallam University Finding Data –In a list of 700 composers, how do we find Berlioz? –The row with."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

School of Computing and Management Sciences © Sheffield Hallam University Finding Data –In a list of 700 composers, how do we find Berlioz? –The row with.

Similar presentations

Presentation on theme: "School of Computing and Management Sciences © Sheffield Hallam University Finding Data –In a list of 700 composers, how do we find Berlioz? –The row with."— Presentation transcript:

Similar presentations

About project

Feedback