Select Operation- disk access and Indexing *Some info on slides from Dr. S. Son, U. Va.

Select Operation- disk access and Indexing *Some info on slides from Dr. S. Son, U. Va

Disk access DBs traditionally stored on disk Cheaper to store on disk than in memory Costs for: –Seek time, latency, data transfer time Disk access is page (block) oriented 2 - 4 KB page size

Access time Access time is the time to randomly access a page System initially determines if page in memory buffer (page tables, etc.) Large disparity between disk access and memory access

Select operation using table scan If read the entire table for a select – table scan Improvements to table scan of disk: –Parallel access –Sequential prefetch

Parallel access Linear search - all data rows read in from disk – I/O parallelism can be used (Raid) multiple I/O read requests satisfied at the same time stripe the data across different disks –Problems with parallelism? must balance disk arm load to gain maximum parallelism requires the same total number of random I/O's, but using devices for a shorter time

Sequential prefetch I/O Retrieve one disk page after another (on same track) – (32 in DB2, varies in Oracle) Seek time no longer a problem Must know in advance to read 32 successive pages Speed up of I/O by a factor of ≈10 (500 I/O's per second vs. 70)

Access time Seek time –as low as 4 ms server Latency time –as low as 1 ms or less Data transfer time –.4-2 ms Solid state disks up to 100,000 I/Os per sec. – still expensive

Access time for fast I/O RIO Seq. Prefetch.004.004 Seek - disk arm to cylinder.001.001 Latency - platter to sector.0005.016 Data transfer - Page.0055.021 1 page vs. 32 pages.176* seconds.021 seconds 32 pages for both *.0055X32=.176 for 32 pages of RIO vs.021 for 32 pages of Seq. Prefetch

Organizing disk space How to store data so minimize access time if read the entire table?

Disk allocation Disk Resource Allocation for Databases (DBA has control) Goal – contiguous sectors on disk - want data as close together as possible to minimize seek time No standard SQL approach, but general way to deal with allocation Some OS allow specification of size of file and disk device

Types of Files Heap files (unordered – sequential) Sorted files (ordered – sort key) Hash files (hash key, hash function) B+-trees Storage Area Networks SAN – ERP (enterprise resource planning) and DW (data warehouses) –Storage devices configured as nodes in network – can attach/detach

Tablespace Tablespace is: Allocation medium for tables and indexes for ORACLE, DB2, etc. Can put >1 table in a table space if accessed together Tablespace corresponds to 1 or more OS files and can span disk devices Usually relations cannot span disk devices

DB storage structures DBCompany Database Table- tspace 1 system space OS files fname1 fname2 fname3 Tables Empl Dept Proj Dep EmpIndx Segments data data data data index Extents

Tablespace ORACLE DB's contain several tablespaces, including one called system - data description + indexes + user-defined tables default tablespace given to each user if multiple tablespaces - better control over load balancing can take some disk space off-line

Extent Relation composed of 1 or more extents Extent - contiguous storage on disk when data segment or index segment first created, given an initial extent from tablespace 10KB (5 pages) if need more space given next contiguous extent

DB storage structures DBCompany Database Table- tspace 1 system space OS files fname1 fname2 fname3 Tables Empl Dept Proj Dep EmpIndx Segments data data data data index Extents

Extent Can increase the size by a positive % (cannot decrease) – initial n - size of initial extent – next n - size of next – max extents - maximum number of extents – min extents - number of extents initially allocated – pct increase n - % by which next extent grows over previous one

Oracle create tablespace http://www.adp-gmbh.ch/ora/sql/create_tablespace.html

Create table Create table statement - can specify tablespace, no. of extentsCreate table statement –When initial extent full, new extent allocated –pctfree - determine how much space in a page can be used for inserts of new rows if pctfree =10%, inserts stop when page is 90% full »Uses another page –pctused – determines when new inserts start again if fall below certain percentage of total, default pctused = 40% pctfree + pctused < 100

Rows Row layout on each disk page 1 2 3… N Row N Row N-1 … Row 1 Header info Row directory free space data rows Header - Row directory – row number and page byte offset –Row number is row number in page – also called slot# Page byte offset – with varchar, row size not constant To identify a particular row use RID (RowID) – page #, slot # [file#] slot# is number in row directory (logical #)

Differences in DBMSs re: rows ROWID can be retrieved in ORACLE but not DB2 (violates relational model rule) ORACLE rows can be split between pages (row record fragmentation) Can have rows from multiple tables on same page, more info DB2, no splitting, entire row moved to new page, need forwarding pointer

Select operation using Indexes Alternative to table scan

24 Why use an index? If use a select (or join) on the same attribute frequently want a way to improve performance - use indexes –For example: Select from Employee where ssn = 333445555

B+-tree Most commonly used index structure type in DBs today Based on B-tree Good for equality and range searches B+ tree: dynamic, adjusts gracefully under inserts and deletes. Used to minimize disk I/O available in DB2, ORACLE also has hash cluster, Ingres has heap structure, B-tree, isam (chain together new nodes)

Structure of B+ Trees leaf level pointers to data (RIDs) the remaining are directory (index) nodes that point to other index nodes Fig.Fig. Index Entries Data Entries ("Sequence set") (Direct search)

Example of B+Tree 10 20 40 123123 10 12 20 35 40 42 50 Points to data

Characteristics of B+ Tree Order of tree (fan out) – max number of child nodes Minimum 50% occupancy (except for root). Each node contains d/2 <= m <= d-1 entries. –Where the parameter d is the order of the tree. Insert/delete at log F N cost; keep tree height- balanced. (F = fanout, N = # leaf pages) Supports equality and range-searches efficiently

Cost of I/O for B+-tree One index node is one page If tree with depth of 3, 3 I/Os to get pointer to data Read in index node can remain in memory –likely since frequent access to upper -level nodes of actively used B+-trees

B+ Trees in Practice Typical order: between 100-200 children Typical fill-factor: 2/3 full (66.6%) –average fanout = 133 (if 200 children) Typical capacities: – Height 4: 133 4 = 312,900,700 records – Height 3: 133 3 = 2,352,637 records Can often hold top levels in buffer pool: – Level 1 = 1 page = 8 Kbytes – Level 2 = 133 pages = 1 Mbyte – Level 3 = 17,689 pages = 133 MBytes

Why B+-tree Directory structure - retrieve range of values efficiently –search for leftmost index entry S i such that X <= S i Index entries always in sequence by value - can use sequential prefetch on index Index entries shorter than data rows - less I/O

B+-tree Balancing of B+-trees - insert, delete Nodes usually not full Utilities to reorganize to lower disk I/O Most systems allow nodes to become depopulated- no automatic algorithm to balance Average node below root level 71% full in active growing B+-trees

Duplicate key values Duplicate key values in index leaf nodes have sibling pointers but a delete of a row that has a heavily duplicated key entails a long search through the leaf-level of the B+-tree Index compression - with multiple duplicates | header info | PrX keyval RID RID... RID | PrX keyval RID…RID| where PrX is count of RID values

Create Index Options: multiple columns tablespace storage - initial extents, etc. percent free default = 10 % of each page left unfilled (creation) free page (1 free page for every n index pages during creation)

35 Types of indexes (textbook) Primary index - key field is a candidate key (must be unique) – data file ordered by key field Clustering index - key field is not unique, data file is ordered – all records with same values on same pages Secondary index - non-clustering index – data file not ordered –First record in the data page (or block) is called the anchor record Non-dense index - pointer in index entry points to anchor Dense index - pointer to every record in the file

Clustering Efficiency advantage read in a page, get all of the rows with the same value clustering is useful for range queries e.g. between keyval1 and keyval2

Clustering Can only cluster table by 1 clustering index at a time In SQL server –creates clustered index on PK automatically if no other clustered index on table and PK nonclustered index not specified In DB2 – –if the table is empty, rows sorted as placed on disk –subsequent insertions not clustered, must use REORG In Oracle- –Cluster index – now available for PK in 10g –Define a cluster to create cluster index for 2 tables

Please help me to remember to TURN OFF THE PROJECTOR!!

Indexes vs. table scan To illustrate the difference between table scan, secondary index (non clustered) and clustered index Assume 10 M customers, 200 cities 2KB/page, row = 100 bytes, 20 rows/page Select * From Customers Where city = Birmingham 1/200 * 10M if assume selectivity = 1/200 50,000 customers in a city

Rules of Thumb for I/O Assume slightly slower times than before: –Random I/O – 160 pages/second,.00625 –Sequential prefetch I/O – 1600 pages/second,.000625 Will discuss later: –List prefetch I/O – 400 pages/second,.0025

Table Scan Table Scan - read entire table If used an random I/O (RIO) – WHICH ONE WOULD NEVER DO 10,000,000/20 = 500,000 pages 500,000*RIO = 3125 Instead, it makes more sense to use: sequential prefetch (SP) read 32 pages at a time 500,000*SP = 312

Clustering Index Clustering Index – All entries for B'ham clustered on same pages 50,000/20 = 2500 data pages (with 20 rows per page) Assume 3 upper nodes of the tree Assume 1000 index entries per leaf node, read 50000/1000 = 50 index pages 3 + 50000/1000 + 50,000/20 = number of pages to access If top 3 levels of tree in memory, count access time as 0 Access time: (3*0) + (50*SP) + (2500*SP) = 2,550 *.000625 = 1.6

Secondary Index In the worst case 1 entry for B'ham per page 50,000 data pages pages (10M/200) 3 + 50 + 50,000 = 50, 053 number of accesses (3*0)+(50*SP) + (50,000*RIO)=312.5 access time REALLY slow – see next slide for a better solution! Use List Prefetch instead of RIO

List Prefetch – Better solution Create list of data pages to access Pages not necessarily in contiguous sequential order System orders pages to minimize disk I/O E.g. elevator algorithm for disk request scheduling Using list prefetch (LP) 0+(50*SP)+50,000*LP=125.03 access time

% Free Redo the previous calculations assuming relations created with 50% free option specified.

Creating Indexes When determining what indexes to create consider: –workload - mix of queries and frequencies of requests 20% of requests are updates, etc. –can create lots of indexes but: cost to create insertions initial load time high if a large table index entries can become longer and longer as multiple columns included

Multiple Indexes More than one index on a relation –e.g. age – one index, class - one index, gender - one index

Composite Index One index based on more than one attribute Create Index index_name on Table (col1, col2,... coln) Composite index entry - values for each attribute age, class, gender entry in index is: C1, C2, C3, RID

Using Indexes System must decide if to use index What if more than one index, which one? What if composite index?

Plans using Indexes Can use an index if index matches select condition in where clause: 1.A matching index scan - only have to access a limited number of contiguous leaf entries to access data 2.Predicate screening with matching index scan – index entries to eliminate RIDs 3.Non-matching index scan – use index to identify RIDs 4.Index-only retrieval – don’t access data, RIDs only 5.Multiple index retrieval – use >1 index to identify RIDs

Matching index scan Definition of a matching index scan - Only have to access contiguous leaf nodes 1)Single where clause and index matches Create index Idx1 on T1 ( C1) Select * from T1 where C1=10 search B+-tree to leaf level for leftmost entry having specified values useful for =, between

Matching Index Scan 2)If multiple where clauses and all '=' Select * from T1 where C1=10 and C2=5 i) if there is a composite index and select columns match all index columns, e.g. Create index Idx2 on T1 ( C1, C2) only have to read contiguous leaf pages ii) if there is a separate index for each clause, e.g. Create index idx3 on T1(C1); Create index idx4 on T1(C2); must choose one or more of the indexes (later)

Matching Index Scan - Rules A matching scan can be used ONLY IF one of the columns in select is the first column of index Decide how many attributes to match in a composite index after the first column, so can read in a small contiguous range of leaf entries in B+-tree to get RIDs Match first column of composite index then: –look at index columns from left to right –Match ends when no predicate found –If range (<=, like, between) for a column, match terminates thereafter easier to scan all entries for range – process rest of entries using predicate screening

Matching Index Scan with Predicate screening 1) If select conditions match some index columns of composite index Create index idx6 on T1(C1, C2, C3, C4); Select * from T1 where C1=10 and C2=3 and C4=20 Access contiguous leaf pages, but not all results on contiguous leaf pages Must examine index entries to determine if in the result - - called predicate screening

Matching Index Scan with Predicate screening Another example: 2) If all select conditions match composite index columns and some selects are a range Create index idx7 on T1(C1, C2, C3); Select * from T1 where C1=10 and C2 between 1 and 5 and C3 =‘F’

Advantages to Predicate screening discard RIDs based on values (for index) will access fewer tuples because RIDs used to eliminate potential tuples

Non-matching index scan Not always used by DBMSs attributes in where clause don't include initial attribute of index Create index idx3 on T1(C1, C2, C3); Select * from T1 where C2=2 and C3=‘M’ Search leaf entries of index and compare values for entries must read in all index leaf pages to find C2, C3 value (so why do it?) –50 index pages vs 500,000 data pages

Index only retrieval Elements retrieved in select clause are attributes of compose index Don't need to access rows (actual data) Create index idx5 on T1(C1, C3); Select C1, C3 from T1 where C1=5 and C3 between 2 and 5 Select sum(C3) from T1

Multiple Index Access If conjunctive conditions & in where clause, can use >1 index –Extract RIDs from each index satisfying matching predicate – Intersect lists of RIDs (and them) from each index – Final list - satisfies all predicates indexed If disjunctive conditions (or) –Union the two lists of RIDs

Some Query optimizer rules for using RID-lists (then use list prefetch) 1. predicted active resulting RIDs must not be > 50% of RID pool 2. Limit to any single RID list the size of the RID memory pool (16M RIDs) 3. RID list cannot be generated by screening predicates

Rules for multiple index Access Optimizer determines diminishing returns using multiple index access 1. List indexes with matching predicates in where clause 2. Place indexes in order by increasing filter factor 3. For successive indexes, extract RID list only if reduced cost for final row returned e.g. no sense reading 100's of pages of a new index to get number of rows to only 1 tuple

Example: Using RID lists with Multiple Indexes Prospects Table : 50M rows - 10 rows per page Pages in table: 5,000,000 There are 4 Indexes: age – 50 values (1000 entries per page) zipcode – 100,000 values (100 entries per page) hobby – 100 values (1000 entries per page) incomeclass – 10 values (1000 entries per page)

Problem cont’d Select name, straddr from prospects where zipcode between 02159 and 02658 and age = 40 and hobby = ‘chess’ and incomeclass = 10; Compute FF : Make sure in ascending order FF(zipcode) = 500/100,000 = 1/200 FF(hobby) = 1/100 FF(age) = 1/50 FF(incomeclass) = 1/10

Problem cont’d Data rows read if use indexes: (1) 50,000,000/200 = 250,000 (1,2) 250,000/100 = 2500 (1,2,3) 2500/50 = 50 (1,2,3,4) 50/10 = 5 How much time will this take? Is it cost effective to use all of these indexes?

Problem cont’d I/O costs Cost: –Random IO: RIO= 1/160 =.00625 –Sequential Prefetch: SP = 1/1600 =.000625 –List Prefetch: LP = 1/400 =.0025 Note: –Some textbooks assume if read <= 3 pages use RIO –They also assume non-leaf nodes RIO, we assume in memory so it takes 0 disk access time

Problem cont’d Table scan: 50M/10 per page * SP Total time: 5,000,000 * 0.000625 = 3125 Using index 1: (100 entries per page) data: 50M*FF*LP 250,000 * 0.0025 = 625 index: non-leaf pages+(#leaf entries*FF*entries per page))*SP (3*0) + (50,000,000/200/100) * 0.000625 = 1.56 Total time: 1.56 + 625 = 626.56

Problem cont’d Using indexes 1&2: data: 250,000/100 * LP 2500 * 0.0025 = 6.25 index 2: (1000 entries per page) (3*0) + (50,000,000/100/1000)* 0.000625 = 0.3125 To use both indexes: 1.56 + 0.3125 = 1.8725 Total time: 1.8725 + 6.25 = 8.1225

Problem cont’d Using indexes 1,2,3: data: 50 * 0.0025 = 0.125 index 3: (1000 entries per page) (3*0) + (50,000,000/50/1000) *.000625=.625 To use 3 indexes: 1.56 + 0.3125 + 0.625 = 2.4975 Total time: 2.4975 + 0.125 = 2.6225 Using indexes 1,2,3,4: data: 5 * 0.0025 = 0.0125 index 4: (1000 entries per page) (3*0)+ (50,000,000/10/1000)*.000625 = 3.125 To use 4 indexes: 1.56+0.3125+0.625+3.125=5.6225 Total time: 5.6225 + 0.0125 = 5.635

Problem cont’d Index used Data rows I/O cost Index I/O cost Trade off if use index None50M 3125 sec 1250,000 625 sec 1.56 secDecrease 3125 to 625 sec With 1.56 additional sec 1,22500 6.25 sec 1.56 + 0.3125 secDecrease 625 to 6.25 sec With 0.3125 additional sec 1,2,350 0.125 sec 1.56 + 0.3125 + 0.625 sec Decrease 6.25 to 0.125 sec With 0.625 additional sec 1,2,3,45 0.0125 sec 1.56 + 0.3125 + 0.625 + 3.125 sec Decrease 0.125 to 0.0125 sec With 3.125 additional sec

Indexes and Information Retrieval Some information on slides taken from CS245 – Stanford Univ.

Query: Get employees in (Toy Dept) ^ (2nd floor) Dept. indexEMP Floor index Toy 2nd  Intersect toy RIDs and 2nd Floor RIDs to get set of matching EMP’s

This idea used in text information retrieval Documents...the cat is fat......was raining cats and dogs......Fido the dog...

This idea used in text information retrieval Documents...the cat is fat......was raining cats and dogs......Fido the dog... Inverted lists cat dog

IR QUERIES Find articles with “cat” and “dog” Find articles with “cat” or “dog” Find articles with “cat” and not “dog”

IR QUERIES Find articles with “cat” and “dog” Find articles with “cat” or “dog” Find articles with “cat” and not “dog” Find articles with “cat” in title Find articles with “cat” and “dog” within 5 words

IR – Web search problems –Crawling and indexing share similar characteristics and requirements –Both are offline problems, no need for real-time –Tolerable for a few minutes delay before content searchable –OK to run smaller-scale index updates frequently –Querying online problem –Demands sub-second response time –Low latency high throughput –Loads can very greatly

Architecture of IR Systems Documents Query Hits Representation Function Representation Function Query RepresentationDocument Representation Comparison Function Index offlineonline

How do we represent text? “Bag of words” –Treat all the words in a document as index terms for that document –Assign a “weight” to each term based on “importance” –Disregard order, structure, meaning, etc. of the words –Simple, yet effective! Assumptions –Term occurrence is independent –Document relevance is independent –“Words” are well-defined

Stop Word List Words filtered out Common words Match on common word not as useful as match on rare words... Not one definite listlist

Representing Documents The quick brown fox jumped over the lazy dog’s back. Document 1 Document 2 Now is the time for all good men to come to the aid of their party. the is for to of quick brown fox over lazy dog back now time all good men come jump aid their party 0 0 1 1 0 1 1 0 1 1 0 0 1 0 1 0 0 1 1 0 0 1 0 0 1 0 0 1 1 0 1 0 1 1 Term Document 1Document 2 Stopword List

Inverted Index Inverted indexing is fundamental to all IR models Consists of postings lists, one with each term in the collection Posting list – document id and payload –Payload can be term frequency or number of times occurs on document, position of occurrence, properties, etc. –Can be ordered by document id, page rank, etc. –Data structure necessary to map from document id to e.g. URL

Inverted Index quick brown fox over lazy dog back now time all good men come jump aid their party 48 246 137 1357 2468 35 357 2468 3 1357 13578 248 268 157 246 13 68 Term Postings

CS 245Notes 485 Posting: an entry in inverted list. Represents occurrence of term in article Size of a list:1Rare words or (in postings) miss-spellings 10 6 Common words Size of a posting: 10-15 bits (compressed)

Process query Given a query, fetch posting lists associated with query, traverse postings to compute result set Query document scores must be computed Partial scores stored in accumulators Top k documents extracted Optimization strategies to reduce # postings must examine

Indexing: Performance Analysis The indexing problem –Must be relatively fast, but need not be real time –For Web, incremental updates are important How large is the inverted index? –Size of vocabulary –Size of postings Fundamentally, a large sorting problem –Terms usually fit in memory –Postings usually don’t

Index Size of index depends on payload Well-optimized inverted index can be 1/10 of size of original document collection If store position info, could be several times larger Usually can hold entire vocabulary in memory (using front-coding) Postings lists usually too large to store in memory Query evaluation involves random disk access and decoding postings –Try to minimize random seeks

Select Operation- disk access and Indexing *Some info on slides from Dr. S. Son, U. Va.

Similar presentations

Presentation on theme: "Select Operation- disk access and Indexing *Some info on slides from Dr. S. Son, U. Va."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Select Operation- disk access and Indexing *Some info on slides from Dr. S. Son, U. Va.

Similar presentations

Presentation on theme: "Select Operation- disk access and Indexing *Some info on slides from Dr. S. Son, U. Va."— Presentation transcript:

Similar presentations

About project

Feedback