©Silberschatz, Korth and SudarshanDatabase System Concepts 21.1 Chapter 21:

©Silberschatz, Korth and SudarshanDatabase System Concepts 21.2 Índices e Queries

©Silberschatz, Korth and Sudarshan21.3Database System Concepts 3 Índices Indices  Sequenciais, Hash,  Únicos e não únicos  primários  Simples e compostos  Vantagens e desvantagens dos indices Declaração em SQL:  CREATE UNIQUE INDEX nome ON tabela(a1,....,an)

©Silberschatz, Korth and Sudarshan21.4Database System Concepts 4 Índices e Queries Select A1,..,An from T where Ai = x  O índices tipo hash são melhores nestes casos Select A1,..,An from T where Ai y O índices tipo sequenciais são melhores nestes casos. Select A1,..,An from T where Ai = x AND Aj = y Podemos usar índices simples em Ai e Aj Podemos usar índices compostos (Ai, Aj)

©Silberschatz, Korth and Sudarshan21.5Database System Concepts 5 Basic Concepts Indexing mechanisms used to speed up access to desired data.  E.g., author catalog in library Search Key - attribute to set of attributes used to look up records in a file. An index file consists of records (called index entries) of the form Index files are typically much smaller than the original file Two basic kinds of indices:  Ordered indices: search keys are stored in sorted order  Hash indices: search keys are distributed uniformly across “buckets” using a “hash function”. search-key pointer

©Silberschatz, Korth and Sudarshan21.6Database System Concepts 6 Index Evaluation Metrics Access types supported efficiently. E.g.,  records with a specified value in the attribute  or records with an attribute value falling in a specified range of values. Access time Insertion time Deletion time Space overhead

©Silberschatz, Korth and Sudarshan21.7Database System Concepts 7 Ordered Indices In an ordered index, index entries are stored sorted on the search key value. E.g., author catalog in library. Primary index: in a sequentially ordered file, the index whose search key specifies the sequential order of the file.  Also called clustering index  The search key of a primary index is usually but not necessarily the primary key. Secondary index: an index whose search key specifies an order different from the sequential order of the file. Also called non-clustering index. Index-sequential file: ordered sequential file with a primary index. Indexing techniques evaluated on basis of:

©Silberschatz, Korth and Sudarshan21.8Database System Concepts 8 Hash Indices Hashing can be used not only for file organization, but also for index-structure creation. A hash index organizes the search keys, with their associated record pointers, into a hash file structure. Strictly speaking, hash indices are always secondary indices  if the file itself is organized using hashing, a separate primary hash index on it using the same search-key is unnecessary.  However, we use the term hash index to refer to both secondary index structures and hash organized files.

©Silberschatz, Korth and Sudarshan21.9Database System Concepts 9 Example of Hash Index

©Silberschatz, Korth and Sudarshan21.10Database System Concepts 10 Comparison of Ordered Indexing and Hashing Cost of periodic re-organization Relative frequency of insertions and deletions Is it desirable to optimize average access time at the expense of worst-case access time? Expected type of queries:  Hashing is generally better at retrieving records having a specified value of the key.  If range queries are common, ordered indices are to be preferred

©Silberschatz, Korth and Sudarshan21.11Database System Concepts 11 Index Definition in SQL Create an index create index on ( ) E.g.: create index b-index on branch(branch-name) Use create unique index to indirectly specify and enforce the condition that the search key is a candidate key is a candidate key.  Not really required if SQL unique integrity constraint is supported To drop an index drop index

©Silberschatz, Korth and Sudarshan21.12Database System Concepts 12 Multiple-Key Access Use multiple indices for certain types of queries. Example: select account-number from account where branch-name = “Perryridge” and balance = 1000 Possible strategies for processing query using indices on single attributes: 1.Use index on branch-name to find accounts with balances of $1000; test branch-name = “Perryridge”. 2.Use index on balance to find accounts with balances of $1000; test branch-name = “Perryridge”. 3.Use branch-name index to find pointers to all records pertaining to the Perryridge branch. Similarly use index on balance. Take intersection of both sets of pointers obtained.

©Silberschatz, Korth and Sudarshan21.13Database System Concepts 13 Indices on Multiple Attributes With the where clause where branch-name = “Perryridge” and balance = 1000 the index on the combined search-key will fetch only records that satisfy both conditions. Using separate indices in less efficient — we may fetch many records (or pointers) that satisfy only one of the conditions. Can also efficiently handle where branch-name = “Perryridge” and balance < 1000 But cannot efficiently handle where branch-name < “Perryridge” and balance = 1000 May fetch many records that satisfy the first but not the second condition. Suppose we have an index on combined search-key (branch-name, balance).

©Silberschatz, Korth and SudarshanDatabase System Concepts 21.14 Performance Tuning

©Silberschatz, Korth and Sudarshan21.15Database System Concepts 15 Performance Tuning Adjusting various parameters and design choices to improve system performance for a specific application. Tuning is best done by 1. identifying bottlenecks, and 2. eliminating them. Can tune a database system at 3 levels:  Hardware -- e.g., add disks to speed up I/O, add memory to increase buffer hits, move to a faster processor.  Database system parameters -- e.g., set buffer size to avoid paging of buffer, set checkpointing intervals to limit log size. System may have automatic tuning.  Higher level database design, such as the schema, indices and transactions (more later)

©Silberschatz, Korth and Sudarshan21.16Database System Concepts 16 Desempenho dos Sistemas Como desempenho do Processadores actualmente, porquê que ainda temos problemas com o desempenho? Decomposição do tempo gasto numa transacção óptima: 10% – CPU 30% – Rede 30% – Base de Dados 30% – Idle (caso contrário as queues entram em “trash”) Com transacções distribuídas, o cenário é bastante pior Rede Processador & Memória IO Base de Dados Log Tipicamente, 5 a 20 I/Os por transacção

©Silberschatz, Korth and Sudarshan21.17Database System Concepts 17 Bottlenecks Performance of most systems (at least before they are tuned) usually limited by performance of one or a few components: these are called bottlenecks Transactions request a sequence of services  e.g. CPU, Disk I/O, locks  E.g. 80% of the code may take up 20% of time and 20% of code takes up 80% of time  Worth spending most time on 20% of code that take 80% of time Bottlenecks may be in hardware (e.g. disks are very busy, CPU is idle), or in software Removing one bottleneck often exposes another De-bottlenecking consists of repeatedly finding bottlenecks, and removing them  This is a heuristic

©Silberschatz, Korth and Sudarshan21.18Database System Concepts 18 Identifying Bottlenecks With concurrent transactions, transactions may have to wait for a requested service while other transactions are being served Can model database as a queueing system with a queue for each service  transactions repeatedly do the following  request a service, wait in queue for the service, and get serviced Bottlenecks in a database system typically show up as very high utilizations (and correspondingly, very long queues) of a particular service  E.g. disk vs CPU utilization  100% utilization leads to very long waiting time:  Rule of thumb: design system for about 70% utilization at peak load  utilization over 90% should be avoided

©Silberschatz, Korth and Sudarshan21.20Database System Concepts 20 Tunable Parameters Tuning of hardware Tuning of schema Tuning of indices Tuning of materialized views

©Silberschatz, Korth and Sudarshan21.21Database System Concepts 21 Tuning of Hardware Even well-tuned transactions typically require a few I/O operations  Typical disk supports about 100 random I/O operations per second  Suppose each transaction requires just 2 random I/O operations. Then to support n transactions per second, we need to stripe data across n/50 disks (ignoring skew) Number of I/O operations per transaction can be reduced by keeping more data in memory  If all data is in memory, I/O needed only for writes  Keeping frequently used data in memory reduces disk accesses, reducing number of disks required, but has a memory cost

©Silberschatz, Korth and Sudarshan21.22Database System Concepts 22 Hardware Tuning: Five-Minute Rule Question: which data to keep in memory:  If a page is accessed n times per second, keeping it in memory saves  n * price-per-disk-drive accesses-per-second-per-disk  Cost of keeping page in memory  price-per-MB-of-memory ages-per-MB-of-memory  Break-even point: value of n for which above costs are equal  If accesses are more then saving is greater than cost  Solving above equation with current disk and memory prices leads to: 5-minute rule: if a page that is randomly accessed is used more frequently than once in 5 minutes it should be kept in memory  (by buying sufficient memory!)

©Silberschatz, Korth and Sudarshan21.23Database System Concepts 23 Hardware Tuning: One-Minute Rule For sequentially accessed data, more pages can be read per second. Assuming sequential reads of 1MB of data at a time: 1-minute rule: sequentially accessed data that is accessed once or more in a minute should be kept in memory Prices of disk and memory have changed greatly over the years, but the ratios have not changed much  so rules remain as 5 minute and 1 minute rules, not 1 hour or 1 second rules!

©Silberschatz, Korth and Sudarshan21.24Database System Concepts 24 RAID Levels RAID 0: nonredudant striping (Block striping) RAID 1: mirroed disks (Block striping) RAID 3: Bit interleaved parity RAID 5: Block interleaved distributed parity C C C C P P P P P P

©Silberschatz, Korth and Sudarshan21.25Database System Concepts 25 Hardware Tuning: Choice of RAID Level To use RAID 1 or RAID 5?  Depends on ratio of reads and writes  RAID 5 requires 2 block reads and 2 block writes to write out one data block If an application requires r reads and w writes per second  RAID 1 requires r + 2w I/O operations per second  RAID 5 requires: r + 4w I/O operations per second For reasonably large r and w, this requires lots of disks to handle workload  RAID 5 may require more disks than RAID 1 to handle load!  Apparent saving of number of disks by RAID 5 (by using parity, as opposed to the mirroring done by RAID 1) may be illusory! Thumb rule: RAID 5 is fine when writes are rare and data is very large, but RAID 1 is preferable otherwise  If you need more disks to handle I/O load, just mirror them since disk capacities these days are enormous!

©Silberschatz, Korth and Sudarshan21.26Database System Concepts 26 Tuning the Database Design Schema tuning  Vertically partition relations to isolate the data that is accessed most often -- only fetch needed information. E.g., split account into two, (account-number, branch-name) and (account-number, balance). Branch-name need not be fetched unless required  Improve performance by storing a denormalized relation E.g., store join of account and depositor; branch-name and balance information is repeated for each holder of an account, but join need not be computed repeatedly. Price paid: more space and more work for programmer to keep relation consistent on updates better to use materialized views (more on this later..)  Cluster together on the same disk page records that would match in a frequently required join,  compute join very efficiently when required.

©Silberschatz, Korth and Sudarshan21.27Database System Concepts 27 Tuning the Database Design (Cont.) Index tuning  Create appropriate indices to speed up slow queries/updates  Speed up slow updates by removing excess indices (tradeoff between queries and updates)  Choose type of index (B-tree/hash) appropriate for most frequent types of queries.  Choose which index to make clustered Index tuning wizards look at past history of queries and updates (the workload) and recommend which indices would be best for the workload

©Silberschatz, Korth and SudarshanDatabase System Concepts 21.1 Chapter 21:

Similar presentations

Presentation on theme: "©Silberschatz, Korth and SudarshanDatabase System Concepts 21.1 Chapter 21:"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

©Silberschatz, Korth and SudarshanDatabase System Concepts 21.1 Chapter 21:

Similar presentations

Presentation on theme: "©Silberschatz, Korth and SudarshanDatabase System Concepts 21.1 Chapter 21:"— Presentation transcript:

Similar presentations

About project

Feedback