Presentation is loading. Please wait.

Presentation is loading. Please wait.

©Silberschatz, Korth and SudarshanDatabase System Concepts 21.1 Chapter 21:

Similar presentations


Presentation on theme: "©Silberschatz, Korth and SudarshanDatabase System Concepts 21.1 Chapter 21:"— Presentation transcript:

1 ©Silberschatz, Korth and SudarshanDatabase System Concepts 21.1 Chapter 21:

2 ©Silberschatz, Korth and SudarshanDatabase System Concepts 21.2 Índices e Queries

3 ©Silberschatz, Korth and Sudarshan21.3Database System Concepts 3 Índices Indices  Sequenciais, Hash,  Únicos e não únicos  primários  Simples e compostos  Vantagens e desvantagens dos indices Declaração em SQL:  CREATE UNIQUE INDEX nome ON tabela(a1,....,an)

4 ©Silberschatz, Korth and Sudarshan21.4Database System Concepts 4 Índices e Queries Select A1,..,An from T where Ai = x  O índices tipo hash são melhores nestes casos Select A1,..,An from T where Ai y O índices tipo sequenciais são melhores nestes casos. Select A1,..,An from T where Ai = x AND Aj = y Podemos usar índices simples em Ai e Aj Podemos usar índices compostos (Ai, Aj)

5 ©Silberschatz, Korth and Sudarshan21.5Database System Concepts 5 Basic Concepts Indexing mechanisms used to speed up access to desired data.  E.g., author catalog in library Search Key - attribute to set of attributes used to look up records in a file. An index file consists of records (called index entries) of the form Index files are typically much smaller than the original file Two basic kinds of indices:  Ordered indices: search keys are stored in sorted order  Hash indices: search keys are distributed uniformly across “buckets” using a “hash function”. search-key pointer

6 ©Silberschatz, Korth and Sudarshan21.6Database System Concepts 6 Index Evaluation Metrics Access types supported efficiently. E.g.,  records with a specified value in the attribute  or records with an attribute value falling in a specified range of values. Access time Insertion time Deletion time Space overhead

7 ©Silberschatz, Korth and Sudarshan21.7Database System Concepts 7 Ordered Indices In an ordered index, index entries are stored sorted on the search key value. E.g., author catalog in library. Primary index: in a sequentially ordered file, the index whose search key specifies the sequential order of the file.  Also called clustering index  The search key of a primary index is usually but not necessarily the primary key. Secondary index: an index whose search key specifies an order different from the sequential order of the file. Also called non-clustering index. Index-sequential file: ordered sequential file with a primary index. Indexing techniques evaluated on basis of:

8 ©Silberschatz, Korth and Sudarshan21.8Database System Concepts 8 Hash Indices Hashing can be used not only for file organization, but also for index-structure creation. A hash index organizes the search keys, with their associated record pointers, into a hash file structure. Strictly speaking, hash indices are always secondary indices  if the file itself is organized using hashing, a separate primary hash index on it using the same search-key is unnecessary.  However, we use the term hash index to refer to both secondary index structures and hash organized files.

9 ©Silberschatz, Korth and Sudarshan21.9Database System Concepts 9 Example of Hash Index

10 ©Silberschatz, Korth and Sudarshan21.10Database System Concepts 10 Comparison of Ordered Indexing and Hashing Cost of periodic re-organization Relative frequency of insertions and deletions Is it desirable to optimize average access time at the expense of worst-case access time? Expected type of queries:  Hashing is generally better at retrieving records having a specified value of the key.  If range queries are common, ordered indices are to be preferred

11 ©Silberschatz, Korth and Sudarshan21.11Database System Concepts 11 Index Definition in SQL Create an index create index on ( ) E.g.: create index b-index on branch(branch-name) Use create unique index to indirectly specify and enforce the condition that the search key is a candidate key is a candidate key.  Not really required if SQL unique integrity constraint is supported To drop an index drop index

12 ©Silberschatz, Korth and Sudarshan21.12Database System Concepts 12 Multiple-Key Access Use multiple indices for certain types of queries. Example: select account-number from account where branch-name = “Perryridge” and balance = 1000 Possible strategies for processing query using indices on single attributes: 1.Use index on branch-name to find accounts with balances of $1000; test branch-name = “Perryridge”. 2.Use index on balance to find accounts with balances of $1000; test branch-name = “Perryridge”. 3.Use branch-name index to find pointers to all records pertaining to the Perryridge branch. Similarly use index on balance. Take intersection of both sets of pointers obtained.

13 ©Silberschatz, Korth and Sudarshan21.13Database System Concepts 13 Indices on Multiple Attributes With the where clause where branch-name = “Perryridge” and balance = 1000 the index on the combined search-key will fetch only records that satisfy both conditions. Using separate indices in less efficient — we may fetch many records (or pointers) that satisfy only one of the conditions. Can also efficiently handle where branch-name = “Perryridge” and balance < 1000 But cannot efficiently handle where branch-name < “Perryridge” and balance = 1000 May fetch many records that satisfy the first but not the second condition. Suppose we have an index on combined search-key (branch-name, balance).

14 ©Silberschatz, Korth and SudarshanDatabase System Concepts 21.14 Performance Tuning

15 ©Silberschatz, Korth and Sudarshan21.15Database System Concepts 15 Performance Tuning Adjusting various parameters and design choices to improve system performance for a specific application. Tuning is best done by 1. identifying bottlenecks, and 2. eliminating them. Can tune a database system at 3 levels:  Hardware -- e.g., add disks to speed up I/O, add memory to increase buffer hits, move to a faster processor.  Database system parameters -- e.g., set buffer size to avoid paging of buffer, set checkpointing intervals to limit log size. System may have automatic tuning.  Higher level database design, such as the schema, indices and transactions (more later)

16 ©Silberschatz, Korth and Sudarshan21.16Database System Concepts 16 Desempenho dos Sistemas Como desempenho do Processadores actualmente, porquê que ainda temos problemas com o desempenho? Decomposição do tempo gasto numa transacção óptima: 10% – CPU 30% – Rede 30% – Base de Dados 30% – Idle (caso contrário as queues entram em “trash”) Com transacções distribuídas, o cenário é bastante pior Rede Processador & Memória IO Base de Dados Log Tipicamente, 5 a 20 I/Os por transacção

17 ©Silberschatz, Korth and Sudarshan21.17Database System Concepts 17 Bottlenecks Performance of most systems (at least before they are tuned) usually limited by performance of one or a few components: these are called bottlenecks Transactions request a sequence of services  e.g. CPU, Disk I/O, locks  E.g. 80% of the code may take up 20% of time and 20% of code takes up 80% of time  Worth spending most time on 20% of code that take 80% of time Bottlenecks may be in hardware (e.g. disks are very busy, CPU is idle), or in software Removing one bottleneck often exposes another De-bottlenecking consists of repeatedly finding bottlenecks, and removing them  This is a heuristic

18 ©Silberschatz, Korth and Sudarshan21.18Database System Concepts 18 Identifying Bottlenecks With concurrent transactions, transactions may have to wait for a requested service while other transactions are being served Can model database as a queueing system with a queue for each service  transactions repeatedly do the following  request a service, wait in queue for the service, and get serviced Bottlenecks in a database system typically show up as very high utilizations (and correspondingly, very long queues) of a particular service  E.g. disk vs CPU utilization  100% utilization leads to very long waiting time:  Rule of thumb: design system for about 70% utilization at peak load  utilization over 90% should be avoided

19 ©Silberschatz, Korth and Sudarshan21.19Database System Concepts 19 Queues In A Database System

20 ©Silberschatz, Korth and Sudarshan21.20Database System Concepts 20 Tunable Parameters Tuning of hardware Tuning of schema Tuning of indices Tuning of materialized views

21 ©Silberschatz, Korth and Sudarshan21.21Database System Concepts 21 Tuning of Hardware Even well-tuned transactions typically require a few I/O operations  Typical disk supports about 100 random I/O operations per second  Suppose each transaction requires just 2 random I/O operations. Then to support n transactions per second, we need to stripe data across n/50 disks (ignoring skew) Number of I/O operations per transaction can be reduced by keeping more data in memory  If all data is in memory, I/O needed only for writes  Keeping frequently used data in memory reduces disk accesses, reducing number of disks required, but has a memory cost

22 ©Silberschatz, Korth and Sudarshan21.22Database System Concepts 22 Hardware Tuning: Five-Minute Rule Question: which data to keep in memory:  If a page is accessed n times per second, keeping it in memory saves  n * price-per-disk-drive accesses-per-second-per-disk  Cost of keeping page in memory  price-per-MB-of-memory ages-per-MB-of-memory  Break-even point: value of n for which above costs are equal  If accesses are more then saving is greater than cost  Solving above equation with current disk and memory prices leads to: 5-minute rule: if a page that is randomly accessed is used more frequently than once in 5 minutes it should be kept in memory  (by buying sufficient memory!)

23 ©Silberschatz, Korth and Sudarshan21.23Database System Concepts 23 Hardware Tuning: One-Minute Rule For sequentially accessed data, more pages can be read per second. Assuming sequential reads of 1MB of data at a time: 1-minute rule: sequentially accessed data that is accessed once or more in a minute should be kept in memory Prices of disk and memory have changed greatly over the years, but the ratios have not changed much  so rules remain as 5 minute and 1 minute rules, not 1 hour or 1 second rules!

24 ©Silberschatz, Korth and Sudarshan21.24Database System Concepts 24 RAID Levels RAID 0: nonredudant striping (Block striping) RAID 1: mirroed disks (Block striping) RAID 3: Bit interleaved parity RAID 5: Block interleaved distributed parity C C C C P P P P P P

25 ©Silberschatz, Korth and Sudarshan21.25Database System Concepts 25 Hardware Tuning: Choice of RAID Level To use RAID 1 or RAID 5?  Depends on ratio of reads and writes  RAID 5 requires 2 block reads and 2 block writes to write out one data block If an application requires r reads and w writes per second  RAID 1 requires r + 2w I/O operations per second  RAID 5 requires: r + 4w I/O operations per second For reasonably large r and w, this requires lots of disks to handle workload  RAID 5 may require more disks than RAID 1 to handle load!  Apparent saving of number of disks by RAID 5 (by using parity, as opposed to the mirroring done by RAID 1) may be illusory! Thumb rule: RAID 5 is fine when writes are rare and data is very large, but RAID 1 is preferable otherwise  If you need more disks to handle I/O load, just mirror them since disk capacities these days are enormous!

26 ©Silberschatz, Korth and Sudarshan21.26Database System Concepts 26 Tuning the Database Design Schema tuning  Vertically partition relations to isolate the data that is accessed most often -- only fetch needed information. E.g., split account into two, (account-number, branch-name) and (account-number, balance). Branch-name need not be fetched unless required  Improve performance by storing a denormalized relation E.g., store join of account and depositor; branch-name and balance information is repeated for each holder of an account, but join need not be computed repeatedly. Price paid: more space and more work for programmer to keep relation consistent on updates better to use materialized views (more on this later..)  Cluster together on the same disk page records that would match in a frequently required join,  compute join very efficiently when required.

27 ©Silberschatz, Korth and Sudarshan21.27Database System Concepts 27 Tuning the Database Design (Cont.) Index tuning  Create appropriate indices to speed up slow queries/updates  Speed up slow updates by removing excess indices (tradeoff between queries and updates)  Choose type of index (B-tree/hash) appropriate for most frequent types of queries.  Choose which index to make clustered Index tuning wizards look at past history of queries and updates (the workload) and recommend which indices would be best for the workload


Download ppt "©Silberschatz, Korth and SudarshanDatabase System Concepts 21.1 Chapter 21:"

Similar presentations


Ads by Google