Presentation is loading. Please wait.

Presentation is loading. Please wait.

Query Evaluation Techniques for Larger Databases**

Similar presentations


Presentation on theme: "Query Evaluation Techniques for Larger Databases**"— Presentation transcript:

1 Query Evaluation Techniques for Larger Databases**
By Goetz Graefe Elaborado por: Edwin Andrés Bernal López Claudia Jeanneth Becerra Cortés Curso: Tópicos Avanzados de Bases de Datos Bogotá, Marzo 23 del 2006 **Portland State University, Computer Science Department, P. O. Box751, Portland, Oregon , Received January 1992, final revision accepted February 1993, Published ACM Computing Surveys, Vol. 25, No 2, June 1993.

2 Lista de Publicaciones de Goetz Graefe http://www. informatik
2004 5 Goetz Graefe, Michael J. Zwilling: Transaction support for indexed views. SIGMOD Conference 2004 58 Goetz Graefe: Write-Optimized B-Trees. VLDB 2004: 57 Conor Cunningham, Goetz Graefe, César A. Galindo-Legaria: PIVOT and UNPIVOT: Optimization and Execution Strategies in an RDBMS. VLDB 2004: 2003 56 Goetz Graefe: Executing Nested Queries. BTW 2003: 58-77 55 Goetz Graefe: Partitioned B-trees - a user's guide. BTW 2003: 54 Goetz Graefe: Sorting And Indexing With Partitioned B-Trees. CIDR 2003 2001 53 Goetz Graefe, Per-Åke Larson: B-Tree Indexes and CPU Caches. ICDE 2001: William O'Connell, Andrew Witkowski, Goetz Graefe: Collaborative Analytical Processing - Dream or Reality? (Panel abstract). VLDB 2001: 613, presented in the framework of the 27th International Conference on Very Large Data Bases VLDB '01 51 Sameet Agarwal, José A. Blakeley, Thomas Casey, Kalen Delaney, César A. Galindo-Legaria, Goetz Graefe, Michael Rys, Michael J. Zwilling: Microsoft SQL Server (Chapter 27) Database System Concepts, 4th Edition. 2001:

3 Lista de Publicaciones de Goetz Graefe http://www. informatik
2000 50 Goetz Graefe: Dynamic Query Evaluation Plans: Some Course Corrections? IEEE Data Eng. Bull. 23(2): 3-6 (2000) 1999 49 EE Goetz Graefe: The Value of Merge-Join and Hash-Join in SQL Server. VLDB 1999: 48 Surajit Chaudhuri, Eric Christensen, Goetz Graefe, Vivek R. Narasayya, Michael J. Zwilling: Self-Tuning Technology in Microsoft SQL Server. IEEE Data Eng. Bull. 22(2): (1999) 1998 47 Goetz Graefe: The New Database Imperatives. ICDE 1998: 69-72 46 Goetz Graefe, Usama M. Fayyad, Surajit Chaudhuri: On the Efficient Gathering of Sufficient Statistics for Classification from Large SQL Databases. KDD 1998: 45 Per-Åke Larson, Goetz Graefe: Memory Management During Run Generation in External Sorting. SIGMOD Conference 1998: 44 Goetz Graefe, Ross Bunker, Shaun Cooper: Hash Joins and Hash Teams in Microsoft SQL Server. VLDB 1998: 86-97 43 Jim Gray, Goetz Graefe: The Five-Minute Rule Ten Years Later, and Other Computer Storage Rules of Thumb CoRR cs.DB/ : (1998)

4 Lista de Publicaciones de Goetz Graefe http://www. informatik
1997 42 EE Jim Gray, Goetz Graefe: The Five-Minute Rule Ten Years Later, and Other Computer Storage Rules of Thumb. SIGMOD Record 26(4): (1997) 1996 41 Goetz Graefe: The Microsoft Relational Engine. ICDE 1996: 40 Goetz Graefe: Iterators, Schedulers, and Distributed-memory Parallelism. Softw., Pract. Exper. 26(4): (1996) 1995 39 Diane L. Davison, Goetz Graefe: Dynamic Resource Brokering for Multi-User Query Execution. SIGMOD Conference 1995: 38 Goetz Graefe, Richard L. Cole: Fast Algorithms for Universal Quantification in Large Databases. ACM Trans. Database Syst. 20(2): (1995) 37 Goetz Graefe: The Cascades Framework for Query Optimization. IEEE Data Eng. Bull. 18(3): (1995) 36 Goetz Graefe: Letter from the Special Issue Editor. IEEE Data Eng. Bull. 18(3): 2 (1995) 35 Patrick E. O'Neil, Goetz Graefe: Multi-Table Joins Through Bitmapped Join Indices. SIGMOD Record 24(3): 8-11 (1995)

5 Lista de Publicaciones de Goetz Graefe http://www. informatik
1994 34 EE Goetz Graefe: Sort-Merge-Join: An Idea Whose Time Has(h) Passed? ICDE 1994: 33 Richard L. Cole, Goetz Graefe: Optimization of Dynamic Query Evaluation Plans. SIGMOD Conference 1994: 32 Diane L. Davison, Goetz Graefe: Memory-Contention Responsive Hash Joins. VLDB 1994: 31 Goetz Graefe: Volcano - An Extensible and Parallel Query Evaluation System. IEEE Trans. Knowl. Data Eng. 6(1): (1994) 30 Goetz Graefe, Ann Linville, Leonard D. Shapiro: Sort versus Hash Revisited. IEEE Trans. Knowl. Data Eng. 6(6): (1994) 1993 29 Goetz Graefe, Richard L. Cole, Diane L. Davison: Dynamic Techniques for Very Complex Database Queries. FMLDO 1993: 28 Richard L. Cole, Goetz Graefe: Dynamic Plan Optimization. FMLDO 1993: 45-58 27 Goetz Graefe, William J. McKenna: The Volcano Optimizer Generator: Extensibility and Efficient Search. ICDE 1993: 26 José A. Blakeley, William J. McKenna, Goetz Graefe: Experiences Building the Open OODB Query Optimizer. SIGMOD Conference 1993: 25 Richard H. Wolniewicz, Goetz Graefe: Algebraic Optimization of Computations over Scientific Databases. VLDB 1993: 13-24

6 Lista de Publicaciones de Goetz Graefe http://www. informatik
24 EE Goetz Graefe: Query Evaluation Techniques for Large Databases. ACM Comput. Surv. 25(2): (1993) 23 Richard H. Wolniewicz, Goetz Graefe: Algebraic Optimization of Computations over Scientific Databases. IEEE Data Eng. Bull. 16(1): (1993) 22 Goetz Graefe: Letter from the Special Issue Editor. IEEE Data Eng. Bull. 16(4): 3 (1993) 21 Goetz Graefe, Diane L. Davison: Encapsulation of Parallelism and Architecture-Independence in Extensible Database Query Execution. IEEE Trans. Software Eng. 19(8): (1993) 20 Goetz Graefe: Options in Physical Database Design. SIGMOD Record 22(3): (1993) 19 David Maier, Lois M. L. Delcambre, Calton Pu, Jonathan Walpole, Goetz Graefe, Leonard D. Shapiro: Database Research at the Data-Intensive Systems Center. SIGMOD Record 22(4): (1993) 1992 18 David Maier, Goetz Graefe, Leonard D. Shapiro, Scott Daniels, Thomas Keller, Bennet Vance: Issues in Distributed Object Assembly. IWDOM 1992: 17 Goetz Graefe, Shreekant S. Thakkar: Tuning a Parallel Database Algorithm on a Shared-memory Multiprocessor. Softw., Pract. Exper. 22(7): (1992)

7 Lista de Publicaciones de Goetz Graefe http://www. informatik
1991 16 EE Thomas Keller, Goetz Graefe, David Maier: Efficient Assembly of Complex Objects. SIGMOD Conference 1991: 15 Michael J. Carey, David J. DeWitt, Daniel Frank, Goetz Graefe, Joel E. Richardson, Eugene J. Shekita, M. Muralikrishna: The Architecture of the EXODUS Extensible DBMS. On Object-Oriented Database System 1991: 14 Goetz Graefe, Richard L. Cole, Diane L. Davison, William J. McKenna, Richard H. Wolniewicz: Extensible Query Optimization and Parallel Execution in Volcano. Query Processing for Advanced Database Systems, Dagstuhl 1991: 13 David Maier, Scott Daniels, Thomas Keller, Bennet Vance, Goetz Graefe, William J. McKenna: Challenges for Query Processing in Object-Oriented Databases. Query Processing for Advanced Database Systems, Dagstuhl 1991: 12 Scott Daniels, Goetz Graefe, Thomas Keller, David Maier, Duri Schmidt, Bennet Vance: Query Optimization in Revelation, an Overview. IEEE Data Eng. Bull. 14(2): (1991) 11 Goetz Graefe: Heap-Filter Merge Join: A New Algorithm For Joining Medium-Size Inputs. IEEE Trans. Software Eng. 17(9): (1991)

8 Lista de Publicaciones de Goetz Graefe http://www. informatik
1990 10 EE Goetz Graefe: Encapsulation of Parallelism in the Volcano Query Processing System. SIGMOD Conference 1990: 1989 9 Goetz Graefe: Relational Division: Four Algorithms and Their Performance. ICDE 1989: 8 Goetz Graefe, Karen Ward: Dynamic Query Evaluation Plans. SIGMOD Conference 1989: 1988 7 Goetz Graefe, David Maier: Query Optimization in Object-Oriented Database Systems: A Prospectus. OODBS 1988: 1987 6 Goetz Graefe, David J. DeWitt: The EXODUS Optimizer Generator. SIGMOD Conference 1987: 5 Goetz Graefe: Rule-Based Query Optimization in Extensible Database Systems Univ. of Wisconsin-Madison 1987

9 Lista de Publicaciones de Goetz Graefe http://www. informatik
1986 4 EE Michael J. Carey, David J. DeWitt, Daniel Frank, Goetz Graefe, M. Muralikrishna, Joel E. Richardson, Eugene J. Shekita: The Architecture of the EXODUS Extensible DBMS. OODBS 1986: 52-65 3 David J. DeWitt, Robert H. Gerber, Goetz Graefe, Michael L. Heytens, Krishna B. Kumar, M. Muralikrishna: GAMMA - A High Performance Dataflow Database Machine. VLDB 1986: 2 Goetz Graefe: Software Modularization with the EXODUS Optimizer Generator. IEEE Database Eng. Bull. 9(4): (1986) 1984 1 Michael J. Carey, David J. DeWitt, Goetz Graefe: Mechanisms for Concurrency Control and Recovery in Prolog - A Proposal. Expert Database Workshop 1984:

10 Estado del Arte en Query Processing/93
Bulletin of the Technical Committee on Data Engineering December, 1993 Vol. 16 No. 4 IEEE Computer Society Special Issue on Query Processing in Commercial Database Systems Letter from the Special Issue Editor Goetz Graefe Query Optimization in the IBM DB2 Family Peter Gassner,and Guy Lohman Query Processing in the IBM Application System Richard L. Cole, Mark J. Anderson Query Processing in NonStop SQL . . A. Chen, Y-F Kao, M. Pong, D. Shak, S. Sharma, J. Vaishnav Query Processing in DEC Rdb: Major Issues and Future Challenges Gennady Antoshenkov Letter from the Editor-in-Chief “… Goetz Graefe, our issue editor, has succeeded in overcoming these difficulties. He has collected four papers from prominent database vendors. These papers introduce us to the inside world of ”real” query processing” Letter from the Special Issue Editor “…Second, in some aspects of query processing, the industrial reality has bypassed academic research. By asking leaders in the industrial field to summarize their work, I hope that this issue is a snapshot of the current state of the art. Undoubtedly, some researchers will find inspirations for new, relevant work of their own in these articles.”

11 Conferencias en VLDB

12 Contribuciones a SQL Server 7

13 Incorporación de “PIVOT” a SQL

14 Lista de Publicaciones de Goetz Graefe http://www. informatik
1993 24 EE Goetz Graefe: Query Evaluation Techniques for Large Databases. ACM Comput. Surv. 25(2): (1993) 23 Richard H. Wolniewicz, Goetz Graefe: Algebraic Optimization of Computations over Scientific Databases. IEEE Data Eng. Bull. 16(1): (1993) 22 Goetz Graefe: Letter from the Special Issue Editor. IEEE Data Eng. Bull. 16(4): 3 (1993) 21 Goetz Graefe, Diane L. Davison: Encapsulation of Parallelism and Architecture-Independence in Extensible Database Query Execution. IEEE Trans. Software Eng. 19(8): (1993) 20 Goetz Graefe: Options in Physical Database Design. SIGMOD Record 22(3): (1993) 19 David Maier, Lois M. L. Delcambre, Calton Pu, Jonathan Walpole, Goetz Graefe, Leonard D. Shapiro: Database Research at the Data-Intensive Systems Center. SIGMOD Record 22(4): (1993) 1992 18 David Maier, Goetz Graefe, Leonard D. Shapiro, Scott Daniels, Thomas Keller, Bennet Vance: Issues in Distributed Object Assembly. IWDOM 1992: 17 Goetz Graefe, Shreekant S. Thakkar: Tuning a Parallel Database Algorithm on a Shared-memory Multiprocessor. Softw., Pract. Exper. 22(7): (1992)

15 Tabla de Contenido del Paper (1a. Pte)
INTRODUCTION ARCHITECTURE OF QUERY EXECUTION ENGINES SORTING AND HASHING 2.1 Sorting 2.2.Hashing 3. DISK ACCESS 3.1 File Scans 3.2 Associative Access Using Indices 3.3. Buffer Management 4. AGGREGATION AND DUPLICATE REMOVAL 4.1 Aggregation Algorithm Based on Nested Loops 4.2 Aggregation Algorithms Based on Sortlng 4.3. Aggregation Algorithms Based on Hashing 4.4. A Rough Performance Comparison 4.5. Additional Remarks on Aggregation 5. BINARY MATCHING OPERATIONS 5.1. Nested-Loops Join Algorithms 5.2. Merge-Join Algorithms 5.3. Hash Join Algorithms 5.4. Pointer-Based Joins 5.5. Rough Performance Comparison 6. UNIVERSAL QUANTIFICATION 7. DUALITY OF SORT- AND HASH-BASED QUERY PROCESSING ALGORITHMS

16 Tabla de Contenido del Paper (2a. Pte)
8. EXECUTION OF COMPLEX QUERY PLANS 9. MECHANISMS FOR PARALLEL QUERY EXECUTION 9.1. Parallel versus Distributed Database Systems 9.2 Forms of Parallelism 9.3. Implementation Strategies 9.4. Load Balancing and Skew 9.5. Architectures and Architecture Independence PARALLEL ALGORITHMS Parallel Selections and Updates Parallel Sorting Parallel Aggregation and Duplicate Removal Parallel Joins and Other Binary Matching Operations Parallel Universal Quantification 11. NON STANDARD QUERY PROCESSING ALGORITHMS Nested Relations Temporal and Scientific Database Management Object-oriented Database Systems More Control Operators 12. ADDITIONAL TECHNIQUES FOR PERFORMANCE IMPROVEMENT Precomputatlon and Derived Data Data Compression Surrogate Processing Bit Vector Filtering Specialized Hardware SUMMARY AND OUTLOOK

17 INTRODUCTION

18 INTRODUCTION

19 INTRODUCTION

20 INTROD.: Query Processing Steps [2]

21 INTROD.: Query Processing Steps [2]

22 1. ARCHITECTURE OF QUERY EXECUTION ENGINES

23 1. ARCHITECTURE OF QUERY EXECUTION ENGINES

24 2. SORTING AND HASHING

25 Access Path Algorithm + data structure used to locate rows satisfying some condition File scan: can be used for any condition Hash: equality search; all search key attributes of hash index are specified in condition B+ tree: equality or range search; a prefix of the search key attributes are specified in condition Binary search: Relation sorted on a sequence of attributes and some prefix of sequence is specified in condition

26 2. ACCESS PATHS

27 Sorting and Hashing The slides for this text are organized into chapters. This lecture covers Chapter 12, providing an overview of query optimization and execution. This chapter is the first of a sequence (Chapters 12, 13, 14, 15) on query evaluation that might be covered in full in a course with a systems emphasis. It can also be used stand-alone, as a self-contained overview of these issues, in a course with an application emphasis. It covers the essential concepts in sufficient detail to support a discussion of physical database design and tuning in Chapter 20. 1

28 General External Merge Sort
To sort a file with N pages using B buffer pages: Pass 0: use B buffer pages. Produce sorted runs of B pages each. Pass 2, …, etc.: merge B-1 runs. 7

29 Cost of External Merge Sort
Number of passes: Cost = 2N * (# of passes) E.g., with 5 buffer pages, to sort 108 page file: Pass 0: = 22 sorted runs of 5 pages each (last run is only 3 pages) Pass 1: = 6 sorted runs of 20 pages each (last run is only 8 pages) Pass 2: 2 sorted runs, 80 pages and 28 pages Pass 3: Sorted file of 108 pages 8

30 Number of Passes of External Sort
9

31 Double Buffering To reduce wait time for I/O request to complete, can prefetch into `shadow block’. Potentially, more passes; in practice, most files still sorted in 2-3 passes. OUTPUT OUTPUT' Disk INPUT 1 INPUT k INPUT 2 INPUT 1' INPUT 2' INPUT k' block size b B main memory buffers, k-way merge

32 Sorting Records! Sorting has become a blood sport!
Parallel sorting is the name of the game ... Datamation: Sort 1M records of size 100 bytes Typical DBMS: 15 minutes World record: 3.5 seconds 12-CPU SGI machine, 96 disks, 2GB of RAM New benchmarks proposed: Minute Sort: How many can you sort in 1 minute? Dollar Sort: How many can you sort for $1.00?

33 Using B+ Trees for Sorting
Scenario: Table to be sorted has B+ tree index on sorting column(s). Idea: Can retrieve records in order by traversing leaf pages. Is this a good idea? Cases to consider: B+ tree is clustered Good idea! B+ tree is not clustered Could be a very bad idea! 15

34 Clustered B+ Tree Used for Sorting
16

35 Unclustered B+ Tree Used for Sorting
17

36 External Sorting vs. Unclustered Index
p: # of records per page B=1,000 and block size=32 for sorting p=100 is the more realistic value. 18

37 Query Evaluation The slides for this text are organized into chapters. This lecture covers Chapter 12, providing an overview of query optimization and execution. This chapter is the first of a sequence (Chapters 12, 13, 14, 15) on query evaluation that might be covered in full in a course with a systems emphasis. It can also be used stand-alone, as a self-contained overview of these issues, in a course with an application emphasis. It covers the essential concepts in sufficient detail to support a discussion of physical database design and tuning in Chapter 20. 1

38 Relational Operations
We will consider how to implement: Selection ( ) Selects a subset of rows from relation. Projection ( ) Deletes unwanted columns from relation. Join ( ) Allows us to combine two relations. Set-difference ( ) Tuples in reln. 1, but not in reln. 2. Union ( ) Tuples in reln. 1 and in reln. 2. Aggregation (SUM, MIN, etc.) and GROUP BY Since each op returns a relation, ops can be composed! After we cover the operations, we will discuss how to optimize queries formed by composing them. 3

39 Access Path Algorithm + data structure used to locate rows satisfying some condition File scan: can be used for any condition Hash: equality search; all search key attributes of hash index are specified in condition B+ tree: equality or range search; a prefix of the search key attributes are specified in condition Binary search: Relation sorted on a sequence of attributes and some prefix of sequence is specified in condition

40 1. ARCHITECTURE OF QUERY EXECUTION ENGINES

41 Access Paths A tree index matches (a conjunction of) terms that involve only attributes in a prefix of the search key. E.g., Tree index on <a, b, c> matches the selection a=5 AND b=3, and a=5 AND b>6, but not b=3. A hash index matches (a conjunction of) terms that has a term attribute = value for every attribute in the search key of the index. E.g., Hash index on <a, b, c> matches a=5 AND b=3 AND c=5; but it does not match b=3, or a=5 AND b=3, or a>5 AND b=3 AND c=5. 13

42 Access Paths Supported by B+ tree
Example: Given a B+ tree whose search key is the sequence of attributes a2, a1, a3, a4 Access path for search a1>5  a2=3.0  a3=‘x’ (R): find first entry having a2=3.0  a1>5  a3=‘x’ and scan leaves from there until entry having a2> Select satisfying entries Access path for search  a2=3.0  a3 >‘x’ (R): locate first entry having a2=3.0 and scan leaves until entry having a2> Select satisfying entries No access path for search  a1>5  a3 =‘x’ (R)

43 Choosing an Access Path
Selectivity of an access path refers to its cost Higher selectivity means lower cost (#pages) If several access paths cover a query, DBMS should choose the one with greatest selectivity Size of domain of attribute is a measure of the selectivity of domain Example:  CrsCode=‘CS305’  Grade=‘B’ - a B+ tree with search key CrsCode is more selective than a B+ tree with search key Grade

44 Computing Selection condition: (attr op value) No index on attr:
If rows unsorted, cost = F Scan all data pages to find rows satisfying the condition If rows sorted on attr, cost = log2 F + (cost of scan) Use binary search to locate first data page containing row in which (attr = value) Scan further to get all rows satisfying (attr op value)

45 Computing Selection condition: (attr op value)
B+ tree index on attr (for equality or range search): Locate first index entry corresponding to a row in which (attr = value); cost = depth of tree Clustered index - rows satisfying condition packed in sequence in successive data pages; scan those pages; cost depends on number of qualifying rows Unclustered index - index entries with pointers to rows satisfying condition packed in sequence in successive index pages; scan entries and sort pointers to identify table data pages with qualifying rows, each page (with at least one such row) fetched once

46 Unclustered B+ Tree Index
Index entries satisfying condition data page Data File B+ Tree

47 Computing Selection condition: (attr = value)
Hash index on attr (for equality search only): Hash on value; cost  1.2 (to account for possible overflow chain) to search the (unique) bucket containing all index entries or rows satisfying condition Unclustered index - sort pointers in index entries to identify data pages with qualifying rows, each page (containing at least one such row) fetched once

48 Complex Selections Conjunctions: a1 =x a2 <y  a3=z (R)
Use most selective access path Use multiple access paths Disjunction:  (a1 =x or a2 <y) and (a3=z) (R) DNS (disjunctive normal form) (a1 =x  a3 =z) or (a2 < y  a3=z) Use file scan if one disjunct requires file scan If better access path exist, and combined selectivity is better than file scan, use the better access paths, else use a file scan

49 Two Approaches to General Selections
First approach: Find the most selective access path, retrieve tuples using it, and apply any remaining terms that don’t match the index: Most selective access path: An index or file scan that we estimate will require the fewest page I/Os. Terms that match this index reduce the number of tuples retrieved; other terms are used to discard some retrieved tuples, but do not affect number of tuples/pages fetched. Consider day<8/9/94 AND bid=5 AND sid=3. A B+ tree index on day can be used; then, bid=5 and sid=3 must be checked for each retrieved tuple. Similarly, a hash index on <bid, sid> could be used; day<8/9/94 must then be checked. 14

50 Intersection of Rids Second approach (if we have 2 or more matching indexes that use Alternatives (2) or (3) for data entries): Get sets of rids of data records using each matching index. Then intersect these sets of rids Retrieve the records and apply any remaining terms. Consider day<8/9/94 AND bid=5 AND sid=3. If we have a B+ tree index on day and an index on sid, both using Alternative (2), we can retrieve rids of records satisfying day<8/9/94 using the first, rids of recs satisfying sid=3 using the second, intersect, retrieve records and check bid=5. 15

51 Agregación, Remoción De Duplicados
La idea de la agregación es representar un grupo de items mediante un solo valor o clasificar items en grupos y determinar un valor por cada grupo. Agregación Escalar Agregación por funciones

52 Agregación, Remoción De Duplicados (3)
Algoritmo de Agregación basado en Ordenamiento El ordenamiento permite agrupar items con características similares así se hace mucho más sencillo hacer la remoción de datos duplicados.

53 Agregación, Remoción De Duplicados (4)
Algoritmo de Agregación basado en Ordenamiento La cantidad de datos de entada y salida calculada para éste algoritmo es la siguiente: Donde 2 es el factor para considerar lectura y escritura, R es el tamaño de la entrada, L1 es el número de niveles que no se han visto afectados, O es el tamaño de la salida Y W es el número estimado de ejecuciones del algoritmo

54 Agregación, Remoción De Duplicados (5)
Algoritmo Basado en Hashing La idea general es realizar particiones de los datos que se están analizando Se genera una tabla que contiene esencialmente items de salida. La cantidad de entradas o salidas para la agregación depende del número de niveles necesarios.

55 Agregación, Remoción De Duplicados (6)
Algoritmo Basado en Hashing(2) 2 X (R (L + 1) – FL X (M – [(R’ /G – M)/(M – C)] X C X G ) Dónde L es el nivel de recursividad, R es el tamaño de entradas de archivos, K es el número de archivos de partición, F es el Fan out, M es el tamaño de archivos para llegar al desbordamiento de memoria.

56 Agregación, Remoción De Duplicados (7)
Gráfica comparación de los algoritmos

57 Operaciones Binarias Para “Matching”
De la misma manera en que los procesos de eliminación y agregación son importantes en grupos de datos de tamaño considerable, es deseable también poder cotejar la información, ésta es la función principal del “matching” establecer estas relaciones existentes. Para tal fin se hace uso principalmente del join de las siguientes maneras.

58 Operaciones Binarias Para “Matching”

59 Operaciones Binarias Para “Matching”
Algoritmos “Join” Basados en “Loops” anidados Es el algoritmo más simple. Para cada entrada seleccionada hace una búsqueda completa en el resto de los datos, para de ésta manera encontrar los “matches”. Se requiere un archivo temporal de la entrada que esta siendo escaneada. Obviamente es un algoritmo con poco rendimiento para grupos de datos muy grandes

60 Operaciones Binarias Para “Matching”
Algoritmos “Merge-Join” Este algoritmo requiere que las entradas estén previamente ordenadas para obtener los resultados; el procedimiento es similar al que previamente se revisó. Al estar las entradas ordenadas el algoritmo no requiere de memoria adicional excepto cuando el valor total de los paquetes es mayor que el tamaño de la memoria

61 Operaciones Binarias Para “Matching”
Algoritmos “Merge-Join Se puede realizar una combinación entre los el anterior algoritmo y el presente para optimizar los resultados. Dado que los algoritmos anteriores necesitan de cierta cantidad de memoria es conveniente realizar una asignación así W = R / (2 X M) +1 , dónde R es el tamaño de la entrada, M es la cantidad de memoria necesaria y las otras dos son constantes de lectura y escritura.

62 Operaciones Binarias Para “Matching”
Algoritmos “Hash Join” Estos algoritmos se desarrollan partiendo de la idea básica de realizar la tabla “hash” y de probar ésta tabla usando los items de otra entrada. Este algoritmo presenta características en contra como el constante desbordamiento de memoria pero se han realizado varias investigaciones en éste entorno para mejorar las soluciones dadas.

63 Operaciones Binarias Para “Matching”
La partición realizada por el algoritmo se podría interpretar de la siguiente manera

64 Operaciones Binarias Para “Matching”
Comparación Métodos

65 Contribuciones a SQL Server 7


Download ppt "Query Evaluation Techniques for Larger Databases**"

Similar presentations


Ads by Google