Constraint Processing Techniques for Improving Join Computation: A Proof of Concept Anagh Lal & Berthe Y. Choueiry Constraint Systems Laboratory Department.

Slides:



Advertisements
Similar presentations
Hybrid BDD and All-SAT Method for Model Checking Orna Grumberg Joint work with Assaf Schuster and Avi Yadgar Technion – Israel Institute of Technology.
Advertisements

Outline Interchangeability: Basics Robert Beyond simple CSPs Relating & Comparing Interchangeability Shant Compacting the Search Space – AND/OR graphs,
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
A First Practical Algorithm for High Levels of Relational Consistency Shant Karakashian, Robert Woodward, Christopher Reeson, Berthe Y. Choueiry & Christian.
CPSC 322, Lecture 13Slide 1 CSPs: Arc Consistency & Domain Splitting Computer Science cpsc322, Lecture 13 (Textbook Chpt 4.5,4.6) February, 01, 2010.
Foundations of Constraint Processing, Fall 2005 October 21, 2005CSPs and Relational DBs1 Foundations of Constraint Processing CSCE421/821, Fall 2005:
Constraint Systems Laboratory Oct 21, 2004Guddeti: MS thesis defense1 An Improved Restart Strategy for Randomized Backtrack Search Venkata P. Guddeti Constraint.
Constraint Logic Programming Ryan Kinworthy. Overview Introduction Logic Programming LP as a constraint programming language Constraint Logic Programming.
1 Chapter 10 Query Processing: The Basics. 2 External Sorting Sorting is used in implementing many relational operations Problem: –Relations are typically.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
An Approximation of Generalized Arc-Consistency for Temporal CSPs Lin Xu and Berthe Y. Choueiry Constraint Systems Laboratory Department of Computer Science.
Improving Backtrack Search For Solving the TCSP Lin Xu and Berthe Y. Choueiry Constraint Systems Laboratory Department of Computer Science and Engineering.
CPSC 322, Lecture 12Slide 1 CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12 (Textbook Chpt ) January, 29, 2010.
CPSC 322, Lecture 13Slide 1 CSPs: Arc Consistency & Domain Splitting Computer Science cpsc322, Lecture 13 (Textbook Chpt 4.5,4.8) February, 02, 2009.
A Constraint Satisfaction Problem (CSP) is a combinatorial decision problem defined by a set of variables, a set of domain values for these variables,
 i may require adding new constraints, except for… o i =1  domain filtering o i =   constraint filtering Robert Woodward & Berthe Y. Choueiry Constraint.
Solvable problem Deviation from best known solution [%] Percentage of test runs ERA RDGR RGR LS Over-constrained.
Efficient Techniques for Searching the Temporal CSP Lin Xu and Berthe Y. Choueiry Constraint Systems Laboratory Department of Computer Science and Engineering.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
A Constraint Satisfaction Problem (CSP) is a combinatorial decision problem defined by a set of variables, a set of domain values for these variables,
Cut-and-Traverse: A new Structural Decomposition Method for CSPs Yaling Zheng and Berthe Y. Choueiry Constraint Systems Laboratory Computer Science & Engineering.
ICS (072)Query Processing and Optimization 1 Chapter 15 Algorithms for Query Processing and Optimization ICS 424 Advanced Database Systems Dr.
Constraint Systems Laboratory March 26, 2007Reeson–Undergraduate Thesis1 Using Constraint Processing to Model, Solve, and Support Interactive Solving of.
Introduction to Database Systems 1 Join Algorithms Query Processing: Lecture 1.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
Constraint Systems Laboratory December 9, 2005ISI AI Seminar Series1 Symmetry Detection in Constraint Satisfaction Problems & its Application in Databases.
1 Query Processing: The Basics Chapter Topics How does DBMS compute the result of a SQL queries? The most often executed operations: –Sort –Projection,
1 ES 314 Advanced Programming Lec 2 Sept 3 Goals: Complete the discussion of problem Review of C++ Object-oriented design Arrays and pointers.
Constraint Systems Laboratory April 21, 2005Lal: M.S. thesis defense1 Neighborhood Interchangeability (NI) for Non-Binary CSPs & Application to Databases.
Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.
Query Processing Presented by Aung S. Win.
Exploiting Automatically Inferred Constraint-Models for Building Identification in Satellite Imagery Research funded by the AFSOR, grant numbers FA
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
1 Discovering Robust Knowledge from Databases that Change Chun-Nan HsuCraig A. Knoblock Arizona State UniversityUniversity of Southern California Journal.
Memory Management during Run Generation in External Sorting – Larson & Graefe.
Constraint Systems Laboratory 11/26/2015Zhang: MS Project Defense1 OPRAM: An Online System for Assigning Capstone Course Students to Sponsored Projects.
Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto.
Constraint Systems Laboratory Presented by: Robert J. Woodward, Amanda Swearngin 1 Berthe Y. Choueiry 2 Eugene C. Freuder 3 1 ESQuaReD Laboratory, University.
Chapter 5 Constraint Satisfaction Problems
Chapter 15 A External Methods. © 2004 Pearson Addison-Wesley. All rights reserved 15 A-2 A Look At External Storage External storage –Exists beyond the.
Foundations of Constraint Processing, Fall 2004 October 3, 2004Interchangeability in CSPs1 Foundations of Constraint Processing CSCE421/821, Fall 2004:
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations – Join Chapter 14 Ramakrishnan and Gehrke (Section 14.4)
Robust Planning using Constraint Satisfaction Techniques Daniel Buettner and Berthe Y. Choueiry Constraint Systems Laboratory Department of Computer Science.
Implementation of Database Systems, Jarek Gryz1 Evaluation of Relational Operations Chapter 12, Part A.
ERA on an over-constrained problem A Constraint-Based System for Hiring & Managing Graduate Teaching Assistants Ryan Lim, Praveen Venkata Guddeti, and.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 14, Part A (Joins)
Shortcomings of Traditional Backtrack Search on Large, Tight CSPs: A Real-world Example Venkata Praveen Guddeti and Berthe Y. Choueiry The combination.
Chapter 13: Query Processing
Problem Solving with Constraints CSPs and Relational DBs1 Problem Solving with Constraints CSCE496/896, Fall
Chapter 10 The Basics of Query Processing. Copyright © 2005 Pearson Addison-Wesley. All rights reserved External Sorting Sorting is used in implementing.
Modelling and Solving Configuration Problems on Business
Hybrid BDD and All-SAT Method for Model Checking
Optimizing Parallel Algorithms for All Pairs Similarity Search
Database Management System
Computer Science cpsc322, Lecture 13
Consistency Methods for Temporal Reasoning
CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12
Chapter 12: Query Processing
Robert Glaubius and Berthe Y. Choueiry
Evaluation of Relational Operations
Empirical Comparison of Preprocessing and Lookahead Techniques for Binary Constraint Satisfaction Problems Zheying Jane Yang & Berthe Y. Choueiry Constraint.
CSPs: Search and Arc Consistency Computer Science cpsc322, Lecture 12
Computer Science cpsc322, Lecture 13
CSPs and Relational DBs
Problem Solving with Constraints
Constraint Satisfaction Problems & Its Application in Databases
Artificial Intelligence
Evaluation of Relational Operations: Other Techniques
Problem Solving with Constraints
Problem Solving with Constraints
Presentation transcript:

Constraint Processing Techniques for Improving Join Computation: A Proof of Concept Anagh Lal & Berthe Y. Choueiry Constraint Systems Laboratory Department of Computer Science & Engineering University of Nebraska-Lincoln

An illustrative example  Join query SELECT R1.A,R1.B,R1.C FROM R1,R2 WHERE R1.A=R2.A AND R1.B=R2.B AND R1.C=R2.C  10 tuples in 3 nested tuples

Advantages  Direct Savings of number of tuple comparisons Savings in I/O for next operator Space reduction of materialized join queries  Future applications Use for query size estimation Assist in high-level analysis of data & in data mining

Our contributions  A new representation of a join query as a Constraint Satisfaction Problem (CSP)  A new sorting-based bundling algorithm Suitable for CSPs with fewer and larger constraints (i.e., join) Improves memory usage  A new sort-merge join algorithm for producing (dynamically) bundled tuples Yields compact representation, saves memory space  Identification of possible applications Data analysis Materialized views Assisting query-size estimation  Suggested, not yet demonstrated

Constraint Satisfaction Problem  Given P = ( V, D, C ) V = {V i }, a set of variables D = {D Vi }, the set of their respective domains C is a set of constraints restricting the acceptable combination of values for variables. Solution is a consistent assignment of values to variables  Query: find 1 solution, all solutions, etc. V3V3 {d} {a, b, d}{a, b, c} {c, d, e, f} V4V4 V2V2 V1V1

Solving CSPs  Typically, DFS & backtracking  Improvement Static bundling [Freuder 91] Dynamic bundling [our group] –Based on dynamically identifying symmetries –Guaranteed never less efficient than non-bundling, static bundling Without bundlingStatic bundling S cd, e, f d V1V1 V2V2 Dynamic bundling ce, fd d V1V1 V2V2 S cefd d V1V1 V2V2 S V3V3 {d} {a, b, d}{a, b, c} {c, d, e, f} V4V4 V2V2 V1V1

Modeling Join as a CSP  Attributes of relations  CSP variables  Attribute values  variable domains  Relations  relational constraints  Join conditions  join-condition constraints SELECT R1.A,R1.B,R1.C FROM R1,R2 WHERE R1.A=R2.A AND R1.B=R2.B AND R1.C=R2.C

Sorting-based bundling  Heuristic for variable ordering Place variables linked by join conditions as close to each other as possible R1.A R2.A R1.B R2.B R1.C R2.C R1 R2  Sort relations using above ordering  Next: Compute bundles of variable ahead in variable ordering ( R1.A )

Bundling an attribute  Partition of a constraint Tuples of the relation having the same value of R1.A  Compare projected tuples of first partition with those of another partition  Compare with every other partition to get complete bundle Partition Unequal partitions Symmetric partitions Bundle {1, 5}

Join using dynamic bundling Select next- variable Compute next valid bundle Found bundle? Last variable? Move to previous variable Undo previous assignment 1 st in Ordering? No Yes Output one tuple Start Stop Yes No Assign bundle

Finding the valid bundle {1, 5, x} {1, 5, y, z} Common {1, 5} 1.Compute a bundle for the attribute 2.Check bundle validity with future constraints 3.If no common value found GOTO 1  Assign variable with the surviving values in the bundle

Analysis of overheads  For Bundling Additional data structures: 2 arrays, 1 pointer Only 1 array may become cumbersome  Array size is largest when all the values of a variable are in one bundle But, this case also leads to best savings!  Improved implementation Use of Bitmaps?

Progressive Merge Join  PMJ: A sort-merge algorithm by [Dittrich et al. 03]  Provides early results Assists in query size-estimation  Two main phases Sorting: starts producing results in this phase Merging phase: merges sorted runs  We use the framework of the PMJ for our external join.  Implemented & evaluated with the XXL library We use the same library for our implementation

Preliminary experiments  Data sets Random: 2 relations R1, R2 with same schema as example –Each relation: 10’000 tuples –Memory size: 4’000 tuples –Page size 200 tuples Real-world problem: 3 relations, 4 attributes  Compaction rate achieved Random problem: 1.48 –Savings compensate for even worst case (of the current experimental implementation) Real-world problem: 2.26 (69 tuples in 32 nested tuples)

Related work  Join algorithms Well established algorithms Do not focus on exploiting symmetry  Database compression Output results are not compressed Compression at value level, not tuple level

Related work (contd)  [Mamoulis & Papadias 1998] Join using FC for spatial DB Restricted to binary constraints No compaction of solution space  [Bayardo et al. 1996] Reduce the number of the intermediate tuples of a sequence of joins  [Rich et al. 1993] Do not compact join attribute values Does not detect redundancy present in the grouped sub- relations

Future work  Refine implementation Use of lighter data structures  Test usefulness in the context of Constraint DBs Values are continuous intervals, e.g. spatial database  Conduct thorough evaluations of overall performance & overhead (memory & CPU) on different data distributions  Investigate benefit of using bundling query size estimation materialized views

Research supported by CAREER Award # from NSF

DB vs. CSP terminology

Bundling relations: Data structures  Considering the portion of the relation in memory  Current-Inst: To store the current instantiations of past variables V p of R1.  Current-Constraint: selection of R’: Past variable values equal Current-Inst Current variable V c > all previous instantiations of V c

Bundling relations: Computing bundles (Algorithm 1)  NEXT-PARTITION( p ) returns the first unchecked partition in Current-Constraint following the partition p.  Sorted constraints Checking equality of tuples is efficient

Bundling relations: Data structures  Processed-Values : Cumulatively stores non- representative values of bundles  Computing bundles of V c  Values of V c in it are ignored  Partition p is marked as checked when: Value(p) is in an instantiation bundle p is selected for comparing with other partitions to check for bundles

Join computation: In memory  Two subsets of relations (some pages) in memory: Algorithm to find result of joining the two. Join computed as a search –Finding all solutions After finding one solution, search resumes from same depth –Algorithm shown can be entered at any “depth” in the search Uses Algorithm 1 to find bundles for assigning to variables

Join computation: In memory  Join as a search (Algo. 2)  BACKTRACK Variable[depth] in Current-Inst reset Processed-Values for the variable emptied Value in Current- Solution reset Current-Constraint re- computed  Undoes the effects of the previous instantiation. Expanded on next slide

Join computation: In memory  COMMON(b i, bundles) subset of b i consistent using join-condition constraints  For equality COMMON Intersection  Empty result of COMMON inconsistency BACKTRACK