Presentation is loading. Please wait.

Presentation is loading. Please wait.

Query Processing in Databases Dr. M. Gavrilova

Similar presentations


Presentation on theme: "Query Processing in Databases Dr. M. Gavrilova"— Presentation transcript:

1 Query Processing in Databases Dr. M. Gavrilova
CPSC 695 Week 6 Query Processing in Databases Dr. M. Gavrilova

2 Overview Introduction I/O algorithms for large databases
Complex geometric operations in graphical querying Applications

3 Introduction Geometric algorithms studied before dealt with RAM
In databases, a problem of accessing “pages” of memory stored on disk is encountered. We will see how traditional algorithm design techniques can be useful.

4 Example 4 pages of memory, 10 items in each
To list all sequentially, 4 disks accesses is required To randomly list items – up to 40 disk accesses is require if only 1 page is loaded in the memory once –too expensive!

5 PART 1 Techniques for large data sets
External sorting Distributed sweeping Two-step processing

6 “External” sorting problem
n pages in dataset on disk m pages of memory, m < n

7 “Divide and conquer” strategy
Step 1. Sorted “runs” of size m are created in memory, then written to disk. Used internal sorting algorithms. Step 2. Load some number of first records from each run into memory, merge them in the sorted order. Once a block is sorted, write it back to disk. Complexity: O(n logmn )

8 Distributed sweeping Segment intersection problem, orthogonal segments. In RAM – sweep-line algorithm, O(n logn+k ) n – number of segments, k – number of intersections. In DB – O(n logmn + k ) algorithm, m – number of pages in RAM. Range query v Sweep-line

9 Distributed sweeping Idea: split space by m horizontal strips, each contains approximately n/m segments. Active list is created for each strip: L1, L2, …, Lm . When a vertical segment is met, it is tested against intersection with segment in active lists of strips that overlap with the segment. 4 v 3 2 1

10 Distributed sweeping However, in the worst case, for all vertical segments all strips should be tested. In the picture, segment v intersects strip 4, while no intersections are reported. Solution: split each vertical segment into 3 parts: One lies completely within some number of strips Other two partially cover a strip. v 1 2 3 4 end middle part

11 Distributed sweeping Test intersection between the vertical segment and all segments in “middle” strips Then recursively do it for two “end” strips. Recursion terminates when all processing can be carried out in RAM O(n logmn + k )

12 Rectangle intersection
The same idea is carried out to the case of rectangle intersection Θ(n logmn + k ) bound is met again

13 Two-step processing: Spatial Join
Spatial predicates: Overlaps Contains Adjacent etc. 2 steps: Filter step Refinement step

14 Additional Database Specifics
In databases: challenges with I/O (file access) are resolved using techniques discussed above. Specific methods exist for: Grid files (linear structures) R-trees Unindexed collections of objects

15 PART 2 Computer Graphics Applications
DB operations: windowing and clipping Windowing(g,r) is a Boolean operator: to test if object g intersects rectangle r. Clipping (g,r) computes part of g inside r g r g r

16 Computer Graphic Primitives
Windowing: scan edges of g test for intersection with r checking vertices is not enough O(n) Clipping: consider each edge of r as a half-plane clip g against each of those combine results

17 Computer Graphic Primitives
Polygon partitioning (for large data sets) Polygon triangulation Intersections (polyline, polygon)

18 Polygon partitioning Sort vertices of polygon P according to the X coordinate Use sweep-line technique: vertical line L, for each vertex v compute the maximum vertical segment of L, internal to P and containing v. This is done by examining nearest edges above/below v. The visibility segments define trapezoids. Complexity O(n lg n) Note: complex polygons can be triangulated, if trapezoids are further triangulated.

19 Polygon partitioning The visibility segments define trapezoids (geometric object with 2 parallel edges)

20 Triangulation of a simple polygon
Triangulation involves finding diagonals within the polygon, i.e. segments vivj between vertices of P. vi and vj are said to be visible to each other. Each triangulation of a polygon has (n-3) diagonals and (n-2) triangles

21 Triangulation of a monotone polygon
Idea: monotone polygons can be linearly triangulated. Simple polygon can be partitioned into monotone polygons. Monotone Simple

22 Triangulation of a monotone polygon
Idea: sweep-line, sort all vertices of P If the angle between 3 previously processed points is convex  create a triangle, remove point from list L. If reflex angle  add next point. Partitioning a polygon into monotone polygons – similarly to trapezoidation, sweep-line by edges, find trapezoids, they represent monotone chains. O(n lg n)

23 Convex partitioning Convex partitioning – partitioning into convex components, can minimize the number of components, done in O(n).

24 Geometric Relationships
Computing intersections: Point in a polygon Polyline intersection Polygon intersection (general and convex)

25 Point in a polygon (simple)
Draw a half-ray from p Count # of intersections with the boundary If odd  p is inside, even  outside O(n) algorithm p q

26 Polyline intersection
Given a set of line segments. Detect if any 2 segments intersect. O(n2) – straightforward

27 Polyline intersection
Plane-sweep O(n lg n): The line meets the leftmost point of S: S inserted in L, two neighboring segments below and above S are tested for intersection. The line L meets the rightmost point of S: S is deleted, segments above and below S are tested for intersection. sweep

28 Polygon intersection Two simple polygons P and Q. Possible cases:
One edge of P intersects one edge of Q (use segment intersection test) P is inside Q (point inside polygon) Q is inside P (point inside polygon) Otherwise, P and Q don’t intersect. O(n lg n)

29 Convex polygon intersection
Convexity allows to devise a faster O(n) algorithm. Idea: synchronized scan of edges of P and Q, so that all intersection points are eventually found and “inner” intersection boundary is known at each step. Scanned edges are advanced if they “point” at each other.

30 Summary Dealing with large data sets requires additional resources
Some methods such as below can be useful: External sorting Distributed sweeping Two-step processing Other applications (spatial map querying) require computer graphics primitives Various intersection operations exist


Download ppt "Query Processing in Databases Dr. M. Gavrilova"

Similar presentations


Ads by Google