Presentation on theme: "Query Processing in Databases Dr. M. Gavrilova. Introduction I/O algorithms for large databases Complex geometric operations in graphical querying."— Presentation transcript:
Introduction I/O algorithms for large databases Complex geometric operations in graphical querying Applications
Geometric algorithms studied before dealt with RAM In databases, a problem of accessing “pages” of memory stored on disk is encountered. We will see how traditional algorithm design techniques can be useful.
4 pages of memory, 10 items in each To list all sequentially, 4 disks accesses is required To randomly list items – up to 40 disk accesses is require if only 1 page is loaded in the memory once –too expensive!
n pages in dataset on disk m pages of memory, m < n
Step 1. Sorted “runs” of size m are created in memory, then written to disk. Used internal sorting algorithms. Step 2. Load some number of first records from each run into memory, merge them in the sorted order. Once a block is sorted, write it back to disk. Complexity: O(n log m n )
Segment intersection problem, orthogonal segments. In RAM – sweep-line algorithm, O(n logn+k ) n – number of segments, k – number of intersections. In DB – O(n log m n + k ) algorithm, m – number of pages in RAM. Sweep-line Range query v
Idea: split space by m horizontal strips, each contains approximately n/m segments. Active list is created for each strip: L 1, L 2, …, L m. When a vertical segment is met, it is tested against intersection with segment in active lists of strips that overlap with the segment. 1 2 3 4 v
However, in the worst case, for all vertical segments all strips should be tested. In the picture, segment v intersects strip 4, while no intersections are reported. Solution: split each vertical segment into 3 parts: ◦ One lies completely within some number of strips ◦ Other two partially cover a strip. 1 2 3 4 end middle part v
Test intersection between the vertical segment and all segments in “middle” strips Then recursively do it for two “end” strips. Recursion terminates when all processing can be carried out in RAM O(n log m n + k )
The same idea is carried out to the case of rectangle intersection Θ(n log m n + k ) bound is met again
In databases: challenges with I/O (file access) are resolved using techniques discussed above. Specific methods exist for: ◦ Grid files (linear structures) ◦ R-trees ◦ Unindexed collections of objects
DB operations: windowing and clipping Windowing(g,r) is a Boolean operator: to test if object g intersects rectangle r. Clipping (g,r) computes part of g inside r g r g r
Windowing: ◦ scan edges of g ◦ test for intersection with r ◦ checking vertices is not enough ◦ O(n) Clipping: ◦ consider each edge of r as a half-plane ◦ clip g against each of those ◦ combine results ◦ O(n)
Polygon partitioning (for large data sets) Polygon triangulation Intersections (polyline, polygon)
Sort vertices of polygon P according to the X coordinate Use sweep-line technique: vertical line L, for each vertex v compute the maximum vertical segment of L, internal to P and containing v. This is done by examining nearest edges above/below v. The visibility segments define trapezoids. Complexity O(n lg n) Note: complex polygons can be triangulated, if trapezoids are further triangulated.
The visibility segments define trapezoids (geometric object with 2 parallel edges)
Triangulation involves finding diagonals within the polygon, i.e. segments v i v j between vertices of P. v i and v j are said to be visible to each other. Each triangulation of a polygon has (n-3) diagonals and (n-2) triangles
Idea: monotone polygons can be linearly triangulated. Simple polygon can be partitioned into monotone polygons. MonotoneSimple
Idea: sweep-line, sort all vertices of P If the angle between 3 previously processed points is convex create a triangle, remove point from list L. If reflex angle add next point. Partitioning a polygon into monotone polygons – similarly to trapezoidation, sweep-line by edges, find trapezoids, they represent monotone chains. O(n lg n)
Convex partitioning – partitioning into convex components, can minimize the number of components, done in O(n).
Computing intersections: ◦ Point in a polygon ◦ Polyline intersection ◦ Polygon intersection (general and convex)
Draw a half-ray from p Count # of intersections with the boundary If odd p is inside, even outside O(n) algorithm p q
Given a set of line segments. Detect if any 2 segments intersect. O(n 2 ) – straightforward
Plane-sweep O(n lg n): 1.The line meets the leftmost point of S: S inserted in L, two neighboring segments below and above S are tested for intersection. 2.The line L meets the rightmost point of S: S is deleted, segments above and below S are tested for intersection. sweep
Two simple polygons P and Q. Possible cases: ◦ One edge of P intersects one edge of Q (use segment intersection test) ◦ P is inside Q (point inside polygon) ◦ Q is inside P (point inside polygon) ◦ Otherwise, P and Q don’t intersect. O(n lg n)
Convexity allows to devise a faster O(n) algorithm. Idea: synchronized scan of edges of P and Q, so that all intersection points are eventually found and “inner” intersection boundary is known at each step. Scanned edges are advanced if they “point” at each other.
Dealing with large data sets requires additional resources Some methods such as below can be useful: ◦ External sorting ◦ Distributed sweeping ◦ Two-step processing Other applications (spatial map querying) require computer graphics primitives Various intersection operations exist