 # Query Processing in Databases Dr. M. Gavrilova.  Introduction  I/O algorithms for large databases  Complex geometric operations in graphical querying.

## Presentation on theme: "Query Processing in Databases Dr. M. Gavrilova.  Introduction  I/O algorithms for large databases  Complex geometric operations in graphical querying."— Presentation transcript:

Query Processing in Databases Dr. M. Gavrilova

 Introduction  I/O algorithms for large databases  Complex geometric operations in graphical querying  Applications

 Geometric algorithms studied before dealt with RAM  In databases, a problem of accessing “pages” of memory stored on disk is encountered.  We will see how traditional algorithm design techniques can be useful.

 4 pages of memory, 10 items in each  To list all sequentially, 4 disks accesses is required  To randomly list items – up to 40 disk accesses is require if only 1 page is loaded in the memory once –too expensive!

 External sorting  Distributed sweeping  Two-step processing

 n pages in dataset on disk  m pages of memory, m < n

 Step 1. Sorted “runs” of size m are created in memory, then written to disk. Used internal sorting algorithms.  Step 2. Load some number of first records from each run into memory, merge them in the sorted order. Once a block is sorted, write it back to disk.  Complexity: O(n log m n )

Segment intersection problem, orthogonal segments. In RAM – sweep-line algorithm, O(n logn+k ) n – number of segments, k – number of intersections. In DB – O(n log m n + k ) algorithm, m – number of pages in RAM. Sweep-line Range query v

 Idea: split space by m horizontal strips, each contains approximately n/m segments.  Active list is created for each strip: L 1, L 2, …, L m.  When a vertical segment is met, it is tested against intersection with segment in active lists of strips that overlap with the segment. 1 2 3 4 v

 However, in the worst case, for all vertical segments all strips should be tested.  In the picture, segment v intersects strip 4, while no intersections are reported.  Solution: split each vertical segment into 3 parts: ◦ One lies completely within some number of strips ◦ Other two partially cover a strip. 1 2 3 4 end middle part v

 Test intersection between the vertical segment and all segments in “middle” strips  Then recursively do it for two “end” strips.  Recursion terminates when all processing can be carried out in RAM  O(n log m n + k )

 The same idea is carried out to the case of rectangle intersection  Θ(n log m n + k ) bound is met again

 Spatial predicates: ◦ Overlaps ◦ Contains ◦ Adjacent ◦ etc.  2 steps: ◦ Filter step ◦ Refinement step

 In databases: challenges with I/O (file access) are resolved using techniques discussed above.  Specific methods exist for: ◦ Grid files (linear structures) ◦ R-trees ◦ Unindexed collections of objects

 DB operations: windowing and clipping  Windowing(g,r) is a Boolean operator: to test if object g intersects rectangle r.  Clipping (g,r) computes part of g inside r g r g r

 Windowing: ◦ scan edges of g ◦ test for intersection with r ◦ checking vertices is not enough ◦ O(n)  Clipping: ◦ consider each edge of r as a half-plane ◦ clip g against each of those ◦ combine results ◦ O(n)

 Polygon partitioning (for large data sets)  Polygon triangulation  Intersections (polyline, polygon)

 Sort vertices of polygon P according to the X coordinate  Use sweep-line technique: vertical line L, for each vertex v compute the maximum vertical segment of L, internal to P and containing v. This is done by examining nearest edges above/below v.  The visibility segments define trapezoids.  Complexity O(n lg n)  Note: complex polygons can be triangulated, if trapezoids are further triangulated.

 The visibility segments define trapezoids (geometric object with 2 parallel edges)

 Triangulation involves finding diagonals within the polygon, i.e. segments v i v j between vertices of P.  v i and v j are said to be visible to each other.  Each triangulation of a polygon has (n-3) diagonals and (n-2) triangles

 Idea: monotone polygons can be linearly triangulated. Simple polygon can be partitioned into monotone polygons.  MonotoneSimple

 Idea: sweep-line, sort all vertices of P  If the angle between 3 previously processed points is convex  create a triangle, remove point from list L.  If reflex angle  add next point.  Partitioning a polygon into monotone polygons – similarly to trapezoidation, sweep-line by edges, find trapezoids, they represent monotone chains. O(n lg n)

 Convex partitioning – partitioning into convex components, can minimize the number of components, done in O(n).

 Computing intersections: ◦ Point in a polygon ◦ Polyline intersection ◦ Polygon intersection (general and convex)

 Draw a half-ray from p  Count # of intersections with the boundary  If odd  p is inside, even  outside  O(n) algorithm p q

 Given a set of line segments. Detect if any 2 segments intersect.  O(n 2 ) – straightforward

 Plane-sweep O(n lg n): 1.The line meets the leftmost point of S: S inserted in L, two neighboring segments below and above S are tested for intersection. 2.The line L meets the rightmost point of S: S is deleted, segments above and below S are tested for intersection. sweep

 Two simple polygons P and Q.  Possible cases: ◦ One edge of P intersects one edge of Q (use segment intersection test) ◦ P is inside Q (point inside polygon) ◦ Q is inside P (point inside polygon) ◦ Otherwise, P and Q don’t intersect.  O(n lg n)

 Convexity allows to devise a faster O(n) algorithm.  Idea: synchronized scan of edges of P and Q, so that all intersection points are eventually found and “inner” intersection boundary is known at each step.  Scanned edges are advanced if they “point” at each other.

 Dealing with large data sets requires additional resources  Some methods such as below can be useful: ◦ External sorting ◦ Distributed sweeping ◦ Two-step processing  Other applications (spatial map querying) require computer graphics primitives  Various intersection operations exist

Download ppt "Query Processing in Databases Dr. M. Gavrilova.  Introduction  I/O algorithms for large databases  Complex geometric operations in graphical querying."

Similar presentations