Download presentation

Presentation is loading. Please wait.

Published byMariah Lawson Modified about 1 year ago

1
© Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de Geometric Algorithms and Data Structures Prof. Neeraj Suri Andreas Johansson Constantin Sarbu Abdelmajid Khelil

2
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures2 Outline Introduction Geometric Data Structures Quadtree □Region quadtree □Point quadtree K-d tree Strip tree K-d trie Binary trie Multidimensional Data Z-Order Multidimensional data Data mining

3
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures3 Geometric Problems (1) Algorithmic geometry: Study of the algorithmic complexity of elementary geometric problems Geometric problems: Are often abstract formulations of practical problems (similar to graph theory) Some geometric problems and their interpretation: Given a set of points in the plane. Find all the points within a rectangle □„Clipping“ in VR □Find tuples in a database with values within given bounds for attributes A1 and A2 □Generalization for searching in a k-dimensional field (all points contained in a k-dimensional field)

4
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures4 Geometric Problems (2) Given a set of rectangles in the plane. Find all pairwise intersecting rectangles □Correctness test at designing Very Large Scale Integration ( VLSI), chip layers as rectangles Given a set of 3-dimensional objects (compounds). Find pair wise intersecting objects □Ensuring the rule distance resp. the safety margin in CAD Given a set of rectangles in the plane. Find the slice plane. □Geographic Information Systems (GIS), approximation of generic forms through rectangles, determining areas with specific properties on distinct maps (e.g. find regions which are sandy (map 1), wet (map 2), and between 200 and 300 m altitude (elevation map))

5
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures5 Geometric Problems (3) Given a set of polyhedrons in space. Determine the edges or portion of edges that are visible or hidden from a viewpoint. □Computation of a realistic view of a 3-dimensional scene □Determining the coverage area of a transmitter, the area with no reception Given a set of points in a k-dimensional space and a query-point P. Find the point S closest to P. □Voice recognition: A spoken word is characterized by features and compared with the vocabulary (point set in a k- dimensional space).

6
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures6 Classification of Geometric Problems 2 classes of problems: Set problems: Compute the property of a set of objects S you’re interested in. □E.g. the outline of the area covered by S Search problems: Given a set of objects S and a query- object q. Find all objects in S that have a specific relation with q. Set problems are often reducible to search problems E.g. Plane-Sweep algorithms reduce a k-dimensional set problem to a (k-1)-dimensional search problem Search problems are solved by organizing S with the aid of appropriate data structures and indexing

7
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures7 First Problem How do we efficiently represent this figure?

8
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures8 Representing Figures (1) How about a matrix representation? Black = 1, empty = 0 00 11 00 11 00 11 000 1 1 000 11 00 1111 1111 1111 1111 0000 0000 0000 0000 0000 0000 Not very effective

9
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures9 Representing Figures (2) Idea: represent areas, not points Now represent the areas using another structure Quadtrees do this 11 11 11 1 1 11 1111 1111 1111 1111 11 11 11 1 1 11 1111 1111 1111 1111

10
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures10 Overview of Quadtrees Quadtree is a generic term Quadtree: A class of hierarchical data structures that are based on recursive decomposition of space Differentiation is possible based on: Data type represented by the Quadtree : Point data, regions, curves, surfaces, and volumes Principle of decomposition: regular vs. input-driven Resolution: Fixed vs. variable number of decomposition steps Examples: Region quadtree Point quadtree Literature: Samet, H.; “The Quadtree and Related Hierarchical Data Structures”, ACM Comp. Surveys, Vol. 16, No. 2, June 1984 (available from ACM DL)

11
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures11 Region Quadtree Successive subdivision of the image array into 4 equal- sized quadrants. Basic idea: Figure as an image array, i.e. every pixel of the figure has a value of 1, all other pixels have a value of 0 The entire area (image array) is subdivided into 4 equal- sized quadrants (usually 2 k dimensional) Upon each division one has to check if the image array of a quadrant is homogeneous (i.e. only 1s or only 0s) □homogeneous no further subdivision □heterogeneous further subdivisions until homogeneous (possibly single pixels)

12
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures12 Region Quadtree: Terminology NWNE SWSE E N W S

13
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures13 Region Quadtree: Terminology NW NE SW SE GREY BLACK WHITE 01 10 01 0 Leaf nodes are said to be either BLACK or WHITE Non-leaf nodes are said to be GREY

14
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures14 Region Quadtree: Example Step1 11 11 11 1 1 11 1111 1111 1111 1111 11 11 11 1 1 11 1111 1111 1111 1111

15
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures15 Region Quadtree : Example 11 11 11 1 1 11 1111 1111 1111 1111 Step2

16
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures16 Region Quadtree : Example 11 11 11 1 1 11 1111 1111 1111 1111 Step3

17
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures17 Region Quadtree: Set Operations Quadtrees are especially useful for performing set operations Overlap (intersection) Overlays (union) Example: From data provided on forests, grassland, fields, nature reserve and polder, identify which areas are in agricultural use (typical overlay problem)

18
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures18 Overlays with Quadtrees: Example

19
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures19 Overlays with Quadtrees: Algorithm (1) Traverse top-down quadtree QT1 beginning with root and compare with the corresponding node in quadtree QT2 if the node in QT1 is BLACK, then the corresponding node in the resulting quadtree is also BLACK if the node in QT1 is WHITE, then the node in the resulting quadtree is set to the node in QT2 if the node in QT1 is GREY, then set the node in the resulting quadtree to GREY if QT2 is GREY GREY if QT2 is WHITE BLACK if QT2 is BLACK if both nodes are gray, the algorithm returns after processing the next level to consolidate if necessary.

20
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures20 Overlays with Quadtrees: Algorithm (2) BLACKx WHITExx GREY GREY 1) 1) A check for a merger need to be performed to determine if all 4 sons are BLACK. Decision Table: Example:

21
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures21 Intersection with Quadtrees (Example)

22
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures22 Intersection with Quadtrees: Algorithm (1) Traverse top-down quadtree QT1 beginning with root and compare with the corresponding node in quadtree QT2 if the node in QT1 is BLACK and the node in QT2 is BLACK, then set the corresponding node in the resulting QT to BLACK if the node in QT1 or QT2 is WHITE, then the resulting node is WHITE if the node in QT1 is GREY, then set the node to GREY if QT2 is also GREY WHITE if QT2 is WHITE GREY if QT2 is BLACK if both nodes are grey, the algorithm returns after processing the next level to consolidate if necessary.

23
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures23 Intersection with Quadtrees: Algorithm (2) WHITEx BLACKxx GREY GREY 1) 1) A check for a merger need to be performed to determine if all 4 sons are WHITE. Decision Table: Example:

24
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures24 Complexity Analysis Complexity is proportional to the number of nodes in the quadtree best case: whole area unicolored (1 node) worst case: “Salt and Pepper”, i.e. all inner nodes are grey, need to go down to pixel level (depends on the resolution)

25
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures25 Point-Quadtree: Definition Point data 2-D points can be stored and indexed in a point- quadtree A point-quadtree splits the space into 4 quadrants at the insertion point The insertion order is thus important (it determines the structure of the tree)

26
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures26 Point-Quadtree (Example) (100,100) (0,0)(100,0) (0,100) (35,40) Chicago (5,45) Denver (25,35) Omaha (50,10) Mobile (90,5) Miami (85,15) Atlanta (80,65) Buffalo (60,75) Toronto Insertion order: Chicago, Mobile, Toronto, Buffalo, Denver, Omaha, Atlanta, Miami

27
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures27 Point-Quadtree (Example) Insertion order: Chicago, Mobile, Toronto, Buffalo, Denver, Omaha, Atlanta, Miami Chicago Mobile Buffalo Atlanta Miami (100,100) (0,0) (100,0) (0,100) (35,40) Chicago (5,45) Denver (25,35) Omaha (50,10) Mobile (90,5) Miami (85,15) Atlanta (80,65) Buffalo (60,75) Toronto Denver TorontoOmaha

28
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures28 „find all points (records) within a given distance from another point (record)” Point-Quadtree (Search Example) Find all the cities, at most 8 units from the point (83,10) Chicago Mobile Buffalo Atlanta Miami (100,100) (0,0) (100,0) (0,100) (35,40) Chicago (5,45) Denver (25,35) Omaha (50,10) Mobile (90,5) Miami (85,15) Atlanta (80,65) Buffalo (60,75) Toronto Denver TorontoOmaha

29
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures29 Point-Quadtree (Search Example) The root is (35,40) NW, NE, SW can be ignored Next is Mobile (50,10) NW and SW can be ignored Are Atlanta or Miami within 8? Solutions based on approximations with rectangles (bounding box), can contain negative reports Exact solution with a circle Find all the cities, at most 8 units from the point (83,10) (100,100) (0,0) (100,0) (0,100) (35,40) Chicago (5,45) Denver (25,35) Omaha (50,10) Mobile (90,5) Miami (85,15) Atlanta (80,65) Buffalo (60,75) Toronto

30
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures30 Search in Point-Quadtrees Especially suitable for search problems of the following type: “find all points (records) within a given distance from another point (record)” Point Quadtrees are quite efficient for 2 dimensions. In k > 2 dimensions however, Point Quadtrees have a large branching factor and thus contain many NULL-pointers Chicago Mobile Buffalo Atlanta Miami Denver TorontoOmaha

31
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures31 K-d Trees k-dimensional point data We want to avoid the large fan-out of point quadtree Quadtrees (2 2 =4-way split) Octrees (2 3 =8-way split) In general: 2 k -way split A k-d tree is a binary search tree with the distinction that at each level, a different coordinate (dimension) is tested to determine the direction of the branch 2-way split Node consists of □2 child pointers □Name □Key

32
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures32 K-d Tree: Basic Idea Construct a binary Tree At each step, choose one of the coordinates as a basis of dividing the rest of the points For example, at the root, choose x as the basis □Like binary search trees, all items to the left of root will have the x-coordinate less than that of the root □All items to the right of the root will have the x-coordinate greater than (or equal to) that of the root Choose y as the basis for discrimination for the root’s children Choose x again for the root’s grandchildren

33
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures33 K-d Tree: Example Insertion order: Chicago, Mobile, Toronto, Buffalo, Denver, Omaha, Atlanta, Miami (100,100) (0,0) (100,0) (0,100) (35,40) Chicago (5,45) Denver (25,35) Omaha (50,10) Mobile (90,5) Miami (85,15) Atlanta (80,65) Buffalo (60,75) Toronto Fewer NULL pointers! Denver MiamiOmaha K-d tree Alternation of discriminator x Toronto y Buffalo x Atlanta x Chicago x≥x chicago x

34
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures34 Adaptive k-d Tree Like k-d tree, but Division is between (not on) data points. Division not by alternating the discriminator, but according to the dimension with the maximum spread (max-min). Balanced k-d Tree Internal nodes contain only split coordinates and their value (e.g. X=30) The records are stored at the terminal nodes (leaves) Insertion of one record requires rebuilding the tree ( Static structure ) Deletion of one record is highly complex Search is like k-d tree

35
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures35 Example adaptive k-d tree (k=2) (100,100) (0,0)(100,0) (0,100) (35,40) Chicago (5,45) Denver (25,35) Omaha (50,10) Mobile (90,5) Miami (85,15) Atlanta (80,65) Buffalo (60,75) Toronto 55,x 30,x 40,y 15,x 25,y 10,y 70,x Chicago (35,45) Mobile (50,10) Toronto (60,75) Buffalo (80,65) Denver (5,45) Omaha (25,35) Atlanta (85,15) Miami (90,5)

36
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures36 Comparison Region Quadtree parallelizable Point Quadtree: parallelizable, dynamic K-d Tree: Not easily parallelizable, dynamic, better sequential data structure Adaptive k-d Tree: Not easily parallelizable, static, balanced, optimized search

37
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures37 Curvilinear Data: Strip Tree (Example) Q P BCDE Selected as splitting point for A, since W l > W r Strip Tree: Splitting point for C WlWl WrWr Strips become successively thinner The splitting finishes when all strips are thinner than a predefined value A Root strip Basic idea: Represent the curve by strips enclosing portions of it

38
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures38 Strip Tree: Algorithm Recursive Splitting Join the endpoints of the curve (i.e. P and Q) The root corresponds to a rectangle enclosing the curve and whose sides are parallel to line PQ The next split point □Lies on the curve and on one side of the strip rectangle □Has maximum distance to line PQ Node Structure The node is an 8-tuple and contains □2 pairs of X,Y coordinates (the diagonal endpoints) □The strip width on each side of the line connecting the endpoints □Pointers to the 2 sons

39
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures39 Representation of Arbitrary Curves Curves are well represented by chains, however indexing them is difficult A strip-tree is a quadtree variant for representing arbitrary curves by hierarchical decomposition Useful in applications that involve search and set operations

40
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures40 Trees and Tries We have seen (normal) trees for storing figures We can also use Tries! Tries store the key “along the way”

41
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures41 Kd-Tries: Example LR UD L R LR D UDU L R UD UDUD L: left R: right D: Down U: Up X dim Y dim Key stored along the path from the root, Ex: “RDRU” The complete keys are located at the leaves RDRU

42
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures42 Binary Tries 01 10 0 1 01 0 101 0 1 10 1010 A binary trie is a binary tree, whereby left sons correspond to a “0” at the corresponding position in the key, and right sons correspond to a “1” 100101

43
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures43 Geometric Interpretation of the Binary Trie A trie compresses a 1-dimensional space with 2 d addresses through coding to a string with d characters In previous example: d=3+3=6 The root represents the complete space Left son (first character = 0) represents the lower half of the search space Right son (first character = 1) represents the upper half of the search space.

44
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures44 Binary Tries, Revisited 01 10 0 1 01 0 101 0 1 10 1010 X0X1X2X0X1X2 000001010011100101110111 Y0Y1Y2Y0Y1Y2 000 001 010 011 100 101 110 111 100101100101 Binary x coordinate of the cell Binary y coordinate of the cell In 2D each key is a pair of bit sequences (x,y) The path to the key is composed of bits that are taken from the x and y coordinates on a rotating basis

45
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures45 Observations Kd-trie splits by rotating x and y coordinates A kd-trie is unique for a given set of keys Trie structure does not depend on the insertion order Geometric kd-tries generate a total order of the search space Two points P1 and P2 in the kd-Space will always have the same order

46
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures46 Building a Linear Order Given a 2D grid how (1) to find a linear order for the cells of the grid such that cells close together in space are also (as far as possible) close to each other in the linear order, and (2) to define this order recursively for a grid that is obtained by a hierarchical subdivision of space. The most popular solution is Bit interleaving (Z-Order)

47
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures47 Z-Order 0000 0000 0000 0000 1111 1111 11 11 11 11 1111 1111 1111 1111 0000 0000 0000 0000 Y0Y1Y2Y0Y1Y2 Start with a vertical split for X 0 (Z=X 0 ) 000 001 010 011 100 101 110 111 X0X1X2X0X1X2 000001010011100101110111 Addresses in a 2-dimensional space are identified by pairs (x,y) of values Each x and y value is a sequence of d bits This results in a grid with 2d x 2d cells How to build the addresses using bit interleaving?

48
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures48 Z-Order 00 10 11 01 Horizontal split for Y 0 (Z=X 0 Y 0 ) X0X1X2X0X1X2 Y0Y1Y2Y0Y1Y2 000001010011100101110111 000 001 010 011 100 101 110 111

49
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures49 Z-Order 000 001 000 001 000 001 000 001 100 101 100 101 100 101 110 111 110 111 110 111 110 111 010 011 010 011 010 011 Vertical split for X 1 (Z=X 0 Y 0 X 1 ) X0X1X2X0X1X2 Y0Y1Y2Y0Y1Y2 000001010011100101110111 000 001 010 011 100 101 110 111

50
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures50 Z-Order 0000 0010 0000 0010 0001 0011 0001 0011 1000 1010 1000 1010 1001 1011 1100 1110 1100 1110 1101 1111 1101 1111 0100 0110 0100 0110 0111 0101 0111 Horizontal split for Y 1 (Z=X 0 Y 0 X 1 Y 1 ) X0X1X2X0X1X2 Y0Y1Y2Y0Y1Y2 000001010011100101110111 000 001 010 011 100 101 110 111

51
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures51 Z-Order 00000000010010000101 00000000010010000101 00010000110011000111 00010000110011000111 10000100011010010101 10000100011010010101 1001010011 1001010011 1011010111 1011010111 11000110011110011101 11000110011110011101 11010110111111011111 11010110111111011111 01000010010110001101 01000010010110001101 01110011110101001011 01010010110111001111 Vertical split for X 2 (Z=X 0 Y 0 X 1 Y 1 X 2 ) X0X1X2X0X1X2 Y0Y1Y2Y0Y1Y2 000001010011100101110111 000 001 010 011 100 101 110 111

52
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures52 Z-Order 000000000010001000001010 000001000011001001001011 000100000110001100001110 000101000111001101001111 100000100010101000101010 100001100011101001101011 100100100110 100101100111 101100101110 101101101111 110000110010111000111010 110001110011111001111011 110100110110111100111110 110101110111111101111111 010000010010011000011010 010001010011011001011011 011100011110010100010110 010101010111011101011111 Horizontal split for Y 2 (Z=X 0 Y 0 X 1 Y 1 X 2 Y 2 ) X0X1X2X0X1X2 Y0Y1Y2Y0Y1Y2 000001010011100101110111 000 001 010 011 100 101 110 111 Lowest z z-low und z-hi are located in the left lower and right upper corner highest z

53
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures53 Z-Order 000000000010001000001010 000001000011001001001011 000100000110001100001110 000101000111001101001111 100000100010101000101010 100001100011101001101011 100100100110 100101100111 101100101110 101101101111 110000110010111000111010 110001110011111001111011 110100110110111100111110 110101110111111101111111 010000010010011000011010 010001010011011001011011 011100011110010100010110 010101010111011101011111 X0X1X2X0X1X2 Y0Y1Y2Y0Y1Y2 000001010011100101110111 000 001 010 011 100 101 110 111 If each possible z-value represents a cell in the grid, this yields the following space filling curve:

54
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures54 Example: Point Data X0X1X2X0X1X2 Y0Y1Y2Y0Y1Y2 000000000010001000001010 000001000011001001001011 000100000110001100001110 000101000111001101001111 100000100010101000101010 100001100011101001101011 100100100110 100101100111 101100101110 101101101111 110000110010111000111010 110001110011111001111011 110100110110111100111110 110101110111111101111111 010000010010011000011010 010001010011011001 011011011011 011100011110010100010110 010101010111011101011111 000001010011100101110111 000 001 010 011 100 101 110 111 Data point: A = (3, 5) = (011, 101) Bit interleaving: z = 011011 This gives simple method for translating between x,y coordinates and z- values A

55
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures55 Example: Region Data X0X1X2X0X1X2 Y0Y1Y2Y0Y1Y2 000000000010001000001010 000001000011001001001011 000100000110001100001110 000101000111001101001111 100000100010101000101010 100001100011101001101011 100100100110 100101100111 101100101110 101101101111 110000110010111000111010 110001110011111001111011 110100110110111100111110 110101110111111101111111 010000010010011000011010 010001010011011001011011 011100011110010100010110 010101010111011101011111 000001010011100101110111 000 001 010 011 100 101 110 111 00110 0111 The object with a z-value of 001 contains all elements with a prefix equal to 001

56
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures56 Bit Interleaving: Recursive Definition A vertical split differentiates values of X 0 A horizontal split differentiates values of Y 0 The address is given by the z-value (00,01,10,11) The z-value represents the path in the kd-trie We can use the z-values alone, s.t. we don’t need the kd-trie anymore 0111 0010 Y 0 =1 Y 0 =0 X 0 =0X 0 =1 LR UDDU 00011011 1101 1100 1110 1111

57
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures57 Explanation Z-order encoding preserves the spatial proximity of points homogeneous regions are represented compactly the elements are clustered => efficient access to secondary storage Z-order coded data can be stored into secondary storage using conventional prefix B+ trees efficient “range queries” are possible direct access via z-value

58
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures58 Geometric Data Structures for non-Geometric Data? Application of geometric data structures for geometric problems is obvious Geographic Information System (GIS) Computer graphic A further application of geometric data structures: multidimensional databases OLAP (Online Analytical Processing) Data-mining

59
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures59 Multidimensional Data Space Coke Fanta Beer Milk Juice Water 1 2 3 4 5 6 7 West East South North Region Product Day Each cell corresponds to an observation point, described by the attributes of individual cells. Each cell contains an observation, e.g. the sales value of Product “Coke” on Day “4” in Region “East”.

60
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures60 Multidimensional (MD) Data Space Each observed fact w can be expressed as a function of the dimensions, which define the multidimensional data space: w = f(x,y,z) DOM(f) = DOM(x) x DOM(y) x DOM(z) A fact w 0 is the value of function f for the specific values (x 0,y 0,z 0 ) w 0 = f(x 0,y 0,z 0 )

61
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures61 Sparseness in the MD Space Typically, only a small fragment of the space defined by DOM(a) x … x DOM(z) is actually used Addressing in the MD space (a multi-dimensional array) is easy and fast However inefficient memory usage Need to find mechanisms to compress the MD space Linearization of the data space by totally ordering the facts with the aid of space filling curves Extraction of all facts into a table, then join this table with descriptive dimension tables

62
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures62 Linearization of the MD Space Linearization with the aid of space filling curves (e.g. Z- Transforms or Hilbert construction) The principle is based on a coding, that generates a total order of all points in the data space The indexing is done by conventional, order preserving indexing methods (e.g. B + -Trees) The mechanism is well suited for 2-4 dimensions (x,y,z,t) for tracking applications and range queries

63
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures63 Data-Mining Till now: Storage und search of data Evaluation and interpretation of results is done using Data-Mining Typical problem: “Where, in supermarket, should we put the beer that should be sold as early as possible (close date expiry, low sales volume..)”

64
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures64 Data-Mining Overview of basic techniques for data-mining Variance Detection Association Clustering Numerical Prediction Classification Forecast, Prediction Knowledge Discovery Data Mining

65
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures65 Prediction: Classification Data entries are classified according to a certain property PurchasedLendingLendingto sort yearTotallast yearout 199415785yes 20003410203No 19822558310yes............ New data entry is automatically assigned PurchasedLendingLendingto sort yeartotallast yearout 198858939?

66
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures66 Prediction: Numerical Prediction Numerical prediction is similar to classification, however, a value is predicted instead of a class. Most important application: Weather forecast Yesterday Today Tomorrow Temp. PressureTemp. PressureTemp. 17,0 99019,2 100120,5 10,8 101112,1 9738,2 30,5 100030,4 99429,9............... 14,2 98017,0 991?

67
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures67 Knowledge Discovery: Association Tries to find common rules between the characteristics of data. Interesting relations are returned. Example: From the previous weather data one could derive the following rules: With a probability of 0.89: IF "Air pressure today" > "Air pressure yesterday" AND "Temperature today" > 12° THEN "Temperature tomorrow" > "Temperature today" With a probability of 0.75: IF "Air pressure today" < "Air pressure yesterday" AND "Temperature today" > 15° THEN "Temperature tomorrow" < "Temperature today"

68
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures68 Knowledge Discovery: Variance Detection Given a data pool, variance detection tries to distinguish normal data entries from “Outlier” entries Example: A home security system has 100 Sensors (temperature, light barrier, sound detector,....) should detect intruders. Hereby, flying birds, shade in the moonlight or car headlight should not have any impact on the operation of the system. The system gets a database describing “safe" configurations (where no alarm has to be triggered). The system creates a Model of the non-alarm-cases. Data for real intrusions are not provided! Using this model, updates from sensors can be checked: If they do not fit in the non-alarm-cases, an alarm is triggered.

69
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures69 Knowledge Discovery: Clustering Find similar data entries and group them into clusters Example: Exam, the percentage that exercises E1.. E5 were correctly answered? StudentE1E2E3E4E5 S12084111774 S26241578119 S37933606830 S41993252387 S5288902679 Ø41,66830,64357,8 Clustering may divide the students taking the exam into 2 groups: G 1 = {S1, S4, S5}: good at exercises E2 und E5, G 2 = {S2, S3} : good at exercises E1, E3 und E4. Possibility of individual support!

70
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures70 k-means Clustering: Example

71
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures71 k-means Clustering: Algorithm 1.Fix the number of desired clusters Parameter k. 2.Place K random points into the space initial group centroids. 3.For all m data objects determine the Euclidian distance of the object (as vector) from all centroids und assign the object to the closest centroid. 4.For all k centroids determine the real center of the assigned cluster (average). These are the new centroids. 5.Repeat steps 3 and 4, until the centroids no longer move (Old and new ones are so close to each other, so that no real improvement is more remarkable).

72
ICS-II - 2006Lecture 14: Geometric Algorithms and Data Structures72 k-means Algorithm: Properties Finds a local optimum, but does not necessarily find the most optimal configuration (global optimum) Is a Heuristic Significantly sensitive to the initial randomly selected cluster centers Optimizations Randomly modify the results between different rounds The k-means algorithm can be run multiple times Operates with linear optimization Highly stable and frequently used approach Operates also for very large data sets with a controllable complexity Ian H. Witten, Eibe Frank “Data Mining – Practical Machine Learning Tools and Techniques with Java Implementations” Academic Press, San Diego, CA; 2000; ISBN 1-55860-552-5

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google