Download presentation

Presentation is loading. Please wait.

Published byMariah Lawson Modified about 1 year ago

1
© Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group Geometric Algorithms and Data Structures Prof. Neeraj Suri Andreas Johansson Constantin Sarbu Abdelmajid Khelil

2
ICS-II Lecture 14: Geometric Algorithms and Data Structures2 Outline Introduction Geometric Data Structures Quadtree □Region quadtree □Point quadtree K-d tree Strip tree K-d trie Binary trie Multidimensional Data Z-Order Multidimensional data Data mining

3
ICS-II Lecture 14: Geometric Algorithms and Data Structures3 Geometric Problems (1) Algorithmic geometry: Study of the algorithmic complexity of elementary geometric problems Geometric problems: Are often abstract formulations of practical problems (similar to graph theory) Some geometric problems and their interpretation: Given a set of points in the plane. Find all the points within a rectangle □„Clipping“ in VR □Find tuples in a database with values within given bounds for attributes A1 and A2 □Generalization for searching in a k-dimensional field (all points contained in a k-dimensional field)

4
ICS-II Lecture 14: Geometric Algorithms and Data Structures4 Geometric Problems (2) Given a set of rectangles in the plane. Find all pairwise intersecting rectangles □Correctness test at designing Very Large Scale Integration ( VLSI), chip layers as rectangles Given a set of 3-dimensional objects (compounds). Find pair wise intersecting objects □Ensuring the rule distance resp. the safety margin in CAD Given a set of rectangles in the plane. Find the slice plane. □Geographic Information Systems (GIS), approximation of generic forms through rectangles, determining areas with specific properties on distinct maps (e.g. find regions which are sandy (map 1), wet (map 2), and between 200 and 300 m altitude (elevation map))

5
ICS-II Lecture 14: Geometric Algorithms and Data Structures5 Geometric Problems (3) Given a set of polyhedrons in space. Determine the edges or portion of edges that are visible or hidden from a viewpoint. □Computation of a realistic view of a 3-dimensional scene □Determining the coverage area of a transmitter, the area with no reception Given a set of points in a k-dimensional space and a query-point P. Find the point S closest to P. □Voice recognition: A spoken word is characterized by features and compared with the vocabulary (point set in a k- dimensional space).

6
ICS-II Lecture 14: Geometric Algorithms and Data Structures6 Classification of Geometric Problems 2 classes of problems: Set problems: Compute the property of a set of objects S you’re interested in. □E.g. the outline of the area covered by S Search problems: Given a set of objects S and a query- object q. Find all objects in S that have a specific relation with q. Set problems are often reducible to search problems E.g. Plane-Sweep algorithms reduce a k-dimensional set problem to a (k-1)-dimensional search problem Search problems are solved by organizing S with the aid of appropriate data structures and indexing

7
ICS-II Lecture 14: Geometric Algorithms and Data Structures7 First Problem How do we efficiently represent this figure?

8
ICS-II Lecture 14: Geometric Algorithms and Data Structures8 Representing Figures (1) How about a matrix representation? Black = 1, empty = Not very effective

9
ICS-II Lecture 14: Geometric Algorithms and Data Structures9 Representing Figures (2) Idea: represent areas, not points Now represent the areas using another structure Quadtrees do this

10
ICS-II Lecture 14: Geometric Algorithms and Data Structures10 Overview of Quadtrees Quadtree is a generic term Quadtree: A class of hierarchical data structures that are based on recursive decomposition of space Differentiation is possible based on: Data type represented by the Quadtree : Point data, regions, curves, surfaces, and volumes Principle of decomposition: regular vs. input-driven Resolution: Fixed vs. variable number of decomposition steps Examples: Region quadtree Point quadtree Literature: Samet, H.; “The Quadtree and Related Hierarchical Data Structures”, ACM Comp. Surveys, Vol. 16, No. 2, June 1984 (available from ACM DL)

11
ICS-II Lecture 14: Geometric Algorithms and Data Structures11 Region Quadtree Successive subdivision of the image array into 4 equal- sized quadrants. Basic idea: Figure as an image array, i.e. every pixel of the figure has a value of 1, all other pixels have a value of 0 The entire area (image array) is subdivided into 4 equal- sized quadrants (usually 2 k dimensional) Upon each division one has to check if the image array of a quadrant is homogeneous (i.e. only 1s or only 0s) □homogeneous no further subdivision □heterogeneous further subdivisions until homogeneous (possibly single pixels)

12
ICS-II Lecture 14: Geometric Algorithms and Data Structures12 Region Quadtree: Terminology NWNE SWSE E N W S

13
ICS-II Lecture 14: Geometric Algorithms and Data Structures13 Region Quadtree: Terminology NW NE SW SE GREY BLACK WHITE Leaf nodes are said to be either BLACK or WHITE Non-leaf nodes are said to be GREY

14
ICS-II Lecture 14: Geometric Algorithms and Data Structures14 Region Quadtree: Example Step

15
ICS-II Lecture 14: Geometric Algorithms and Data Structures15 Region Quadtree : Example Step2

16
ICS-II Lecture 14: Geometric Algorithms and Data Structures16 Region Quadtree : Example Step3

17
ICS-II Lecture 14: Geometric Algorithms and Data Structures17 Region Quadtree: Set Operations Quadtrees are especially useful for performing set operations Overlap (intersection) Overlays (union) Example: From data provided on forests, grassland, fields, nature reserve and polder, identify which areas are in agricultural use (typical overlay problem)

18
ICS-II Lecture 14: Geometric Algorithms and Data Structures18 Overlays with Quadtrees: Example

19
ICS-II Lecture 14: Geometric Algorithms and Data Structures19 Overlays with Quadtrees: Algorithm (1) Traverse top-down quadtree QT1 beginning with root and compare with the corresponding node in quadtree QT2 if the node in QT1 is BLACK, then the corresponding node in the resulting quadtree is also BLACK if the node in QT1 is WHITE, then the node in the resulting quadtree is set to the node in QT2 if the node in QT1 is GREY, then set the node in the resulting quadtree to GREY if QT2 is GREY GREY if QT2 is WHITE BLACK if QT2 is BLACK if both nodes are gray, the algorithm returns after processing the next level to consolidate if necessary.

20
ICS-II Lecture 14: Geometric Algorithms and Data Structures20 Overlays with Quadtrees: Algorithm (2) BLACKx WHITExx GREY GREY 1) 1) A check for a merger need to be performed to determine if all 4 sons are BLACK. Decision Table: Example:

21
ICS-II Lecture 14: Geometric Algorithms and Data Structures21 Intersection with Quadtrees (Example)

22
ICS-II Lecture 14: Geometric Algorithms and Data Structures22 Intersection with Quadtrees: Algorithm (1) Traverse top-down quadtree QT1 beginning with root and compare with the corresponding node in quadtree QT2 if the node in QT1 is BLACK and the node in QT2 is BLACK, then set the corresponding node in the resulting QT to BLACK if the node in QT1 or QT2 is WHITE, then the resulting node is WHITE if the node in QT1 is GREY, then set the node to GREY if QT2 is also GREY WHITE if QT2 is WHITE GREY if QT2 is BLACK if both nodes are grey, the algorithm returns after processing the next level to consolidate if necessary.

23
ICS-II Lecture 14: Geometric Algorithms and Data Structures23 Intersection with Quadtrees: Algorithm (2) WHITEx BLACKxx GREY GREY 1) 1) A check for a merger need to be performed to determine if all 4 sons are WHITE. Decision Table: Example:

24
ICS-II Lecture 14: Geometric Algorithms and Data Structures24 Complexity Analysis Complexity is proportional to the number of nodes in the quadtree best case: whole area unicolored (1 node) worst case: “Salt and Pepper”, i.e. all inner nodes are grey, need to go down to pixel level (depends on the resolution)

25
ICS-II Lecture 14: Geometric Algorithms and Data Structures25 Point-Quadtree: Definition Point data 2-D points can be stored and indexed in a point- quadtree A point-quadtree splits the space into 4 quadrants at the insertion point The insertion order is thus important (it determines the structure of the tree)

26
ICS-II Lecture 14: Geometric Algorithms and Data Structures26 Point-Quadtree (Example) (100,100) (0,0)(100,0) (0,100) (35,40) Chicago (5,45) Denver (25,35) Omaha (50,10) Mobile (90,5) Miami (85,15) Atlanta (80,65) Buffalo (60,75) Toronto Insertion order: Chicago, Mobile, Toronto, Buffalo, Denver, Omaha, Atlanta, Miami

27
ICS-II Lecture 14: Geometric Algorithms and Data Structures27 Point-Quadtree (Example) Insertion order: Chicago, Mobile, Toronto, Buffalo, Denver, Omaha, Atlanta, Miami Chicago Mobile Buffalo Atlanta Miami (100,100) (0,0) (100,0) (0,100) (35,40) Chicago (5,45) Denver (25,35) Omaha (50,10) Mobile (90,5) Miami (85,15) Atlanta (80,65) Buffalo (60,75) Toronto Denver TorontoOmaha

28
ICS-II Lecture 14: Geometric Algorithms and Data Structures28 „find all points (records) within a given distance from another point (record)” Point-Quadtree (Search Example) Find all the cities, at most 8 units from the point (83,10) Chicago Mobile Buffalo Atlanta Miami (100,100) (0,0) (100,0) (0,100) (35,40) Chicago (5,45) Denver (25,35) Omaha (50,10) Mobile (90,5) Miami (85,15) Atlanta (80,65) Buffalo (60,75) Toronto Denver TorontoOmaha

29
ICS-II Lecture 14: Geometric Algorithms and Data Structures29 Point-Quadtree (Search Example) The root is (35,40) NW, NE, SW can be ignored Next is Mobile (50,10) NW and SW can be ignored Are Atlanta or Miami within 8? Solutions based on approximations with rectangles (bounding box), can contain negative reports Exact solution with a circle Find all the cities, at most 8 units from the point (83,10) (100,100) (0,0) (100,0) (0,100) (35,40) Chicago (5,45) Denver (25,35) Omaha (50,10) Mobile (90,5) Miami (85,15) Atlanta (80,65) Buffalo (60,75) Toronto

30
ICS-II Lecture 14: Geometric Algorithms and Data Structures30 Search in Point-Quadtrees Especially suitable for search problems of the following type: “find all points (records) within a given distance from another point (record)” Point Quadtrees are quite efficient for 2 dimensions. In k > 2 dimensions however, Point Quadtrees have a large branching factor and thus contain many NULL-pointers Chicago Mobile Buffalo Atlanta Miami Denver TorontoOmaha

31
ICS-II Lecture 14: Geometric Algorithms and Data Structures31 K-d Trees k-dimensional point data We want to avoid the large fan-out of point quadtree Quadtrees (2 2 =4-way split) Octrees (2 3 =8-way split) In general: 2 k -way split A k-d tree is a binary search tree with the distinction that at each level, a different coordinate (dimension) is tested to determine the direction of the branch 2-way split Node consists of □2 child pointers □Name □Key

32
ICS-II Lecture 14: Geometric Algorithms and Data Structures32 K-d Tree: Basic Idea Construct a binary Tree At each step, choose one of the coordinates as a basis of dividing the rest of the points For example, at the root, choose x as the basis □Like binary search trees, all items to the left of root will have the x-coordinate less than that of the root □All items to the right of the root will have the x-coordinate greater than (or equal to) that of the root Choose y as the basis for discrimination for the root’s children Choose x again for the root’s grandchildren

33
ICS-II Lecture 14: Geometric Algorithms and Data Structures33 K-d Tree: Example Insertion order: Chicago, Mobile, Toronto, Buffalo, Denver, Omaha, Atlanta, Miami (100,100) (0,0) (100,0) (0,100) (35,40) Chicago (5,45) Denver (25,35) Omaha (50,10) Mobile (90,5) Miami (85,15) Atlanta (80,65) Buffalo (60,75) Toronto Fewer NULL pointers! Denver MiamiOmaha K-d tree Alternation of discriminator x Toronto y Buffalo x Atlanta x Chicago x≥x chicago x

34
ICS-II Lecture 14: Geometric Algorithms and Data Structures34 Adaptive k-d Tree Like k-d tree, but Division is between (not on) data points. Division not by alternating the discriminator, but according to the dimension with the maximum spread (max-min). Balanced k-d Tree Internal nodes contain only split coordinates and their value (e.g. X=30) The records are stored at the terminal nodes (leaves) Insertion of one record requires rebuilding the tree ( Static structure ) Deletion of one record is highly complex Search is like k-d tree

35
ICS-II Lecture 14: Geometric Algorithms and Data Structures35 Example adaptive k-d tree (k=2) (100,100) (0,0)(100,0) (0,100) (35,40) Chicago (5,45) Denver (25,35) Omaha (50,10) Mobile (90,5) Miami (85,15) Atlanta (80,65) Buffalo (60,75) Toronto 55,x 30,x 40,y 15,x 25,y 10,y 70,x Chicago (35,45) Mobile (50,10) Toronto (60,75) Buffalo (80,65) Denver (5,45) Omaha (25,35) Atlanta (85,15) Miami (90,5)

36
ICS-II Lecture 14: Geometric Algorithms and Data Structures36 Comparison Region Quadtree parallelizable Point Quadtree: parallelizable, dynamic K-d Tree: Not easily parallelizable, dynamic, better sequential data structure Adaptive k-d Tree: Not easily parallelizable, static, balanced, optimized search

37
ICS-II Lecture 14: Geometric Algorithms and Data Structures37 Curvilinear Data: Strip Tree (Example) Q P BCDE Selected as splitting point for A, since W l > W r Strip Tree: Splitting point for C WlWl WrWr Strips become successively thinner The splitting finishes when all strips are thinner than a predefined value A Root strip Basic idea: Represent the curve by strips enclosing portions of it

38
ICS-II Lecture 14: Geometric Algorithms and Data Structures38 Strip Tree: Algorithm Recursive Splitting Join the endpoints of the curve (i.e. P and Q) The root corresponds to a rectangle enclosing the curve and whose sides are parallel to line PQ The next split point □Lies on the curve and on one side of the strip rectangle □Has maximum distance to line PQ Node Structure The node is an 8-tuple and contains □2 pairs of X,Y coordinates (the diagonal endpoints) □The strip width on each side of the line connecting the endpoints □Pointers to the 2 sons

39
ICS-II Lecture 14: Geometric Algorithms and Data Structures39 Representation of Arbitrary Curves Curves are well represented by chains, however indexing them is difficult A strip-tree is a quadtree variant for representing arbitrary curves by hierarchical decomposition Useful in applications that involve search and set operations

40
ICS-II Lecture 14: Geometric Algorithms and Data Structures40 Trees and Tries We have seen (normal) trees for storing figures We can also use Tries! Tries store the key “along the way”

41
ICS-II Lecture 14: Geometric Algorithms and Data Structures41 Kd-Tries: Example LR UD L R LR D UDU L R UD UDUD L: left R: right D: Down U: Up X dim Y dim Key stored along the path from the root, Ex: “RDRU” The complete keys are located at the leaves RDRU

42
ICS-II Lecture 14: Geometric Algorithms and Data Structures42 Binary Tries A binary trie is a binary tree, whereby left sons correspond to a “0” at the corresponding position in the key, and right sons correspond to a “1”

43
ICS-II Lecture 14: Geometric Algorithms and Data Structures43 Geometric Interpretation of the Binary Trie A trie compresses a 1-dimensional space with 2 d addresses through coding to a string with d characters In previous example: d=3+3=6 The root represents the complete space Left son (first character = 0) represents the lower half of the search space Right son (first character = 1) represents the upper half of the search space.

44
ICS-II Lecture 14: Geometric Algorithms and Data Structures44 Binary Tries, Revisited X0X1X2X0X1X Y0Y1Y2Y0Y1Y Binary x coordinate of the cell Binary y coordinate of the cell In 2D each key is a pair of bit sequences (x,y) The path to the key is composed of bits that are taken from the x and y coordinates on a rotating basis

45
ICS-II Lecture 14: Geometric Algorithms and Data Structures45 Observations Kd-trie splits by rotating x and y coordinates A kd-trie is unique for a given set of keys Trie structure does not depend on the insertion order Geometric kd-tries generate a total order of the search space Two points P1 and P2 in the kd-Space will always have the same order

46
ICS-II Lecture 14: Geometric Algorithms and Data Structures46 Building a Linear Order Given a 2D grid how (1) to find a linear order for the cells of the grid such that cells close together in space are also (as far as possible) close to each other in the linear order, and (2) to define this order recursively for a grid that is obtained by a hierarchical subdivision of space. The most popular solution is Bit interleaving (Z-Order)

47
ICS-II Lecture 14: Geometric Algorithms and Data Structures47 Z-Order Y0Y1Y2Y0Y1Y2 Start with a vertical split for X 0 (Z=X 0 ) X0X1X2X0X1X Addresses in a 2-dimensional space are identified by pairs (x,y) of values Each x and y value is a sequence of d bits This results in a grid with 2d x 2d cells How to build the addresses using bit interleaving?

48
ICS-II Lecture 14: Geometric Algorithms and Data Structures48 Z-Order Horizontal split for Y 0 (Z=X 0 Y 0 ) X0X1X2X0X1X2 Y0Y1Y2Y0Y1Y

49
ICS-II Lecture 14: Geometric Algorithms and Data Structures49 Z-Order Vertical split for X 1 (Z=X 0 Y 0 X 1 ) X0X1X2X0X1X2 Y0Y1Y2Y0Y1Y

50
ICS-II Lecture 14: Geometric Algorithms and Data Structures50 Z-Order Horizontal split for Y 1 (Z=X 0 Y 0 X 1 Y 1 ) X0X1X2X0X1X2 Y0Y1Y2Y0Y1Y

51
ICS-II Lecture 14: Geometric Algorithms and Data Structures51 Z-Order Vertical split for X 2 (Z=X 0 Y 0 X 1 Y 1 X 2 ) X0X1X2X0X1X2 Y0Y1Y2Y0Y1Y

52
ICS-II Lecture 14: Geometric Algorithms and Data Structures52 Z-Order Horizontal split for Y 2 (Z=X 0 Y 0 X 1 Y 1 X 2 Y 2 ) X0X1X2X0X1X2 Y0Y1Y2Y0Y1Y Lowest z z-low und z-hi are located in the left lower and right upper corner highest z

53
ICS-II Lecture 14: Geometric Algorithms and Data Structures53 Z-Order X0X1X2X0X1X2 Y0Y1Y2Y0Y1Y If each possible z-value represents a cell in the grid, this yields the following space filling curve:

54
ICS-II Lecture 14: Geometric Algorithms and Data Structures54 Example: Point Data X0X1X2X0X1X2 Y0Y1Y2Y0Y1Y Data point: A = (3, 5) = (011, 101) Bit interleaving: z = This gives simple method for translating between x,y coordinates and z- values A

55
ICS-II Lecture 14: Geometric Algorithms and Data Structures55 Example: Region Data X0X1X2X0X1X2 Y0Y1Y2Y0Y1Y The object with a z-value of 001 contains all elements with a prefix equal to 001

56
ICS-II Lecture 14: Geometric Algorithms and Data Structures56 Bit Interleaving: Recursive Definition A vertical split differentiates values of X 0 A horizontal split differentiates values of Y 0 The address is given by the z-value (00,01,10,11) The z-value represents the path in the kd-trie We can use the z-values alone, s.t. we don’t need the kd-trie anymore Y 0 =1 Y 0 =0 X 0 =0X 0 =1 LR UDDU

57
ICS-II Lecture 14: Geometric Algorithms and Data Structures57 Explanation Z-order encoding preserves the spatial proximity of points homogeneous regions are represented compactly the elements are clustered => efficient access to secondary storage Z-order coded data can be stored into secondary storage using conventional prefix B+ trees efficient “range queries” are possible direct access via z-value

58
ICS-II Lecture 14: Geometric Algorithms and Data Structures58 Geometric Data Structures for non-Geometric Data? Application of geometric data structures for geometric problems is obvious Geographic Information System (GIS) Computer graphic A further application of geometric data structures: multidimensional databases OLAP (Online Analytical Processing) Data-mining

59
ICS-II Lecture 14: Geometric Algorithms and Data Structures59 Multidimensional Data Space Coke Fanta Beer Milk Juice Water West East South North Region Product Day Each cell corresponds to an observation point, described by the attributes of individual cells. Each cell contains an observation, e.g. the sales value of Product “Coke” on Day “4” in Region “East”.

60
ICS-II Lecture 14: Geometric Algorithms and Data Structures60 Multidimensional (MD) Data Space Each observed fact w can be expressed as a function of the dimensions, which define the multidimensional data space: w = f(x,y,z) DOM(f) = DOM(x) x DOM(y) x DOM(z) A fact w 0 is the value of function f for the specific values (x 0,y 0,z 0 ) w 0 = f(x 0,y 0,z 0 )

61
ICS-II Lecture 14: Geometric Algorithms and Data Structures61 Sparseness in the MD Space Typically, only a small fragment of the space defined by DOM(a) x … x DOM(z) is actually used Addressing in the MD space (a multi-dimensional array) is easy and fast However inefficient memory usage Need to find mechanisms to compress the MD space Linearization of the data space by totally ordering the facts with the aid of space filling curves Extraction of all facts into a table, then join this table with descriptive dimension tables

62
ICS-II Lecture 14: Geometric Algorithms and Data Structures62 Linearization of the MD Space Linearization with the aid of space filling curves (e.g. Z- Transforms or Hilbert construction) The principle is based on a coding, that generates a total order of all points in the data space The indexing is done by conventional, order preserving indexing methods (e.g. B + -Trees) The mechanism is well suited for 2-4 dimensions (x,y,z,t) for tracking applications and range queries

63
ICS-II Lecture 14: Geometric Algorithms and Data Structures63 Data-Mining Till now: Storage und search of data Evaluation and interpretation of results is done using Data-Mining Typical problem: “Where, in supermarket, should we put the beer that should be sold as early as possible (close date expiry, low sales volume..)”

64
ICS-II Lecture 14: Geometric Algorithms and Data Structures64 Data-Mining Overview of basic techniques for data-mining Variance Detection Association Clustering Numerical Prediction Classification Forecast, Prediction Knowledge Discovery Data Mining

65
ICS-II Lecture 14: Geometric Algorithms and Data Structures65 Prediction: Classification Data entries are classified according to a certain property PurchasedLendingLendingto sort yearTotallast yearout yes No yes New data entry is automatically assigned PurchasedLendingLendingto sort yeartotallast yearout ?

66
ICS-II Lecture 14: Geometric Algorithms and Data Structures66 Prediction: Numerical Prediction Numerical prediction is similar to classification, however, a value is predicted instead of a class. Most important application: Weather forecast Yesterday Today Tomorrow Temp. PressureTemp. PressureTemp. 17, , ,5 10, ,1 9738,2 30, , , , ,0 991?

67
ICS-II Lecture 14: Geometric Algorithms and Data Structures67 Knowledge Discovery: Association Tries to find common rules between the characteristics of data. Interesting relations are returned. Example: From the previous weather data one could derive the following rules: With a probability of 0.89: IF "Air pressure today" > "Air pressure yesterday" AND "Temperature today" > 12° THEN "Temperature tomorrow" > "Temperature today" With a probability of 0.75: IF "Air pressure today" < "Air pressure yesterday" AND "Temperature today" > 15° THEN "Temperature tomorrow" < "Temperature today"

68
ICS-II Lecture 14: Geometric Algorithms and Data Structures68 Knowledge Discovery: Variance Detection Given a data pool, variance detection tries to distinguish normal data entries from “Outlier” entries Example: A home security system has 100 Sensors (temperature, light barrier, sound detector,....) should detect intruders. Hereby, flying birds, shade in the moonlight or car headlight should not have any impact on the operation of the system. The system gets a database describing “safe" configurations (where no alarm has to be triggered). The system creates a Model of the non-alarm-cases. Data for real intrusions are not provided! Using this model, updates from sensors can be checked: If they do not fit in the non-alarm-cases, an alarm is triggered.

69
ICS-II Lecture 14: Geometric Algorithms and Data Structures69 Knowledge Discovery: Clustering Find similar data entries and group them into clusters Example: Exam, the percentage that exercises E1.. E5 were correctly answered? StudentE1E2E3E4E5 S S S S S Ø41,66830,64357,8 Clustering may divide the students taking the exam into 2 groups: G 1 = {S1, S4, S5}: good at exercises E2 und E5, G 2 = {S2, S3} : good at exercises E1, E3 und E4. Possibility of individual support!

70
ICS-II Lecture 14: Geometric Algorithms and Data Structures70 k-means Clustering: Example

71
ICS-II Lecture 14: Geometric Algorithms and Data Structures71 k-means Clustering: Algorithm 1.Fix the number of desired clusters Parameter k. 2.Place K random points into the space initial group centroids. 3.For all m data objects determine the Euclidian distance of the object (as vector) from all centroids und assign the object to the closest centroid. 4.For all k centroids determine the real center of the assigned cluster (average). These are the new centroids. 5.Repeat steps 3 and 4, until the centroids no longer move (Old and new ones are so close to each other, so that no real improvement is more remarkable).

72
ICS-II Lecture 14: Geometric Algorithms and Data Structures72 k-means Algorithm: Properties Finds a local optimum, but does not necessarily find the most optimal configuration (global optimum) Is a Heuristic Significantly sensitive to the initial randomly selected cluster centers Optimizations Randomly modify the results between different rounds The k-means algorithm can be run multiple times Operates with linear optimization Highly stable and frequently used approach Operates also for very large data sets with a controllable complexity Ian H. Witten, Eibe Frank “Data Mining – Practical Machine Learning Tools and Techniques with Java Implementations” Academic Press, San Diego, CA; 2000; ISBN

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google