# Christoph F. Eick Questions and Topics Review Dec. 10, 2013 1.Compare AGNES /Hierarchical clustering with K-means; what are the main differences? 2. K-means.

## Presentation on theme: "Christoph F. Eick Questions and Topics Review Dec. 10, 2013 1.Compare AGNES /Hierarchical clustering with K-means; what are the main differences? 2. K-means."— Presentation transcript:

Christoph F. Eick Questions and Topics Review Dec. 10, 2013 1.Compare AGNES /Hierarchical clustering with K-means; what are the main differences? 2. K-means has a runtime complexity of O(t*k*n*d), where t is the number of iterations, d is the dimensionality of the datasets, k is the number of clusters in the dataset, and n is the number of objects in the dataset. Explain! In general, is K-means an efficient clustering algorithm; give a reason for this answer, by discussing its runtime by referring to its runtime complexity formula! [5] The number of attributes an object has! 3. Assume the Apriori-style sequence mining algorithm described at pages 429-435 is used and the algorithm generated 3-sequences listed below (see 2007 Final Exam!): Frequent 3-sequences Candidate Generation Candidates that survived pruning

Christoph F. Eick Answers Question 1 a. a.AGNES creates set of clustering/a dendrogram; K-Means creates a single clustering b.K-means forms cluster by using an iteration procedure which minimizes an objective functions, AGNES forms the dendrogram by merging the closest 2 clusters until a single cluster is obtained c.…

Christoph F. Eick Answers Questions 2&3 Answer Question 2: t: #iteration k: number of clusters n: #objects-to-be-clustered d:#attributes In each iteration, all the n points are compared to k centroids to assign them to nearest centroid, which is O(k*n), each distance computations complexity is O(d). Therefore, O(t*k*n*d).

Christoph F. Eick Questions and Topics Review Dec. 10, 2013 4. Gaussian Kernel Density Estimation and DENCLUE a.Assume we have a 2D dataset X containing 4 objects : X={(1,0), (0,1), (1,2) (3,4)}; moreover, we use the Gaussian kernel density function to measure the density of X. Assume we want to compute the density at point (1,1) and you can also assume h=1 (  =1) and that we use Manhattan distance as the distance function!. Give a sketch how the Gaussian Kernel Density Estimation approach determines the density for point (1, 1). Be specific! b.What is a density attractor?. How does DENCLUE form clusters.? 5) PageRank [8] a) What does the PageRank compute? What are the challenges in using the PageRank algorithm in practice? [3] b) Give the equation system that PAGERANK would use for the webpage structure given below. Give a sketch of an approach that determines the page rank of the 4 pages from this equation system! [5] P1P2 P3 P4

Christoph F. Eick Answer Question4 4. Gaussian Kernel Density Estimation and DENCLUE a.Assume we have a 2D dataset X containing 4 objects : X={(1,0), (0,1), (1,2) (3,4)}; moreover, we use the Gaussian kernel density function to measure the density of X. Assume we want to compute the density at point (1,1) and you can also assume h=1 (  =1) and that we use Manhattan distance as the distance function!. Give a sketch how the Gaussian Kernel Density Estimation approach determines the density for point (1, 1). Be specific! b.What is a density attractor?. How does DENCLUE form clusters.? a. The density of (1,1) is computed as follows : f X ((1,1))= e -1/2 + e -1/2 + e -1/2 + e -25/2 b. A density attractor is a local maximum of a density function. DENCLUE iterates over the objects in the dataset and uses hill climbing to associate each point with a density attractor. Next, if forms clusters such that each cluster contains objects in the dataset that are associated with the same clusters; objects who belong to a cluster whose density (of its attractor) is below a user defined threshold are considered as outliers.

Christoph F. Eick Answers Questions 5 and 6 5a) What does the PageRank compute? What are the challenges in using the PageRank algorithm in practice? [3] It computes the probability of a webpage to be assessed. [1] As there are a lot of webpage and links finding an efficient scalable algorithm is a major challenge [2] 5b) Give the equation system that PAGERANK would use for the webpage structure given below. Give a sketch of an approach that determines the page rank of the 4 pages from this equation system! [5] PR(P1)= (1-d) + d * (PR(P3)/2 + PR(P4)/3) PR(P2)= (1-d) + d * (PR(P3)/2 + PR(P4)/3 + PR(P1)) PR(P3)= (1-d) + d*PR(P4)/3 PR(P4)=1-d [One solution: Initial all page ranks with 1 [0.5] and then update the PageRank of each page using the above 4 equations until there is some convergence[1]. 6) A Delaunay triangulation for a set P of points in a plane is a triangulation DT(P) such that no point in P is inside the circumcircle of any triangle in DT(P).triangulationcircumcircletriangle

Christoph F. Eick Questions and Topics Review Dec. 10, 2013 6.What is a Delaunay triangulation? 7.SVM a)The soft margin support vector machine solves the following optimization problem: What does the second term minimize? Depict all non-zero  i in the figure below! What is the advantage of the soft margin approach over the linear SVM approach? [5] b) Referring to the figure above, explain how examples are classified by SVMs! What is the relationship between  i and example i being classified correctly? [4]

Christoph F. Eick Answer Question 7 a. Minimizes the error which is measured as the distance to the class’ hyperplane for points that are on the wrong side of the hyperplane [1.5]Depict [2]; distances to wrong hyperplane at most 1 point]. Can deal with classification problems in which the examples are not linearly separable[1.5]. b. The middle hyperplane is used to classify the examples[1.5]. If  i less equal to half of the width of the hyperplane the example is classified correctly. The length of the arrow for point i is the value of  i ; for points i without arrow  i =0.

Download ppt "Christoph F. Eick Questions and Topics Review Dec. 10, 2013 1.Compare AGNES /Hierarchical clustering with K-means; what are the main differences? 2. K-means."

Similar presentations