Download presentation

Presentation is loading. Please wait.

Published byAiyana Frail Modified over 2 years ago

1
Christoph F. Eick Questions and Topics Review Dec. 10, 2013 1.Compare AGNES /Hierarchical clustering with K-means; what are the main differences? 2. K-means has a runtime complexity of O(t*k*n*d), where t is the number of iterations, d is the dimensionality of the datasets, k is the number of clusters in the dataset, and n is the number of objects in the dataset. Explain! In general, is K-means an efficient clustering algorithm; give a reason for this answer, by discussing its runtime by referring to its runtime complexity formula! [5] The number of attributes an object has! 3. Assume the Apriori-style sequence mining algorithm described at pages 429-435 is used and the algorithm generated 3-sequences listed below (see 2007 Final Exam!): Frequent 3-sequences Candidate Generation Candidates that survived pruning

2
Christoph F. Eick Answers Question 1 a. a.AGNES creates set of clustering/a dendrogram; K-Means creates a single clustering b.K-means forms cluster by using an iteration procedure which minimizes an objective functions, AGNES forms the dendrogram by merging the closest 2 clusters until a single cluster is obtained c.…

3
Christoph F. Eick Answers Questions 2&3 Answer Question 2: t: #iteration k: number of clusters n: #objects-to-be-clustered d:#attributes In each iteration, all the n points are compared to k centroids to assign them to nearest centroid, which is O(k*n), each distance computations complexity is O(d). Therefore, O(t*k*n*d).

4
Christoph F. Eick Questions and Topics Review Dec. 10, 2013 4. Gaussian Kernel Density Estimation and DENCLUE a.Assume we have a 2D dataset X containing 4 objects : X={(1,0), (0,1), (1,2) (3,4)}; moreover, we use the Gaussian kernel density function to measure the density of X. Assume we want to compute the density at point (1,1) and you can also assume h=1 ( =1) and that we use Manhattan distance as the distance function!. Give a sketch how the Gaussian Kernel Density Estimation approach determines the density for point (1, 1). Be specific! b.What is a density attractor?. How does DENCLUE form clusters.? 5) PageRank [8] a) What does the PageRank compute? What are the challenges in using the PageRank algorithm in practice? [3] b) Give the equation system that PAGERANK would use for the webpage structure given below. Give a sketch of an approach that determines the page rank of the 4 pages from this equation system! [5] P1P2 P3 P4

5
Christoph F. Eick Answer Question4 4. Gaussian Kernel Density Estimation and DENCLUE a.Assume we have a 2D dataset X containing 4 objects : X={(1,0), (0,1), (1,2) (3,4)}; moreover, we use the Gaussian kernel density function to measure the density of X. Assume we want to compute the density at point (1,1) and you can also assume h=1 ( =1) and that we use Manhattan distance as the distance function!. Give a sketch how the Gaussian Kernel Density Estimation approach determines the density for point (1, 1). Be specific! b.What is a density attractor?. How does DENCLUE form clusters.? a. The density of (1,1) is computed as follows : f X ((1,1))= e -1/2 + e -1/2 + e -1/2 + e -25/2 b. A density attractor is a local maximum of a density function. DENCLUE iterates over the objects in the dataset and uses hill climbing to associate each point with a density attractor. Next, if forms clusters such that each cluster contains objects in the dataset that are associated with the same clusters; objects who belong to a cluster whose density (of its attractor) is below a user defined threshold are considered as outliers.

6
Christoph F. Eick Answers Questions 5 and 6 5a) What does the PageRank compute? What are the challenges in using the PageRank algorithm in practice? [3] It computes the probability of a webpage to be assessed. [1] As there are a lot of webpage and links finding an efficient scalable algorithm is a major challenge [2] 5b) Give the equation system that PAGERANK would use for the webpage structure given below. Give a sketch of an approach that determines the page rank of the 4 pages from this equation system! [5] PR(P1)= (1-d) + d * (PR(P3)/2 + PR(P4)/3) PR(P2)= (1-d) + d * (PR(P3)/2 + PR(P4)/3 + PR(P1)) PR(P3)= (1-d) + d*PR(P4)/3 PR(P4)=1-d [One solution: Initial all page ranks with 1 [0.5] and then update the PageRank of each page using the above 4 equations until there is some convergence[1]. 6) A Delaunay triangulation for a set P of points in a plane is a triangulation DT(P) such that no point in P is inside the circumcircle of any triangle in DT(P).triangulationcircumcircletriangle

7
Christoph F. Eick Questions and Topics Review Dec. 10, 2013 6.What is a Delaunay triangulation? 7.SVM a)The soft margin support vector machine solves the following optimization problem: What does the second term minimize? Depict all non-zero i in the figure below! What is the advantage of the soft margin approach over the linear SVM approach? [5] b) Referring to the figure above, explain how examples are classified by SVMs! What is the relationship between i and example i being classified correctly? [4]

8
Christoph F. Eick Answer Question 7 a. Minimizes the error which is measured as the distance to the class’ hyperplane for points that are on the wrong side of the hyperplane [1.5]Depict [2]; distances to wrong hyperplane at most 1 point]. Can deal with classification problems in which the examples are not linearly separable[1.5]. b. The middle hyperplane is used to classify the examples[1.5]. If i less equal to half of the width of the hyperplane the example is classified correctly. The length of the arrow for point i is the value of i ; for points i without arrow i =0.

Similar presentations

OK

More on Clustering Hierarchical Clustering to be discussed in Clustering Part2 DBSCAN will be used in programming project.

More on Clustering Hierarchical Clustering to be discussed in Clustering Part2 DBSCAN will be used in programming project.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on role of political parties in india Ppt on shell scripting commands Ppt on fibonacci sequence Ppt on duty roster template Field emission display ppt on tv Ppt on travel and tourism in india free download Ppt on australian continent with landforms Ppt on eisenmenger syndrome causes Good health habits for kids ppt on batteries Ppt on helen keller the story of my life