Presentation is loading. Please wait.

Presentation is loading. Please wait.

Canopy Clustering Given a distance measure and two threshold distances T1>T2, 1. Determine canopy centers - go through The list of input points to form.

Similar presentations


Presentation on theme: "Canopy Clustering Given a distance measure and two threshold distances T1>T2, 1. Determine canopy centers - go through The list of input points to form."— Presentation transcript:

1 Canopy Clustering Given a distance measure and two threshold distances T1>T2, 1. Determine canopy centers - go through The list of input points to form a list of “clusterCenters”. If a point is within T2 of A a point in clusterCenters, then ignore it. If not, then append the point to ClusterCenters. 2. Determine canopy membership – for each point in the input set, if the point is Within T1 of a cluster center, then the point is a member of the corresponding cluster

2 Combine Canopy and kMeans or EM Only calculate distances for points that share a canopy with the centroid. (assign infinite distance to points outside the canopies containing the Centroid.

3 Canopy Clustering with MR Given distance metric and tighter threshold T2 Mapper – Start with empty set of canopyCenters. For each x in inputData, if x is further than T2 from any member of canopyCenters, Then add x to canopyCenters and emit (1, x). Reducer – start with empty set of canopyCenters. Input = (key, iterator over mapper cluster centers). For x in iterator, if x is further than T2 from any member of canopyCenters, then add x to canopyCenters and emit(1,x). This results in a list of canopy centers to be used for determining canopy membership


Download ppt "Canopy Clustering Given a distance measure and two threshold distances T1>T2, 1. Determine canopy centers - go through The list of input points to form."

Similar presentations


Ads by Google