Download presentation

Presentation is loading. Please wait.

Published byLane Upham Modified over 4 years ago

1
Sheng Yu UM Statistics

2
Outline Motivation Strategy Sample Algorithms

3
Motivation (pattern) Most current clustering methods are only able to detect agglomerated patterns. New generation methods, such as normalized cut, have more flexibility, but are still not able to detect twisted, perhaps also entangled manifolds. Such manifold patterns are not rare.

4
This is a manifold...

5
Example: Try to cluster a pair of symmetric double spiral.

6
Example: result from k-means

7
Example: result from normalized cut

8
Motivation (noise) Theoretically, hierarchical clustering method using “single linkage” as the merging criterion is able to cluster twisted patterns. However, since “single linkage” is extremely sensitive to noisy, it is not actually a usable method.

9
Motivation To design a new method that is not only able to accomplish traditional “easy” tasks, but also handles twisted, entangled patterns as well. Also, this new method should not be ruined by noise (moderate level, in terms of signal-noise ratio).

10
Outline Motivation Strategy Sample Algorithms

11
Strategy (rationale)

12
Strategy (design) Engine: Searches paths between each pair of points. More powerful engine provides faster speed. Filter: Tells the engine which neighboring points can be connected from a specific start point. Controls the quality. Engine Filter

13
Example (easy one)

14
Example (not so easy one)

16
Example (hard one)

17
Outline Motivation Strategy Sample Algorithms

18
Algorithms (filter) The filter I currently use is still primitive. But it does a lot of jobs, such as the above examples. The strategy is an open framework. We can build better filters to detect even more difficult patterns and have more resistance to noise.

19
Algorithms the importance of the engine Sample Size 320 Sample Size 640 Brute force97”Death touch Fission1.5”65’’ Algebraic fissionNever minded0.5”

20
Strategy (rationale)

21
Algorithms The true benefit of a super fast engine is that it allows us to do iteration. We need to set up a range of acceptable number of clusters. We do not need our initial parameters to be precise. The algorithm will do heuristic search for us.

22
a demo of visual aids for choosing parameters

Similar presentations

OK

Data Mining on the Web via Cloud Computing COMS E6125 Web Enhanced Information Management Presented By Hemanth Murthy.

Data Mining on the Web via Cloud Computing COMS E6125 Web Enhanced Information Management Presented By Hemanth Murthy.

© 2018 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google