DISC-Finder: A distributed algorithm for identifying galaxy clusters.

DISC-Finder: A distributed algorithm for identifying galaxy clusters

Friends-of-Friends (FoF) technique: Identification of galaxy clusters Sequential algorithms Exact: O((n ∙ log n) 1.5 ) Approximate: O(n) We need to identify its connected components Two galaxies are “friends” if they are close to each other We analyze an undirected graph, where galaxies are vertices and their “friendships” are edges

Distributed procedure Divide the space into “slightly overlapping” cubes Identify cross-cube edges and merge the respective clusters Load balancing: -Randomly select a subset of galaxies -Apply the kd-tree construction to build a balanced partition for the subset -Use it for the full set of galaxies -Use any sequential FoF -Allocate different cores to cubes -Apply the union-find algorithm to the galaxies in the cube overlaps Distributed computation: Apply a sequential FoF algorithm to find the clusters within each cube

Distributed procedure galaxy sets local clusters divide the space into cubes apply local sequential FoF

Advantages Scalable: We can apply it to massive datasets and use all available cores Black-box use of a sequential FoF: We can utilize any FoF algorithm Hadoop friendly: We have mapped all main operations into the Hadoop framework, which has resulted in very compact code (800 lines)

Scalability 8 16 3264 128 256 4 15 60 240 Time (min) Number of cores 1000 mln galaxies 500 mln galaxies 14,800 mln galaxies

DISC-Finder: A distributed algorithm for identifying galaxy clusters.

Similar presentations

Presentation on theme: "DISC-Finder: A distributed algorithm for identifying galaxy clusters."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

DISC-Finder: A distributed algorithm for identifying galaxy clusters.

Similar presentations

Presentation on theme: "DISC-Finder: A distributed algorithm for identifying galaxy clusters."— Presentation transcript:

Similar presentations

About project

Feedback