Presentation is loading. Please wait.

Presentation is loading. Please wait.

Group 15 Swathi Gurram Prajakta Purohit

Similar presentations


Presentation on theme: "Group 15 Swathi Gurram Prajakta Purohit"— Presentation transcript:

1 Group 15 Swathi Gurram Prajakta Purohit
K-means Clustering Group 15 Swathi Gurram Prajakta Purohit

2 Goal To program K-means on Twister (Iterative Map-Reduce) and Hadoop(Map - Reduce) and see how the change of framework effects the implementation time.

3 Survey Twister Configurable long running (cacheable) map/reduce tasks
Pub/sub messaging based communication/data transfers Efficient support for Iterative MapReduce computation Combine phase to collect all reduce outputs Data access via local disks

4 Survey Hadoop: a software framework that supports data-intensive distributed applications Uses Map- reduce programming model it's own filesystem ( HDFS Hadoop Distributed File System based on the Google File System) which is specifically tailored for dealing with large files can intelligently manage the distribution of processing and your files, and breaking those files down into more manageable chunks for processing

5 Survey Haloop : a modified version of the Hadoop MapReduce framework
 provide caching options for loop-invariant data access let users reuse major building blocks from applications' Hadoop implementations have similar intra-job fault-tolerance mechanisms to Hadoop.  HaLoop reduces query runtimes by 1.85 compared with Hadoop

6 K-means Clustering

7 K-means Clustering

8 Twister K-means

9 Hadoop K-means

10

11 Implementation Timeline
Week Task Team member Oct 24th – Oct 31st Understand K-means algorithm and design Prajakta, Swathi Nov 1st – Nov 7th Implement K-means Nov 8th – Nov 21st Implement K-means on Twister and performance analysis Nov 21st – Nov 28th Optimized validation method for Kmeans algorithm Nov 29th – Dec 3rd Implement K-means on Hadoop Dec 4th – Dec 5th Performance Analysis and Presentation Dec 6th – Dec 12th Final Technical report

12 Validation methods

13 Conclusion Twister framework is faster than Hadoop for iterative map- reduce applications.

14 References http://salsahpc.indiana.edu
Loop.pdf

15 Demo

16 Thank you


Download ppt "Group 15 Swathi Gurram Prajakta Purohit"

Similar presentations


Ads by Google