Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 29-May 3, 2013 Mr. Scan: Efficient Clustering with MRNet and GPUs Evan Samanas and Ben.

Slides:



Advertisements
Similar presentations
Alex Cheung and Hans-Arno Jacobsen August, 14 th 2009 MIDDLEWARE SYSTEMS RESEARCH GROUP.
Advertisements

LIBRA: Lightweight Data Skew Mitigation in MapReduce
© 2005 Dorian C. Arnold Reliability in Tree-based Overlay Networks Dorian C. Arnold University of Wisconsin Paradyn/Condor Week March 14-18, 2005 Madison,
Data Mining Cluster Analysis: Advanced Concepts and Algorithms
Efficient Sparse Matrix-Matrix Multiplication on Heterogeneous High Performance Systems AACEC 2010 – Heraklion, Crete, Greece Jakob Siegel 1, Oreste Villa.
Presented by: GROUP 7 Gayathri Gandhamuneni & Yumeng Wang.
Clustering Prof. Navneet Goyal BITS, Pilani
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Clustering CS 685: Special Topics in Data Mining Spring 2008 Jinze Liu.
More on Clustering Hierarchical Clustering to be discussed in Clustering Part2 DBSCAN will be used in programming project.
Chapter 3: Cluster Analysis
MR-DBSCAN: An Efficient Parallel Density-based Clustering Algorithm using MapReduce Yaobin He, Haoyu Tan, Wuman Luo, Huajian Mao, Di Ma, Shengzhong Feng,
APACHE GIRAPH ON YARN Chuan Lei and Mohammad Islam.
Query Processing in Databases Dr. M. Gavrilova.  Introduction  I/O algorithms for large databases  Complex geometric operations in graphical querying.
University of Wisconsin Petascale Tools Workshop Madison, WI August 4-7 th 2014 The Hybrid Model: Experiences at Extreme Scale Benjamin Welton.
2009/04/07 Yun-Yang Ma.  Overview  What is CUDA ◦ Architecture ◦ Programming Model ◦ Memory Model  H.264 Motion Estimation on CUDA ◦ Method ◦ Experimental.
Cluster Analysis.
Bin Fu Eugene Fink, Julio López, Garth Gibson Carnegie Mellon University Astronomy application of Map-Reduce: Friends-of-Friends algorithm A distributed.
FLANN Fast Library for Approximate Nearest Neighbors
Tree-Based Density Clustering using Graphics Processors
Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Background: MapReduce and FREERIDE Co-clustering on FREERIDE Experimental.
1 Babak Behzad, Yan Liu 1,2,4, Eric Shook 1,2, Michael P. Finn 5, David M. Mattli 5 and Shaowen Wang 1,2,3,4 Babak Behzad 1,3, Yan Liu 1,2,4, Eric Shook.
Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Summary of Contributions Background: MapReduce and FREERIDE Wavelet.
Min Xu1, Yunfeng Zhu2, Patrick P. C. Lee1, Yinlong Xu2
1 CSE 980: Data Mining Lecture 17: Density-based and Other Clustering Algorithms.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
Garrett Poppe, Liv Nguekap, Adrian Mirabel CSUDH, Computer Science Department.
Outlier Detection Lian Duan Management Sciences, UIOWA.
ICPP 2012 Indexing and Parallel Query Processing Support for Visualizing Climate Datasets Yu Su*, Gagan Agrawal*, Jonathan Woodring † *The Ohio State University.
Frontiers in Massive Data Analysis Chapter 3.  Difficult to include data from multiple sources  Each organization develops a unique way of representing.
Parallel Algorithms Patrick Cozzi University of Pennsylvania CIS Fall 2013.
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.
CURE: An Efficient Clustering Algorithm for Large Databases Sudipto Guha, Rajeev Rastogi, Kyuseok Shim Stanford University Bell Laboratories Bell Laboratories.
IIIT Hyderabad Scalable Clustering using Multiple GPUs K Wasif Mohiuddin P J Narayanan Center for Visual Information Technology International Institute.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
CCGrid, 2012 Supporting User Defined Subsetting and Aggregation over Parallel NetCDF Datasets Yu Su and Gagan Agrawal Department of Computer Science and.
Record Linkage in a Distributed Environment
U N I V E R S I T Y O F S O U T H F L O R I D A Hadoop Alternative The Hadoop Alternative Larry Moore 1, Zach Fadika 2, Dr. Madhusudhan Govindaraju 2 1.
Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.
Presented by Ho Wai Shing
Density-Based Clustering Methods. Clustering based on density (local cluster criterion), such as density-connected points Major features: –Discover clusters.
5/29/2008AI UEC in Japan Chapter 12 Clustering: Large Databases Written by Farial Shahnaz Presented by Zhao Xinyou Data Mining Technology.
Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN 1 Remaining Lectures in Advanced Clustering and Outlier Detection 2.Advanced Classification.
IBM Research ® © 2007 IBM Corporation Introduction to Map-Reduce and Join Processing.
PDAC-10 Middleware Solutions for Data- Intensive (Scientific) Computing on Clouds Gagan Agrawal Ohio State University (Joint Work with Tekin Bicer, David.
Other Clustering Techniques
Page 1 A Platform for Scalable One-pass Analytics using MapReduce Boduo Li, E. Mazur, Y. Diao, A. McGregor, P. Shenoy SIGMOD 2011 IDS Fall Seminar 2011.
Ohio State University Department of Computer Science and Engineering Servicing Range Queries on Multidimensional Datasets with Partial Replicas Li Weng,
CLUSTERING DENSITY-BASED METHODS Elsayed Hemayed Data Mining Course.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Parameter Reduction for Density-based Clustering on Large Data Sets Elizabeth Wang.
Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi
| presented by Vasileios Zois CS at USC 09/20/2013 Introducing Scalability into Smart Grid 1.
Cohesive Subgraph Computation over Large Graphs
Sushant Ahuja, Cassio Cristovao, Sameep Mohta
DATA MINING Spatial Clustering
Data Mining Soongsil University
Distributed Network Traffic Feature Extraction for a Real-time IDS
Sameh Shohdy, Yu Su, and Gagan Agrawal
CS 685: Special Topics in Data Mining Jinze Liu
Database Applications (15-415) Hadoop Lecture 26, April 19, 2016
Join Processing in Database Systems with Large Main Memories (part 2)
Yu Su, Yi Wang, Gagan Agrawal The Ohio State University
Cse 344 May 2nd – Map/reduce.
CSE572, CBS598: Data Mining by H. Liu
Data-Intensive Computing: From Clouds to GPU Clusters
Stack Trace Analysis for Large Scale Debugging using MRNet
CSE572, CBS572: Data Mining by H. Liu
CSE572: Data Mining by H. Liu
CS 685: Special Topics in Data Mining Jinze Liu
Supporting Online Analytics with User-Defined Estimation and Early Termination in a MapReduce-Like Framework Yi Wang, Linchuan Chen, Gagan Agrawal The.
Presentation transcript:

Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 29-May 3, 2013 Mr. Scan: Efficient Clustering with MRNet and GPUs Evan Samanas and Ben Welton

Density-based clustering o Discovers the number of clusters o Finds oddly-shaped clusters 2 Mr. Scan: Efficient Clustering with MRNet and GPUs

Goal: Find regions that meet minimum density and spatial distance characteristics The two parameters that determine if a point is in a cluster is Epsilon (Eps), and MinPts If the number of points in Eps is > MinPts, the point is a core point. For every discovered point, this same calculation is performed until the cluster is fully expanded Clustering Example (DBSCAN [1] ) 3 Mr. Scan: Efficient Clustering with MRNet and GPUs EpsMinPts MinPts: 3 [1] M. Ester et. al., A density-based algorithm for discovering clusters in large spatial databases with noise, (1996)

Scaling DBSCAN o PDBSCAN (1999) [2] o Quality equivalent to single DBSCAN o Linear speedup up to 8 nodes o DBDC (2004) [3] o Sacrifices quality o ~30x speedup on 15 nodes o PDSDBSCAN (2012) [4] o Quality equivalent to single node DBSCAN o 5675x Speedup on 8192 nodes (72 Million Points) o 2 Map/Reduce attempts (2011, 2012) o Quality equivalent to single node DBSCAN o 6x speedup on 12 nodes 4 Mr. Scan: Efficient Clustering with MRNet and GPUs [2] X. Xu et. al., A fast Parallel Clustering Algorithm for Large Spatial Databases (1999) [3] E. Januzaj et. al., DBDC: Density Based Distributed Clustering (2004) [4] M Patwary et. al., A new scalable parallel DBSCAN algorithm using the disjoint-set data structure (2012)

Challenges of scaling DBSCAN o Data distribution o How do we effectively take an input file and create partitions that can be clustered by DBSCAN? o Distributed 2-D partitioner reading from a distributed file system o Load balancing o How to keep variance in clustering times across nodes to a minimum? o Dense Box o Merge o How do we reduce the amount of data needed for the merge while keeping accuracy high? o Representative points 5 Mr. Scan: Efficient Clustering with MRNet and GPUs

6 MRNet – Multicast / Reduction Network o General-purpose TBON API o Network: user-defined topology o Stream: logical data channel o to a set of back-ends o multicast, gather, and custom reduction o Packet: collection of data o Filter: stream data operator o synchronization o transformation o Widely adopted by HPC tools o CEPBA toolkit o Cray ATP & CCDB o Open|SpeedShop & CBTF o STAT o TAU FE ……… BE app BE app BE app BE app CP F(x 1,…,x n )

TBON Computation 7 Mr. Scan: Efficient Clustering with MRNet and GPUs FE BE app BE app BE app CP BE app Ideal Characteristics: o Filter output size constant or decreasing o Computation rate similar across levels o Adjustable for load balance Data Size: 10MB per BE Packet Size: ≤ 10 MB Packet Size: ≤10 MB ~10 sec ~40 sec … 4x ~10 sec Total Time: ~30 sec Total Time: ~60 sec

Intro to Mr. Scan 8 Mr. Scan: Efficient Clustering with MRNet and GPUs BE CP BE DBSCAN Merge FE Mr. Scan Phases Partition: Distributed DBSCAN: BE) Merge: CPU (x #levels) Sweep: CPU (x #levels) FE BE Merge FS Sweep

Mr. Scan Architecture 9 Mr. Scan: Efficient Clustering with MRNet and GPUs Time: 0Time: 18.2 Min Partitioner DBSCAN Merge & Sweep Clustering 6.5 Billion Points FS Read 224 Secs FS Write 489 Secs MRNet Startup 130 Secs FS Read: 24 Secs DBSCAN 168 Secs Merge Time: 6 Secs Sweep Time: 4 Secs Write Output: 19 Secs

Partition Phase o Goal: Partitions computationally equivalent to DBSCAN o Algorithm: o Form initial partitions o Add shadow regions o Rebalance 10 Mr. Scan: Efficient Clustering with MRNet and GPUs

Distributed Partitioner 11 Mr. Scan: Efficient Clustering with MRNet and GPUs

GPU DBSCAN Filter 12 Mr. Scan: Efficient Clustering with MRNet and GPUs DBSCAN is performed in two distinct steps Step 1: Detect Core Points Block 1 Block 2 Block 900 T1T1 T2T2 T 512 T1T1 T2T2 T 512 T1T1 T2T2 T 512 Block 1 T1T1 T2T2 T 512 Block 2 T1T1 T2T2 T 512 Block 900 T1T1 T2T2 T 512 Step 2: Expand core points and color

Dense Box 13 Mr. Scan: Efficient Clustering with MRNet and GPUs One significant scalability issue is dealing with dense regions of data Density increases the computation cost of DBSCAN R2 Requires more comparison operations R1 R2 We reduce the computation cost of high density regions by pre- clustering these regions KD-Tree Look at each leaf bounding box looking for boxes with point count > minpts and size < 0.35 * eps DBSCAN no longer needs to expand these regions `

Merge Algorithm o Merge overlapping clusters found on different nodes. o Two steps in the merge operation 1.Select Representative points (BE) 2.Merge operation 14 Mr. Scan: Efficient Clustering with MRNet and GPUs

Representative Points o These are points that represent the core points in the dataset. o Create a boundary which at least one core point shared between overlapping clusters must be contained. 15 Mr. Scan: Efficient Clustering with MRNet and GPUs Representative points are the points closest to the corners and middle of the side of the eps box These points create a boundary (shaded region) which a point must fall in to merge overlapping clusters

Merge Algorithm 16 Mr. Scan: Efficient Clustering with MRNet and GPUs Merge algorithm is responsible for merging overlapping clusters detected on different DBSCAN nodes. Need to handle the merge with low overhead and without the full dataset Node 1Node 2 Core Point Non-Core Point 1. Core/Core overlap Core Point in common. 64 operations to detect. Node 1Node 2 Core Point Non-Core Point 2. Non-core/Core overlap Core point seen as non-core by one node. MinPts * 2 operations required to detect

Sweep Step o Get cluster identifiers and file offsets down to BE’s to write final clusters. o FE gives each cluster a unique ID and a file offset. o This data is passed back down to the BE that holds the data in the cluster. o Data is written out to disk by the BE. 17 Mr. Scan: Efficient Clustering with MRNet and GPUs

Experiment Setup o Dataset: Generated data with distribution from real Twitter data o Measuring: o Weak Scaling up to 8192 GPUs o Strong Scaling o Quality compared to single-threaded DBSCAN 18 Mr. Scan: Efficient Clustering with MRNet and GPUs

Results 19 Mr. Scan: Efficient Clustering with MRNet and GPUs Weak Scaling: 4096x data/compute increase 18.48x-31.68x time increase

Results Breakdown – Partition 6.5 Billion Points: 65.9% of Mr. Scan’s time 94.6% I/O time 20 Mr. Scan: Efficient Clustering with MRNet and GPUs

Results Breakdown – GPU Cluster Time 21 Mr. Scan: Efficient Clustering with MRNet and GPUs

Strong Scaling 22 Mr. Scan: Efficient Clustering with MRNet and GPUs

Quality 23 Mr. Scan: Efficient Clustering with MRNet and GPUs

Future Work o Remove partitioner’s I/O bottleneck o Multiple dimensions 24 Mr. Scan: Efficient Clustering with MRNet and GPUs

Conclusion o Clustered 6.5 billion points with DBSCAN in 18.2 minutes o Controlled computational variance of DBSCAN o Partitioner I/O = scaling enemy 25 Mr. Scan: Efficient Clustering with MRNet and GPUs

Questions? 26 A Brief Discussion of Ways and Means

Summary of previous Mr. Scan implementation 27 Mr. Scan: Efficient Clustering with MRNet and GPUs FE BE CP BE DBSCAN Algorithm Steps SpatialDecomp: FE) DBSCAN: CPU or BE) DrawBoundBox: CPU or GPU MergeCluster: CPU (x #levels) MergeCluster