Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Background: MapReduce and FREERIDE Co-clustering on FREERIDE Experimental.

Slides:



Advertisements
Similar presentations
Parallel Jacobi Algorithm Steven Dong Applied Mathematics.
Advertisements

MINJAE HWANG THAWAN KOOBURAT CS758 CLASS PROJECT FALL 2009 Extending Task-based Programming Model beyond Shared-memory Systems.
SkewReduce YongChul Kwon Magdalena Balazinska, Bill Howe, Jerome Rolia* University of Washington, *HP Labs Skew-Resistant Parallel Processing of Feature-Extracting.
LIBRA: Lightweight Data Skew Mitigation in MapReduce
Distributed Approximate Spectral Clustering for Large- Scale Datasets FEI GAO, WAEL ABD-ALMAGEED, MOHAMED HEFEEDA PRESENTED BY : BITA KAZEMI ZAHRANI 1.
Spark: Cluster Computing with Working Sets
Map-Reduce and Parallel Computing for Large-Scale Media Processing Youjie Zhou.
Parallel K-Means Clustering Based on MapReduce The Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences Weizhong Zhao, Huifang.
FLANN Fast Library for Approximate Nearest Neighbors
Parallel Data Analysis from Multicore to Cloudy Grids Indiana University Geoffrey Fox, Xiaohong Qiu, Scott Beason, Seung-Hee.
The hybird approach to programming clusters of multi-core architetures.
18.337: Image Median Filter Rafael Palacios Aeronautics and Astronautics department. Visiting professor (IIT-Institute for Research in Technology, University.
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
Applying Twister to Scientific Applications CloudCom 2010 Indianapolis, Indiana, USA Nov 30 – Dec 3, 2010.
Performance Evaluation of Hybrid MPI/OpenMP Implementation of a Lattice Boltzmann Application on Multicore Systems Department of Computer Science and Engineering,
An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.
Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Summary of Contributions Background: MapReduce and FREERIDE Wavelet.
Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal.
Spiros Papadimitriou Jimeng Sun IBM T.J. Watson Research Center Hawthorne, NY, USA Reporter: Nai-Hui, Ku.
DisCo: Distributed Co-clustering with Map-Reduce S. Papadimitriou, J. Sun IBM T.J. Watson Research Center Speaker: 吳宏君 陳威遠 洪浩哲.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Performance Issues in Parallelizing Data-Intensive applications on a Multi-core Cluster Vignesh Ravi and Gagan Agrawal
Independent Component Analysis (ICA) A parallel approach.
HPDC 2014 Supporting Correlation Analysis on Scientific Datasets in Parallel and Distributed Settings Yu Su*, Gagan Agrawal*, Jonathan Woodring # Ayan.
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Xin Huo, Vignesh T. Ravi, Gagan Agrawal Department of Computer Science and Engineering.
Intelligent Database Systems Lab 1 Advisor : Dr. Hsu Graduate : Jian-Lin Kuo Author : Silvia Nittel Kelvin T.Leung Amy Braverman 國立雲林科技大學 National Yunlin.
SALSASALSASALSASALSA Design Pattern for Scientific Applications in DryadLINQ CTP DataCloud-SC11 Hui Li Yang Ruan, Yuduo Zhou Judy Qiu, Geoffrey Fox.
Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.
ICPP 2012 Indexing and Parallel Query Processing Support for Visualizing Climate Datasets Yu Su*, Gagan Agrawal*, Jonathan Woodring † *The Ohio State University.
Integrating and Optimizing Transactional Memory in a Data Mining Middleware Vignesh Ravi and Gagan Agrawal Department of ComputerScience and Engg. The.
Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.
MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.
Parallelization of likelihood functions for data analysis Alfio Lazzaro CERN openlab Forum on Concurrent Programming Models and Frameworks.
U N I V E R S I T Y O F S O U T H F L O R I D A Hadoop Alternative The Hadoop Alternative Larry Moore 1, Zach Fadika 2, Dr. Madhusudhan Govindaraju 2 1.
October 2008 Integrated Predictive Simulation System for Earthquake and Tsunami Disaster CREST/Japan Science and Technology Agency (JST)
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
Efficiency of small size tasks calculation in grid clusters using parallel processing.. Olgerts Belmanis Jānis Kūliņš RTU ETF Riga Technical University.
The Similarity Graph: Analyzing Database Access Patterns Within A Company PhD Preliminary Examination Robert Searles Committee John Cavazos and Stephan.
Massive Semantic Web data compression with MapReduce Jacopo Urbani, Jason Maassen, Henri Bal Vrije Universiteit, Amsterdam HPDC ( High Performance Distributed.
Information-Theoretic Co- Clustering Inderjit S. Dhillon et al. University of Texas, Austin presented by Xuanhui Wang.
MATE-CG: A MapReduce-Like Framework for Accelerating Data-Intensive Computations on Heterogeneous Clusters Wei Jiang and Gagan Agrawal.
AUTO-GC: Automatic Translation of Data Mining Applications to GPU Clusters Wenjing Ma Gagan Agrawal The Ohio State University.
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Xin Huo Vignesh T. Ravi Gagan Agrawal Department of Computer Science and Engineering,
Institute of Software,Chinese Academy of Sciences An Insightful and Quantitative Performance Optimization Chain for GPUs Jia Haipeng.
Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi
EpiC: an Extensible and Scalable System for Processing Big Data Dawei Jiang, Gang Chen, Beng Chin Ooi, Kian Lee Tan, Sai Wu School of Computing, National.
Computer Science and Engineering Parallelizing Feature Mining Using FREERIDE Leonid Glimcher P. 1 ipdps’04 Scaling and Parallelizing a Scientific Feature.
| presented by Vasileios Zois CS at USC 09/20/2013 Introducing Scalability into Smart Grid 1.
SPIDAL Analytics Performance February 2017
Sushant Ahuja, Cassio Cristovao, Sameep Mohta
A Dynamic Scheduling Framework for Emerging Heterogeneous Systems
Scaling Spark on HPC Systems
Smart: A MapReduce-Like Framework for In-Situ Scientific Analytics
Parallel Density-based Hybrid Clustering
Database Applications (15-415) Hadoop Lecture 26, April 19, 2016
Applying Twister to Scientific Applications
Yu Su, Yi Wang, Gagan Agrawal The Ohio State University
Linchuan Chen, Peng Jiang and Gagan Agrawal
Communication and Memory Efficient Parallel Decision Tree Construction
CS110: Discussion about Spark
Wei Jiang Advisor: Dr. Gagan Agrawal
KMeans Clustering on Hadoop Fall 2013 Elke A. Rundensteiner
Data-Intensive Computing: From Clouds to GPU Clusters
Bin Ren, Gagan Agrawal, Brad Chamberlain, Steve Deitz
Hybrid Programming with OpenMP and MPI
A Map-Reduce System with an Alternate API for Multi-Core Environments
Parallel Programming in C with MPI and OpenMP
An Implementation of User-level Distributed Shared Memory
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Venkatram Ramanathan 1

Motivation Evolution of Multi-Core Machines and the challenges Background: MapReduce and FREERIDE Co-clustering on FREERIDE Experimental Evaluation Conclusion 2

Performance Increase: Increased number of cores with lower clock frequencies Cost Effective Scalability of performance HPC Environments – Cluster of Multi- Cores 3

Multi-Level Parallelism Within Cores in a node – Shared Memory Parallelism - Pthreads, OpenMP Within Nodes – Distributed Memory Parallelism - MPI Achieving Programmability and Performance – Major Challenge 4

Possible solution Use higher-level/restricted APIs Reduction based APIs Map-Reduce Higher-level API Program Cluster of Multi-Cores with 1 API Expressive Power Considered Limited Expressing computations using reduction-based APIs 5

MapReduce Map (in_key,in_value) -> list(out_key,intermediate_value) Reduce(out_key,list(intermediate_value) -> list(out_value) FREERIDE Users explicitly declare Reduction Object and update it Map and Reduce steps combined Each data element – processed and reduced before next element is processed 6

7

Involves simultaneous clustering of rows to row clusters and columns to column clusters Maximizes Mutual Information Uses Kullback-Leibler Divergence 8

9

10

Input matrix and its transpose pre-computed Input matrix and transpose Divided into files Distributed among nodes Each node - same amount of row and column data rowCL and colCL – replicated on all nodes Initial clustering Round robin fashion - consistency across nodes 11

In Preprocessing, pX and pY – normalized by total sum Wait till all nodes process to normalize Each node calculates pX and pY with local data Reduction object updated partial sum, pX and pY values Accumulated partial sums - total sum pX and pY normalized xnorm and ynorm calculated in second iteration as they need total sum 12

Compressed Matrix of size #rowclusters x #colclusters, calculated with local data Sum of values of values of each row cluster across each column cluster Final compressed matrix -sum of local compressed matrices Local compressed matrices – updated in reduction object Produces final compressed matrix on accumulation Cluster Centroids calculated 13

Reassign clustering Determined by Kullback-Leibler divergence Reduction object updated Compute compressed matrix Update reduction object Column Clustering – similar Objective function – finalize Next iteration 14

15

16

Algorithm - same for shared memory, distributed memory and hybrid parallelization Experiments conducted 2 clusters env1 Intel Xeon E5345 Quad Core Clock Frequency 2.33 GHz Main Memory 6 GB 8 nodes env2 AMD Opteron 8350 CPU 8 Cores Main Memory 16 GB 4 Nodes 17

 2 Datasets  1 GB Dataset  Matrix Dimensions 16k x 16k  4 GB Dataset  Matrix Dimensions 32k x 32k  Datasets and transpose  Split into 32 files each (row partitioning)  Distributed among nodes  Number of row and column clusters: 4 18

19

20

21

Preprocessing stage – bottleneck for smaller dataset – not compute intensive Speedup with Preprocessing : Speedup without Preprocessing: Preprocessing stage scales well for Larger dataset – more computation Speedup is the same with and without preprocessing. Speedup for larger dataset :

FREERIDE Offers the Following Advantages: No need for loading data in custom file-systems C/C++ based framework Much better performance (comparison for other algorithms) Co-clusterings can be viewed as generalized reduction Implementing them on FREERIDE Speedup of 21 on 32 cores. 23