Bin Ren, Gagan Agrawal, Brad Chamberlain, Steve Deitz

Slides:

Advertisements

Similar presentations

MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.

Advertisements

Enabling Speculative Parallelization via Merge Semantics in STMs Kaushik Ravichandran Santosh Pande College.

SkewReduce YongChul Kwon Magdalena Balazinska, Bill Howe, Jerome Rolia* University of Washington, *HP Labs Skew-Resistant Parallel Processing of Feature-Extracting.

Parallel Computing MapReduce Examples Parallel Efficiency Assignment

Tile Reduction: the first step towards tile aware parallelization in OpenMP Ge Gan Department of Electrical and Computer Engineering Univ. of Delaware.

7/14/2015EECS 584, Fall MapReduce: Simplied Data Processing on Large Clusters Yunxing Dai, Huan Feng.

MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering.

Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Background: MapReduce and FREERIDE Co-clustering on FREERIDE Experimental.

Interpreting the data: Parallel analysis with Sawzall LIN Wenbin 25 Mar 2014.

Exploiting Domain-Specific High-level Runtime Support for Parallel Code Generation Xiaogang Li Ruoming Jin Gagan Agrawal Department of Computer and Information.

Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Summary of Contributions Background: MapReduce and FREERIDE Wavelet.

Ex-MATE: Data-Intensive Computing with Large Reduction Objects and Its Application to Graph Mining Wei Jiang and Gagan Agrawal.

A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.

Performance Issues in Parallelizing Data-Intensive applications on a Multi-core Cluster Vignesh Ravi and Gagan Agrawal

A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Tekin Bicer Gagan Agrawal 1.

CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.

A Framework for Elastic Execution of Existing MPI Programs Aarthi Raveendran Graduate Student Department Of CSE 1.

Shared Memory Parallelization of Decision Tree Construction Using a General Middleware Ruoming Jin Gagan Agrawal Department of Computer and Information.

Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China Dec 19.

Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.

StreamX10: A Stream Programming Framework on X10 Haitao Wei School of Computer Science at Huazhong University of Sci&Tech.

Integrating and Optimizing Transactional Memory in a Data Mining Middleware Vignesh Ravi and Gagan Agrawal Department of ComputerScience and Engg. The.

A Map-Reduce System with an Alternate API for Multi-Core Environments Wei Jiang, Vignesh T. Ravi and Gagan Agrawal.

MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.

Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal June 1,

Mehmet Can Kurt, The Ohio State University Gagan Agrawal, The Ohio State University DISC: A Domain-Interaction Based Programming Model With Support for.

Computer Science and Engineering Predicting Performance for Grid-Based P. 1 IPDPS’07 A Performance Prediction Framework.

CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.

Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal December 3,

Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.

GEM: A Framework for Developing Shared- Memory Parallel GEnomic Applications on Memory Constrained Architectures Mucahid Kutlu Gagan Agrawal Department.

Euro-Par, 2006 ICS 2009 A Translation System for Enabling Data Mining Applications on GPUs Wenjing Ma Gagan Agrawal The Ohio State University ICS 2009.

PDAC-10 Middleware Solutions for Data- Intensive (Scientific) Computing on Clouds Gagan Agrawal Ohio State University (Joint Work with Tekin Bicer, David.

MATE-CG: A MapReduce-Like Framework for Accelerating Data-Intensive Computations on Heterogeneous Clusters Wei Jiang and Gagan Agrawal.

High-level Interfaces for Scalable Data Mining Ruoming Jin Gagan Agrawal Department of Computer and Information Sciences Ohio State University.

Exploiting Computing Power of GPU for Data Mining Application Wenjing Ma, Leonid Glimcher, Gagan Agrawal.

AUTO-GC: Automatic Translation of Data Mining Applications to GPU Clusters Wenjing Ma Gagan Agrawal The Ohio State University.

Computer Science and Engineering Parallelizing Feature Mining Using FREERIDE Leonid Glimcher P. 1 ipdps’04 Scaling and Parallelizing a Scientific Feature.

GMProf: A Low-Overhead, Fine-Grained Profiling Approach for GPU Programs Mai Zheng, Vignesh T. Ravi, Wenjing Ma, Feng Qin, and Gagan Agrawal Dept. of Computer.

1 ”MCUDA: An efficient implementation of CUDA kernels for multi-core CPUs” John A. Stratton, Sam S. Stone and Wen-mei W. Hwu Presentation for class TDT24,

NFV Compute Acceleration APIs and Evaluation

A Dynamic Scheduling Framework for Emerging Heterogeneous Systems

Spark Presentation.

Compiling Dynamic Data Structures in Python to Enable the Use of Multi-core and Many-core Libraries Bin Ren, Gagan Agrawal 9/18/2018.

Functional Programming with Java

Tools and Techniques for Processing (and Management) of Data

Applying Twister to Scientific Applications

Yu Su, Yi Wang, Gagan Agrawal The Ohio State University

Linchuan Chen, Peng Jiang and Gagan Agrawal

Introduction to PIG, HIVE, HBASE & ZOOKEEPER

On Spatial Joins in MapReduce

Communication and Memory Efficient Parallel Decision Tree Construction

CS110: Discussion about Spark

Optimizing MapReduce for GPUs with Effective Shared Memory Usage

Scalable Parallel Interoperable Data Analytics Library

Wei Jiang Advisor: Dr. Gagan Agrawal

Data-Intensive Computing: From Clouds to GPU Clusters

Slides prepared by Samkit

Declarative Transfer Learning from Deep CNNs at Scale

Yi Wang, Wei Jiang, Gagan Agrawal

Experiences with Hadoop and MapReduce

A Map-Reduce System with an Alternate API for Multi-Core Environments

TensorFlow: A System for Large-Scale Machine Learning

COMP755 Advanced Operating Systems

OPERATING SYSTEMS MEMORY MANAGEMENT BY DR.V.R.ELANGOVAN.

FREERIDE: A Framework for Rapid Implementation of Datamining Engines

FREERIDE: A Framework for Rapid Implementation of Datamining Engines

Map Reduce, Types, Formats and Features

L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher

Presentation transcript:

Bin Ren, Gagan Agrawal, Brad Chamberlain, Steve Deitz Translating Chapel to Use FREERIDE: A Case Study in Using an HPC Language for Data-Intensive Computing Bin Ren, Gagan Agrawal, Brad Chamberlain, Steve Deitz

Outline Background Chapel and Reduction Support FREERIDE Middleware Transformation Issues and Implementation Experiments Conclusion January 12, 2019

Background Data-Intensive SuperComputing New programming paradigms Data sizes Increasingly large Data analysis Large-scale computations Multi-core and many-core applications New programming paradigms Map-Reduce & similar programming models High level Languages January 12, 2019

Map-Reduce Programming Model A set of high level API Hide the low level communication details Easy to write parallel programs: map and reduce Suitable for large-scale data processing FREERIDE Share a similar processing structure Utilize an explicit user-defined reduction-object data structure Outperformed Map-reduce for a sub-class of data intensive applications January 12, 2019

High Level Programming Languages Data-intensive computing languages Sawzall from Google Pig Latin from Yahoo … Based on Map-Reduce or similar programming models, and provide a higher level programming logic General HPC languages Chapel from Cray X10 from IBM Separate effort from above, and have more general usage January 12, 2019

Motivation Question Our method Are HPC languages suitable for expressing data-intensive computations? Can we utilize the high productivity of General HPC languages without a big performance degradation in data-intensive applications? Our method Start from Chapel A high level abstraction can improve our productivity Invoke FREERIDE by a compilation framework C libraries ensure good performance January 12, 2019

Chapel & Reduction Support Selected features of Chapel High level programming language, which can be compiled into C Support calls to C via extern declarations Support built-in and user-defined reduction operations by multi-level abstractions Local view abstraction Straightforward and flexible Users have to handle low level communication details Global view abstraction Built-in reduction model Users only need to implement the exposed functions January 12, 2019

Chapel Reduction Example ; accumulate: local reduction combine: global reduction generate: post-processing January 12, 2019

FREERIDE Middleware Both have two stages: local-reduction /map; global-reduction /reduce FREERIDE maintains an explicit user-defined reduction-object to represent the intermediate state Map-Reduce maintains (key, value) pairs as the intermediate result, while sorting, grouping and shuffling it result large amount overhead January 12, 2019

Chapel and FREERIDE User Code January 12, 2019

Transformation Issues and Implementation Three transformations Invoke the split function Transform the hierarchical data structure in Chapel to dense memory buffer in C Call reduction function to update the reduction object Map the operations on Chapel data to FREERIDE data Call combine function Use default combine function Two algorithms are emphasized Linearization: Mapping: January 12, 2019

Transformation Issues and Implementation data[0] data[1] … data[l-1] b1[0] b1[1] b1[n-1] b2 a1[0] a1[1] a1[m-1] a2 data[l]: Linearizing Alg Mapping Alg Linear_data[ ]: a1[0] a1[m-1] … a2 b2 m n l January 12, 2019

Transformation Issues and Implementation Linearization Algorithm Two-stage recursive algorithm: Compute the size of the whole memory buffer Copy the actual data from the high level data structure to the memory buffer For different types, we adopt different strategy Primitive type Iterative type and Record type Collect necessary information for mapping algorithm Data unit size in each level Starting offset for each member January 12, 2019

Transformation Issues and Implementation Illustration for collecting information Information Collected During Linearization levels = 3; unitSize[levels] = {unitSize_B, unitSize_A, sizeof (real)}; unitOffset[levels - 1][2] = {{unitOffset_B[], unitOffset_A[]}}; unitOffset_B[2] = {0, unitSize_A * n}; unitOffset_A[2] = {0, sizeof(real) * m}; January 12, 2019

Transformation Issues and Implementation Mapping Algorithm Recursive algorithm Basic idea: Start from the outer-most level Terminate with the inner-most At each level, calculate the offset caused by index and the position information collected during the linearization stage January 12, 2019

Transformation Issues and Implementation Two level optimization Classic optimization Adaptive optimization For instance: in K-Means, the cluster is also a hierarchical data set accessed frequently, so we can also linearize it January 12, 2019

Experiments Configuration Terms explanation CPU: Intel Xeon E5345 2 quad-core 2.33GHz Memory: 6GB OS: 64-bit Linux Terms explanation generated: compiler generated C code with FREERIDE opt-1: the version with classic optimization opt-2: the version with adaptive optimization manual FR: manual FREERIDE user code January 12, 2019

Experiments K-means Data size = 12MB, k = 100, iter = 10 Scalability is good Adaptive opt is more obvious Overhead is within 20% January 12, 2019

Experiments K-means Data size = 1.2G, k = 10, iter = 10 (left) Data size = 1.2G, k = 100, iter = 1 (right) January 12, 2019

Experiments PCA row = 1000, column = 10,000 (left) row = 1000, column = 100,000 (right) January 12, 2019

Conclusion Present a case study for the possible use of a new HPC language for data-intensive computations Show how to transform the reduction features of Chapel down to FREERIDE middleware Combine the productivity of high level language with the performance of a specialized runtime system January 12, 2019

Thank you for your attention! Any questions?