Weekly Report Ph.D. Student: Leo Lee date: Oct. 9, 2009.

Slides:

Advertisements

Similar presentations

Shredder GPU-Accelerated Incremental Storage and Computation

Advertisements

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.

Mars: A MapReduce Framework on Graphics Processors Bingsheng He 1, Wenbin Fang, Qiong Luo Hong Kong Univ. of Sci. and Tech. Naga K. Govindaraju Tuyong.

Exploiting Graphics Processors for High- performance IP Lookup in Software Routers Author: Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu.

GPGPU Introduction Alan Gray EPCC The University of Edinburgh.

An Effective GPU Implementation of Breadth-First Search Lijuan Luo, Martin Wong and Wen-mei Hwu Department of Electrical and Computer Engineering, UIUC.

Introduction CSCI 444/544 Operating Systems Fall 2008.

Concurrency for data-intensive applications

OpenFOAM on a GPU-based Heterogeneous Cluster

Data Analytics and Dynamic Languages Lee E. Edlefsen, Ph.D. VP of Engineering 1.

GPU Computing with CUDA as a focus Christie Donovan.

Multi Agent Simulation and its optimization over parallel architecture using CUDA™ Abdur Rahman and Bilal Khan NEDUET(Department Of Computer and Information.

Towards Acceleration of Fault Simulation Using Graphics Processing Units Kanupriya Gulati Sunil P. Khatri Department of ECE Texas A&M University, College.

Team Members: Tyler Drake Robert Wrisley Kyle Von Koepping Justin Walsh Faculty Advisors: Computer Science – Prof. Sanjay Rajopadhye Electrical & Computer.

Weekly Report Start learning GPU Ph.D. Student: Leo Lee date: Sep. 18, 2009.

CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.

Weekly Report Start learning GPU Ph.D. Student: Leo Lee date: Sep. 18, 2009.

Accelerating Machine Learning Applications on Graphics Processors Narayanan Sundaram and Bryan Catanzaro Presented by Narayanan Sundaram.

OPL: Our Pattern Language. Background Design Patterns: Elements of Reusable Object-Oriented Software o Introduced patterns o Very influential book Pattern.

Gregex: GPU based High Speed Regular Expression Matching Engine Date:101/1/11 Publisher:2011 Fifth International Conference on Innovative Mobile and Internet.

GPU Graphics Processing Unit. Graphics Pipeline Scene Transformations Lighting & Shading ViewingTransformations Rasterization GPUs evolved as hardware.

To GPU Synchronize or Not GPU Synchronize? Wu-chun Feng and Shucai Xiao Department of Computer Science, Department of Electrical and Computer Engineering,

GPGPU platforms GP - General Purpose computation using GPU

Accelerating SQL Database Operations on a GPU with CUDA Peter Bakkum & Kevin Skadron The University of Virginia GPGPU-3 Presentation March 14, 2010.

Jawwad A Shamsi Nouman Durrani Nadeem Kafi Systems Research Laboratories, FAST National University of Computer and Emerging Sciences, Karachi Novelties.

What Programming Language Should We Use Tomorrow Kim Young Soo.

Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.

Massively LDPC Decoding on Multicore Architectures Present by : fakewen.

An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.

COLLABORATIVE EXECUTION ENVIRONMENT FOR HETEROGENEOUS PARALLEL SYSTEMS Aleksandar Ili´c, Leonel Sousa 2010 IEEE International Symposium on Parallel & Distributed.

CuMAPz: A Tool to Analyze Memory Access Patterns in CUDA

GPU Programming David Monismith Based on notes taken from the Udacity Parallel Programming Course.

(1) ECE 8823: GPU Architectures Sudhakar Yalamanchili School of Electrical and Computer Engineering Georgia Institute of Technology NVIDIA Keplar.

BY: ALI AJORIAN ISFAHAN UNIVERSITY OF TECHNOLOGY 2012 GPU Architecture 1.

By Arun Bhandari Course: HPC Date: 01/28/12. GPU (Graphics Processing Unit) High performance many core processors Only used to accelerate certain parts.

MS Thesis Defense “IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA” by Deepthi Gummadi CoE EECS Department April 21, 2014.

Y. Kotani · F. Ino · K. Hagihara Springer Science + Business Media B.V Reporter: 李長霖.

Parallelizing Iterative Computation for Multiprocessor Architectures Peter Cappello.

Workflow Early Start Pattern and Future's Update Strategies in ProActive Environment E. Zimeo, N. Ranaldo, G. Tretola University of Sannio - Italy.

GPU Architecture and Programming

Autonomic scheduling of tasks from data parallel patterns to CPU/GPU core mixes Published in: High Performance Computing and Simulation (HPCS), 2013 International.

Parallelization and Characterization of Pattern Matching using GPUs Author: Giorgos Vasiliadis 、 Michalis Polychronakis 、 Sotiris Ioannidis Publisher:

Summary Background –Why do we need parallel processing? Moore’s law. Applications. Introduction in algorithms and applications –Methodology to develop.

Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.

Optimizing MapReduce for GPUs with Effective Shared Memory Usage Department of Computer Science and Engineering The Ohio State University Linchuan Chen.

1)Leverage raw computational power of GPU  Magnitude performance gains possible.

DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

Program Optimizations and Recent Trends in Heterogeneous Parallel Computing Dušan Gajić, University of Niš Program Optimizations and Recent Trends in Heterogeneous.

Debunking the 100X GPU vs. CPU Myth An Evaluation of Throughput Computing on CPU and GPU Present by Chunyi Victor W Lee, Changkyu Kim, Jatin Chhugani,

Weekly Report- Reduction Ph.D. Student: Leo Lee date: Oct. 30, 2009.

Introduction to CUDA CAP 4730 Spring 2012 Tushar Athawale.

Shangkar Mayanglambam, Allen D. Malony, Matthew J. Sottile Computer and Information Science Department Performance.

University of Michigan Electrical Engineering and Computer Science Adaptive Input-aware Compilation for Graphics Engines Mehrzad Samadi 1, Amir Hormati.

Threads. Readings r Silberschatz et al : Chapter 4.

CSci6702 Parallel Computing Andrew Rau-Chaplin

CS 732: Advance Machine Learning

Sunpyo Hong, Hyesoon Kim

GFlow: Towards GPU-based High- Performance Table Matching in OpenFlow Switches Author : Kun Qiu, Zhe Chen, Yang Chen, Jin Zhao, Xin Wang Publisher : Information.

Part III BigData Analysis Tools (YARN) Yuan Xue

Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.

Threads by Dr. Amin Danial Asham. References Operating System Concepts ABRAHAM SILBERSCHATZ, PETER BAER GALVIN, and GREG GAGNE.

Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.

TensorFlow– A system for large-scale machine learning

University of Maryland College Park

Graphics Processing Unit

Optimizing MapReduce for GPUs with Effective Shared Memory Usage

Assoc. Prof. Dr. Syed Abdul-Rahman Al-Haddad

Multithreaded Programming

ECE 8823: GPU Architectures

No. Date Agenda 1 09/14/2012 Course Organization; [slides] Lecture 1 - What is Cloud Computing [slides] 2 09/21/2012 Lecture 2 - The Art of Concurrency.

Presentation transcript:

Weekly Report Ph.D. Student: Leo Lee date: Oct. 9, 2009

Outline Courses Research Work plan

Outline Courses Research Work plan

Courses Data mining –Homework; –Hidden Markov Model –Read the most classical tutorial; Forward-backward procedure; Viterbi algorithm;

Courses Network security –Check the homework; –Modify the tutorial for next week; Learn C# ; Dev. an easy chat application.

Outline Courses Research Work plan

Research

Mars: A MapReduce Framework on Graphics Processors Introduction –For search engines and other web server applications, high performance is essential. –The MapReduce framework is a successful paradigm to support such data processing applications, which reduces the complexity of parallel programming. –Encouraged by the success of the CPU-based MapReduce frameworks, we develop Mars, a MapReduce framework on graphics processors, or GPUs.

Mars: A MapReduce Framework on Graphics Processors Introduction –Since GPUs are traditionally designed as special-purpose co- processors for gaming applications, their languages lack support for some basic programming constructs. variable-length data types; more complex functions such as recursion. –GPU architectural details are highly vendor-specific and programmers have limited access to these details. –All these factors make the GPU programming a difficult task in general and more so for complex tasks such as web data analysis. Therefore, we propose to develop a MapReduce framework on the GPU so that programmers can easily harness the GPU computation power for their data processing tasks.

Mars: A MapReduce Framework on Graphics Processors Introduction –First, the synchronization overhead must be low so that the system can scale to hundreds of processors. –Second, due to the lack of dynamic thread scheduling on current GPUs, it is essential to allocate work evenly across threads on the GPU to exploit its massive thread parallelism. –Third, the core tasks of MapReduce programs, including string processing, file manipulation and concurrent reads and writes, are unconventional to GPUs and must be handled efficiently.

Mars: A MapReduce Framework on Graphics Processors Preliminaries and overview –GPUs –GPGPU –MapReduce

Mars: A MapReduce Framework on Graphics Processors Design and implementation –Ease of programming. Ease of programming encourages developers to use the GPU for their tasks. –Performance. The overall performance of our GPU-based MapReduce should be comparable to or better than that of the state- of-the-art CPU counterparts.

Mars: A MapReduce Framework on Graphics Processors Design and implementation-APIs –User-implemented

Mars: A MapReduce Framework on Graphics Processors Design and implementation-APIs –System-provided

Mars: A MapReduce Framework on Graphics Processors System Workflow and Configuration

Mars: A MapReduce Framework on Graphics Processors Optimization Techniques –Coalesced accesses –Accesses using built-in vector types: char4 and int4?

Mars: A MapReduce Framework on Graphics Processors Experimental evaluation

Mars: A MapReduce Framework on Graphics Processors Experimental evaluation

Mars: A MapReduce Framework on Graphics Processors Experimental evaluation

Mars: A MapReduce Framework on Graphics Processors Experimental evaluation

Mars: A MapReduce Framework on Graphics Processors Experimental evaluation

Mars: A MapReduce Framework on Graphics Processors Experimental evaluation

Mars: A MapReduce Framework on Graphics Processors Experimental evaluation

Mars: A MapReduce Framework on Graphics Processors Experimental evaluation

Mars: A MapReduce Framework on Graphics Processors

Outline Courses Research Work plan

Work Plan Go on paper reading Learn more CUDA applications Work hard on data mining, try to implement some classical algorithm Learn C#

Thanks for your listening