Applying GPU and POSIX Thread Technologies in Massive Remote Sensing Image Data Processing By: Group 17 King Mongkut's Institute of Technology Ladkrabang.

Slides:



Advertisements
Similar presentations
Shredder GPU-Accelerated Incremental Storage and Computation
Advertisements

Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
Exploiting Graphics Processors for High- performance IP Lookup in Software Routers Author: Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu.
2009/04/07 Yun-Yang Ma.  Overview  What is CUDA ◦ Architecture ◦ Programming Model ◦ Memory Model  H.264 Motion Estimation on CUDA ◦ Method ◦ Experimental.
Programming with CUDA, WS09 Waqar Saleem, Jens Müller Programming with CUDA and Parallel Algorithms Waqar Saleem Jens Müller.
1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 19, 2011 Emergence of GPU systems and clusters for general purpose High Performance Computing.
Gnort: High Performance Intrusion Detection Using Graphics Processors Giorgos Vasiliadis, Spiros Antonatos, Michalis Polychronakis, Evangelos Markatos,
Gregex: GPU based High Speed Regular Expression Matching Engine Date:101/1/11 Publisher:2011 Fifth International Conference on Innovative Mobile and Internet.
GPGPU overview. Graphics Processing Unit (GPU) GPU is the chip in computer video cards, PS3, Xbox, etc – Designed to realize the 3D graphics pipeline.
To GPU Synchronize or Not GPU Synchronize? Wu-chun Feng and Shucai Xiao Department of Computer Science, Department of Electrical and Computer Engineering,
GPGPU platforms GP - General Purpose computation using GPU
OpenSSL acceleration using Graphics Processing Units
Efficient Pseudo-Random Number Generation for Monte-Carlo Simulations Using GPU Siddhant Mohanty, Subho Shankar Banerjee, Dushyant Goyal, Ajit Mohanty.
Jared Barnes Chris Jackson.  Originally created to calculate pixel values  Each core executes the same set of instructions Mario projected onto several.
Shekoofeh Azizi Spring  CUDA is a parallel computing platform and programming model invented by NVIDIA  With CUDA, you can send C, C++ and Fortran.
Accelerating SQL Database Operations on a GPU with CUDA Peter Bakkum & Kevin Skadron The University of Virginia GPGPU-3 Presentation March 14, 2010.
Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.
1 Integrating GPUs into Condor Timothy Blattner Marquette University Milwaukee, WI April 22, 2009.
Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching UoM.
OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.
CuMAPz: A Tool to Analyze Memory Access Patterns in CUDA
Shared memory systems. What is a shared memory system Single memory space accessible to the programmer Processor communicate through the network to the.
1 Lecture 4: Threads Operating System Fall Contents Overview: Processes & Threads Benefits of Threads Thread State and Operations User Thread.
BY: ALI AJORIAN ISFAHAN UNIVERSITY OF TECHNOLOGY 2012 GPU Architecture 1.
Threads, Thread management & Resource Management.
Revisiting Kirchhoff Migration on GPUs Rice Oil & Gas HPC Workshop
By Arun Bhandari Course: HPC Date: 01/28/12. GPU (Graphics Processing Unit) High performance many core processors Only used to accelerate certain parts.
MS Thesis Defense “IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA” by Deepthi Gummadi CoE EECS Department April 21, 2014.
General Purpose Computing on Graphics Processing Units: Optimization Strategy Henry Au Space and Naval Warfare Center Pacific 09/12/12.
High Performance Computing with GPUs: An Introduction Krešimir Ćosić, Thursday, August 12th, LSST All Hands Meeting 2010, Tucson, AZ GPU Tutorial:
Programming Concepts in GPU Computing Dušan Gajić, University of Niš Programming Concepts in GPU Computing Dušan B. Gajić CIITLab, Dept. of Computer Science.
NVIDIA Fermi Architecture Patrick Cozzi University of Pennsylvania CIS Spring 2011.
Robert Liao Tracy Wang CS252 Spring Overview Traditional GPU Architecture The NVIDIA G80 Processor CUDA (Compute Unified Device Architecture) LAPACK.
GPU Architecture and Programming
Accelerating Error Correction in High-Throughput Short-Read DNA Sequencing Data with CUDA Haixiang Shi Bertil Schmidt Weiguo Liu Wolfgang Müller-Wittig.
Silberschatz and Galvin  Operating System Concepts Module 1: Introduction What is an operating system? Simple Batch Systems Multiprogramming.
Hardware Acceleration Using GPUs M Anirudh Guide: Prof. Sachin Patkar VLSI Consortium April 4, 2008.
Chapter 4 – Threads (Pgs 153 – 174). Threads  A "Basic Unit of CPU Utilization"  A technique that assists in performing parallel computation by setting.
"Distributed Computing and Grid-technologies in Science and Education " PROSPECTS OF USING GPU IN DESKTOP-GRID SYSTEMS Klimov Georgy Dubna, 2012.
Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching UoM.
Multi-Core Development Kyle Anderson. Overview History Pollack’s Law Moore’s Law CPU GPU OpenCL CUDA Parallelism.
Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.
Some key aspects of NVIDIA GPUs and CUDA. Silicon Usage.
GPUs: Overview of Architecture and Programming Options Lee Barford firstname dot lastname at gmail dot com.
Compiler and Runtime Support for Enabling Generalized Reduction Computations on Heterogeneous Parallel Configurations Vignesh Ravi, Wenjing Ma, David Chiu.
CDVS on mobile GPUs MPEG 112 Warsaw, July Our Challenge CDVS on mobile GPUs  Compute CDVS descriptor from a stream video continuously  Make.
Department of Computer Science and Software Engineering
1.1 Sandeep TayalCSE Department MAIT 1: Introduction What is an operating system? Simple Batch Systems Multiprogramming Batched Systems Time-Sharing Systems.
Would'a, CUDA, Should'a. CUDA: Compute Unified Device Architecture OU Supercomputing Symposium Highly-Threaded HPC.
Canny Edge Detection Using an NVIDIA GPU and CUDA Alex Wade CAP6938 Final Project.
CUDA Compute Unified Device Architecture. Agent Based Modeling in CUDA Implementation of basic agent based modeling on the GPU using the CUDA framework.
My Coordinates Office EM G.27 contact time:
Fast and parallel implementation of Image Processing Algorithm using CUDA Technology On GPU Hardware Neha Patil Badrinath Roysam Department of Electrical.
GPU Computing for GIS James Mower Department of Geography and Planning University at Albany.
Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.
Matthew Royle Supervisor: Prof Shaun Bangay.  How do we implement OpenCL for CPUs  Differences in parallel architectures  Is our CPU implementation.
General Purpose computing on Graphics Processing Units
Applied Operating System Concepts
GPU Architecture and Its Application
CS427 Multicore Architecture and Parallel Computing
Image Transformation 4/30/2009
Chapter 4: Multithreaded Programming
Basic CUDA Programming
Linchuan Chen, Xin Huo and Gagan Agrawal
NVIDIA Fermi Architecture
Operating System Concepts
Chapter 1 Introduction.
Graphics Processing Unit
Operating System Concepts
6- General Purpose GPU Programming
Presentation transcript:

Applying GPU and POSIX Thread Technologies in Massive Remote Sensing Image Data Processing By: Group 17 King Mongkut's Institute of Technology Ladkrabang

High Resolution graphics not Smoother? 1. Buy better Graphics Card 2. Install newest Driver > 3. Better GPU & CPU Management!! Why

This Paper? GPU? POSIX? - IEEE Thread Standard - Thread API for Many OS - Pthread = GPU + Thread Technique Apply GPU & POSIX Thread What

Overview 1. Solve? 2. Technique 3. Process 4. Real App

1.Solve?

Problem: Remote sensing is Massive! High Resolution = Slow Processing = Can’t Real Time Goal: Processing image faster

2.Technique

2.Technique (1/4) 1. CUDA 2. Block and Tile 3. Dual-parallel Processing

2.Technique (2/4) 1. CUDA (Compute Unified Device Architecture) Is: Parallel Architecture for using GPU Use by: NVIDIA GPUs since 2007 e.g. GeForce GT 420*

2.Technique (3/4) 2.Block and Tile What is Block and Tile ●Image data share to block of I/O ●If block larger than gpu, have problem

2.Technique (4/4) 3.Dual-parallel Processing POSIX threads are applied to perform the I/O step and the GPU processing step simultaneously.

3.Process

2.Process(1/2)

3.Process (2/2)

4.Real App EXPERIMENT Database PostgreSQL

4.Real App CPU 1 ( AMD Athon 3000+, 1.81GHz )CPU 2 ( Intel Core i3 530, 2.93 GHz )GPU ( Nvidia Geforce GT240, 512M ) 1 23 EXPERIMENT

4.Real App Image1 Image2 Image3 Image4 Origital Image ProcessesProcessed Image Input Process Output

4.Real App Original imageProcessed image

4.Real App IMAGE 1 IMAGE 2 IMAGE 4 IMAGE 3

4.Real App Image * Image * Image * 5480 Image * 2740

-PostgreSQL is a powerful -Object-Relational database system -More Than MySQL - CUDA technology directly to process the image blocks stored in database rapidly with SQL statements 4.Real App Database PostgreSQL

4.Real App 1.Use 2.Download 3.Install Database PostgreSQL

4.Real App 1.Use PostgreSQL PostPICPostGIS SELECT * FROM images WHERE where date(the_img) > ' '::date and size(the_img) > 1600; - Support geographic objects to PostgreSQL - Simple Features Specification for SQL - Types and Functions

-Download : 4.Real App 2. Download

4.Real App 3.1 Install PostgreSQL

4.Real App 3.1 Install PostgreSQL

4.Real App 3.2 Install PostGIS

4.Real App 3.2 Install PostGIS 1920

4.Real App PostgreSQL 9.3 PostGIS Install Complete

Conclusion!

Conclusion 1.Solve? = Processing faster 3.Process = Flowchart 2.Technique = Dual-Process 4.Real App = Speed up

PostPICPostGIS PostgreSQL Conclusion CPUGPU Block and Tile

Applying GPU and POSIX Thread Technologies in Massive Remote Sensing Image Data Processing By: Group 17 THANK YOU !!

Applying GPU and POSIX Thread Technologies in Massive Remote Sensing Image Data Processing Extended!! 1. Technique (Extended) 2. Process (Extended)

1. Technique X

1.CUDA X 1.Block and Tile X 1.Dual-parallel Processing X

1. Technique X 1. CUDA (Extended)

1. Technique X 1.1 Architecture Confix GPU API 1. CUDA (Extended)

1. Technique X 1. Language integration-Level = e.g. C, C++, … 2. Device-Level = e.g. Assembly 1.1 Architecture 1. CUDA (Extended) -> API

1. Technique X 1.2 Parallel CPU GPU 1. CUDA (Extended)

1. Technique X -A resource management program in OS -> CPU -Allow copy back help other compute 1.2 Parallel 1. CUDA (Extended)

1. Technique X Partition of image Data in GPU Partition for a kernel Processing data in a Block -> GPU 1.2 Parallel 1. CUDA (Extended)

1. Technique X 2. Block and Tile (Extended)

1. Technique X 2. Block and Tile (Extended) The block is the I/O unit between sensing image data and the system memory

1. Technique X Block and Tile(Cont) We tested the I/O performance with different block sizes Results show that the I/O performance of sensing image data declines The tile is used to transfer data between the system memory and GPU memory Block, the I/O unit between image and memory Tile, the I/O unit between memory and GPU memory

1. Technique X Block and Tile(Cont) The block in system memory is partitioned into multiple tiles in this approach The tile is used to transfer data between the system memory and GPU memory

1. Technique X 2. Dual-parallel Processing (Extended)

3.Dual-parallel Processing (Extended) 1. Technique X Using buffer pool technology. I/O step and the GPU processing step are independent from each other. Simple and easy to implement if buffer size is equal to block size.

1.Technique X Used for the I/O task between the image data and the system memory Responsible for delivering the buffers from the buffer pool to the GPU memory and processing 3.Dual-parallel Processing (Extended)

1.Technique X 3.Dual-parallel Processing (Extended)

1.Technique X From the micro perspective, image data are processed by hundreds of execution units simultaneously in GPU From the macro perspective, the I/O operation and the processing operation are performed simultaneously by POSIX threads. 3.Dual-parallel Processing (Extended)

1.Technique X 3.Dual-parallel Processing (Extended)

2.Process X

2.Process(1/3) Begins with function main after that Initialization of CUDA environment Group of buffers are set up Two POSIX threads are created One is used to input and output the remote sensing image data Two is used to responsible for processing the buffers read by the first thread For example Ready_to_read The I/O thread should read a block from image data to current buffer Ready_to_process Buffer is ready to be process by GPU Ready_to_process The I/O thread should write a block to image data

2.Process(2/3) Two global variables are declared to record the execution state of each thread For example Is_IO_Over is true, it means all the work of I/O thread is finished Is_Process_Over is true, it means all the work of responsible for processing the buffers is finished “pthread_join” would be called to terminate that thread Two threads are finished then all the remote sensing image data are completely processed and the program will be end by calling “return”