Small-Scale Raster Map Projection using the Compute Unified Device Architecture (CUDA) U.S. Department of the Interior U.S. Geological Survey Michael P.

Slides:

Advertisements

Similar presentations

High-Resolution National Elevation Dataset: CyberGIS Challenges and Opportunities for Scalable Spatial Data Access and Analytics Yan Liu1,3,5, Babak Behzad1,2,

Advertisements

Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.

Optimization on Kepler Zehuan Wang

State of CyberGIS State of CyberGIS Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic.

Exploiting Graphics Processors for High- performance IP Lookup in Software Routers Author: Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu.

Cyberinfrastructure for Scalable and High Performance Geospatial Computation Xuan Shi Graduate assistants supported by the CyberGIS grant Fei Ye (2011)

ECE 562 Computer Architecture and Design Project: Improving Feature Extraction Using SIFT on GPU Rodrigo Savage, Wo-Tak Wu.

Rapid Raster Projection Transformation and Web Service Using High-performance Computing Technology 2009 AAG Annual Meeting Las Vegas, NV March 25 th, 2009.

Acceleration of the Smith– Waterman algorithm using single and multiple graphics processors Author : Ali Khajeh-Saeed, Stephen Poole, J. Blair Perot. Publisher:

DCABES 2009 China University Of Geosciences 1 The Parallel Models of Coronal Polarization Brightness Calculation Jiang Wenqian.

Hardware Basics: Inside the Box 2  2001 Prentice Hall2.2 Chapter Outline “There is no invention – only discovery.” Thomas J. Watson, Sr. What Computers.

1 ITCS 6/8010 CUDA Programming, UNC-Charlotte, B. Wilkinson, Jan 19, 2011 Emergence of GPU systems and clusters for general purpose High Performance Computing.

CUDA Programming Lei Zhou, Yafeng Yin, Yanzhi Ren, Hong Man, Yingying Chen.

University of Michigan Electrical Engineering and Computer Science Amir Hormati, Mehrzad Samadi, Mark Woh, Trevor Mudge, and Scott Mahlke Sponge: Portable.

Contemporary Languages in Parallel Computing Raymond Hummel.

To GPU Synchronize or Not GPU Synchronize? Wu-chun Feng and Shucai Xiao Department of Computer Science, Department of Electrical and Computer Engineering,

Geospatial Analytics for Government Agencies and the General Public: The CyberGIS Toolkit as an Enabler U.S. Department of the Interior U.S. Geological.

Shekoofeh Azizi Spring  CUDA is a parallel computing platform and programming model invented by NVIDIA  With CUDA, you can send C, C++ and Fortran.

U.S. Department of the Interior U.S. Geological Survey David V. Hill, Information Dynamics, Contractor to USGS/EROS 12/08/2011 Satellite Image Processing.

An Introduction to Programming with CUDA Paul Richmond

PRasterBlaster: High-Performance Small-Scale Raster Map Projection Transformation Using the Extreme Science and Engineering Discovery Environment U.S.

Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.

Training Program on GPU Programming with CUDA 31 st July, 7 th Aug, 14 th Aug 2011 CUDA Teaching UoM.

OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston.

CyberGIS Toolkit: A Software Toolbox Built for Scalable cyberGIS Spatial Analysis and Modeling Yan Liu 1,2, Michael Finn 4, Hao Hu 1, Jay Laura 3, David.

1 Babak Behzad, Yan Liu 1,2,4, Eric Shook 1,2, Michael P. Finn 5, David M. Mattli 5 and Shaowen Wang 1,2,3,4 Babak Behzad 1,3, Yan Liu 1,2,4, Eric Shook.

COLLABORATIVE EXECUTION ENVIRONMENT FOR HETEROGENEOUS PARALLEL SYSTEMS Aleksandar Ili´c, Leonel Sousa 2010 IEEE International Symposium on Parallel & Distributed.

CuMAPz: A Tool to Analyze Memory Access Patterns in CUDA

U.S. Department of the Interior U.S. Geological Survey Accurate Projection of Small-Scale Raster Datasets 21 st International Cartographic Conference 10.

Cartographic Modeling Language Approach for CyberGIS: A Demonstration with Flux Footprint Modeling Michael E. Hodgson, April Hiscox, Shaowen Wang, Babak.

BY: ALI AJORIAN ISFAHAN UNIVERSITY OF TECHNOLOGY 2012 GPU Architecture 1.

U.S. Department of the Interior U.S. Geological Survey Reprojecting Raster Data of Global Extent Auto-Carto 2005: A Research Symposium March, 2005.

By Arun Bhandari Course: HPC Date: 01/28/12. GPU (Graphics Processing Unit) High performance many core processors Only used to accelerate certain parts.

Introduction to CUDA (1 of 2) Patrick Cozzi University of Pennsylvania CIS Spring 2012.

Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2012.

CyberGIS in Action CyberGIS in Action Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic.

General Purpose Computing on Graphics Processing Units: Optimization Strategy Henry Au Space and Naval Warfare Center Pacific 09/12/12.

Programming Concepts in GPU Computing Dušan Gajić, University of Niš Programming Concepts in GPU Computing Dušan B. Gajić CIITLab, Dept. of Computer Science.

Small-Scale Raster Map Projection Transformation Using a Virtual System to Interactively Share Computing Resources and Data U.S. Department of the Interior.

Applying GPU and POSIX Thread Technologies in Massive Remote Sensing Image Data Processing By: Group 17 King Mongkut's Institute of Technology Ladkrabang.

Conclusions and Future Considerations: Parallel processing of raster functions were 3-22 times faster than ArcGIS depending on file size. Also, processing.

NVIDIA Fermi Architecture Patrick Cozzi University of Pennsylvania CIS Spring 2011.

Robert Liao Tracy Wang CS252 Spring Overview Traditional GPU Architecture The NVIDIA G80 Processor CUDA (Compute Unified Device Architecture) LAPACK.

Realizing CyberGIS Vision through Software Integration Anand Padmanabhan, Yan Liu, Shaowen Wang CyberGIS Center for Advanced Digital and Spatial Studies.

GPU Architecture and Programming

Experiences Accelerating MATLAB Systems Biology Applications Heart Wall Tracking Lukasz Szafaryn, Kevin Skadron University of Virginia.

Introduction What is GPU? It is a processor optimized for 2D/3D graphics, video, visual computing, and display. It is highly parallel, highly multithreaded.

Some key aspects of NVIDIA GPUs and CUDA. Silicon Usage.

QCAdesigner – CUDA HPPS project

U.S. Department of the Interior U.S. Geological Survey Elements of a Global Model: An Example of Sea Level Rise and Human Populations at Risk E. Lynn Usery.

1)Leverage raw computational power of GPU  Magnitude performance gains possible.

Introduction to CUDA (1 of n*) Patrick Cozzi University of Pennsylvania CIS Spring 2011 * Where n is 2 or 3.

{ U.S. Department of the Interior U.S. Geological Survey Michael P. Finn Briefing to a pre-conference workshop of the 27th International Cartographic Conference:

1 Workshop 9: General purpose computing using GPUs: Developing a hands-on undergraduate course on CUDA programming SIGCSE The 42 nd ACM Technical.

GPU Accelerated MRI Reconstruction Professor Kevin Skadron Computer Science, School of Engineering and Applied Science University of Virginia, Charlottesville,

An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-body Algorithm By Martin Burtscher and Keshav Pingali Jason Wengert.

PDAC-10 Middleware Solutions for Data- Intensive (Scientific) Computing on Clouds Gagan Agrawal Ohio State University (Joint Work with Tekin Bicer, David.

1 Kalev Leetaru, Eric Shook, and Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic Information.

Shangkar Mayanglambam, Allen D. Malony, Matthew J. Sottile Computer and Information Science Department Performance.

Introduction to CUDA 1 of 2 Patrick Cozzi University of Pennsylvania CIS Fall 2014.

U.S. Department of the Interior U.S. Geological Survey Projecting Global Raster Databases July 11, 2002 Joint International Symposium on GEOSPATIAL THEORY,

Fast and parallel implementation of Image Processing Algorithm using CUDA Technology On GPU Hardware Neha Patil Badrinath Roysam Department of Electrical.

GPU Computing for GIS James Mower Department of Geography and Planning University at Albany.

CyberGIS Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall

Heterogeneous Processing KYLE ADAMSKI. Overview What is heterogeneous processing? Why it is necessary Issues with heterogeneity CPU’s vs. GPU’s Heterogeneous.

Computer Engg, IIT(BHU)

GPU Architecture and Its Application

CS 179: GPU Programming Lecture 1: Introduction 1

PRasterBlaster: High-Performance Small-Scale Raster Map Projection Transformation Using the Extreme Science and Engineering Discovery Environment Michael.

2009 AAG Annual Meeting Las Vegas, NV March 25th, 2009

Presentation transcript:

Small-Scale Raster Map Projection using the Compute Unified Device Architecture (CUDA) U.S. Department of the Interior U.S. Geological Survey Michael P. Finn, Jing Li, and David Mattli ISPRS Technical Commission IV Symposium on Geospatial Databases and Location Based Services Suzhou, China 14 – 16 May 2014

HPC-Research/ Motivation Prime test case: Map projection/ reprojection for large raster datasets (“Big” Data?) pRasterBlaster: mapIMG in HPC environment Solve problems using multiple processors Currently testing within the NSF CyberGIS Project leveraging XSEDE (more traditional supercomputing (SC) environment) How does the same problem compare in a computation sense between CPU-dominate SC environment and a more light-weight General Purpose GPU-dominate environment?

CUDA A parallel computing platform and programming model invented by Nvidia Allows GPUs to be used for general purpose processing (not exclusively graphics) GPUs have a parallel throughput architecture that allows executing many concurrent threads slowly (rather than executing a single thread very quickly) Accessible to software developers through libraries, compiler directives, and extensions to programming languages, including C, C++ and Fortran

Accurate Raster Reprojection in Three (primary) Steps Step 1: Calculate and Partition Output Space Step 2: Read Input and Reproject Step 3: Combine Temporary Files

The Equations Projection Transformation Process: Framing – The frame of a raster dataset defines the extent of the dataset in the projection space. It also defines the alignment of projection space with the input (often) image coordinate system. X = ULprojX + ((sample – 1) * pixelSizeX)(1) Y = ULprojY – ((line – 1) * pixelSizeY)(2) – Alternatively: Sample = ((X – ULprojX) / PixelSizeX) + 1(3) Line = ((ULprojY – Y) / pixelSizeY) + 1(4)

CUDA implementation 4 corner point based map projection using CUDA

Raster Chunk Handling Cannot merge output chunks due to the limitation of computing resources

Results Configuration of the testing machine – Intel Quad-core CPU (i – GeForce GT 640, 384 GPU cores – 8G RAM – NVIDIA CUDA SDK 5.5 – Visual Studio 2010

Results CUDA configuration – Block size:256*1 – Chunk dimension: 1024 Resample GLC (original: ~900MB) FileResDimensionVolumeCPUGPURatio * MB * MB896 (C)21.635/17.553(C) * MB5548(C) (C) * MBNA407.65(C) Equirectangular to Albers NA = Out of memory (8 Gb) on test machine

Results CUDA configuration – Block size:256*1 – Chunk dimension: 1024 Resample NLCD (original: 15.6G) Albers to Equirectangular FileResDimensionVolumeCPUGPURatio * MB C * MB C * MB C * MBNA204.93

Issues (1 of 2) The inverse/ forward map projection for Molliweide is not accurate – Need to find the reasons why (should be a minor fix) – Therefore, restrained the current testing to Equirectangular and Albers The results of map projection were inaccurate due to misapplied resampling method (minor fix) The way to retrieve input data chunk based on the bounding box of output chunk may not be quite accurate – Problem identified: chunks near the edges of dataset need to have some overlap retrieved (negative coordinates)

Issues (2 of 2) Needs better memory management – CPU: Out of memory error even with chunk – Suspect test machine not releasing memory in timely fashion GPU: not stable always: kernels may fail during the execution; grid/ block setup – Workload may not be balanced very well. – Kernels can fail when sending too much data – Using remote desktop to manipulate the data may cause issue

Conclusion CUDA provides a light-weight, less-expensive alternative to CPU parallel environments like supercomputers Raster map projection behaves similarly in initial test to established pRasterBlaster testing in CPU-dominated HPC environments – Greater than one order of magnitude faster More work necessary/ issues remain

References Behzad, Babak, Yan Liu, Eric Shook, Michael P. Finn, David M. Mattli, and Shaowen Wang (2012). A Performance Profiling Strategy for High-Performance Map Re-Projection of Coarse-Scale Spatial Raster Data. Abstract presented at the Auto-Carto 2012, A Cartography and Geographic Information Society Research Symposium, Columbus, OH. Finn, Michael P., Yan Liu, David M. Mattli, Babak Behzad, Kristina H. Yamamoto, Qingfeng (Gene) Guan, Eric Shook, Anand Padmanabhan, Michael Stramel, and Shaowen Wang (2014). High-Performance Small-Scale Raster Map Projection Transformation on Cyberinfrastructure. Paper accepted for publication as a chapter in CyberGIS: Fostering a New Wave of Geospatial Discovery and Innovation, Shaowen Wang and Michael F. Goodchild, editors. Springer-Verlag. Finn, Michael P., Yan Liu, David M. Mattli, Qingfeng (Gene) Guan, Kristina H. Yamamoto, Eric Shook and Babak Behzad (2012). pRasterBlaster: High-Performance Small-Scale Raster Map Projection Transformation Using the Extreme Science and Engineering Discovery Environment. Abstract presented at the XXII International Society for Photogrammetry & Remote Sensing Congress, Melbourne, Australia. Finn, Michael P., Daniel R. Steinwand, Jason R. Trent, Robert A. Buehler, David Mattli, and Kristina H. Yamamoto (2012). A Program for Handling Map Projections of Small Scale Geospatial Raster Data. Cartographic Perspectives, Number 71, pages 53 – 67. Liu, Yan, Michael P. Finn, Babak Behzad, and Eric Shook (2013). High-Resolution National Elevation Dataset: Opportunities and Challenges for High-Performance Spatial Analytics. Abstract presented in the Special Session on “Big Data,” American Society for Photogrammetry and Remote Sensing Annual Conference. Baltimore, Maryland. Liu, Yan, Anand Padmanabhan, and Shaowen Wang, (2014) CyberGIS Gateway for enabling data-rich geospatial research and education, Concurrency Computat.: Pract. Exper., DOI: /cpe Rey, S.J. (2014) “Open regional science." Presidential Address, Western Regional Science Association, San Diego. February

Other Collaborators (primarily on the CyberGIS project) Shaowen Wang, Anand Padmanabhan, Yan Liu – University of Illinois at Urbana-Champaign (UIUC), CyberInfrastructure and Geospatial Information Laboratory David M. Mattli, Jeff Wendel, E. Lynn Usery, Michael Stramel – USGS, Center of Excellence for Geospatial Information Science (CEGIS) Kristina H. Yamamoto – USGS, National Geospatial Technical Operations Center Babak Behzad – UIUC, Department of Computer Science Eric Shook – Kent State University, Department of Geography Qingfeng (Gene) Guan – China University of Geosciences

Disclaimer Any use of trade, product, or firm names in this paper is for descriptive purposes only and does not imply endorsement by the U.S. Government.

Small-Scale Raster Map Projection using the Compute Unified Device Architecture (CUDA) U.S. Department of the Interior U.S. Geological Survey QUESTIONS? ISPRS Technical Commission IV Symposium on Geospatial Databases and Location Based Services Suzhou, China 14 – 16 May 2014

The block size is not directly related to the chunking concept. Block size is the number of threads within each block of GPU. Another concept is grid size. CUDA can launch multiple threads at the same time (e.g., 512 threads). All threads in a block will be sent to the GPU processors at the same time but may not launch at the same time (depending how many GPU cores are available). In my implementation, I assign each cell of the output image/chunk to a thread. If the output image has a dimension of 256*256 and the block size is 16*16, then the grid size is (256/16)*(256/16) =16*16. If the output image has a dimension of 250*250 and the block size is 16*16, then the grid size is (256/16)*(250/16) = 16*15.x = 16*16. This implies that the last few blocks have less data (e.g., 16*10). So the selection of the block size is determined by the number of GPU cores as well as the dimension of the image. When dealing with large image, which cannot be read into CPU main memory all at once, the image should be divided into chunks. One chunk then becomes an input image. Then GPU starts processing the chunk..