Parallel ODETLAP for Terrain Compression and Reconstruction

Slides:



Advertisements
Similar presentations
Measure Projection Analysis
Advertisements

Parallel Jacobi Algorithm Steven Dong Applied Mathematics.
Render Cache John Tran CS851 - Interactive Ray Tracing February 5, 2003.
CAP4730: Computational Structures in Computer Graphics Visible Surface Determination.
Computer Graphics Visible Surface Determination. Goal of Visible Surface Determination To draw only the surfaces (triangles) that are visible, given a.
Spark: Cluster Computing with Working Sets
Using one level of Cache:
Path Planning on a Compressed Terrain Daniel M. Tracy, W. Randolph Franklin, Barbara Cutler, Franklin T. Luk, Marcus Andrade, Metin Inanc, Zhongyi Xie,
Path Planning on a Compressed Terrain Daniel M. Tracy, W. Randolph Franklin, Barbara Cutler, Franklin T. Luk, Marcus Andrade, Jared Stookey Rensselaer.
Completing fragmentary river networks via induced terrain Tsz-Yam Lau and W. Randolph Franklin Rensselaer Polytechnic Institute Troy NY Autocarto.
Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions.
Page 1 CS Department Parallel Design of JPEG2000 Image Compression Xiuzhen Huang CS Department UC Santa Barbara April 30th, 2003.
Cliff Rhyne and Jerry Fu June 5, 2007 Parallel Image Segmenter CSE 262 Spring 2007 Project Final Presentation.
FLANN Fast Library for Approximate Nearest Neighbors
Introduction to Symmetric Multiprocessors Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı
Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.
JPEG C OMPRESSION A LGORITHM I N CUDA Group Members: Pranit Patel Manisha Tatikonda Jeff Wong Jarek Marczewski Date: April 14, 2009.
GmImgProc Alexandra Olteanu SCPD Alexandru Ştefănescu SCPD.
EE369C Final Project: Accelerated Flip Angle Sequences Jan 9, 2012 Jason Su.
Raster Data Model.
Independent Component Analysis (ICA) A parallel approach.
Scheduling Many-Body Short Range MD Simulations on a Cluster of Workstations and Custom VLSI Hardware Sumanth J.V, David R. Swanson and Hong Jiang University.
Performance Measurement n Assignment? n Timing #include double When() { struct timeval tp; gettimeofday(&tp, NULL); return((double)tp.tv_sec + (double)tp.tv_usec.
Intelligent Database Systems Lab 1 Advisor : Dr. Hsu Graduate : Jian-Lin Kuo Author : Silvia Nittel Kelvin T.Leung Amy Braverman 國立雲林科技大學 National Yunlin.
Performance Measurement. A Quantitative Basis for Design n Parallel programming is an optimization problem. n Must take into account several factors:
ICPP 2012 Indexing and Parallel Query Processing Support for Visualizing Climate Datasets Yu Su*, Gagan Agrawal*, Jonathan Woodring † *The Ohio State University.
Triangular Mesh Decimation
RPI / Geo* / ACM-GIS - Nov Smugglers and Border Guards: the Geo* Project at RPI Presented at The 15 th ACM International Symposium on Advances.
3-May-2006cse cache © DW Johnson and University of Washington1 Cache Memory CSE 410, Spring 2006 Computer Systems
RPI / Geo* / NGA & DARPA Oct DEM Compression and Terrain Approximation; Smugglers and Border Guards Prof W Randolph Franklin, Prof Frank Luk,
Overcoming Scaling Challenges in Bio-molecular Simulations Abhinav Bhatelé Sameer Kumar Chao Mei James C. Phillips Gengbin Zheng Laxmikant V. Kalé.
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.
Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.
Lei Li Computer Science Department Carnegie Mellon University Pre Proposal Time Series Learning completed work 11/27/2015.
Pipelining and Parallelism Mark Staveley
Motivation: Sorting is among the fundamental problems of computer science. Sorting of different datasets is present in most applications, ranging from.
 The need for parallelization  Challenges towards effective parallelization  A multilevel parallelization framework for BEM: A compute intensive application.
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
A BRIEF INTRODUCTION TO CACHE LOCALITY YIN WEI DONG 14 SS.
Efficiency of small size tasks calculation in grid clusters using parallel processing.. Olgerts Belmanis Jānis Kūliņš RTU ETF Riga Technical University.
Big data Usman Roshan CS 675. Big data Typically refers to datasets with very large number of instances (rows) as opposed to attributes (columns). Data.
Parallelizing Spacetime Discontinuous Galerkin Methods Jonathan Booth University of Illinois at Urbana/Champaign In conjunction with: L. Kale, R. Haber,
+ Clusters Alternative to SMP as an approach to providing high performance and high availability Particularly attractive for server applications Defined.
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
Image Processing A Study in Pixel Averaging Building a Resolution Pyramid With Parallel Computing Denise Runnels and Farnaz Zand.
Hybrid Parallel Implementation of The DG Method Advanced Computing Department/ CAAM 03/03/2016 N. Chaabane, B. Riviere, H. Calandra, M. Sekachev, S. Hamlaoui.
COSC2410: LAB 19 INTRODUCTION TO MEMORY/CACHE DIRECT MAPPING 1.
1 Munther Abualkibash University of Bridgeport, CT.
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
Section 7 Erasure Coding Overview
Parallel ODETLAP for Terrain Compression and Reconstruction
Summary of lectures Introduction to Algorithm Analysis and Design (Chapter 1-3). Lecture Slides Recurrence and Master Theorem (Chapter 4). Lecture Slides.
Parallel Density-based Hybrid Clustering
Introduction to Parallelism.
Multi-Processing in High Performance Computer Architecture:
Architecture Background
Spatial Data Models Raster uses individual cells in a matrix, or grid, format to represent real world entities Vector uses coordinates to store the shape.
Author: Ahmed Eldawy, Mohamed F. Mokbel, Christopher Jonathan
Yu Su, Yi Wang, Gagan Agrawal The Ohio State University
Image Processing for Physical Data
11/30/2018 1:29 AM Tradeoffs When Multiple Observer Siting on Large Terrain Cells Spatial Data Handling (SDH) Vienna, July 2006 W Randolph Franklin, Christian.
Static Image Filtering on Commodity Graphics Processors
Parallel ODETLAP for Terrain Compression and Reconstruction
Cooperative Caching, Simplified
Mesh Parameterization: Theory and Practice
Drainage Network and Watershed Reconstruction on Simplified Terrain
Hybrid Programming with OpenMP and MPI
By Brandon, Ben, and Lee Parallel Computing.
PERFORMANCE MEASURES. COMPUTATIONAL MODELS Equal Duration Model:  It is assumed that a given task can be divided into n equal subtasks, each of which.
Chapter 4 Multiprocessors
Presentation transcript:

Parallel ODETLAP for Terrain Compression and Reconstruction Jared Stookey Zhongyi Xie W. Randolph Franklin Dan Tracy Barbara Cutler Marcus V. A. Andrade RPI GeoStar Group

Outline Quick Overview of our research ODETLAP (Non-Patch) Motivation for Parallelization Our Approach MPI Implementation Results Current and Future Work RPI GeoStar Group

Quick Overview Our research Terrain compression Compress terrain by selecting subset of points Reconstruct the terrain by solving a system of equations to fill in missing points The method we use to reconstruct the terrain is slow for large datasets We came up with a method for reconstructing very large datasets quickly using MPI RPI GeoStar Group

ODETLAP Over-Determined Laplacian Two Equations: 4zij = zi-1,j + zi+1,j + zi,j-1 + zi,j+1 zij = hij Multiple values for some points Require a smooth parameter R to interpolate when multiple values exist Reconstruct an approximated surface from {hij} (Red points) RPI GeoStar Group

ODETLAP Compression Lossily compress image by selecting subset of points ODETLAP reconstruction solves for the whole terrain 2) Store 1) Compress 3) Reconstruct (ODETLAP) RPI GeoStar Group

Motivation for Parallelization ODETLAP prohibitively slow for large datasets We need a scalable implementation Only a small neighborhood of points will affect a particular elevation. 1 pixel only affected an area of 62x62 RPI GeoStar Group

Our Approach Divide the terrain into individual patches Run ODETLAP on each patch separately 3) Reconstruct each patch 1) Compressed terrain 2) Divide it into patches 4) Merge the patches RPI GeoStar Group

There is a problem! (continued) We get discontinuity if we naively merge the patches Errors: Naively reconstructed terrain: RPI GeoStar Group

There is a problem! Points near the edges of patches have incomplete data which causes errors Pixels in red show erroneous results Pixels in blue show correct results RPI GeoStar Group

Solution Use overlapping layers of patches RPI GeoStar Group

Solution Use overlapping layers of patches RPI GeoStar Group

Solution Use overlapping layers of patches RPI GeoStar Group

Solution Use overlapping layers of patches RPI GeoStar Group

Solution Use overlapping layers of patches RPI GeoStar Group

Solution Use overlapping layers of patches RPI GeoStar Group

Solution Use overlapping layers of patches RPI GeoStar Group

Solution Use overlapping layers of patches RPI GeoStar Group

Solution Use overlapping layers of patches RPI GeoStar Group

Solution Use overlapping layers of patches Then merge the results RPI GeoStar Group

Solution Use overlapping layers of patches Then merge the results RPI GeoStar Group

Problem: Averaging the patches A simple averaging of the patches incorporates the border error into the reconstructed terrain: Terrain reconstructed using averaged patches Errors: RPI GeoStar Group

Solution: Bilinear Interpolation Use bilinear interpolation to do a weighted average such that border values fall off to zero: Naively averaging results Bilinear interpolation results Error (avg: 0.1m, max: 2m): Elevation Range of the Original: 1105m..1610m Using DTED Level 2 (30m spacing) RPI GeoStar Group

Weighting Pattern for Bilinear Interpolation vs. Simple Averaging RPI GeoStar Group

MPI Implementation 1) Each processor (except central process) is pre-assigned one or more patches 2) Every MPI process does the following for each patch assigned to it: Load patch Run ODETLAP on the patch MPI_send the patch to the central process 3) When all of the patches have been received by the central process, merge them using bilinear interpolation. RPI GeoStar Group

Results 16,000*16,000 Central USA terrain data Use 128 2.6 GHz processors on RPI CCNI cluster Divide into 101,761 patches of 100x100 size Completed in 28 minutes and 32 seconds Non-patch ODETLAP would have taken 179 days RPI GeoStar Group

Results(cont.) Size: 16K*16K STD: 217 Range: 1013 Mean Error: 1.96 Max Error: 50 RMS Error: 2.76 The terrain was compressed by a factor of 100, with a mean error within 0.2% of the range. RPI GeoStar Group

Original and reconstructed Terrain Original Terrain (1000 * 1000) Reconstruction Result (1000 * 1000) RPI GeoStar Group

Patch Size vs. Time & Error Total size #points used Patch size Running time Mean absolute Error Max Error RMS Error 2000*2000 39894 50*50 0m 38s 0.6640 13 0.9617 100*100 0m 55s 0.6598 0.9530 200*200 5m 25s 0.9527 400*400 18m 49s These results come from an 8-processor machine RPI GeoStar Group

Serialized vs. Parallel Serialized: A single worker processor runs each patch sequentially (speedup of 9.5 in the test) Parallel: Several processors run on many patches in parallel (additional speedup of 5.6 in the test) Test data: 800 x 800 size with mean elevation of 107 RPI GeoStar Group

Running Time Comparison Method Running time Mean Error Max Error RMS Error Original ODETLAP 549s 0.6150 7 0.8835 Serial ODETLAP 34s 0.6156 0.8846 Parallel ODETLAP 9s Test data: 800 x 800 size with mean elevation of 107, run on 8 processors. Parallel ODETLAP is 50 times faster, while introducing only 0.1% additional error. RPI GeoStar Group

Current and Future Work Improvements to our implementation Reduce data size – regular grid can be more compact Each process should grab the next available patch Optimize for the Blue Gene/L system (see next slide) Reduce errors from the patch method Improve the method for merging patches RPI GeoStar Group

Blue Gene/L System Computational Center for Nanotechnology Innovations (CCNI) at RPI 32,768 CPU’s @ 700 Mhz 512-1024MB memory/CPU (non-shared) Opportunity to run very large data sets quickly New method Source, Sink, Workers, and Coordinator DEM size is not limited by process memory size Use processors as cache instead of the disk On the BG, disk is slow, network and memory is very fast We must reduce the overhead to take advantage of all CPU’s RPI GeoStar Group