Presentation is loading. Please wait.

Presentation is loading. Please wait.

High-Resolution National Elevation Dataset: CyberGIS Challenges and Opportunities for Scalable Spatial Data Access and Analytics Yan Liu1,3,5, Babak Behzad1,2,

Similar presentations


Presentation on theme: "High-Resolution National Elevation Dataset: CyberGIS Challenges and Opportunities for Scalable Spatial Data Access and Analytics Yan Liu1,3,5, Babak Behzad1,2,"— Presentation transcript:

1 High-Resolution National Elevation Dataset: CyberGIS Challenges and Opportunities for Scalable Spatial Data Access and Analytics Yan Liu1,3,5, Babak Behzad1,2, Anand Padmanabhan1,3,5, Eric Shook1,3, Shaowen Wang1,2,3,4,5, and Yanli Zhao1,3 1 CyberInfrastructure and Geospatial Information Laboratory (CIGI) 2 Department of Computer Science 3 Department of Geography and Geographic Information Science 4 Department of Urban and Regional Planning 5 National Center for Supercomputing Applications (NCSA) University of Illinois at Urbana-Champaign Michael P. Finn and E. Lynn Usery U.S. Geological Survey U.S. Department of the Interior

2 Outline Introduction NED data access Interfaces and performance issues
Computational challenges Data-intensive spatial analysis Experience and solutions CyberGIS Scalable spatial data access and analytics Concluding discussions Overall flow of the presentation: Introduce NED and its broad usage (2 examples: cybergis analytical environment; “great flood” movie) NED data access is different from simple file sharing/downloading; therefore indicates the development of highly usable download client tools and different programming pattern in integrating the downloading step in application logic Now that data is downloaded, using big data in spatial analysis can be computationally prohibitive: memory, I/O, CPU time. High-performance spatial analysis can reduce CPU time, then the bottleneck can be I/O: 1) there might be two many intermediate input/output steps during an analysis; 2) a single I/O step on big data can slow down the whole analysis. Solutions: 1) reduce intermediate I/O steps through the integration of geospatial data processing libraries and analysis methods; 2) use parallel computing to reduce single I/O time

3 National Elevation Dataset (NED)
Digital elevation models (DEM) Product of the USGS National Map Resolutions: 3-meter, 10-meter, 30-meter Formats: ArcGrid, GridFloat, IMG Organized as 1 degree x 1 degree tiles Sizes (U.S. continent) 10-meter: 936 tiles; 440GB raw files; 1TB with pyramid tiles

4 NED Access Challenges Data integration and processing User interface
Data are stored on multiple file/database servers Data processing is needed to extract subsets of data from the data collection Downloading becomes complex, involving processing operations such as location, extraction, aggregation, archiving, and transfer among data servers Computationally intensive User interface Usability is crucial to make big data usable Programmable interface for automatic downloading

5 CyberGIS Analytics Based on NED
CyberGIS: high-performance and collaborative GIS based on cyberinfrastructure Viewshed analysis Web Mapping Service for online visualization NED WMS layer built using GeoServer Pre-generated pyramid tiles for 20-level zooming CyberGIS Gateway

6 The Great Flood Project
A 75-minute multimedia work of original music and film inspired by the 1927 Mississippi River floods Contributors include Bill Frisell, Grammy Award-winning guitarist and composer Bill Morrison, Obie-winning experimental filmmaker Illinois Emerging Digital Research and Education in Arts Media Institute (eDream) Advanced Visualization Laboratory (AVL) at the National Center for Supercomputing Applications (NCSA) CyberInfrastructure and Geospatial Information Laboratory (CIGI), University of Illinois at Urbana-Champaign Used NED Approximately 70GB 10-meter NED tiles covering the Mississippi river valley were used for creating the 3D landscape animation

7 Open YouTube URL http://www.youtube.com/watch?v=Lgy7mDJ_fVI
Relevant parts: 0:00 – 0:24, historical maps; 0:25 – 1:16, 3D digital map animation based on 1/3 arc sec NED

8 NED Data Access

9 NED Download: User Interface
Download tool web interface New interface National Map Viewer: This slide and next one show the inconvenience of NED downloading tools. Reason: these tools were developed primarily based on how data were produced and hosted; less on how data should be used by users.

10 NED Downloading Process
1. Queue a request 2. Launch data extractor Click each URL 3. Extract data 4. Archive data files File list 5. Notify data readiness 6. User download Please repeat 936 times to get all 1 degree x 1 degree tiles for U.S. continent!

11 NED Downloading Web Service Interface
Start download Check status Download This slide illustrates the trend of programming big data downloading: there will be no request-response call in just one round (takes too long, blocking main program); indicating the use of asynchronous programming model to overlap downloading with processing and need to synchronize them. Cleanup

12 NED Downloader Goal Software Status
Provide an easy-to-use NED downloading utility by supporting batch downloads and managing downloading status transition automatically Software Linux-based Bash + PHP Open source (MIT license) Hosted on CyberGIS SVN Status Used by the National Science Foundation CyberGIS project team for NED data integration and the Great Flood project Facts: We used this downloader to keep a copy of 1/3 arcsec NED dataset; converted it to geotiff format; created pyramid tiles for 20-level zooming; and published it as WMS; We used this downloader to download 70GB 1/3 arcsec NED dataset files for the Mississippi river valley area. They were used for making the “Great Flood” movie by the Advanced Visualization NCSA.

13 Computational Challenges in Related CyberGIS Analytics

14 Why CyberGIS? Most of commonly used GIS software is based on sequential computing Not scalable for big data analytics Many runtime Input/output (I/O) steps in an analysis workflow Transfer of big data to / from cyberinfrastructure resources

15 Viewshed Analysis Input DEM High-performance viewshed computation
HTTP downloading Data processing using GDAL commands High-performance viewshed computation Exploiting Graphic Processing Units (GPU) Output transfer GridFTP – a parallel file transfer protocol Computational bottlenecks The test viewshed analysis (see figure) handled 3.9GB raster data in total 1.8GB input NED; 436MB output; 1.67GB runtime output Execution time: 4 minutes 55 seconds Input data transfer – 21 seconds; input data processing seconds; Computing - 65 seconds; output data processing - 88 seconds; output transfer – 7 seconds Input/output data processing took 68.4% of analysis time

16 Resolving Computational Bottlenecks
Reduce the number of runtime I/O steps Employ high-performance I/O techniques CPU GPU Input Processing Analysis Output Processing Input Data Storage Input Files Transfer Output Data Storage Transfer Input Output Output Files Transfer Input Output Transfer Transfer Input Output Transfer

17 Experience and Solutions

18 CyberGIS Approach Tightly couple geospatial data processing libraries to eliminate unnecessary I/O operations Exploit parallel I/O for geospatial data processing Integrate high-performance data transfer capability in CyberGIS analytics

19 Integrated CyberGIS Architecture
CyberGIS Software Environment Applications Scalable Analytical Libraries Scalable Data Libraries Spatial Middleware Dependent Libraries Geospatial Parallel Computing GRASS NetCDF OpenMP CUDA GDAL HDF5 MPI CyberGIS computational resources Parallel File Systems Processors Memory Network

20 Highlights Analytical libraries Data libraries Spatial middleware
pRasterBlaster (a high-performance map reprojection library under joint development by CEGIS and CIGI) Data libraries Parallel Geospatial I/O library (pGIO) with NetCDF/HDF5 support is to be released soon GDAL+MPI IO for parallel I/O of GeoTIFF format is under development Spatial middleware GridFTP transfer between CyberGIS data source sites and XSEDE sites CEGIS <-> supercomputer centers (NCSA, SDSC, TACC) CyberGIS computational resources CEGIS high-performance computers CIGI cloud infrastructure Key national cyberinfrastructure environments NSF XSEDE (http://xsede.org) Open Science Grid (http://opensciencegrid.org)

21 Parallel I/O Strategies
Row-wise I/O Column-wise I/O Block-wise I/O P0 P1 P2 Pn P0 P1 P0 . . . P1 P2 P2 Pn Storage Device Storage Device Storage Device Pn

22 High-Performance Data Transfer
CEGIS White lines: high-speed network connections among supercomputer centers Blue lines: parallel data transfer connections between CEGIS and accessible supercomputer centers Background image source: https://www.xsede.org/documents/10157/169907/xsedenet.pdf

23 Data Transfer Service between USGS and XSEDE
Technology GridFTP, a secure and high-performance data transfer protocol Data transfer service setup USGS GridFTP server: usgs-ybother.srv.mst.edu Globus Toolkit 5 Data transfer capability Parallel data channels for large dataset transfer Data transfer is initiated in the CyberGIS Gateway as a third- party transfer Transfer rate: up to 100MB/second XSEDE

24 Concluding Discussions
Usability of NED can be significantly improved if the data access interface can be made more friendly Big data require cyberinfrastructure and significant computational power for scalable data access and analytics CyberGIS has emerged as a new-generation GIS for resolving these challenges and represent significant opportunities for the National Map communities

25 References Canters, F. (2002). Small-Scale Map Projection Design. London: Taylor & Francis. Finn, Michael P., and David M. Mattli (2012). User’s Guide for the mapIMG 3: Map Image Reprojection Software Package. U. S. Geological Survey Open-File Report , 12 p.. Finn, Michael P., Daniel R. Steinwand, Jason R. Trent, Robert A. Buehler, David Mattli, and Kristina H. Yamamoto (2012). A Program for Handling Map Projections of Small Scale Geospatial Raster Data. Cartographic Perspectives, Number 71, pages 53 – 67. Wang, S., Anselin, L., Bhaduri, B., Crosby, C., Goodchild, M. F., Liu, Y., and Nyerges, T. L (2013). CyberGIS Software: A Synthetic Review and Integration Roadmap. International Journal of Geographical Information Science, DOI: / Wang, S., and Liu, Y. (2009) TeraGrid GIScience Gateway: Bridging Cyberinfrastructure and GIScience. International Journal of Geographical Information Science, 23 (5): 631–656. Zhao, Y., Padmanabhan, A., and Wang, S. (2013) A Parallel Computing Approach to Viewshed Analysis of Large Terrain Data Using Graphics Processing Units. International Journal of Geographical Information Science, 27 (2):

26 DISCLAIMER & ACKNOWLEDGEMENT
DISCLAIMER: Any use of trade, product, or firm names in this paper is for descriptive purposes only and does not imply endorsement by the U.S. Government ACKNOWLEDGEMENT: This work is supported in part by the National Science Foundation (NSF) under Grant Numbers: BCS and OCI Computational experiments used the NSF Extreme Science and Engineering Discovery Environment (XSEDE) (Award Number SES090019), which is supported by NSF under Grant Number OCI

27 Contact: usery@usgs.gov or shaowen@illinois.edu
High-Resolution National Elevation Dataset: CyberGIS Challenges and Opportunities for Scalable Spatial Data Access and Analytics Comments / Questions? Contact: or University of Illinois at Urbana-Champaign CyberInfrastructure and Geospatial Information Laboratory Department of Computer Science Department of Geography and Geographic Information Science Department of Urban and Regional Planning National Center for Supercomputing Applications U.S. Department of the Interior U.S. Geological Survey


Download ppt "High-Resolution National Elevation Dataset: CyberGIS Challenges and Opportunities for Scalable Spatial Data Access and Analytics Yan Liu1,3,5, Babak Behzad1,2,"

Similar presentations


Ads by Google