Presentation is loading. Please wait.

Presentation is loading. Please wait.

Geospatial Analytics for Government Agencies and the General Public: The CyberGIS Toolkit as an Enabler U.S. Department of the Interior U.S. Geological.

Similar presentations


Presentation on theme: "Geospatial Analytics for Government Agencies and the General Public: The CyberGIS Toolkit as an Enabler U.S. Department of the Interior U.S. Geological."— Presentation transcript:

1 Geospatial Analytics for Government Agencies and the General Public: The CyberGIS Toolkit as an Enabler U.S. Department of the Interior U.S. Geological Survey Michael P. Finn High Performance Computing and Geospatial Analytics Workshop Argonne National Laboratory 29 – 30 Apr 2014

2 Collaborators Shaowen Wang, Anand Padmanabhan, Yan Liu – University of Illinois at Urbana-Champaign (UIUC), CyberInfrastructure and Geospatial Information Laboratory David M. Mattli, Jeff Wendel, E. Lynn Usery, Michael Stramel – USGS, Center of Excellence for Geospatial Information Science (CEGIS) Kristina H. Yamamoto – USGS, National Geospatial Technical Operations Center Babak Behzad – UIUC, Department of Computer Science Eric Shook – Kent State University, Department of Geography Qingfeng (Gene) Guan – China University of Geosciences

3 Where Do We Want to Go? Geospatial Analytics – Spatial Modeling – Geovisualization (GeoViz/ Visual Analytics) For Decision Makers (agencies/ citizens) – Protect natural resources – Empower cultures – Provide for our future

4 Geospatial Analytics Spatial Modeling/ Geovisualization

5 Data / Software Geospatial Methods, Technologies, and Applications GIScience and Cyberinfrastructure Geospatial Toolkits Geospatial Analytics (Spatial Modeling / GeoViz) So: -Where have we been? -Where are we now? -Where do we want to go?

6 Data Analog  Digital “Big” Data Spatial Data (geometric structure) Data: Open? – mostly – Findable, Accessible, Exploitable (standard format) Example: USGS Data holdings – 8 Layers of the National Map – Soon: Hyperspectral cubes and LiDAR point cloud s

7 Quality Level Horizontal Point Spacing (meters) Vertical Accuracy (centimeters) Description 10.359.25 High accuracy and resolution lidar example: lidar data collected in the Pacific Northwest 20.79.25 Medium-high accuracy and resolution lidar 31-2<18.5 Medium accuracy and resolution lidar – analogous to USGS specification v. 13 and most data collected to date 4546-139 Early or lower quality lidar and photogrammetric elevations produced from aerotriangulated NAIP imagery 5593-185 Lower accuracy and resolution, primarily from IfSAR The National Map- Elevation: Quality Levels http://nationalmap.gov/3DEP/neea.html

8 Big Spatial Data Geographic data of high resolution and covering large areas creates big spatial data Remotely-sensed images – One-meter resolution NAIP images for Dent County, Missouri (1,955 km²) require 800 GB of storage space (more than 4 Pb equivalent for U.S.) – Atlanta footprint of 0.33 m resolution color images is almost 1 Tb of data – Satellite images with finer than one meter resolution – LiDAR data of level 1 (8 pts per square meter), level 2 (2 points per square meter)

9 Big Spatial Data USGS 3DEP – Level 2 LiDAR for all of U.S. except Alaska which is acquiring level 5 IfSAR – Data volume for point cloud, intensity images, and bare Earth elevation model – 7 to 9 petabytes – Processing and file creation usually doubles to triples the storage requirements Other geospatial data – USGS National Hydrography Dataset based on 1:24,000 scale about 700 GB (equivalent resolution 12 m; accuracy 25 m RMSE) New project to extract hydrography from level 2 lidar – How big will the vector (< 1 m Resolution) dataset be that results?

10 Software Computer compiled/ scripting languages – Manipulate data Software – Commercial? Open? Modifiable code? Functional? Tools: SAS (SPSS)/ R/ MATLAB, etc., etc….. GIS Software: Esri ArcGIS/ QGIS – and image processing S/W: Imagine/ ENVI – Libraries: GDAL Example software: mapIMG (based on CGTP; open)

11 Geospatial Methods, Technologies, and Applications Analytical Cartography – Mathematical Cartography – Since roughly the 18 th Century Quantitative Geography – Since 1960s GIS (and image processing S/W) – Since about the 1970s – combining data & software  GIS Packages – Legacy of primarily commercial software Open Source Software – Since roughly 1980s OpenGIS? – early wide-spread but often spotty “open” GIS – Foundation for maturity, expansion, and further openness

12 Here we are/ where are we going? Open GIS: Technology and Applications (exploitable) Hardware and Operating Systems evolving Data Storage trying to keep pace with Big Data Advanced GeoViz on cusp of exploding HPC  High-Performance Spatial Computing Increasing Spatiotemporal fidelity Cyberinfrastructure

13 CyberGIS Cyberinfrastructure (eScience) HPC & GIScience A balance/ interaction between theory/ data (Rey, 2013) Collaborative Research Standards (for interoperability)

14 NSF CyberGIS Project NSF Software Infrastructure for Sustained Innovation Award – http://cybergis.org http://cybergis.org USGS/ CEGIS Participation Cyberinfrastructure resources – XSEDE – Blue Waters supercomputer allocation – Open Science Grid Integration – CyberGIS Toolkit – CyberGIS Gateway – GISolve middleware services 14

15 CyberGIS Software Environment From Liu et al. (2014)

16 CyberGIS Toolkit Software Components PABM – Parallel Agent-Based Modeling pRasterBlaster – Parallel Map Reprojection Parallel PySAL (Python Spatial Analysis Library) Spatial Text An open and reliable software toolbox for high-end users Hide compute complexity A rigorous software building, testing, packaging, and deployment framework Focused on computational intensity, performance, scalability, and portability in various CI environments Easy to configure and use

17 Scalable Raster Processing Need for scalable map reprojection in CyberGIS analytics – Spatial analysis and modeling Distance calculation on raster cells requires appropriate projection – Visualization Reprojection for faster visualization on Web Mercator base maps pRasterBlaster integration in CyberGIS Toolkit and Gateway – Software componentization: librasterblaster, pRasterBlaster, MapIMG – Build, test, and documentation – Gateway user interface 17

18 Performance Profiling Performance profiling is an important tool for developing scalable and efficient high performance applications Performance profiling identified computational bottlenecks in pRasterBlaster Demonstration of one example of the value of profilers for pRasterBlaster in the next slides

19 A Computational Bottleneck: Symptom 19

20 A Computational Bottleneck: Symptom 20

21 A Computational Bottleneck: Cause

22 A Computational Bottleneck: Analysis Spatial data-dependent performance anomaly – The anomaly is data dependent – Four corners of the raster dataset were processed by processors whose indexes are close to the two ends Exception handling in C++ is costly – Coordinate transformation on nodata area was handled as an exception Solution – Remove C++ exception handling part 22

23 A Computational Bottleneck: Performance Improvement

24 A Computational Bottleneck: Summary Symptom – Processors responsible for polar regions spent more time than those processing equatorial region Cause – Corner cells were mapped to invalid input raster cells generating exceptions – C++ exception handling was expensive Solution – Removed C++ exception handling – Corner cells need not to be processed They now contribute less time of computation 24

25 pRasterBlaster Component View 25 librasterblasterpRasterBlasterMapIMG Cyberinfrastructure Service ProvidersGIS ProgrammersEnd Users via API CyberToolkit

26 Performance Test: -On an XSEDE supercomputer (Trestles at the San Diego Supercomputing Center) -Using a parallel file system (Luster) and MPI I/O (vs. traditional Network File System (NFS)) -40GB data -Processor cores were increased from 256 to 1024

27 Obstacles, Issues, Challenges Parallel I/O (particularly raster) is the proverbial long pole in tent Raster decomposes nicely (embarrassingly parallel) File I/O (especially output file re-composition) is a huge bottleneck Lessons learned; one of our prime contributions to the community (to date) : optimized parallel I/O for raster – GeoTIFF (SPTW – Simple Parallel TIFF Writer) led by David Mattli, USGS – HDF5 parallel work by Babak Bahzad, UIUC

28 Computational Challenges Converting legacy (linear) code to HPC (parallel) environment requires a lot of skilled manpower Scaling to large-scale analysis using HPC resources is difficult Cyberinfrastructure-based computational analysis needs in-depth knowledge and expertise on computational performance profiling and analysis 28

29 Geospatial Analytics Spatial Modeling/ Geovisualization Solving “Changing World” Problems Smart Decisions Protecting Natural Resources Democratizing Science Empowering cultures Products and Services for society and its citizens Data & Software  Solving (Geospatial) Problems

30 Geospatial Analytics Spatial Modeling/ Geovisualization

31 References Behzad, Babak, Yan Liu, Eric Shook, Michael P. Finn, David M. Mattli, and Shaowen Wang (2012). A Performance Profiling Strategy for High-Performance Map Re-Projection of Coarse-Scale Spatial Raster Data. Abstract presented at the Auto-Carto 2012, A Cartography and Geographic Information Society Research Symposium, Columbus, OH. Finn, Michael P., Yan Liu, David M. Mattli, Babak Behzad, Kristina H. Yamamoto, Qingfeng (Gene) Guan, Eric Shook, Anand Padmanabhan, Michael Stramel, and Shaowen Wang (2014). High-Performance Small-Scale Raster Map Projection Transformation on Cyberinfrastructure. Paper accepted for publication as a chapter in CyberGIS: Fostering a New Wave of Geospatial Discovery and Innovation, Shaowen Wang and Michael F. Goodchild, editors. Springer-Verlag. Finn, Michael P., Yan Liu, David M. Mattli, Qingfeng (Gene) Guan, Kristina H. Yamamoto, Eric Shook and Babak Behzad (2012). pRasterBlaster: High-Performance Small-Scale Raster Map Projection Transformation Using the Extreme Science and Engineering Discovery Environment. Abstract presented at the XXII International Society for Photogrammetry & Remote Sensing Congress, Melbourne, Australia. Liu, Yan, Michael P. Finn, Babak Behzad, and Eric Shook (2013). High-Resolution National Elevation Dataset: Opportunities and Challenges for High-Performance Spatial Analytics. Abstract presented in the Special Session on “Big Data,” American Society for Photogrammetry and Remote Sensing Annual Conference. Batltimore, Maryland. Liu, Yan, Anand Padmanabhan, and Shaowen Wang, (2014) CyberGIS Gateway for enabling data-rich geospatial research and education, Concurrency Computat.: Pract. Exper., DOI: 10.1002/cpe.3256. Rey, S.J. (2014) “Open regional science." Presidential Address, Western Regional Science Association, San Diego. February. http://cegis.usgs.gov/ http://nationalmap.gov/3DEP/ http://cybergis.cigi.uiuc.edu/cyberGISwiki/doku.php http://cgwiki.cigi.uiuc.edu:8080/mediawiki/index.php/Main_Page http://cgwiki.cigi.uiuc.edu:8080/mediawiki/index.php/Software:pRasterBlaster

32 Geospatial Analytics for Government Agencies and the General Public: The CyberGIS Toolkit as an Enabler U.S. Department of the Interior U.S. Geological Survey Questions? http://cegis.usgs.gov/index.html High Performance Computing and Geospatial Analytics Workshop Argonne National Laboratory 29 – 30 Apr 2014


Download ppt "Geospatial Analytics for Government Agencies and the General Public: The CyberGIS Toolkit as an Enabler U.S. Department of the Interior U.S. Geological."

Similar presentations


Ads by Google