Presentation on theme: "High-Resolution National Elevation Dataset: CyberGIS Challenges and Opportunities for Scalable Spatial Data Access and Analytics Yan Liu1,3,5, Babak Behzad1,2,"— Presentation transcript:
1High-Resolution National Elevation Dataset: CyberGIS Challenges and Opportunities for Scalable Spatial Data Access and AnalyticsYan Liu1,3,5, Babak Behzad1,2, Anand Padmanabhan1,3,5, Eric Shook1,3, Shaowen Wang1,2,3,4,5, and Yanli Zhao1,31 CyberInfrastructure and Geospatial Information Laboratory (CIGI)2 Department of Computer Science3 Department of Geography and Geographic Information Science4 Department of Urban and Regional Planning5 National Center for Supercomputing Applications (NCSA)University of Illinois at Urbana-ChampaignMichael P. Finn and E. Lynn UseryU.S. Geological SurveyU.S. Department of the Interior
2Outline Introduction NED data access Interfaces and performance issues Computational challengesData-intensive spatial analysisExperience and solutionsCyberGISScalable spatial data access and analyticsConcluding discussionsOverall flow of the presentation:Introduce NED and its broad usage (2 examples: cybergis analytical environment; “great flood” movie)NED data access is different from simple file sharing/downloading; therefore indicates the development of highly usable download client tools and different programming pattern in integrating the downloading step in application logicNow that data is downloaded, using big data in spatial analysis can be computationally prohibitive: memory, I/O, CPU time. High-performance spatial analysis can reduce CPU time, then the bottleneck can be I/O: 1) there might be two many intermediate input/output steps during an analysis; 2) a single I/O step on big data can slow down the whole analysis.Solutions: 1) reduce intermediate I/O steps through the integration of geospatial data processing libraries and analysis methods; 2) use parallel computing to reduce single I/O time
3National Elevation Dataset (NED) Digital elevation models (DEM)Product of the USGS National MapResolutions: 3-meter, 10-meter, 30-meterFormats: ArcGrid, GridFloat, IMGOrganized as 1 degree x 1 degree tilesSizes (U.S. continent)10-meter: 936 tiles; 440GB raw files; 1TB with pyramid tiles
4NED Access Challenges Data integration and processing User interface Data are stored on multiple file/database serversData processing is needed to extract subsets of data from the data collectionDownloading becomes complex, involving processing operations such as location, extraction, aggregation, archiving, and transfer among data serversComputationally intensiveUser interfaceUsability is crucial to make big data usableProgrammable interface for automatic downloading
5CyberGIS Analytics Based on NED CyberGIS: high-performance and collaborative GIS based on cyberinfrastructureViewshed analysisWeb Mapping Service for online visualizationNED WMS layer built using GeoServerPre-generated pyramid tiles for 20-level zoomingCyberGIS Gateway
6The Great Flood Project A 75-minute multimedia work of original music and film inspired by the 1927 Mississippi River floodsContributors includeBill Frisell, Grammy Award-winning guitarist and composerBill Morrison, Obie-winning experimental filmmakerIllinois Emerging Digital Research and Education in Arts Media Institute (eDream)Advanced Visualization Laboratory (AVL) at the National Center for Supercomputing Applications (NCSA)CyberInfrastructure and Geospatial Information Laboratory (CIGI), University of Illinois at Urbana-ChampaignUsed NEDApproximately 70GB 10-meter NED tiles covering the Mississippi river valley were used for creating the 3D landscape animation
7Open YouTube URL http://www.youtube.com/watch?v=Lgy7mDJ_fVI Relevant parts:0:00 – 0:24, historical maps;0:25 – 1:16, 3D digital map animation based on 1/3 arc sec NED
9NED Download: User Interface Download tool web interfaceNew interfaceNational Map Viewer:This slide and next one show the inconvenience of NED downloading tools. Reason: these tools were developed primarily based on how data were produced and hosted; less on how data should be used by users.
10NED Downloading Process 1. Queue a request2. Launch data extractorClick each URL3. Extract data4. Archive data filesFile list5. Notify data readiness6. User downloadPlease repeat 936 times to get all1 degree x 1 degree tiles for U.S. continent!
11NED Downloading Web Service Interface Start downloadCheck statusDownloadThis slide illustrates the trend of programming big data downloading: there will be no request-response call in just one round (takes too long, blocking main program); indicating the use of asynchronous programming model to overlap downloading with processing and need to synchronize them.Cleanup
12NED Downloader Goal Software Status Provide an easy-to-use NED downloading utility by supporting batch downloads and managing downloading status transition automaticallySoftwareLinux-basedBash + PHPOpen source (MIT license)Hosted on CyberGIS SVNStatusUsed by the National Science Foundation CyberGIS project team for NED data integration and the Great Flood projectFacts:We used this downloader to keep a copy of 1/3 arcsec NED dataset; converted it to geotiff format; created pyramid tiles for 20-level zooming; and published it as WMS;We used this downloader to download 70GB 1/3 arcsec NED dataset files for the Mississippi river valley area. They were used for making the “Great Flood” movie by the Advanced Visualization NCSA.
13Computational Challenges in Related CyberGIS Analytics
14Why CyberGIS?Most of commonly used GIS software is based on sequential computingNot scalable for big data analyticsMany runtime Input/output (I/O) steps in an analysis workflowTransfer of big data to / from cyberinfrastructure resources
15Viewshed Analysis Input DEM High-performance viewshed computation HTTP downloadingData processing using GDAL commandsHigh-performance viewshed computationExploiting Graphic Processing Units (GPU)Output transferGridFTP – a parallel file transfer protocolComputational bottlenecksThe test viewshed analysis (see figure) handled 3.9GB raster data in total1.8GB input NED; 436MB output; 1.67GB runtime outputExecution time: 4 minutes 55 secondsInput data transfer – 21 seconds; input data processing seconds;Computing - 65 seconds;output data processing - 88 seconds; output transfer – 7 secondsInput/output data processing took 68.4% of analysis time
16Resolving Computational Bottlenecks Reduce the number of runtime I/O stepsEmploy high-performance I/O techniquesCPUGPU…Input ProcessingAnalysisOutput ProcessingInput Data StorageInput FilesTransferOutput Data StorageTransferInputOutputOutput FilesTransferInputOutputTransferTransferInputOutputTransfer
18CyberGIS ApproachTightly couple geospatial data processing libraries to eliminate unnecessary I/O operationsExploit parallel I/O for geospatial data processingIntegrate high-performance data transfer capability in CyberGIS analytics
20Highlights Analytical libraries Data libraries Spatial middleware pRasterBlaster (a high-performance map reprojection library under joint development by CEGIS and CIGI)Data librariesParallel Geospatial I/O library (pGIO) with NetCDF/HDF5 support is to be released soonGDAL+MPI IO for parallel I/O of GeoTIFF format is under developmentSpatial middlewareGridFTP transfer between CyberGIS data source sites and XSEDE sitesCEGIS <-> supercomputer centers (NCSA, SDSC, TACC)CyberGIS computational resourcesCEGIS high-performance computersCIGI cloud infrastructureKey national cyberinfrastructure environmentsNSF XSEDE (http://xsede.org)Open Science Grid (http://opensciencegrid.org)
22High-Performance Data Transfer CEGISWhite lines: high-speed network connections among supercomputer centersBlue lines: parallel data transfer connections between CEGIS and accessible supercomputer centersBackground image source:https://www.xsede.org/documents/10157/169907/xsedenet.pdf
23Data Transfer Service between USGS and XSEDE TechnologyGridFTP, a secure and high-performance data transfer protocolData transfer service setupUSGS GridFTP server: usgs-ybother.srv.mst.eduGlobus Toolkit 5Data transfer capabilityParallel data channels for large dataset transferData transfer is initiated in the CyberGIS Gateway as a third- party transferTransfer rate: up to 100MB/secondXSEDE
24Concluding Discussions Usability of NED can be significantly improved if the data access interface can be made more friendlyBig data require cyberinfrastructure and significant computational power for scalable data access and analyticsCyberGIS has emerged as a new-generation GIS for resolving these challenges and represent significant opportunities for the National Map communities
25ReferencesCanters, F. (2002). Small-Scale Map Projection Design. London: Taylor & Francis.Finn, Michael P., and David M. Mattli (2012). User’s Guide for the mapIMG 3: Map Image Reprojection Software Package. U. S. Geological Survey Open-File Report , 12 p..Finn, Michael P., Daniel R. Steinwand, Jason R. Trent, Robert A. Buehler, David Mattli, and Kristina H. Yamamoto (2012). A Program for Handling Map Projections of Small Scale Geospatial Raster Data. Cartographic Perspectives, Number 71, pages 53 – 67.Wang, S., Anselin, L., Bhaduri, B., Crosby, C., Goodchild, M. F., Liu, Y., and Nyerges, T. L (2013). CyberGIS Software: A Synthetic Review and Integration Roadmap. International Journal of Geographical Information Science, DOI: /Wang, S., and Liu, Y. (2009) TeraGrid GIScience Gateway: Bridging Cyberinfrastructure and GIScience. International Journal of Geographical Information Science, 23 (5): 631–656.Zhao, Y., Padmanabhan, A., and Wang, S. (2013) A Parallel Computing Approach to Viewshed Analysis of Large Terrain Data Using Graphics Processing Units. International Journal of Geographical Information Science, 27 (2):
26DISCLAIMER & ACKNOWLEDGEMENT DISCLAIMER: Any use of trade, product, or firm names in this paper is for descriptive purposes only and does not imply endorsement by the U.S. GovernmentACKNOWLEDGEMENT: This work is supported in part by the National Science Foundation (NSF) under Grant Numbers: BCS and OCI Computational experiments used the NSF Extreme Science and Engineering Discovery Environment (XSEDE) (Award Number SES090019), which is supported by NSF under Grant Number OCI
27Contact: firstname.lastname@example.org or email@example.com High-Resolution National Elevation Dataset: CyberGIS Challenges and Opportunities for Scalable Spatial Data Access and AnalyticsComments / Questions?Contact: orUniversity of Illinois at Urbana-ChampaignCyberInfrastructure and Geospatial Information LaboratoryDepartment of Computer ScienceDepartment of Geography and Geographic Information ScienceDepartment of Urban and Regional PlanningNational Center for Supercomputing ApplicationsU.S. Department of the InteriorU.S. Geological Survey