CyberGIS Toolkit: A Software Toolbox Built for Scalable cyberGIS Spatial Analysis and Modeling Yan Liu 1,2, Michael Finn 4, Hao Hu 1, Jay Laura 3, David.

Slides:



Advertisements
Similar presentations
Delivering User Needs: A middleware perspective Steven Newhouse Director.
Advertisements

DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
State of CyberGIS State of CyberGIS Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic.
Priority Research Direction (I/O Models, Abstractions and Software) Key challenges What will you do to address the challenges? – Develop newer I/O models.
SAN DIEGO SUPERCOMPUTER CENTER The Integration of 2 Science Gateways: CyberGIS + OpenTopography Choonhan Youn, Nancy Wilkins-Diehr, SDSC Christopher Crosby,
SAN DIEGO SUPERCOMPUTER CENTER Choonhan Youn Viswanath Nandigam, Nancy Wilkins-Diehr, Chaitan Baru San Diego Supercomputer Center, University of California,
A CyberGIS Environment for Near-Real-Time Spatial Analysis of Social Media Data Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory.
Accelerating TauDEM as a Scalable Hydrological Terrain Analysis Service on XSEDE 1 Ye Fan 1, Yan Liu 1, Shaowen Wang 1, David Tarboton 2, Ahmet Yildirim.
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CF21) IRNC Kick-Off Workshop July 13,
BELMONT FORUM E-INFRASTRUCTURES AND DATA MANAGEMENT PROJECT Updates and Next Steps to Deliver the final Community Strategy and Implementation Plan Maria.
April 2009 OSG Grid School - RDU 1 Open Science Grid John McGee – Renaissance Computing Institute University of North Carolina, Chapel.
Geospatial Analytics for Government Agencies and the General Public: The CyberGIS Toolkit as an Enabler U.S. Department of the Interior U.S. Geological.
NSF Vision and Strategy for Advanced Computational Infrastructure Vision: NSF Leadership in creating and deploying a comprehensive portfolio…to facilitate.
Small-Scale Raster Map Projection using the Compute Unified Device Architecture (CUDA) U.S. Department of the Interior U.S. Geological Survey Michael P.
1 Building National Cyberinfrastructure Alan Blatecky Office of Cyberinfrastructure EPSCoR Meeting May 21,
Help communities share knowledge more effectively across the language barrier Automated Community Content Editing PorTal.
PGIST CyberGIS Integration Strategy Timothy Nyerges, Mary Roderick University of Washington November 10, 2011.
PRasterBlaster: High-Performance Small-Scale Raster Map Projection Transformation Using the Extreme Science and Engineering Discovery Environment U.S.
Requirements Analysis Update 4 Broad Areas of Participatory Requirements Business The character of geospatial problem solving. System Capabilities to address.
4.x Performance Technology drivers – Exascale systems will consist of complex configurations with a huge number of potentially heterogeneous components.
1 Babak Behzad, Yan Liu 1,2,4, Eric Shook 1,2, Michael P. Finn 5, David M. Mattli 5 and Shaowen Wang 1,2,3,4 Babak Behzad 1,3, Yan Liu 1,2,4, Eric Shook.
A High-Throughput Computational Approach to Environmental Health Study Based on CyberGIS Xun Shi 1, Anand Padmanabhan 2, and Shaowen Wang 2 1 Department.
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
Effective User Services for High Performance Computing A White Paper by the TeraGrid Science Advisory Board May 2009.
National Center for Supercomputing Applications The Computational Chemistry Grid: Production Cyberinfrastructure for Computational Chemistry PI: John Connolly.
CyberGIS in Action CyberGIS in Action Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
1 GISolve – TeraGrid GIScience Gateway Shaowen Wang Department of Geography and Grid Research & educatiOn ioWa (GROW) The University of Iowa May.
2005 Materials Computation Center External Board Meeting The Materials Computation Center Duane D. Johnson and Richard M. Martin (PIs) Funded by NSF DMR.
Technology + Process SDCI HPC Improvement: High-Productivity Performance Engineering (Tools, Methods, Training) for NSF HPC Applications Rick Kufrin *,
Small-Scale Raster Map Projection Transformation Using a Virtual System to Interactively Share Computing Resources and Data U.S. Department of the Interior.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
Realizing CyberGIS Vision through Software Integration Anand Padmanabhan, Yan Liu, Shaowen Wang CyberGIS Center for Advanced Digital and Spatial Studies.
Pascucci-1 Valerio Pascucci Director, CEDMAV Professor, SCI Institute & School of Computing Laboratory Fellow, PNNL Massive Data Management, Analysis,
Hassan A. Karimi Geoinformatics Laboratory School of Information Sciences University of Pittsburgh 3/27/20121.
Presented by An Overview of the Common Component Architecture (CCA) The CCA Forum and the Center for Technology for Advanced Scientific Component Software.
GBIF Mid Term Meetings 2011 Biodiversity Data Portals for GBIF Participants: The NPT Global Biodiversity Information Facility (GBIF) 3 rd May 2011.
National Center for Supercomputing Applications Barbara S. Minsker, Ph.D. Associate Professor National Center for Supercomputing Applications and Department.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
BalticGrid-II Project BalticGrid-II Kick-off Meeting, , Vilnius1 Joint Research Activity Enhanced Application Services on Sustainable e-Infrastructure.
GRIDS Center Middleware Overview Sandra Redman Information Technology and Systems Center and Information Technology Research Center National Space Science.
S2I2: Enabling grand challenge data intensive problems using future computing platforms Project Manager: Shel Swenson (USC & GATech)
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
Breakout # 1 – Data Collecting and Making It Available Data definition “ Any information that [environmental] researchers need to accomplish their tasks”
Scientific Workflow systems: Summary and Opportunities for SEEK and e-Science.
©2012 LIESMARS Wuhan University Building Integrated Cyberinfrastructure for GIScience through Geospatial Service Web Jianya Gong, Tong Zhang, Huayi Wu.
 Programming - the process of creating computer programs.
OOI-CYBERINFRASTRUCTURE OOI Cyberinfrastructure Education and Public Awareness Plan Cyberinfrastructure Design Workshop October 17-19, 2007 University.
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
1 Kalev Leetaru, Eric Shook, and Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic Information.
U.S. Grid Projects and Involvement in EGEE Ian Foster Argonne National Laboratory University of Chicago EGEE-LHC Town Meeting,
NSF Middleware Initiative Purpose To design, develop, deploy and support a set of reusable, expandable set of middleware functions and services that benefit.
Forging the eXtremeDigital (XD) Program Barry I. Schneider Program Director, Office of CyberInfrastructure January 20, 2011.
High Risk 1. Ensure productive use of GRID computing through participation of biologists to shape the development of the GRID. 2. Develop user-friendly.
Thinking in Parallel - Introduction New Mexico Supercomputing Challenge in partnership with Intel Corp. and NM EPSCoR.
Building PetaScale Applications and Tools on the TeraGrid Workshop December 11-12, 2007 Scott Lathrop and Sergiu Sanielevici.
CyberGIS Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
Black and White Introduction to Cyberinfrastructure Eric Shook Department of Geography Kent State University.
TeraGrid’s Process for Meeting User Needs. Jay Boisseau, Texas Advanced Computing Center Dennis Gannon, Indiana University Ralph Roskies, University of.
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
Shaowen Wang 1, 2, Yan Liu 1, 2, Nancy Wilkins-Diehr 3, Stuart Martin 4,5 1. CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department.
Performance Technology for Scalable Parallel Systems
Shaowen Wang1, 2, Yan Liu1, 2, Nancy Wilkins-Diehr3, Stuart Martin4,5
XSEDE’s Campus Bridging Project
PRasterBlaster: High-Performance Small-Scale Raster Map Projection Transformation Using the Extreme Science and Engineering Discovery Environment  Michael.
Building and running HPC apps in Windows Azure
Presentation transcript:

CyberGIS Toolkit: A Software Toolbox Built for Scalable cyberGIS Spatial Analysis and Modeling Yan Liu 1,2, Michael Finn 4, Hao Hu 1, Jay Laura 3, David Mattli 4, Anand Padmanabhan 1,2, Serge Rey 3, Eric Shook 5, Kornelijus Survila 1, and Shaowen Wang 1,2 1 CyberInfrastructure and Geospatial Information Laboratory (CIGI) 2 National Center for Supercomputing Applications (NCSA) University of Illinois at Urbana-Champaign 3 GeoDa Center for Spatial Analysis and Computation Arizona State University 4 Center of Excellence for Geospatial Information Science U.S. Geological Survey 5 Department of Geography Kent State University CyberGIS All-Hands Meeting 2013Seattle, WA., September 15, 2013

Outline Purposes Software integration approach o Software component selection o Software engineering o Scalability analysis Progress update o CyberGIS Toolkit 0.5-alpha release o Continuous integration framework o Deployment on advanced cyberinfrastructure Case study: Parallel PySAL (pPySAL) o Parallel PySAL project o Illustration of parallelization strategies Future work and concluding discussion 2

Six Major Goals of the NSF CyberGIS Project 3 1.Engage multidisciplinary communities through a participatory approach to evolving CyberGIS software requirements; 2.Integrate and sustain a core set of composable, interoperable, manageable, and reusable CyberGIS software elements based on community-driven and open source strategies; 3.Empower high-performance and scalable CyberGIS by exploiting spatial characteristics of data and analytical operations for achieving unprecedented capabilities for geospatial scientific discoveries; 4.Enhance an online geospatial problem solving environment to allow for the contribution, sharing and learning of CyberGIS software by numerous users, which will foster the development of crosscutting education, outreach and training programs with significant broad impacts; 5.Deploy and test CyberGIS software by linking with national and international cyberinfrastructure to achieve scalability to significant sizes of geospatial problems, amounts of cyberinfrastructure resources, and number of users; and 6.Evaluate and improve the CyberGIS framework through domain science applications and vibrant partnerships to gain better understanding of the complexity of coupled human-natural systems. Software as deliverables Building blocks Computational solutions

CyberGIS Software Environment 4 CyberGIS Gateway CyberGIS Toolkit GISolve Middleware

“CyberGIS Toolkit, a deep approach to CyberGIS software integration research and development, is focused on developing and leveraging innovative computational strategies needed to solve significant geospatial scientific problems by exploiting high-end cyberinfrastructure (CI) resources.” -- NSF CyberGIS Project, 3 rd Year Annual Report (2013) 5

Objectives Identify and integrate a set of loosely coupled scalable geospatial software components into the CyberGIS Toolkit Establish and sustain the CyberGIS Toolkit as a reliable software toolbox through an open and rigorous software building, testing, packaging, and deployment framework Capture computational and spatial characteristics of a software element focusing on computational performance, scalability, and portability in various CI environments XSEDE (Extreme Science and Engineering Discovery Environment. Open Science Grid Extreme-scale supercomputers Provide a software environment for computational and data scientists to easily configure and use CyberGIS Toolkit components 6

Software Integration Approach Software component selection o Open source strategy o Community-driven component identification Benefits to related science areas Software engineering o Complexity in software integration o A holistic framework for streamlined component integration CI-based scalability evaluation and enhancement o Computational intensity analysis – theoretical approach o Performance analysis and profiling – experimental approach 7

Open Source Strategy Components o Each component must be open sourced o No restriction on any particular open source license Toolkit o Improves the accessibility to integrated software capabilities through the establishment of a software toolbox o Focuses on the robustness, portability, compatibility, and scalability of each component within advanced CI environments 8

Component Selection Example: pRasterBlaster Need for scalable map reprojection in cyberGIS analytics o Spatial analysis and modeling Distance calculation on raster cells requires appropriate projection o Visualization Reprojection for faster visualization on Web Mercator base maps pRasterBlaster integration in CyberGIS Toolkit and Gateway o Software componentization: librasterblaster, pRasterBlaster, MapIMG o Build, test, and documentation o Gateway user interface 9

pRasterBlaster Component View 10 librasterblasterpRasterBlasterMapIMG Cyberinfrastructure Service ProvidersDevelopersEnd Users

Integration Challenges Diversity and heterogeneity of software components o Programming languages o Application vs. library o Dependent libraries o Code availability o Programming models o Cyberinfrastructure resources Multiple levels of integration o Desktop level o Heterogeneous software environments o Cyberinfrastructure Software distribution o CI deployment o Packaging for broader distribution 11

Strategies Build and test o Establish a streamlined process to build and test software codes Computational intensity analysis o Computational bottleneck evaluation o Scalability analysis o Generalize solutions as computational and spatial knowledge Packaging and distribution o Package software for download and build in user computing environments o Provide common distribution packages (Debian, RPM, Windows installer, etc.) o Establish CyberGIS Toolkit as a configurable software module that can be loaded/unloaded on supercomputers Documentation and training o Build a CyberGIS Toolkit web site to host the software suite, user guide, development documentation, education materials, user feedbacks, and forums o Develop online training materials 12

Continuous Integration Framework 13 Developer-level Testing Support NMI-based Continuous Integration Scalability Analysis and Enhancement Continuous Integration Management Service e.g., Travis CI e.g., Jenkins NMI: National Middleware Initiative (

Component Integration Process Selection Parallelization Development of test cases Development of build and test plans NMI regular build and test Scalability analysis on CI o Identification of potential computational bottlenecks o Scalability to the number of processors o Scalability to problem size Release in the CyberGIS Toolkit Accessibility in CyberGIS o Cyberinfrastructure deployment for high-end users o Lowering access barriers for community users through Gateway and GISolve o Incorporation in advanced application workflow 14

Demo CyberGIS Toolkit 0.5-alpha release o

Software Components Parallel Agent-Based Modeling – PABM o HPC models: MPI, parallel I/O o Contributor: UIUC team Parallel PySAL o Parallel python implementation o Contributor: ASU team Parallel map reprojection – pRasterBlaster o HPC models: MPI, parallel I/O o Contributor: High-performance mapping group, CEGIS, USGS SpatialText o Full-text geocoding of massive social media and text data o Contributor: Kalev Leetaru and UIUC team 16

Dependency Graph – CyberGIS Toolkit 0.5-alpha 17 PABMpRasterBlasterSpatialTextpPySAL MPI IPM PAPI Lustre GDAL GEOSProj4 PySAL Perl Numpy, Scipy Python

Toolkit Deployment on XSEDE XSEDE resources o 10 petaflops; cluster computing, multi-threaded, GPU computing o 0.3 petaflops; cluster computing, multi-threaded, large memory o 0.1 petaflops; cluster computing, data I/O intensive computing o 0.34 petaflops; cluster computing, data I/O intensive computing, large memory o petaflops; shared memory o cluster computing, GPU computing Compiling and install o Compilers: Intel, GNU, PGI o MPI: Open MPI, mvapich2, mpich2, PGI, Intel MPI Software environment configuration o Environment Modules o Adaptive software module loading/unloading o Demo 18

Case Study: Parallel PySAL 19 Serge Rey, Jay Laura

Concluding Discussion The first set of selected software components has been integrated into the CyberGIS Toolkit 0.5-alpha A prototype integration framework has been established to allow sophisticated multi-level integration tests Scalability analysis has been effective to identify computational bottlenecks and improve the performance of several components CyberGIS Toolkit has been deployed on XSEDE for community access o Through GISolve Open Service API Community evaluation and feedback is critical for building high-quality and scientifically sound software 20

Acknowledgements NSF Software Infrastructure for Sustained Innovation (SI2) Program This material is based in part upon work supported by NSF under Grant Number OCI Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation 21

We need your feedback! Contact: Thanks! 22