Presentation is loading. Please wait.

Presentation is loading. Please wait.

UNIVERSITY of MARYLAND GLOBAL LAND COVER FACILITY High Performance Computing in Support of Geospatial Information Discovery and Mining Joseph JaJa Institute.

Similar presentations


Presentation on theme: "UNIVERSITY of MARYLAND GLOBAL LAND COVER FACILITY High Performance Computing in Support of Geospatial Information Discovery and Mining Joseph JaJa Institute."— Presentation transcript:

1 UNIVERSITY of MARYLAND GLOBAL LAND COVER FACILITY High Performance Computing in Support of Geospatial Information Discovery and Mining Joseph JaJa Institute for Advanced Computer Studies and Department of Electrical and Computer Engineering University of Maryland, College Park

2 Outline Brief Background: Laboratory for Parallel and Distributed Computing NPACI Environmental Informatics Information Discovery Ingestion and Metadata Management Advanced Image Processing and Data Fusion Geospatial Data Mining Data Structures Analysis and Mining Tools Emerging Trends and Planned Activities

3 Laboratory for Parallel and Distributed Computing Advanced high-end computing platforms in support of research in systems software tools and scalable algorithms for a wide variety of science applications. Current platforms: 16-node IBM SP2 with a large disk array - Grand Challenge Project and IBM Shared University Research. 10-node DEC Cluster, each with 4-Alpha processors - Keck Foundation and Grand Challenge Project. SP-based supercomputer for Earth System Science applications –NSF, IBM Shared University Award, and NASA. 32-node (64 processors) Linux Cluster with Gigabit Ethernet - Systems software tools and applications – NSF and IBM SUR. A Large SMP coupled with a 10-TB of “active” disk array – NSF and IBM SUR. 32-node IBM SP in support of scientific computing and computational biology – Center for Scientific Computing and IBM.

4 IBM SP2 RS6000 3TB of Disks Tape Robot 18 terabytes, 8 drives Supporting Hardware for the GLCF 4 Way High Node 1GB memory Silver Node Thin Node Silver Node Thin Node 3590 drive Thin Node

5 NSF Partnership for Advanced Computational Infrastructure UMD is a Major Partner in PACI/UCSD, One of the Two Surviving Supercomputer Centers. UMD Roles: Data Cache Site R&D Participation in the Thrust Areas: Programming Tools and Environments Data Intensive Computing Earth Systems Science Resources

6 Environmental Informatics: NPACI Project Develop and prototype a software infrastructure on top of multiple distributed data sites that will allow: Information discovery from distributed, heterogeneous environmental and biodiversity data sources. Integration with Current and Emerging Web Technologies. Advanced browsing, subsetting, and image processing at different granule levels, including automatic overlay of different types of data.

7 Initial Prototype Informix (Sites, Workspace and Remote link Management) Data Search and Retrieval Data Overlay ESS Web Site WWW Interface GLCF USER WORKSPACE KUBirds SDSC-SRB LTER users

8 Software Modules Map Server Data Overlay Ingestion Data Transport Description Remote Site Description Database(Site, workspace And remote Meta-data, Preview management) WWW Interface Image Browsing And Processing Distributed Search & Retrieve Workspace

9 Remote Data or Link Ingestion XML DTD at three levels: granule, data set and web site Granule level describes the data item, the fields in XML are either extracted from the header file or provided by the user for historical data Data set level describes the data list in collection level and specifies the searchable parameters, Web site level gives all the information for the whole site of a data provider, such as search engine, interaction protocol, etc.

10 Collaboration Scenario No migration: No data is migrated but each site provides the interaction protocol and searchable parameters. Each site needs to provide ftp or http service for user to access the data. Metadata Management System The metadata in granule XML is transferred and ingested at UMD. The raw data is hosted at the original site.

11 Geospatial Data Analysis and Mining Develop basic building blocks to efficiently manage and analyze large scale spatio-temporal data: Efficient indexing schemes for large scale heterogeneous geospatial raster data. Built-in modules for aggregate and statistical analysis over space and time. Mining for spatio-temporal regions that satisfy user-specified characteristics. Efficient algorithms for clustering, discovery of association rules, and decision-tree induction.

12 A Typical Class of Queries Given a time series of geospatial data and a set of functions {f}, determine regions/time intervals for which each function varies in a certain fashion. Example: Find regions with land cover type x in which there is an unusually warm and dry winter season, followed by a summer drought lasting d days, followed by a period of above normal precipitation

13 Preliminary Results Efficient data structures built around multidimensional arrays and R-trees: Three-dimensional arrays that include aggregate and statistical values of attributes of interest (average, maximum, minimum, sum, standard deviation, etc.) R-tree built around attributes such that each node contains rectangular regions whose indicators fall within that node Efficient high performance algorithms to build these data structures Efficient algorithms to perform bulk updates

14 Emerging Trends and Planned Activities Persistent Distributed Data Archives – including data, information, and knowledge management infrastructure (project in collaboration with NARA, led by SDSC). Computational Grid – widely distributed computational and storage resources that can be accessed as if all the resources are local (NPACI). Storage Area Networks (SAN) – storage devices (tapes, disk arrays, NAS) are connected to servers via a Fiber Channel.


Download ppt "UNIVERSITY of MARYLAND GLOBAL LAND COVER FACILITY High Performance Computing in Support of Geospatial Information Discovery and Mining Joseph JaJa Institute."

Similar presentations


Ads by Google