University of Alabama in Huntsville NMI Testing and Experiences Sandra Redman Information Technology and Systems Center and Information Technology Research.

University of Alabama in Huntsville NMI Testing and Experiences Sandra Redman Information Technology and Systems Center and Information Technology Research Center National Space Science and Technology Center 256-961-7806 sredman@itsc.uah.edu Sandra.Redman@msfc.nasa.gov www.itsc.uah.edu

Improving Data Usability Advanced Applications Development Data organization and management for archival and analysis Data Mining in real-time and for post run analysis Interchange Technologies for improved data exploitation Semantics to transform data exploitation via intelligent automated processing Exploiting Technology Grid technologies for seamless access to multiple computational and data resources into a virtual computing environment Cluster technologies for high speed parallel computation, for multiple agent computations, and other applications High-performance networking for advanced applications development and high performance connectivity Next generation technologies in videoconferencing and electronic collaboration

Exploiting Technology to Improve Data Usability Time Increasing Capability Real-time Data Fusion & Information Delivery Custom Order Processing Now Customized knowledge delivery Adaptive/learning GRID Processing Future Data Mining Knowledge Discovery On-Board Mining Visual navigation aides Earth Science Markup Language (ESML) Distributed Immersive Collaborative Environments 3D and 4D distributed dynamic data fusion

Domain Scientists and Engineers Research and Analysis Data Set Development Information Technology Scientists Information Science Research Knowledge Management Data Exploitation Domain Scientists and Engineers Information Scientists Collaborations Accelerate research process Maximize knowledge discovery Minimize data handling Contribute to both fields Data Usability Success Builds on the Integration of User Domains and Information Technology

Data Mining Automated discovery of patterns, anomalies from vast observational data sets Derived knowledge for decision making, predictions and disaster response http://datamining.itsc.uah.edu

Mining Environment: When, Where, Who and Why? WHEN Real Time On-Ingest On-Demand Repeatedly WHERE User Workstation Data Mining Center Cluster Grid On-board WHO End Users Domain Experts Mining Experts Data Mining WHY Event Relationship Association Corroboration Collaboration

Creating a Successful Environment for Data Mining Provide scientists with the capabilities to allow the flexibility of creative scientific analysis Provide data mining benefits of Automation of the analysis process Reducing data volume Provide a framework to allow a well defined structure to the entire process Provide a suite of mining algorithms for creative analysis that can adapt to new hypotheses Provide capabilities to add “science algorithms” to the environment Exploit emerging technologies in computational and data grids, high-performance networks, and collaborative environments

Algorithm Development and Mining System (ADaM) - System Overview Consists of over 100 interoperable mining and image processing components Each component is provided with a C++ application programming interface (API), an executable in support of scripting tools (e.g. Perl, Python, Tcl, Shell) ADaM components are lightweight and autonomous, and have been used successfully in a grid environment ADaM has several translation components that provide data level interoperability with other mining systems (such as WEKA and Orange), and point tools (such as libSVM and svmLight) Components include Python wrappers and web service interfaces Visualization of results easily accomplished with various visualization packages

ADaM Components Classification Techniques Clustering Techniques Feature Selection / Reduction Techniques Pattern Recognition Utilities Association Rules Optimization Techniques Pattern Recognition Image Processing Basic Image Operations Segmentation / Edge Detection Filtering Texture Features

Current Mining Environments Multiple Configurations – Complete System (Client and Engine) – Mining Engine (User provides its own client) – Application Specific Mining Systems – Operations Tool Kit – Stand Alone Mining Algorithms Distributed/Federated/Grid Mining – Distributed services – Distributed data – Chaining using Interchange Technologies On-board Mining – Real time and distributed mining – Processing environment constraints – Space-based/ground-based/unmanned

ADaM Feature Subset Selection application chosen for testing – Supervised pattern classification is a technique important in many domains – Used to improve both the runtime and accuracy of a supervised pattern classifier by eliminating noisy, irrelevant or redundant attributes or features from the data set. – Feature subset selection is the process of choosing a subset of the features from the original data set in order to maximize classifier accuracy – Both processor and data-intensive

Parallel Version of Cloud Extraction Laplacian Filter Sobel Horizontal Filter Sobel Vertical Filter Energy Computation Energy Computation Energy Computation Energy Computation Classifier GOES Image Cloud Image GOES images used to recognize cumulus cloud fields Cumulus clouds are small and do not show up well in 4km resolution IR channels Detection of cumulus cloud fields in GOES can be accomplished by using texture features or edge detectors Three edge detection filters are used together to detect cumulus clouds which lends itself to implementation on a parallel cluster GOES Image Cumulus Cloud Mask

Feature Subset Selection Testing Application ported to linux Support Vector Machine downloaded and tested Developed application scripts Modified for Globus environment by writing simple Globus RSL file Ran each combination of tools on a different node on the grid Globus used to execute jobs on different machines Experimented with both real and synthetic data Grid Mining Agent Satellite Data Archive X Satellite Data Archive Y Grid Mining Agent Grid Mining Agent

Early Findings (NMI R2) Globus documentation improved, installation trouble-free, application port straight-forward No problems encountered during Condor-G installation, but found problem with Condor-G under Redhat linux 7.3 when using nss_ldap. Developer provided workaround - start name service caching daemon (nscd) GSI-OpenSSH installed, but Kerberos authentication did not work since linux was not compiled with PAM option (undocumented) Network Weather Service installed, but learned we are more interested in MDS

MEAD Modeling Environment for Atmospheric Discovery One of the NSF PACI Alliance research Expeditions Expeditions ensure intense collaboration among technology developers and application scientists and focus on the deployment of infrastructure that supports computational science and engineering and science in a variety of disciplines. MEAD’s focus is on retrospective analysis of hurricanes and severe storms using the TeraGrid, integrating computation, grid workflow management, data management, model coupling, data analysis/mining, and visualization.

MEAD Science Objective: – To investigate different thunderstorm cell interactions favorable for subsequent tornado (mesocyclone) formation Approach: – Use idealized WRF model simulations with different initial conditions – Create a large parameter space of thunderstorm cell interaction and storm behavior – Mine this search space for patterns and trends

WRF Initializations 230 WRF runs were made, + two control (single-cell) Each corresponded to a particular arrangement of a pair of initial storm cells In figure at left: Each square: 1 simulation 1st storm in the middle; 2nd at one of blue squares Center cell stronger Matrix of WRF simulations Slide Source: Brian Jewett

Goals of this Mining Study Develop a mesocyclone detection algorithm (in both 2D and 3D) Develop an algorithm to track the temporal evolution of the mesocyclone features Investigate the use of clustering techniques to: – Summarize differences in simulation runs – Provide an overview of all the simulations

Example: Tracking Results

Mesocyclone Detection and Tracking Results Features with time durations of a single time step are filtered out

Summary – Mesocyclone Detection Number of mesocyclones with higher duration tend to be associated with initializations where the second cell is closer to the first Mesocyclones found in the storm simulations are sensitive to the particular arrangement of a pair of initial storm cells (secondary storm placement at 45 degrees to the primary storm) Clustering techniques are useful to summarize differences in simulation runs Clustering techniques provide an overview of all the simulations

Some Lessons Learned NMI Testbed Process working well – Answers found through NMI discussion lists from developers and other users Have to “sell” the grid concept to developers, administrators, users NMI Work proven helpful in other grid work – TeraGrid – LEAD Linked Environments for Atmospheric Discovery – SpaceDoG Space Development and Operations Grid – CEOS Committee for Earth Observing Satellites More Components needed

Linked Environment for Atmospheric Discovery (LEAD) NSF Information Technology Research Program Creating a cyberinfrastructure for mesoscale meteorology –real-time, on-demand, and dynamically adaptive needs for mesoscale weather research –High volume data sets and streams –Computationally demanding numerical models and data assimilation systems

The LEAD Goal To create an integrated, scalable framework in which analysis tools, forecast models, and data repositories can be used as dynamically adaptive, on-demand systems that can – operate independent of data formats and the physical location of data or computing resources – change configuration rapidly and automatically in response to weather; – continually be steered by new data (i.e., the weather); – respond to decision-driven inputs from users; – initiate other processes automatically; and – steer remote observing technologies to optimize data collection for the problem at hand

Users ADaM ADAS Tools NWS National Static Observations & Grids Experimental Dynamic Observations Mesoscale Weather Local Observations MyLEAD Portal Local Physical Resources Remote Physical (Grid) Resources Virtual/Digital Resources and Services The LEAD Vision: Dynamic, Adaptive, Multi-Scale

LEAD An integrated framework for identifying, accessing, preparing, assimilating, predicting, managing, analyzing, mining, and visualizing meteorological data, independent of format and physical location

Develop and document common/standard interfaces for interoperability of data and services Design new data models for handling real-time/streaming input data fusion/integration Design and develop distributed standardized catalog capabilities Develop advanced resource allocation and load balancing techniques Exploit the grid concept for enhanced data mining functionality Develop more intelligent and intuitive user interfaces Integrate with collaborative environments Develop ontologies of scientific data, processes and data mining techniques for multiple domains Support language and system independent components Incorporate data mining into science and engineering curricula Challenges for Next-generation Mining

LEAD GWSTBs Grid and Web Services Testbeds – Local User Environment – customized portal, control of information flows, collaboration tools, managing processes – Productivity Environment – models, tools, and algorithms – Data Services Environment – data transport, data formatting, and interoperability – Distributed Technologies Environment – workflow infrastructure to autonomously acquire resources and adapt to changing plans – Data Archive – recent and historical data, products, and tools

LEAD Education Testbeds Provide hands-on access to assess the effectiveness of LEAD technologies for education Provide input and feedback to LEAD developers Facilitate knowledge transfer Collaborative technologies

LEAD policy development and implementation Define Virtual Organizations – LEAD designed for use principally by the meteorological higher education and operations research communities Develop LEAD policies – Developing LEAD global policies – Adhere to local policies of each site (security, resource utilization, etc.) Policy management services – PKI cryptography, X.509 certificates – Authorization service – Monitor resource utilization and accounting services

Other considerations Emerging standards and middleware – Applications development to concentrate on the application; using NMI middleware (Globus, MyProxy, OGCE, etc) for grid infrastructure; also using additional middleware (MCS, RSL, performance monitoring tools) – Current software has dependencies on middleware versions Configuration management – Distributed team developing and delivering software to multiple testbeds – Goal is to allow heterogeneous host environments Collaborative technologies – Access Grid, H.323 videoconferencing facilitate LEAD team project planning and work sessions – Collaborative technologies will be integrated into testbeds for user education and research

Data Integration and Mining: From Global Information to Local Knowledge Precision Agriculture Emergency Response Weather Prediction Urban Environments Bioinformatics

University of Alabama in Huntsville NMI Testing and Experiences Sandra Redman Information Technology and Systems Center and Information Technology Research.

Similar presentations

Presentation on theme: "University of Alabama in Huntsville NMI Testing and Experiences Sandra Redman Information Technology and Systems Center and Information Technology Research."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

University of Alabama in Huntsville NMI Testing and Experiences Sandra Redman Information Technology and Systems Center and Information Technology Research.

Similar presentations

Presentation on theme: "University of Alabama in Huntsville NMI Testing and Experiences Sandra Redman Information Technology and Systems Center and Information Technology Research."— Presentation transcript:

Similar presentations

About project

Feedback