Focus Study: Mining on the Grid with ADaM Sara Graves Sandra Redman Information Technology and Systems Center and Information Technology Research Center.

Slides:



Advertisements
Similar presentations
LEAD Portal: a TeraGrid Gateway and Application Service Architecture Marcus Christie and Suresh Marru Indiana University LEAD Project (
Advertisements

Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY
CPSCG: Constructive Platform for Specialized Computing Grid Institute of High Performance Computing Department of Computer Science Tsinghua University.
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
NG-CHC Northern Gulf Coastal Hazards Collaboratory Simulation Experiment Integration Sandra Harper 1, Manil Maskey 1, Sara Graves 1, Sabin Basyal 1, Jian.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Education and Outreach Within the Modeling Environment for Atmospheric Discovery (MEAD) Project Daniel J. Bramer University Of Illinois at Urbana-Champaign.
Connect. Communicate. Collaborate Click to edit Master title style MODULE 1: perfSONAR TECHNICAL OVERVIEW.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
BUILDING APPLICATIONS FROM A WEB SERVICE BASED COMPONENT ARCHITECTURE D. Gannon, S. Krishnan, L. Fang, G. Kandaswamy, Y. Simmhan, A. Slominski.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.
SAN DIEGO SUPERCOMPUTER CENTER Developing a CUAHSI HIS Data Node, as part of Cyberinfrastructure for the Hydrologic Sciences David Valentine Ilya Zaslavsky.
TPAC Digital Library Talk Overview Presenter:Glenn Hyland Tasmanian Partnership for Advanced Computing & Australian Antarctic Division Outline: TPAC Overview.
V. Chandrasekar (CSU), Mike Daniels (NCAR), Sara Graves (UAH), Branko Kerkez (Michigan), Frank Vernon (USCD) Integrating Real-time Data into the EarthCube.
Grid Computing, B. Wilkinson, a.1 Grid Portals.
Grid Computing for Real World Applications Suresh Marru Indiana University 5th October 2005 OSCER OU.
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
18:15:32Service Oriented Cyberinfrastructure Lab, Grid Deployments Saul Rioja Link to presentation on wiki.
DISTRIBUTED COMPUTING
Gayathri Namala Center for Computation & Technology Louisiana State University Representing the SURA Coastal Ocean Observing and Prediction Program (SCOOP)
University of Alabama in Huntsville NMI Testing and Experiences Sandra Redman Information Technology and Systems Center and Information Technology Research.
Flexibility and user-friendliness of grid portals: the PROGRESS approach Michal Kosiedowski
material assembled from the web pages at
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
OOI CI LCA REVIEW August 2010 Ocean Observatories Initiative OOI Cyberinfrastructure Architecture Overview Michael Meisinger Life Cycle Architecture Review.
Using NMI Components in MGRID: A Campus Grid Infrastructure Andy Adamson Center for Information Technology Integration University of Michigan, USA.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
Digital Earth Communities GEOSS Interoperability for Weather Ocean and Water GEOSS Common Infrastructure Evolution Roberto Cossu ESA
GEM Portal and SERVOGrid for Earthquake Science PTLIU Laboratory for Community Grids Geoffrey Fox, Marlon Pierce Computer Science, Informatics, Physics.
Center for Component Technology for Terascale Simulation Software CCA is about: Enhancing Programmer Productivity without sacrificing performance. Supporting.
Grid Architecture William E. Johnston Lawrence Berkeley National Lab and NASA Ames Research Center (These slides are available at grid.lbl.gov/~wej/Grids)
Tools for collaboration How to share your duck tales…
1 Grid Portal for VN-Grid Cu Nguyen Phuong Ha. 2 Outline Some words about portals in principle Overview of OGCE GridPortlets.
Chapter 5 McGraw-Hill/Irwin Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved.
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
Holding slide prior to starting show. A Portlet Interface for Computational Electromagnetics on the Grid Maria Lin and David Walker Cardiff University.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
GO-ESSP Workshop, LLNL, Livermore, CA, Jun 19-21, 2006, Center for ATmosphere sciences and Earthquake Researches Construction of e-science Environment.
ICCS WSES BOF Discussion. Possible Topics Scientific workflows and Grid infrastructure Utilization of computing resources in scientific workflows; Virtual.
GRIDS Center Middleware Overview Sandra Redman Information Technology and Systems Center and Information Technology Research Center National Space Science.
Sponsored by the National Science Foundation A New Approach for Using Web Services, Grids and Virtual Organizations in Mesoscale Meteorology.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
ITSC/University of Alabama in Huntsville ADaM System Architecture Rahul Ramachandran, Sara Graves and Ken Keiser Mathematical Challenges in Scientific.
Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.
ISERVOGrid Architecture Working Group Brisbane Australia June Geoffrey Fox Community Grids Lab Indiana University
International Symposium on Grid Computing (ISGC-07), Taipei - March 26-29, 2007 Of 16 1 A Novel Grid Resource Broker Cum Meta Scheduler - Asvija B System.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
Cyberinfrastructure: Many Things to Many People Russ Hobby Program Manager Internet2.
An Overview of Scientific Workflows: Domains & Applications Laboratoire Lorrain de Recherche en Informatique et ses Applications Presented by Khaled Gaaloul.
A Technical Overview Bill Branan DuraCloud Technical Lead.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
OGCE Workflow and LEAD Overview Suresh Marru, Marlon Pierce September 2009.
- GMA Athena (24mar03 - CHEP La Jolla, CA) GMA Instrumentation of the Athena Framework using NetLogger Dan Gunter, Wim Lavrijsen,
3-D rendering of jet stream with temperature on Earth’s surface ESIP Air Domain Overview The Air Domain encompasses a variety of topic areas, but its focus.
Data Assimilation Decision Making Using Sensor Web Enablement M. Goodman, G. Berthiau, H. Conover, X. Li, Y. Lu, M. Maskey, K. Regner, B. Zavodsky, R.
LEAD Project Discussion Presented by: Emma Buneci for CPS 296.2: Self-Managing Systems Source for many slides: Kelvin Droegemeier, Year 2 site visit presentation.
Cyberinfrastructure Overview of Demos Townsville, AU 28 – 31 March 2006 CREON/GLEON.
Research and Service Support Resources for EO data exploitation RSS Team, ESRIN, 23/01/2013 Requirements for a Federated Infrastructure.
Joseph JaJa, Mike Smorul, and Sangchul Song
Initial Adaptation of the Advanced Regional Prediction System to the Alliance Environmental Hydrology Workbench Dan Weber, Henry Neeman, Joe Garfield and.
University of Technology
OGCE Portal Applications for Grid Computing
The Anatomy and The Physiology of the Grid
The Anatomy and The Physiology of the Grid
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Focus Study: Mining on the Grid with ADaM Sara Graves Sandra Redman Information Technology and Systems Center and Information Technology Research Center University of Alabama in Huntsville National Space Science and Technology Center

Data Mining Automated discovery of patterns, anomalies from vast observational data sets Derived knowledge for decision making, predictions and disaster response

Creating a Successful Environment for Data Mining Provide scientists with the capabilities to allow the flexibility of creative scientific analysis Provide data mining benefits of Automation of the analysis process Reducing data volume Provide a framework to allow a well defined structure to the entire process Provide a suite of mining algorithms for creative analysis that can adapt to new hypotheses Provide capabilities to add science algorithms to the environment Exploit emerging technologies in computational and data grids, high-performance networks, and collaborative environments

Develop and document common/standard interfaces for interoperability of data and services Design new data models for handling real-time/streaming input data fusion/integration Design and develop distributed standardized catalog capabilities Develop advanced resource allocation and load balancing techniques Exploit the grid concept for enhanced data mining functionality Develop more intelligent and intuitive user interfaces Integrate with collaborative environments Develop ontologies of scientific data, processes and data mining techniques for multiple domains Support language and system independent components Incorporate data mining into science and engineering curricula Challenges for Next-generation Mining

Algorithm Development and Mining System (ADaM) - System Overview Consists of over 100 interoperable mining and image processing components Each component is provided with a C++ application programming interface (API), an executable in support of scripting tools (e.g. Perl, Python, Tcl, Shell) ADaM components are lightweight and autonomous, and have been used successfully in a grid environment (NASA IPG, TeraGrid, lab) ADaM has several translation components that provide data level interoperability with other mining systems (such as WEKA and Orange), and point tools (such as libSVM and svmLight) Web service interfaces in development Executes in multiple environments (e.g. workstation, cluster, grid, on-board, etc.) NMI Integration Testbed test cases

MEAD Modeling Environment for Atmospheric Discovery One of the NSF PACI Alliance research Expeditions Expeditions ensure intense collaboration among technology developers and application scientists and focus on the deployment of infrastructure that supports computational science and engineering and science in a variety of disciplines MEAD’s focus is on retrospective analysis of hurricanes and severe storms using the TeraGrid, integrating computation, grid workflow management, data management, model coupling, data analysis/mining, and visualization

MEAD Mining Example: Mesocyclone Detection Algorithm Science Objective: – To investigate different thunderstorm cell interactions favorable for subsequent tornado (mesocyclone) formation Goals: – Develop a mesocyclone detection algorithm (in both 2D and 3D) – Develop an algorithm to track the temporal evolution of the mesocyclone features – Investigate the use of clustering techniques to: Summarize differences in simulation runs Provide an overview of all the simulations

Approach Mining Approach – Use idealized WRF model simulations with different initial conditions – Create a large parameter space of thunderstorm cell interaction and storm behavior – Mine this search space for patterns and trends Grid Approach – Application scripts developed in Python and tested on linux; modified for Globus environment by writing a simple Globus RSL file – Application scripts constructed to run each combination of tools in parallel on a different node on the grid

Example MEAD Workflow Initial Data and Parameters Initial Data and Parameters Multiple WRF Models (Weather) Multiple ROMS Models (Ocean) Data Mining (ADaM) Visualization Inter-model communications Initial SetupModel Execution Post Run Analysis Model Results Model Results Grid environment supports the demanding computational, data storage and post analysis requirements

Using the TeraGrid Excellent user documentation at Account Management - Procedures vary per site – Get account at each site – Obtain certificate (from one of several sites, X.509 or KX.509) – Establish Distinguished Name in grid-mapfile at each site – Create certificate proxy (grid-proxy-int, MyProxy, kinit) Programming Environment – Know your systems – Compilers (you have a number of choices) – Environment Variables (SoftEnv) – Message Passing (several flavors available) Executing Jobs – Condor-G – Globus

WRF Initializations 230 WRF runs were made, + two control (single-cell) Each corresponded to a particular arrangement of a pair of initial storm cells In figure at left: Each square: 1 simulation 1st storm in the middle; 2nd at one of blue squares Center cell stronger Matrix of WRF simulations Slide Source: Brian Jewett

Example: Tracking Results

Mesocyclone Detection and Tracking Results Features with time durations of a single time step are filtered out

Summary – Mesocyclone Detection Number of mesocyclones with higher duration tend to be associated with initializations where the second cell is closer to the first Mesocyclones found in the storm simulations are sensitive to the particular arrangement of a pair of initial storm cells (secondary storm placement at 45 degrees to the primary storm) Clustering techniques are useful – Summarize differences in simulation runs – Provide an overview of all the simulations Limitations of Clustering algorithms – Investigated K-Means, Dbscan, Maximin and Hiearchical Clustering Algorithms – K-Means clustering quality is inferior but provides useful cluster centers or profiles

LEAD Linked Environments for Atmospheric Discovery A cyberinfrastructure for mesoscale meteorology – real-time, on-demand, and dynamically adaptive needs for mesoscale weather research – High volume data sets and streams – Computationally demanding numerical models and data assimilation systems

LEAD NSF Information Technology Research (ITR) program Multi-Disciplinary team contributing expertise in meteorological applications, analysis tools, forecast tools, data distribution and management, portal development, workflow orchestration, education and outreach

LEAD An integrated framework for identifying, accessing, preparing, assimilating, predicting, managing, analyzing, mining, and visualizing meteorological data, independent of format and physical location Dynamic workflow orchestration and data management are key elements

LEAD GWSTBs Grid and Web Services Testbeds – Local User Environment – customized portal, control of information flows, collaboration tools, managing processes – Productivity Environment – models, tools, and algorithms – Data Services Environment – data transport, data formatting, and interoperability – Distributed Technologies Environment – workflow infrastructure to autonomously acquire resources and adapt to changing plans – Data Archive – recent and historical data, products, and tools

The Portal as a Grid Access Point The Portal Server provides the users Grid Context. Security Data Management Service Data Management Service Accounting Service Accounting Service Logging Event Service Policy Administration & Monitoring Administration & Monitoring Grid Orchestration Registries and Name binding Registries and Name binding Reservations And Scheduling Reservations And Scheduling Open Grid Service Architecture Layer Web Services Resource Framework – Web Services Notification OGCE or GridSphere Grid Portal Server OGCE or GridSphere Grid Portal Server https Physical Resource Layer SOAP & WS-Security

Services Oriented Architecture User interfaces with portal via browser Portal provides tools for users to build and launch workflows Portlets (JSR-168) provide interface between user and grid services Applications can be wrapped as services via a Portal Factory Service Generator – Requires application, script to run it, input parameters, output parameters – Write an AppService document and upload to Portal Factory Service Generator (in portal) – Service is created as well as the portal client interface Security model integral to design

Data Integration and Mining: From Global Information to Local Knowledge Precision Agriculture Emergency Response Weather Prediction Urban Environments Bioinformatics