Research Design for Collaborative Computational Approaches and Scientific Workflows Deana Pennington January 8, 2007.

Slides:



Advertisements
Similar presentations
Kensington Oracle Edition: Open Discovery Workflow Meets Oracle 10g Professor Yike Guo.
Advertisements

Education, Outreach and Training. Specifications Document Overall objective: Better integration of ecoinformatics, in general, and SEEK tools, specifically,
Using Specimen Data in Scientific Workflow Environments to Connect to Metadata Archive and Discovery Services in Environmental Biology CJ Grady, J.H. Beach,
RIVELA Database for the Research on Venice and the Lagoon Dr. Pierpaolo Campostrini Dr. Caterina Dabalà Dr. Stefania De Zorzi Prof. Renzo Orsini RIVELA.
Introduction to Kepler Deana Pennington, PhD University of New Mexico LTER Network Office, Sevilleta LTER PI CI-Team: Advancing CI-Based Science through.
SwissEx WIKI. Motivation for WIKI re-use of measurements –collaborative effort –semantics organization of measurements –temporal and spatial reference.
Computational Physics Kepler Dr. Guy Tel-Zur. This presentations follows “The Getting Started with Kepler” guide. A tutorial style manual for scientists.
Development of a Community Hydrologic Information System Jeffery S. Horsburgh Utah State University David G. Tarboton Utah State University.
Astrophysics, Biology, Climate, Combustion, Fusion, Nanoscience Working Group on Simulation-Driven Applications 10 CS, 10 Sim, 1 VR.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.
Leveraging semantic metadata for ecological data discovery and integration for analysis and modeling Matthew B. Jones Mark P. Schildhauer with contributions.
Rebecca Boger Earth and Environmental Sciences Brooklyn College.
Introduction to the course January 9, Points to Cover  What is GIS?  GIS and Geographic Information Science  Components of GIS Spatial data.
Developing Health Geographic Information Systems (HGIS) for Khorasan Province in Iran (Technical Report) S.H. Sanaei-Nejad, (MSc, PhD) Ferdowsi University.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Špindlerův Mlýn, Czech Republic, SOFSEM Semantically-aided Data-aware Service Workflow Composition Ondrej Habala, Marek Paralič,
January, 23, 2006 Ilkay Altintas
Time Table exchange QSAS / CL / CAA / AMDA CESR, 25/26 feb
Massimiliano Assante – Leonardo Candela – Donatella Castelli – Pasquale Pagano Fourteenth International Conference on Grey Literature An Environment Supporting.
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
Analysis and Workflows Lesson 12: Analysis and Workflows CC image by Marc_Smith on Flickr.
Vision for the 21 st Century Information Environment in Ecology (Ecoinformatics) Deana Pennington University of New Mexico LTER Network Office Shawn Bowers.
SEEK: Enabling Ecology and Biodiversity Science Through Cyberinfrastructure.
Introduction for BEAM Ecological Niche Modeling Working Meeting Deana Pennington University of New Mexico December 14, 2004.
Data Publication 101 for PhD students, starting their academic career [2014] CC-BY: 3TU.Datacentre more
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
Supporting Large-Scale Science with Workflows Deana Pennington University of New Mexico Long-Term Ecological Research Network Office ITR: Science Environment.
Eric GrahamNathan Yau Staff Ecologist, CENSGraduate Student, Department of Statistics Use CasesSensorBase Coupled Human-Observational Systems Technology.
Database System Concepts and Architecture
Spatial Data Integration Deana D. Pennington, PhD University of New Mexico.
material assembled from the web pages at
Preserving the Scientific Record: Preserving a Record of Environmental Change Matthew Mayernik National Center for Atmospheric Research Version 1.0 [Review.
Pipelines and Scientific Workflows with Ptolemy II Deana Pennington University of New Mexico LTER Network Office Shawn Bowers UCSD San Diego Supercomputer.
EcoGrid SEEK All Hands Meeting February 2003 Albuquerque, NM.
Water Quality Data, Maps, and Graphs Over the Web · Chemical concentrations in water, sediment, and aquatic organism tissues.
Science Environment for Ecological Knowledge: EcoGrid Matthew B. Jones National Center for.
U.S. Department of the Interior U.S. Geological Survey Tutorials on Data Management Lesson 3.3: Analysis and Workflows CC image by wlef70 on Flickr.
GCE Data Toolbox -- metadata-based tools for automated data processing and analysis Wade Sheldon University of Georgia GCE-LTER.
SEEK EcoGrid l Integrate diverse data networks from ecology, biodiversity, and environmental sciences l Metacat, DiGIR, SRB, Xanthoria,... l EML is the.
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
Integrated Grid workflow for mesoscale weather modeling and visualization Zhizhin, M., A. Polyakov, D. Medvedev, A. Poyda, S. Berezin Space Research Institute.
TRLN High Performance Data Storage System 21 Sep 2006 Jim Porto Ken Galluppi.
1 Enviromatics Environmental sampling Environmental sampling Вонр. проф. д-р Александар Маркоски Технички факултет – Битола 2008 год.
Knowledge Representation of Statistic Domain For CBR Application Supervisor : Dr. Aslina Saad Dr. Mashitoh Hashim PM Dr. Nor Hasbiah Ubaidullah.
Role of Spatial Database in Biodiversity Conservation Planning Sham Davande, GIS Expert Arid Communities Technologies, Bhuj 11 September, 2015.
Ecoinformatics Workshop Summary SEEK, LTER Network Main Office University of New Mexico Aluquerque, NM.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
Knowledge Representation Breakout KR: to create content (objects, reltnshps) for SMS (logic/inference) that will be useful for enhancing the discovery.
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
1 Limitations of BLAST Can only search for a single query (e.g. find all genes similar to TTGGACAGGATCGA) What about more complex queries? “Find all genes.
Scientific Workflow systems: Summary and Opportunities for SEEK and e-Science.
Volgograd State Technical University Applied Computational Linguistic Society Undergraduate and post-graduate scientific researches under the direction.
Hellenic Centre for Marine Research (HCMR) MedOBIS - Ocean Biogeographic Information System for the Eastern Mediterranean and Black Sea.
SEEK Science Environment for Ecological Knowledge l EcoGrid l Ecological, biodiversity and environmental data l Computational access l Standardized, open.
Cyberinfrastructure Overview Russ Hobby, Internet2 ECSU CI Days 4 January 2008.
Software Reuse Course: # The Johns-Hopkins University Montgomery County Campus Fall 2000 Session 4 Lecture # 3 - September 28, 2004.
Data Organization Quality Assurance and Transformations.
Scientific Workflows for the Sensor Web ICT for Earth Observation Anwar Vahed.
Why use landscape models?  Models allow us to generate and test hypotheses on systems Collect data, construct model based on assumptions, observe behavior.
Example projects using metadata and thesauri: the Biodiversity World Project Richard White Cardiff University, UK
General Introduction. Developed by USGS Freely available via Internet
Data Management: Data Processing Types of Data Processing at USGS There are several ways to classify Data Processing activities at USGS, and here are some.
Staging of the Ecological Niche Modeling Mammal Prototype Project Deana Pennington University of New Mexico December 14, 2004.
Technische Universität München © Prof. Dr. H. Krcmar An Ontology-based Platform to Collaboratively Manage Supply Chains Tobias Engel, Manoj Bhat, Vasudhara.
Example 1: Biodiversity Weather Stations/Automated Long-Term Monitoring: Technologies like DNA-barcoding of environmental samples, visual and acoustic.
Computational Physics Kepler
Overview of Workflows: Why Use Them?
Presentation transcript:

Research Design for Collaborative Computational Approaches and Scientific Workflows Deana Pennington January 8, 2007

Informatics and the Research Cycle Mental Model Research Design Publish Data-intensive Data mining Bio-inspired algs. Exp. Data Analysis Visualization Compute- intensive Parallel processing High throughput Knowledge- intensive Human cognition Ontologies Sem. mediation Collect Data Inductive, Descriptive Statistics Deductive, Prescriptive Mechanistic Conduct Analyses Scientific Workflow System Automation => replication Access to distributed resources Reusability & sharing Empowered by knowledge-intensive approaches*** Data Management Data models Metadata Storage Cyberinfrastructure: Sharing data, analyses, mental models

Scientific Workflows Scientists do their analyses now by:Scientists do their analyses now by: –Focus on data collection and the analytical steps –Manually coordinate export and import of data among software systems Workflow systems collaborate with the scientist to:Workflow systems collaborate with the scientist to: –Discover existing data –Handle data flow between components –Document the analytical process Query EcoGrid to find data Archive output to EcoGrid with workflow metadata

–Not linear –Involve multiple data sets –Involve multiple analytical steps

Automated Workflows ScriptsSingle platformScriptsSingle platform Visual modelingSingle environment environmentVisual modelingSingle environment environment Workflows:Workflows: –Cross-platform –Cross-environment –Distributed data & analyses

Productivity Example Mental Model BiomassTempSoil Et al. == f ( C Concept ClimateTemp Soil Biomass MergeModelPredict Conceptual Workflow AS TS DS TS Transformation Step Dessimination DS Executable Workflow AS Analysis Step Data Step DS AS DS Abstract Workflow “View1”: Excel GIS SAS GIS “View2”: VBScript R Script GA R

Scientists design their research at the conceptual workflow level Often done on the fly over the period of time the research is being conducted For automated approaches, this must be well thought out from the beginning HOWEVER, because of the automation it is easy to modify the analysis and rerun it many times, so you are not locked into the original design

Benefits Reusable analysis steps, pipelines, and workflows Formal documentation of methods (output in report format) Reproducibility of methods Visual creation and communication of methods Versioning Automated data typing and transformation

Nested workflows AS x TS 1 AS y AS z AS r TS 2 Search for relevant data and analyses (Query) SW 0 Image Processing Pipeline Signal Processing Pipeline AS r TS 2 Field Data Ground Sensors Imagery Semantically-integrated

Ecological niche modeling conceptual workflow Training sample GARP rule set Test sample Species pres. & abs. points EcoGrid Query EcoGrid Query Layer Integration Sample Data + A3 + A2 + A1 Data Calculation Map Generation Validation User Model quality parameters Native range prediction map Env. layers Generate Metadata Archive To Ecogrid Selected prediction maps Transformation Scaling EcoGrid DataBase EcoGrid DataBase EcoGrid DataBase EcoGrid DataBase Integrated layers Integrated layers GARP rule set Species pres. & abs. points

Ecological niche modeling conceptual workflow Training sample GARP rule set Test sample Species pres. & abs. points EcoGrid Query EcoGrid Query Layer Integration Sample Data + A3 + A2 + A1 Data Calculation Map Generation Validation User Model quality parameters Native range prediction map Env. layers Generate Metadata Archive To Ecogrid Selected prediction maps Transformation Scaling EcoGrid DataBase EcoGrid DataBase EcoGrid DataBase EcoGrid DataBase Integrated layers Integrated layers GARP rule set Species pres. & abs. points Spatial location Temporal extent

Generic Workflow Training sample GARP rule set Test sample Occurrence Data Binary, Categorical or Numeric EcoGrid Query EcoGrid Query Layer Integration Sample Data + A3 + A2 + A1 Data Calculation Map Generation Validation User Model quality parameters Prediction map Environmental layers Generate Metadata Archive To Ecogrid Selected prediction maps Physical Transformation Scaling EcoGrid DataBase EcoGrid DataBase EcoGrid DataBase EcoGrid DataBase Integrated layers Integrated layers GARP rule set

Temperature Interpolation Workflow Training sample GARP rule set Test sample Weather station temperature data EcoGrid Query EcoGrid Query Layer Integration Sample Data + A3 + A2 + A1 Data Calculation Map Generation Validation User Model quality parameters Prediction map: Interpolated temperature grid Environmental layers: elevation, aspect, land cover Generate Metadata Archive To Ecogrid Selected prediction maps Physical Transformation Scaling EcoGrid DataBase EcoGrid DataBase EcoGrid DataBase EcoGrid DataBase Integrated layers Integrated layers GARP rule set

Temperature Interpolation Workflow Training sample GARP rule set Test sample Sinkhole occurrence EcoGrid Query EcoGrid Query Layer Integration Sample Data + A3 + A2 + A1 Data Calculation Map Generation Validation User Model quality parameters Prediction map: Sinkhole distribution Environmental layers: Groundwater level, chemistry, etc Generate Metadata Archive To Ecogrid Selected prediction maps Physical Transformation Scaling EcoGrid DataBase EcoGrid DataBase EcoGrid DataBase EcoGrid DataBase Integrated layers Integrated layers GARP rule set

Exercise 1.Divide into groups of 4 (or so) with similar research interests 2.Pick a research topic to collaborate on 3.Construct a workflow diagram for an analysis that could be conducted 4.Discuss how it could be reused for other related or unrelated analyses