GCE Software Tools for Data Mining, Analysis and Synthesis Wade M. Sheldon Georgia Coastal Ecosystems LTER, University of Georgia, Athens, Georgia Introduction.

Slides:



Advertisements
Similar presentations
GCE Site and Information Management Overview Wade Sheldon GCE Information Manager.
Advertisements

WEB DESIGN TABLES, PAGE LAYOUT AND FORMS. Page Layout Page Layout is an important part of web design Why do you think your page layout is important?
GCE Data Toolbox for MATLAB Wade Sheldon Georgia Coastal Ecosystems LTER University of Georgia John Chamblee & Richard Cary Coweeta LTER University of.
Blair Sooley and Carl Wills of Trihedral Engineering.
Web Visualization Technology Horner APG Ver 1.0.
C van Ingen, D Agarwal, M Goode, J Gupchup, J Hunt, R Leonardson, M Rodriguez, N Li Berkeley Water Center John Hopkins University Lawrence Berkeley Laboratory.
Linking HIS and GIS How to support the objective, transparent and robust calculation and publication of SWSI? Jeffery S. Horsburgh CUAHSI HIS Sharing hydrologic.
For Mapping Biodiversity Data Data Management Options.
Guide to Oracle10G1 Introduction To Forms Builder Chapter 5.
Integrating Access with the Web and with Other Programs.
Time Series Analyst An Internet Based Application for Viewing and Analyzing Environmental Time Series Jeffery S. Horsburgh Utah State University David.
Chapter 12: ADO.NET and ASP.NET Programming with Microsoft Visual Basic.NET, Second Edition.
Welcome to EDINA Digimap Digimap is an EDINA service offering online access to a range of spatial data. It is authenticated using Athens and is available.
Integrating Historical and Realtime Monitoring Data into an Internet Based Watershed Information System for the Bear River Basin Jeff Horsburgh David Stevens,
1 Computing for Todays Lecture 22 Yumei Huo Fall 2006.
Querying a Database Microsoft Office Access 2003.
QUERYING A DATABASE By: Dr.Ennis-Cole. OBJECTIVES: Learn how to use the Query window in Design view Create, run and Save queries Define a relationship.
Copyright 2003 The McGraw-Hill Companies, Inc CHAPTER Application Software computing ESSENTIALS    
Chapter 7 Managing Data Sources. ASP.NET 2.0, Third Edition2.
EAP ILTER 9 July 2007 Don Henshaw Andrews Experimental Forest LTER Pacific Northwest Research Station, USFS Forest Service Oregon State University Corvallis,
Tutorial 11: Connecting to External Data
Version 4 for Windows NEX T. Welcome to SphinxSurvey Version 4,4, the integrated solution for all your survey needs... Question list Questionnaire Design.
Access Tutorial 3 Maintaining and Querying a Database
Air Quality Data Analysis Using Open Source Tools
XP New Perspectives on Microsoft Access 2002 Tutorial 71 Microsoft Access 2002 Tutorial 7 – Integrating Access With the Web and With Other Programs.
DEMONSTRATION FOR SIGMA DATA ACQUISITION MODULES Tempatron Ltd Data Measurements Division Darwin Close Reading RG2 0TB UK T : +44 (0) F :
Synthesis of Incomplete and Qualified Data using the GCE Data Toolbox Wade Sheldon Georgia Coastal Ecosystems LTER University of Georgia.
2. Introduction to the Visual Studio.NET IDE 2. Introduction to the Visual Studio.NET IDE Ch2 – Deitel’s Book.
MS Access Advanced Instructor: Vicki Weidler Assistant:
This presentation will guide you though the initial stages of installation, through to producing your first report Click your mouse to advance the presentation.
Classroom User Training June 29, 2005 Presented by:
Databases and LINQ Visual Basic 2010 How to Program 1.
ClimDB/HydroDB (ClimHy) Integration ClimHy has been migrated from AND to LNO and will remain status quo in 2011 – Public page (
Databases C HAPTER Chapter 10: Databases2 Databases and Structured Fields  A database is a collection of information –Typically stored as computer.
XP New Perspectives on Introducing Microsoft Office XP Tutorial 1 1 Introducing Microsoft Office XP Tutorial 1.
Carolina Environmental Program UNC Chapel Hill The Analysis Engine – A New Tool for Model Evaluation, Sensitivity and Uncertainty Analysis, and more… Alison.
Workshop on QC in Derived Data Products, Las Cruces, NM, 31 January 2007 ClimDB/HydroDB Objectives Don Henshaw Improve access to long-term collections.
Dynamic, Rule-based Quality Control Framework for Real-time Sensor Data Wade Sheldon Georgia Coastal Ecosystems LTER University of Georgia.
2. Introduction to the Visual Studio.NET IDE. Chapter Outline Overview of the Visual Studio.NET IDE Overview of the Visual Studio.NET IDE Menu Bar and.
Water Quality Data, Maps, and Graphs Over the Web · Chemical concentrations in water, sediment, and aquatic organism tissues.
GCE Data Toolbox -- metadata-based tools for automated data processing and analysis Wade Sheldon University of Georgia GCE-LTER.
EASI a free web database application for collecting and managing monitoring records.
An Internet of Things: People, Processes, and Products in the Spotfire Cloud Library Dr. Brand Niemann Director and Senior Data Scientist/Data Journalist.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
DATA, SITE AND RESOURCE MANAGEMENT SOFTWARE. A Windows application software designed for use with Stylitis data loggers. EMMETRON consolidates resources,
Trends Vision Long-term time series of climate, biogeochemical, biotic & population data Create an “atlas” of these data in graphical (graphs & maps) &
FIX Eye FIX Eye Getting started: The guide EPAM Systems B2BITS.
Strategies for Adding EML Support to the GCE Data Toolbox for Matlab Wade Sheldon Georgia Coastal Ecosystems LTER (WWW: gce-lter.marsci.uga.edu/lter)
ITGS Databases.
3 Copyright © 2004, Oracle. All rights reserved. Working in the Forms Developer Environment.
November 16, 2009 Page 1 of 28 Data and Data Management: Introduction to the BCO-DMO Presented to Professor Keiichi Uchida November 16, 2009 Robert C.
September 2012 Developed by Agricultural and Biological Engineering Department at Purdue University and Department of Regional Infrastructures Engineering.
McGraw-Hill/Irwin The Interactive Computing Series © 2002 The McGraw-Hill Companies, Inc. All rights reserved. Microsoft Excel 2002 Working with Data Lists.
XP New Perspectives on Microsoft Access 2002 Tutorial 31 Microsoft Access 2002 Tutorial 3 – Querying a Database.
T U T O R I A L  2009 Pearson Education, Inc. All rights reserved Address Book Application Introducing Database Programming.
XP New Perspectives on Microsoft Office Access 2003, Second Edition- Tutorial 8 1 Microsoft Office Access 2003 Tutorial 8 – Integrating Access with the.
Chapter – 8 Software Tools.
Enterprise Database Systems Introduction to SQL Server Dr. Georgia Garani Dr. Theodoros Mitakos Technological.
Lecture 11 Introduction to R and Accessing USGS Data from Web Services Jeffery S. Horsburgh Hydroinformatics Fall 2013 This work was funded by National.
Applied Cartography and Introduction to GIS GEOG 2017 EL Lecture-5 Chapters 9 and 10.
XP Creating Web Pages with Microsoft Office
Emdeon Office Batch Management Services This document provides detailed information on Batch Import Services and other Batch features.
Data Visualization with Tableau
Tutorial 3 – Querying a Database
Water Quality Portal Data Tools
Microsoft Office Access 2003
Microsoft Office Access 2003
Tutorial 7 – Integrating Access With the Web and With Other Programs
Microsoft Office Illustrated Fundamentals
Presentation transcript:

GCE Software Tools for Data Mining, Analysis and Synthesis Wade M. Sheldon Georgia Coastal Ecosystems LTER, University of Georgia, Athens, Georgia Introduction The GCE-LTER project and partner organizations (SINERR, UGAMI, USGS) collect extensive environmental monitoring data around Sapelo Island on the SE Georgia Coast. In order to put these observations into a broader spatial and temporal context, it is also important to compare these data with other environmental monitoring observations. Long-term, spatially-extensive climate and hydrologic databases maintained by the LTER Network (ClimDB/HydroDB), USGS (National Water Information System) and NOAA (National Weather Service) are valuable resources for broad-scale comparisons (fig.1). For example, the LTER Network’s ClimDB/HydroDB database contains approximately 7 million daily records for 281 monitoring stations at 39 LTER and USFS sites, providing critical support for LTER cross- site comparisons. USGS and NOAA collect long-term climate and hydrologic data from a vast array of locations across North America (~8000 real-time USGS monitoring stations and ~12,000 active NWS COOP weather stations), supporting truly large-scale regional analyses. The Challenge: Dealing with Differences Despite the fact that GCE, other LTER sites, USGS and NOAA all provide free access to long-term data on the World Wide Web (fig.2), obtaining all the data required for a typical synthesis project can be daunting. Finding relevant stations, navigating to data request pages, and choosing data sets and file formatting options require very different procedures on each site. Once data are located and downloaded other challenges arise, such as deciphering different file layouts, harmonizing parameter names and standardizing units. For example, total daily precipitation is variously reported as “Precip (total mm)” in mm from LTER ClimDB, “00045_00006” in inches from USGS, and “Prcp” in inches from NOAA. Conventions for reporting missing values and quality assurance flags also differ among databases. When many data sets are required for an analysis, the cumulative effort required to standardize them can be a limiting factor. The Solution: GCE Tools for Data Mining, Analysis & Synthesis Automating this cumbersome process is clearly necessary to support large-scale data synthesis. At the Georgia Coastal Ecosystems LTER we have developed a suite of MATLAB-based software tools (GCE Data Toolbox for MATLAB) that can help bridge this gap, allowing researchers to acquire, standardize and integrate data from many sources without manual reformatting. These tools can retrieve data from GCE, LTER ClimDB/HydroDB, USGS and NOAA databases directly over the Internet, and also load files downloaded from these program web sites as well as tabular data from a wide variety of other sources (e.g. delimited text files, MATLAB files, data logger files, and SQL database queries). Detailed metadata are automatically generated for each data set using predefined or user-customized templates to standardize column names and provide detailed data type and column units information to support automated analysis. Once data sets are acquired, a wide variety of graphical dialogs and command-line functions can then be used to manipulate, transform, and integrate data sets for analyses and plotting (see below). Powerful rule- based and visual Quality Control tools are also available to identify invalid or questionable values and flag or omit them from subsequent analyses. Data files can also be saved to disk and then indexed, searched and managed using a graphical search engine program distributed with the toolbox. Results from analyses can then be exported in various common formats (plain text, CSV, MATLAB arrays and matrices) for analysis in other data analysis and plotting programs. Experienced MATLAB users can also combine GCE Data Toolbox functions with their own code to automate large-scaled synthesis projects. Automated Data Retrieval A powerful feature of the GCE Data Toolbox is support for “mining” data from the USGS and LTER databases directly over the Internet, using either interactive graphical dialogs (fig.3) or scriptable command- line functions. Data can be retrieved from any USGS NWIS station or LTER climate site across the country, and browsable lists of stations grouped state/territory (USGS) or by site (LTER) are displayed in data request dialogs for selecting stations. This capability, combined with the metadata templating and QA/QC flagging features, allows users to simultaneously acquire and standardize large amounts of long-term climate and hydrologic data from across the U.S. with just a few mouse clicks or MATLAB commands. More Information For more information about the GCE Data Toolbox for MATLAB, or to download the software package (compatible with MATLAB 6.5 or higher, including student versions, running on Windows 2000/XP/Vista, Linux, Solaris and Mac OS/X), visit: Visualizing Patterns and Trends Several plotting tools are also provided with the GCE Data Toolbox to support data visualization as well as interactive QA/QC flagging with the mouse. A custom zoom toolbar is displayed on data plots to simplify axis scaling and stepping through time-series data sets to identify patterns of interest (fig.6). Plots can be customized and exported in many formats and resolutions for publication, and tools are also provided for automatically generating web pages containing thumbnail views of plots (linked to full-sized plots) at a specified temporal resolution (i.e. daily through decadal time step per plot). Integrating Data Sets Several automated and semi-automated tools are also available in the GCE Data Toolbox for integrating multiple data sets for synthesis and comparative analysis. Multiple related data sets (e.g. daily data files for a monitoring station) can be “merged” in one step to create a single time series, with all columns automatically aligned by parameter, data type and units to prevent inappropriate pairings. When data sets that contain overlapping date ranges are merged, records with overlapping dates from the chronologically earlier data set can be automatically trimmed if desired (i.e. to update old records and produce a continuous, monotonic time series). Data sets can also be “joined” by matching values in specified key columns to produce a composite data set containing a mixture of columns selected from both data sets (e.g. air temperature and wind speed from one data set and precipitation from another). Records are automatically aligned according to key column matches to mesh the data set records appropriately. For time-series data sets, key columns for date/time joins can also be identified automatically, greatly simplifying this process (fig.5). Transforming and Analyzing Data After data sets are acquired and standardized, a wide variety of tools are available for transforming and analyzing the data. Metadata are used to configure dialogs automatically and verify suitability of data selections for each transformation, assuring validity throughout processing. All transformation steps and data set changes are also automatically logged to the metadata to maintain a complete lineage of the data set, which can be reviewed at any time during processing and included in metadata files. Examples of transformations that can be performed: unit conversions (interactive and batch English↔Metric) filtering records based on column values or expressions sub-setting data by removing unneeded columns, rows statistical data reduction by aggregation, binning temporal re-sampling (date/time aggregation) (fig.4) generating derived data columns based on expressions splitting compound data series into separate columns Figure 3. Retrieving real-time data from USGS over the Internet and viewing the data using the GCE Data Toolbox (note that data can also be retrieved using command-line functions). A similar dialog is available for retrieving data from the LTER ClimDB/HydroDB database. Retrieved Data Set (Data View) (note values assigned flags highlighted in red) Figure 6. Left: time series plot of continuous CTD mooring data (note the scroll and zoom buttons at the bottom, and visual QA/QC tools for revising qualifier flags directly on plots). Right: map plot of CTD survey data, illustrating spatial patterns in surface salinity values. Figure 1. U.S. LTER Network sites (left) and real-time USGS stream-flow stations (right) active in 2008 Retrieved Data Set (Editor View) Figure 4. Data transformation using the GCE Data Toolbox date/time re-sampling tool. Numbers of flagged and missing values in the source data are automatically tallied in the derived data set, and these tallies can be used to flag statistical results automatically when user-set thresholds for flagged and/or missing values are exceeded. 30 minute, real-time data set Date/time re-sampling tool Monthly re-sampled data set Automatic Date/Time Join Dialog Independent Data Sets Plot of Integrated Data (3 Datasets) Figure 2. Web-based user interfaces to the GCE-LTER, LTER ClimDB/HydroDB, USGS NWIS and NOAA NWS climate/hydrologic databases USGS NWIS Data Retrieval Dialog Figure 5. Automatic date/time join to integrate multiple USGS stream flow datasets for comparison