Deborah Agarwal BWC technical team 16 July 2007. 1.Applications of eddy covariance measurements, Part 1: Lecture on Analyzing and Interpreting CO2 Flux.

Slides:



Advertisements
Similar presentations
Early Experience Prototyping a Science Data Server for Environmental Data Deb Agarwal (LBL) Catharine van Ingen (MSFT) 25 October 2006.
Advertisements

Bwc technical team 24 July Background Carbon-climate researchers from around the world have contributed data toward global scale synthesis analyses.
17th February, 2000 by Maciej Korzeniowski (CERN-IT-IA-MI) 1 Oracle Discoverer Product Presentation  This is an ad hoc query and analysis tool for.
ETIS+: European Transport Policy Information System - Development and Implementation of Data Collection Methodology for EU Transport Modelling Funded by.
Flux Data Server User Tutorial Deb Agarwal, Catharine van Ingen, Susan Holladay, and Misha Krassovski Berkeley Water Center (UCB, LBL), ORNL, and Microsoft.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Visibility Information Exchange Web System. Source Data Import Source Data Validation Database Rules Program Logic Storage RetrievalPresentation AnalysisInterpretation.
C van Ingen, D Agarwal, M Goode, J Gupchup, J Hunt, R Leonardson, M Rodriguez, N Li Berkeley Water Center John Hopkins University Lawrence Berkeley Laboratory.
Components of an Integrated Environmental Observatory Information System Cyberinfrastructure to Support Publication of Water Resources Data Jeffery S.
Time Series Analyst An Internet Based Application for Viewing and Analyzing Environmental Time Series Jeffery S. Horsburgh Utah State University David.
Chapter 3 Database Management
Lecture Microsoft Access and Relational Database Basics.
Berkeley Water Center Early Experience Prototyping a Science Data Server for Environmental Data Deb Agarwal, LBL Catharine van Ingen,
Development of a Community Hydrologic Information System Jeffery S. Horsburgh Utah State University David G. Tarboton Utah State University.
16 months…. The Visibility Information Exchange Web System is a database system and set of online tools originally designed to support the Regional Haze.
Integrating Historical and Realtime Monitoring Data into an Internet Based Watershed Information System for the Bear River Basin Jeff Horsburgh David Stevens,
Introducing the CUAHSI Hydrologic Information System Desktop Application (HydroDesktop) and Open Development Community Jiří Kadlec, Daniel Ames, Teva Velupillai.
Deployment and Evaluation of an Observations Data Model Jeffery S Horsburgh David G Tarboton Ilya Zaslavsky David R. Maidment David Valentine
State of Connecticut Core-CT Project Query 4 hrs Updated 1/21/2011.
José A. Blakeley Software Architect Database Systems Group Microsoft Corporation.
Raw data Hz HH data submitted for synthesis Flux calculation, raw data filtering Additional filtering for footprint or instrument malfunctioning.
Fluxdata.org FLUXNET Dataset Synthesis Support Deb Agarwal (LBNL) Catharine van Ingen (Microsoft) Fluxdata Team: Marty Humphrey (UVa), Norm Beekwilder.
About CUAHSI The Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI) is an organization representing 120+ universities.
SQL Server Reporting Services London Database Developer Forum Anoop Patel.
Function BIRN: Quality Assurance Practices Introduction: Conclusion: Function BIRN In developing a common fMRI protocol for a multi-center study of schizophrenia,
Deborah Agarwal (UCB/LBL) Catharine van Ingen (MSR) Berkeley Water Center 22 October 2007.
Agriculture and Agri-Food Canada Agriculture et Agroalimentaire Canada The North American Drought Monitor Status in Canada Presented by: E. G. (Ted) O’Brien.
FLUXNET: Measuring CO 2 and Water Vapor Fluxes Across a Global Network Dennis Baldocchi ESPM/Ecosystem Science Div. University of California, Berkeley.
Microsoft Business Intelligence Environment Overview.
Deb Agarwal abd Marty Humphrey e Norman Beekwilder e Monte Goode abd
Business Intelligence Zamaneh Jahed. What is Business Intelligence? Business Intelligence (BI) is a broad category of applications and technologies for.
Water Quality Data, Maps, and Graphs Over the Web · Chemical concentrations in water, sediment, and aquatic organism tissues.
CC&E Best Data Management Practices, April 19, 2015 Please take the Workshop Survey 1.
Using SAS® Information Map Studio
Enhancing Linkages Between Projects and Datasets: Examples from LBA-ECO for NACP Lisa Wilcox, Amy L. Morrell,
Chad Berkley NCEAS National Center for Ecological Analysis and Synthesis (NCEAS), University of California Santa Barbara Long Term Ecological Research.
Global map layers Additional global data sets such as Hydrology data (Hydrosheds), new and updated Landcover data (Globcover), demographic data and others.
Concepts of Database Management Seventh Edition
Deb Agarwal (UCB and LBNL) Catharine van Ingen (MSFT) Berkeley Water Center Microsoft TCI IndoFlux Meeting, Chennai, India, July.
Excel: Pivot Tables Exploring Computer Science Lesson Supplemental.
AmeriFlux and FLUXNET Report Dennis Baldocchi Bev Law AsiaFlux Workshop, 2008, Seoul, Korea.
Reporting and Analysis With Microsoft Office. Reporting and Analysis Business User Reporting & Analysis OLAP Data Warehouse.
1 Technology in Action Chapter 11 Behind the Scenes: Databases and Information Systems Copyright © 2010 Pearson Education, Inc. Publishing as Prentice.
Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.
Database Concepts Track 3: Managing Information using Database.
Abstract Analysis and Visualization of Hydrologic Data and Observations Catalogs Using the OLAP Data Cube Technology Ilya Zaslavsky a, Matthew Rodriguez.
Building Dashboards SharePoint and Business Intelligence.
Mercury – A Service Oriented Web-based system for finding and retrieving Biogeochemical, Ecological and other land- based data National Aeronautics and.
Using a Global Flux Network—FLUXNET— to Study the Breathing of the Terrestrial Biosphere Dennis Baldocchi ESPM/Ecosystem Science Div. University of California,
Today’s Goals Answer questions about homework and lecture 2 Understand what a query is Understand how to create simple queries using Microsoft Access 2007.
Abstract OLAP Cube Visualization of Hydrologic Data Catalogs Ilya Zaslavsky a, Matthew Rodriguez a, Bora Beran b, David Valentine a, Jillian Wallis c,
Vegetation Index Visualization of individual composite period. The tool provides a color coded grid display of the subset region. The tool provides time.
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
MSR Internship 07 What’s cooking ? Jayant Gupchup MSR Intern 07, Ph.D., The Johns Hopkins University.
MATT DIXONARCHITECT CANDACE REMALYPROJECT MANAGER SPENCER SMITHBUSINESS ANALYST BRYAN LINTHICUMDEVELOPER ADAM STERNFELDTESTER Not Even Funny [Property.
Deb Agarwal (BWC), Marty Humphrey (Uva), and Norm Beekwilder (Uva)
Data Management Practices for Early Career Scientists: Closing Robert Cook Environmental Sciences Division Oak Ridge National Laboratory Oak Ridge, TN.
The CarboeuropeIP Ecosystem Component Database: data processing and availability Dario Papale, Markus Reichstein.
The Bear River Watershed Information System Jeffery S. Horsburgh Utah Water Research Laboratory Utah State University David.
Reporting and Analysis With Microsoft Office
Exploring Computer Science Lesson Supplemental
Statistical Analysis with Excel
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Building an Observation Data Layer
Staying afloat in the sensor data deluge
Topic 11 Lesson 1 - Analyzing Data in Access
Biostatistics Lecture (5).
Integrated Statistical Production System WITH GSBPM
Login Main Functions Via SAS Information Delivery Portal
Presentation transcript:

Deborah Agarwal BWC technical team 16 July 2007

1.Applications of eddy covariance measurements, Part 1: Lecture on Analyzing and Interpreting CO2 Flux Measurements, Dennis Baldocchi, CarboEurope Summer Course, 2006, Namur, Belgium (

Carbon-Climate Analysis Goals Towers measure consistent carbon flux and micrometeorological parameters Tower researchers quality check data and then provide the data to regional archives. Regional and global carbon-climate analysis activities rely on data from regional archives Recent La Thuile workshop is gathering over 700 site-years of data available from over 200 sites around the world.

Measurements Are Often Not Simple or Complete Gaps in the data Quiet nights Bird poop High winds …. Difficult to make measurements Leaf area index Wood respiration Soil respiration … Localized measurements – tower footprint Local investigator knowledge important PIs’ science goals are not uniform across the towers

Typical Data Analysis Flow Today Validate & Quality Check data Perform Analysis Identify Data Issues & Retrieve Missing Data Retrieve Data from Source Matlab, Excel, S-Plus, PV-Wave, R, … Web page, , web service, phone, Student, Post-doc, Friend,...

Target Data Analysis Flow Scientific Data Server Validate & Quality Check data Perform Analysis Identify Data Issues & Obtain Missing Data Obtain Data from Source Track versions, results, and provenance

Scientific Data Server - Overview Databases Data Cubes User Interfaces Data Ingest

Database All descriptive metadata and data held in relational databases Metadata is important too! While separate databases are shown, the datasets may actually reside in a single database Mapping is transparent to the scientist Separate databases used for performance Unified databases used for simplicity New metadata and data are staged with a temporary database Minimal quality checks applied All name and unit conversions Data may be exported to flat file, copied to a private MyDb database, directly accessed programmatically, or ?

Data Cubes A data cube is a database specifically for data mining (OLAP) Simple aggregations (sum, min, or max) can be pre-computed for speed Additional calculations can be computed dynamically or pre- computed Both operate along dimensions such as time, site, or datumtype Constructed from a relational database A specialized query language (MDX) is used Client tool integration is evolving Excel PivotTables allow simple data viewing More powerful analysis and plotting using Matlab and statistics software

Scientific Data Server – User Interface ORNL Ameriflux Web Site CSV Files BWC SQL Server Database Data Cube Reports, Web Services, Excel Pivot Table and Pivot Chart

Browsing For Data Availability Sites Reporting Data Colored by Year

Required variable reporting by site by year Each row corresponds to one site- year Each cell corresponds to one site year of (FC, CO2 or SCO2, UST, PAR or Rg, TA, and Rh or H2O). Color indicates: Red – likely not enough for processing - % <.3 reported (roughly less than 5K of 17.5K) Green – likely not enough for processing.3<%<.999 Yellow – may not be good for processing due to gap-filling - % >.999 Red CO2 (second column) can be ignored for cropland/grassland sites Sites shown are just a sample Of the 285 site years with good FC, 50 site years are missing one of (UST, PAR/Rg, and TA) and 79 sites have likely gap-filled data.

Obviously bad annual averages Data cube used to browse average yearly Rg values across all site-years 16 additional likely problematic site-years at 5 sites

Drill down to consistent (bad) daily values Data cube used to browse 2005 Rg values shows consistently high reporting (not just a few very large spikes) at Duke Forest sites

Drill down to Mead sites Daily average FC at each site shows likely units and/or sign issues

Simple check: the same FC value reported too often Database query returns count by (site, year, value) where count >= 500 Graph shows sum of (returned counts) by site and year Sites with high sums likely report very few unique FC values

Ameriflux data ~145 million daily values in cube Advanced calculations and statistics in cube Methods of providing ½ hourly data access efficiently Data quality assessment Collection and incorporation of biological data Fluxnet data ~210 million daily values in cube Waiting on gap filling and quality checking operations Data server, cubes, and reports will likely be the primary repository and access for researchers using the data Russian River Data ~23 million values in cube Integration with Matlab and GIS Additional of data from other sources (fish, sediment size, …)

Versioning of data and collection of data provenance Improved performance of database and cube Handling of biological data Advanced plotting capabilities Integration of collaborative tools Automation of data ingest, cube building, and report building MyDB and MyCube capabilities Integration of workflow capabilities Data server in a box Sociological change

Ameriflux collaboration is adopting the Data Server architecture for the data repository BWC Data Server will be hosting the Fluxnet dataset which is expected to be the foundation for a broad range of research investigations Ecological measurements are often “messy” Applying the Data Server to watersheds introduces many additional challenges

Berkeley Water Center, University of California, Berkeley, Lawrence Berkeley Laboratory Jim Hunt Matt Rodriguez Monte Good Rebecca Leonardson (student) Carolyn Remick Susan Hubbard Yoram Rubin Microsoft Catharine van Ingen Jayant Gupchup (student) Nolan Li (student) Tony Hey Dan Fay Stuart Ozer SQL product team Jim Gray Ameriflux Collaboration Dennis Baldocchi Beverly Law Tara Stiefl (student) Youngryel Ryu (postdoc) Gretchen Miller (student) Mattias Falk Tom Boden Bob Cook CarboEurope Collaboration Dario Papale Markus Reichstein *Project funded by Microsoft