Deb Agarwal (UCB and LBNL) Catharine van Ingen (MSFT) Berkeley Water Center Microsoft TCI IndoFlux Meeting, Chennai, India, July.

Slides:



Advertisements
Similar presentations
Early Experience Prototyping a Science Data Server for Environmental Data Deb Agarwal (LBL) Catharine van Ingen (MSFT) 25 October 2006.
Advertisements

Abuse Testing Laboratory Management Laboratory Management.
MICHAEL MARINO CSC 101 Whats New in Office Office Live Workspace 3 new things about Office Live Workspace are: Anywhere Access Store Microsoft.
NG-CHC Northern Gulf Coastal Hazards Collaboratory Simulation Experiment Integration Sandra Harper 1, Manil Maskey 1, Sara Graves 1, Sabin Basyal 1, Jian.
Flux Data Server User Tutorial Deb Agarwal, Catharine van Ingen, Susan Holladay, and Misha Krassovski Berkeley Water Center (UCB, LBL), ORNL, and Microsoft.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
C van Ingen, D Agarwal, M Goode, J Gupchup, J Hunt, R Leonardson, M Rodriguez, N Li Berkeley Water Center John Hopkins University Lawrence Berkeley Laboratory.
Proposed Microsoft Water TCI+ Development of the AmeriFlux and Central Valley Data Portals conducted through a partnership with the Berkeley Water Center.
OLAP Cubes and Pivot Tables Leveraging the Power of a Microsoft EPM Solution EPM Customization Series Part 1 February 21 st, 2007 Brendan Giles, PMP, MCP.
Time Series Analyst An Internet Based Application for Viewing and Analyzing Environmental Time Series Jeffery S. Horsburgh Utah State University David.
Berkeley Water Center Early Experience Prototyping a Science Data Server for Environmental Data Deb Agarwal, LBL Catharine van Ingen,
Development of a Community Hydrologic Information System Jeffery S. Horsburgh Utah State University David G. Tarboton Utah State University.
Integrating Historical and Realtime Monitoring Data into an Internet Based Watershed Information System for the Bear River Basin Jeff Horsburgh David Stevens,
CLIMATE SCIENTISTS’ BIG CHALLENGE: REPRODUCIBILITY USING BIG DATA Kyo Lee, Chris Mattmann, and RCMES team Jet Propulsion Laboratory (JPL), Caltech.
CS2032 DATA WAREHOUSING AND DATA MINING
SAN DIEGO SUPERCOMPUTER CENTER Developing a CUAHSI HIS Data Node, as part of Cyberinfrastructure for the Hydrologic Sciences David Valentine Ilya Zaslavsky.
1 Alternate Title Slide: Presentation Name Goes Here Presenter’s Name Infrastructure Solutions Division Date GIS Perfct Ltd. Autodesk Value Added Reseller.
SQL Reporting Services Overview SSRS includes all the development and management pieces necessary to publish end user reports in  HTML  PDF 
Deborah Agarwal BWC technical team 16 July Applications of eddy covariance measurements, Part 1: Lecture on Analyzing and Interpreting CO2 Flux.
1 Building National Cyberinfrastructure Alan Blatecky Office of Cyberinfrastructure EPSCoR Meeting May 21,
TeraGrid Gateway User Concept – Supporting Users V. E. Lynch, M. L. Chen, J. W. Cobb, J. A. Kohl, S. D. Miller, S. S. Vazhkudai Oak Ridge National Laboratory.
ArcGIS Workflow Manager An Introduction
Fluxdata.org FLUXNET Dataset Synthesis Support Deb Agarwal (LBNL) Catharine van Ingen (Microsoft) Fluxdata Team: Marty Humphrey (UVa), Norm Beekwilder.
Overview of SQL Server Alka Arora.
STATISTICS Microsoft Excel “Frequency Distribution”
Deborah Agarwal (UCB/LBL) Catharine van Ingen (MSR) Berkeley Water Center 22 October 2007.
Web 2.0: Concepts and Applications 6 Linking Data.
Virtual techdays INDIA │ November 2010 PowerPivot for Excel 2010 and SharePoint 2010 Joy Rathnayake │ MVP.
Cognos TM1 Satya Mobile:
Office Live Workspace Visio 2007 Outlook 2007 Groove 2007 Access 2007 Excel 2007 Word 2007.
© 2008 Ocean Data Systems Ltd - Do not reproduce without permission - exakom.com creation Dream Report O CEAN D ATA S YSTEMS O CEAN D ATA S YSTEMS The.
KM Technology Assessment “Knowledge and team collaboration servers” DSC8030/CIS8260 Dr. Samaddar Summer 2004 Jon A. Preston.
Module 10: Monitoring ISA Server Overview Monitoring Overview Configuring Alerts Configuring Session Monitoring Configuring Logging Configuring.
Agenda TimeSession 9:15Microsoft Business Intelligence Overview Break 10:40Creating High Impact Data Warehouse with Integration and Analysis Services 11:55Lunch.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
CS480 Computer Science Seminar Introduction to Microsoft Solutions Framework (MSF)
Microsoft Business Intelligence Environment Overview.
Deb Agarwal abd Marty Humphrey e Norman Beekwilder e Monte Goode abd
Enhancing Linkages Between Projects and Datasets: Examples from LBA-ECO for NACP Lisa Wilcox, Amy L. Morrell,
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
Data Management BIRN supports data intensive activities including: – Imaging, Microscopy, Genomics, Time Series, Analytics and more… BIRN utilities scale:
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
Fisheries Oceanography Collaboration Software Donald Denbo NOAA/PMEL-UW/JISAO Presented by Nancy Soreide NOAA/PMEL AMS 2002/IIPS 10.3.
Center Pixel Value Mean Value of All Pixels Percent of Pixels that meet QC Criteria Web and Web Services based tool that provides subsets and visualization.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
N ATIONAL E NERGY R ESEARCH S CIENTIFIC C OMPUTING C ENTER 1 NERSC Visualization Greenbook Workshop Report June 2002 Wes Bethel LBNL.
CUAHSI HIS Survey at Berkeley Seongeun Jeong and Xu Liang Department of Civil & Environmental Engineering UC Berkeley.
ORNL DAAC MODIS Subsetting and Visualization tools Tools and services to access subsets of MODIS data Suresh K. Santhana Vannan National Aeronautics and.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Reporting and Analysis With Microsoft Office. Reporting and Analysis Business User Reporting & Analysis OLAP Data Warehouse.
By N.Gopinath AP/CSE. There are 5 categories of Decision support tools, They are; 1. Reporting 2. Managed Query 3. Executive Information Systems 4. OLAP.
NQuery: A Network-enabled Data-based Query Tool for Multi-disciplinary Earth-science Datasets John R. Osborne.
Microsoft Management Seminar Series SMS 2003 Change Management.
Mercury – A Service Oriented Web-based system for finding and retrieving Biogeochemical, Ecological and other land- based data National Aeronautics and.
Vegetation Index Visualization of individual composite period. The tool provides a color coded grid display of the subset region. The tool provides time.
TeraGrid Gateway User Concept – Supporting Users V. E. Lynch, M. L. Chen, J. W. Cobb, J. A. Kohl, S. D. Miller, S. S. Vazhkudai Oak Ridge National Laboratory.
Computer Applications Chapter 16. Management Information Systems Management Information Systems (MIS)- an organized system of processing and reporting.
GOOGLE FUSION TABLES: WEB- CENTERED DATA MANAGEMENT AND COLLABORATION HectorGonzalez, et al. Google Inc. Presented by Donald Cha December 2, 2015.
Features Of SQL Server 2000: 1. Internet Integration: SQL Server 2000 works with other products to form a stable and secure data store for internet and.
Deb Agarwal (BWC), Marty Humphrey (Uva), and Norm Beekwilder (Uva)
Open Science Framework Jeffrey Spies University of Virginia.
ORNL DAAC MODIS Land Product Subsets 1 Suresh K. Santhana Vannan, Robert B. Cook, Bruce E. Wilson, Lisa M. Olsen Environmental Sciences Division, Oak Ridge.
Introduction to the Power BI Platform Presented by Ted Pattison.
The power of Power Pivot Cristian Nicola DynamicsBIGuide.com.
U.S. Department of the Interior U.S. Geological Survey July 2014 OPeNDAP Services – Present and Future at LP DAAC Brian Davis 1, Rob Quenzer 1, Jason Werpy.
Dan Fay Technical Computing Microsoft
Reporting and Analysis With Microsoft Office
Chapter 1 Database Systems
Staying afloat in the sensor data deluge
Chapter 1 Database Systems
Presentation transcript:

Deb Agarwal (UCB and LBNL) Catharine van Ingen (MSFT) Berkeley Water Center Microsoft TCI IndoFlux Meeting, Chennai, India, July 13, 2006 Designing CyberInfrastructure to Support End Science

Project Motivation l Data is now being gathered into common data archives l Data archives provide an opportunity for cross-discipline and cross-site investigations l Data analysis techniques which worked well on small data sets often do not scale l Current CS tools have evolved in support of other disciplines – Investigate their ability to facilitate data analysis

Distributed Data Sets Building BWC Water Cyberinfrastructure to Connect Data, Resources, and People Science Portal Data Harvesting and Transformations Data Cleaning, Models, Analysis Tools Computational Resources

Web Service Interface to Data and Tools Data Providers: Host Ameriflux Climate Data Statsgo Soils Data MODIS products Web-based Workbench access Tools: Statistical Graphical LAI Temp Fpar Veg Index Surf Refl NPP Albedo Choose Ameriflux Area/Transect, Time Range, Data Type Gap Fill, A technique Gap Fill, B technique Design Workflow Statistical & graphical analysis Canoak Model Site 9 Data harvest Sites 1-16 Canoak Model Site 1 Version control Network display LAI Statistical & Graphical analysis Data Cleaning Tools Data Mining and Analysis Tools Modeling Tools Visualization Tools Ecology Toolbox Compute Resources Carbon Community Workbench Climate Statsgo MODIS Import other Datasets Knowledge Generation Tools

Approach l Work closely with the end scientists to define, prototype, and test the system l Provide a solution that leverages both server-based and local desktop/laptop environments l Leverage commercial tools to the extent possible

Some Critical Capabilities l Support for versioning of data sets l Work with multiple data sets l Advanced data selection and plotting capabilities m Select data relative to an event m Simple calculation across any specified date range m Statistical information available m Plots - scatter, diurnal, time series, probability density function, tiled, correlation l Ability to access capabilities from desktop

Data Pipeline ORNL Ameriflux Site CSV Files BWC SQL Server Database Data Cube Excel Pivot Table and Chart

Data Cleaning and Versioning BWC SQL Server Database Excel spreadsheet of current data Investigator updated spreadsheet

Analysis Services Data Cube l An organized view of the data l A multi-dimensional view into the data l Can integrate multiple data sources l Define measures and dimensions m Measure – a value you want to be able to plot m Dimension – An axis you want to be able to use to select data and as axis l Calculations – define new measures

Precipitation trends and totals Summer precipitation: Tonzi and Vaira ~ 2% of total Metolius ~ 24% of total Walker Branch ~ 40% of total *Plot created by Gretchen Miller of UC Berkeley

Other applications *Plot created by Gretchen Miller of UC Berkeley

Observations by latitude *Plot created by Gretchen Miller of UC Berkeley

Observations by ecosystem type *Plot created by Gretchen Miller of UC Berkeley

Some Lessons Learned so Far l Data naming and unit consistency is critical to easy ingest of large amounts of data l Commercial tools do not necessarily provide all the right analysis capabilities directly l Scaling capabilities of the tools not yet clear l We will need tools to aid in notification of PIs

Portal Deployment l Behind the portal are a collection of databases and data cubes l Distribution for ease of use m Only see the data of interest m Private data remains stable l Distribution for scaling m Smaller queries on smaller databases take less resources m Larger databases and cubes can be replicated across machines l Batch job like infrastructure for managing very long running queries

Acknowlegements l Science Team m Dennis Baldocchi m Bev Law m Gretchen Miller l Cyberinfrastructure m Matt Rodriguez m Monte Goode l Microsoft m Tony Hey m Nolan Li l Oak Ridge National Lab CDIAC personnel l Berkeley Water Center m Yoram Rubin m Susan Hubbard

URLs and Connection Coordinates l Web Site m l Blog m l m