Presentation is loading. Please wait.

Presentation is loading. Please wait.

Deb Agarwal (UCB and LBNL) Catharine van Ingen (MSFT) Berkeley Water Center Microsoft TCI IndoFlux Meeting, Chennai, India, July.

Similar presentations


Presentation on theme: "Deb Agarwal (UCB and LBNL) Catharine van Ingen (MSFT) Berkeley Water Center Microsoft TCI IndoFlux Meeting, Chennai, India, July."— Presentation transcript:

1 http://esd.lbl.gov/BWC/ Deb Agarwal (UCB and LBNL) Catharine van Ingen (MSFT) Berkeley Water Center Microsoft TCI IndoFlux Meeting, Chennai, India, July 13, 2006 Designing CyberInfrastructure to Support End Science

2 Project Motivation l Data is now being gathered into common data archives l Data archives provide an opportunity for cross-discipline and cross-site investigations l Data analysis techniques which worked well on small data sets often do not scale l Current CS tools have evolved in support of other disciplines – Investigate their ability to facilitate data analysis

3 Distributed Data Sets Building BWC Water Cyberinfrastructure to Connect Data, Resources, and People Science Portal Data Harvesting and Transformations Data Cleaning, Models, Analysis Tools Computational Resources

4 Web Service Interface to Data and Tools Data Providers: Host Ameriflux Climate Data Statsgo Soils Data MODIS products Web-based Workbench access Tools: Statistical Graphical LAI Temp Fpar Veg Index Surf Refl NPP Albedo Choose Ameriflux Area/Transect, Time Range, Data Type Gap Fill, A technique Gap Fill, B technique Design Workflow Statistical & graphical analysis Canoak Model Site 9 Data harvest Sites 1-16 Canoak Model Site 1 Version control Network display LAI Statistical & Graphical analysis Data Cleaning Tools Data Mining and Analysis Tools Modeling Tools Visualization Tools Ecology Toolbox Compute Resources Carbon Community Workbench Climate Statsgo MODIS Import other Datasets Knowledge Generation Tools

5 Approach l Work closely with the end scientists to define, prototype, and test the system l Provide a solution that leverages both server-based and local desktop/laptop environments l Leverage commercial tools to the extent possible

6 Some Critical Capabilities l Support for versioning of data sets l Work with multiple data sets l Advanced data selection and plotting capabilities m Select data relative to an event m Simple calculation across any specified date range m Statistical information available m Plots - scatter, diurnal, time series, probability density function, tiled, correlation l Ability to access capabilities from desktop

7 Data Pipeline ORNL Ameriflux Site CSV Files BWC SQL Server Database Data Cube Excel Pivot Table and Chart

8 Data Cleaning and Versioning BWC SQL Server Database Excel spreadsheet of current data Investigator updated spreadsheet

9 Analysis Services Data Cube l An organized view of the data l A multi-dimensional view into the data l Can integrate multiple data sources l Define measures and dimensions m Measure – a value you want to be able to plot m Dimension – An axis you want to be able to use to select data and as axis l Calculations – define new measures

10

11

12 Precipitation trends and totals Summer precipitation: Tonzi and Vaira ~ 2% of total Metolius ~ 24% of total Walker Branch ~ 40% of total *Plot created by Gretchen Miller of UC Berkeley

13 Other applications *Plot created by Gretchen Miller of UC Berkeley

14 Observations by latitude *Plot created by Gretchen Miller of UC Berkeley

15 Observations by ecosystem type *Plot created by Gretchen Miller of UC Berkeley

16 Some Lessons Learned so Far l Data naming and unit consistency is critical to easy ingest of large amounts of data l Commercial tools do not necessarily provide all the right analysis capabilities directly l Scaling capabilities of the tools not yet clear l We will need tools to aid in notification of PIs

17 Portal Deployment l Behind the portal are a collection of databases and data cubes l Distribution for ease of use m Only see the data of interest m Private data remains stable l Distribution for scaling m Smaller queries on smaller databases take less resources m Larger databases and cubes can be replicated across machines l Batch job like infrastructure for managing very long running queries

18

19 Acknowlegements l Science Team m Dennis Baldocchi m Bev Law m Gretchen Miller l Cyberinfrastructure m Matt Rodriguez m Monte Goode l Microsoft m Tony Hey m Nolan Li l Oak Ridge National Lab CDIAC personnel l Berkeley Water Center m Yoram Rubin m Susan Hubbard

20 URLs and Connection Coordinates l Web Site m http://esd.lbl.gov/BWC l Blog m http://dsd.lbl.gov/BWC/amfluxblog l E-mail m bwc-tci@lists.berkeley.edu

21 http://esd.lbl.gov/BWC/


Download ppt "Deb Agarwal (UCB and LBNL) Catharine van Ingen (MSFT) Berkeley Water Center Microsoft TCI IndoFlux Meeting, Chennai, India, July."

Similar presentations


Ads by Google