Presentation on theme: "Distributed Data Analysis & Dissemination System (D-DADS) Prepared by Stefan Falke Rudolf Husar Bret Schichtel June 2000."— Presentation transcript:
Distributed Data Analysis & Dissemination System (D-DADS) Prepared by Stefan Falke Rudolf Husar Bret Schichtel June 2000
Overview Environmental data are collected by multiple, disparate sources, e.g. individual EMPACT projects Each data collector presents their data in their own format making it difficult to find, access, read, and integrate the data Standard formats and distribution systems are required for data accessibility and integration of distributed data sets This proposal presents a distributed data analysis and delivery system that allows users data access from multiple sources Such a distributed data and analysis system can facilitate the sharing of environmental data
Interoperability “the ability to freely exchange all kinds of spatial information about the Earth and about objects and phenomena on, above, and below the Earth’s surface; and to cooperatively, over networks, run software capable of manipulating such information.” (Buehler & McKee, 1996) Such a system has two key elements: The exchange of meaningful information Cooperative and distributed data management A requirement for an effective distributed environmental data system is interoperability, referred to as,
Distributed Data Analysis & Dissemination System Distributed Data Analysis & Dissemination System D-DADS Specifications: Specifications: Use standardized form of data, metadata and access protocols Support distributed data archives, each run by its own providers Provide tools for data exploration, analysis and presentation Features: Features: The data are organized as multidimensional data cubes The dimensional data cubes are distributed but shared Analysis is supported by built-in and user functions D-DAS will use Online Analytical Processing (OLAP) for its underlying structure
On-line Analytical Processing OLAP A multidimensional data model making it easy to select, navigate, integrate and explore the data. An analytical query language providing power to filter, aggregate and merge data as well as explore complex data relationships. Ability to create calculated variables from expressions based on other variables in the database. Pre-calculation of frequently queried aggregated values, i.e. monthly averages, enables fast response time to ad hoc queries.
Fast Analysis of Shared Multidimensional Information (FASMI) (Nigel, P. “The OLAP Report”) Fast – The system is designed to deliver relevant data to users quickly and efficiently. Analysis – The capability to have users extract not only “raw” data but data that they “calculate.” Shared – The data and its associated access is distributed. Multidimensional – The key feature. The system provides a multidimensional view of the data. Information – The ability to disseminate large quantities of various forms of data and information. An OLAP system is characterized as and by:
Air Quality Multi-Dimensional Data Cube Multi-dimensional data models use inherent relationships in data to populate multidimensional matrices called data cubes. A cube's data can be queried using any combination of dimensions Hierarchical data structures are created by aggregating the data along successively larger ranges of a given dimension, e.g time dimension can contain the aggregates year, season, month and day.
A Possible Architecture of D-DADS There are four types of nodes in the system: Data Providers, Warehousers, Transformers and Users. The Users receive data on demand from the Providers through D-DADS
D-DADS Components Data Providers Collect and store data in their preferred format Warehousers Integrate and format data for data access and analysis in the form of data cubes. Also build translators Transformers Integrate and format existing data cubes to generate a new, virtual data cube Users Access and analyze distributed data cubes. (The public and EMPACT projects needing data from other EMPACT projects)
Example Application: Visibility D-DADS Visibility observations (extinction coefficient) are an indicator of air quality and serve as an important data set in research and the public’s understanding of air quality. A visibility D-DADS will consist of visual range observations and digital images from web cameras. Some visibility data cubes will be stored at CAPITA, updated daily Other data cubes will be stored as part of the IMPROVE database, EMPACT projects, and …
Example Viewer Map View Variable View Time View WebCam View The views are linked so that making a change in one view updates the other three views.
Participants Stefan Falke – EPA-EMPACT Bret Schichtel – Colorado State University Rudolf Husar – Washington University in St. Louis Current EMPACT projects collecting visibility observations (Boston, Microsoft OLAP Consultants