Presentation is loading. Please wait.

Presentation is loading. Please wait.

Database management system Data analytics system:

Similar presentations


Presentation on theme: "Database management system Data analytics system:"— Presentation transcript:

1 Database management system Data analytics system:
Modeling change from large-scale high-dimensional spatio-temporal array data Meng Lu and Edzer Pebesma Institute for Geoinformatics, University of Muenster, Germany Contact: Meng Lu Introduction The massive data that comes from earth observation satellites and other sensors provide significant information for modeling global change. At the same time, high dimensionality and large size of the data has brought challenges in data acquisition, management, effective querying and processing. In addition, the output of earth system modeling tends to be data intensive and needs methodologies for storage, validation, analysis and visualization (e.g. as maps). An important proportion of earth system observations and simulated data can be represented as multi-dimensional array data, which has received increasing attention in big data management and spatio-temporal analysis. Array based data management and analysis brings opportunities in modeling change with large-scale high-dimensional data. Examples of Array Data 1-D: Time series D: Satellite images D: Image time series 3-D: Sediment/nutrient in flow D: Hyper-spectral remote sensing time series data x,y stand for spatial coordinates, t stands for time, z for height Time series analysis Goal Spatial-temporal change detection and quantification from multi-dimensional array data Time domain Frequency domain Trend analysis Pattern identification Seasonal variation Periodical pattern Forecasting Other cyclic variation Multi-dimensional array on t Change detection in NDVI time series Source: Verbesselt et al. (2009) Analysis in frequency domain: Spectrum density estimation Spatio-temporal analysis Research Questions How to model spatio-temporal change? • How to reduce dimensions spatially and temporally, or thematically? How to analyze array data? • How to extend existing GIS functions to work on multidimensional arrays? • How to combine data sets of different dimensionality or different resolutions? • Can map algebra be extended to an intelligible array algebra? • In what sense are space and time special, as dimensions, compared to other properties? Dimension Reduction --Practical PCA (Principle Component Analysis) Works Time Spatial rainfall variability 1) What is the spatial variability of the long- term rainfall? 1967 1968 1969 1 245 286 256 2 223 271 268 3 234 264 253 …. Gage ID 2-Dimensional array data for summer average rainfall amount, from year 1967 to 1999, 83 gages selected Map of rainfall gage network in a semi-arid watershed (ca. 150 square Km) (Walnut Gulch Experimental Watershed, Arizona, US) Temporal rainfall variability 2) What is the temporal variability of rainfall within the whole watershed ? reflectance PC 1 PC 2 PC 3 PC 4 loading loading loading loading Spatial distribution of PC1 loadings (top), PC2 loadings (middle) and PC3 loadings (bottom); gage as variable, time as record Tools: Database Management Systems PC 1st 2nd 3rd 4th Others Proportion of variance 80% 6% 3% 2.3% 8.7% Data analytics system year year year year Database management system + Data analytics system: The way to go? PC 1 PC 2 PC 3 PC 4 Variance explained by each PC, gage as variable, time as record Interpretation: Each of the circle represent the loadings of PC for each gage. The size of the circle indicates the magnitude of the PC loadings (how much the variable contributes to the variance). The blue and black indicated negative and positive of PC loadings. The first PC explains more than 80% of the total variance. The rather uniform distribution of PC1 loadings suggests that the spatial variability of climate across the watershed is small. PC2 contributes to 6% of total variance, but is more interesting. It forms a loading pattern that varies from west to east, which could be interpreted as weather variability. PC3 also shows spatial weather pattern, which is likely to show the variability from south to north. Noting that the PC3 explains less variance than PC2, the signal in west-east direction is dominating the spatial rainfall variability related to weather. A programming language for statistics and graphics Open source Difficult to scale R Main memory SciDB Array-based Database management and analytics system Open source Partially open source Scalable commercial version scalable AQL (Array Query Language) AFL (Array Functional Language) mixture of SQL syntax and trees of algebraic operators RaSQL (Raster SQL) based on SQL-92 and implemented array algebra Chunks maps onto a disk block. Tiles stored as BLOBs (binary large objects) in RDBMS R and Python interface No R or Python interface General features License Scalability Query Language Storage Interaction with .. predicted predicted predicted predicted Gage ID Gage ID Gage ID Gage ID Loadings of PC 1 to PC 4 (up) and predictions using different loadings from PC1 to PC4 (bottom). Time as variable, gage as record Interpretation: the loadings of PC1 to PC4 represent the variability of each year that contribute to the PC. Note that when using gages as variables (shown on right), the first PC represents the mean rainfall of the period. The data has been centered with the removal of the long-term climate information. The weather variability in each gage could be observed from each of the PCs. The bottom plot shows the predictions (scores) for each record, from which the variability of rainfall that each gage received could be observed.


Download ppt "Database management system Data analytics system:"

Similar presentations


Ads by Google