Presentation on theme: "Phase 4(c) GPS Transient Detection Using Principal Components Analysis, Hidden Markov Modeling, and Covariance Descriptor Analysis 1 NASA / Caltech / JPL."— Presentation transcript:
Phase 4(c) GPS Transient Detection Using Principal Components Analysis, Hidden Markov Modeling, and Covariance Descriptor Analysis 1 NASA / Caltech / JPL / Instrument Software and Science Data Systems This presentation Copyright 2011 California Institute of Technology. US Government Support Acknowledged. Robert Granat, Sharon Kedar, Jay Parker Jet Propulsion Laboratory California Institute of Technology Yehuda Bock Scripps Institution of Oceanography University of California, San Diego
Introduction and Approach Separate detection and modeling steps. Multi-step data cleaning process used to draw out hidden transient signals Statistical, data-driven approaches used for detection to avoid bias from assuming certain physical models Multi-method voting scheme used for detection Simplex modeling used to characterize fault movement and geometry after detection All steps can be completely automated 2 NASA / Calech / JPL / Instrument Software and Science Data Systems
Data Cleaning Smooth time series via convolution with a width five Gaussian kernel Remove secular motion via iteratively reweighted least squares linear fit Remove spike noise via application of the Grubbs criteria Remove seasonal signals via iteratively reweighted least squares fit of sum of sines (with year and half year periods) Remove common mode error by applying probabilistic principal components analysis (PPCA) and subtracting the first component from each station Remove remaining periodic signals by fitting sum of sines with half year periods NASA / Caltech / JPL / Instrument Software and Science Data Systems 3
Three Analysis Methods Probabilistic PCA Similar to standard PCA, but represents the observation as a linear transformation of some k-dimensional latent variable plus a Gaussian Noise term Latent variable is estimated via Expectation-Maximization (EM) Robust to missing data (estimated) As with PCA, columns (components) of the PPCA transformation can be interpreted as modes of the system Hidden Markov Modeling Models the system as switching between discrete states over time State outputs are probability distributions that generate the observations Can find the optimal fit of the model to observations via EM Correlated state changes between stations indicate regional events Covariance Descriptor Analysis Based on the calculation of the generalized distance metric between covariance matrices Calculating covariance matrices of time series sequences or sub-sequences enables comparison via this metric NASA / Caltech / JPL / Instrument Software and Science Data Systems 4
Covariance Descriptor Analysis 5 NASA / Caltech / JPL / Instrument Software and Science Data Systems C1C1 C2C2 CNCN
Using Covariance Descriptors 6 NASA / Calech / JPL / Instrument Software and Science Data Systems for We can calculate the so-called “divergence” between two time series using the generalized covariance distance metric [Tuzel 2008]: Because the metric is based on the time series statistics, we can easily compare time series of different lengths. To find anomalies, we compare sub-windows of individual GPS time series with a “training set” from that same time series that contains nominal behavior. Windows with a high divergence indicate anomalous activity as compared to the normal behavior of the station.
Sub-networks Event localization was performed by dividing the network into sub-networks and performing individualized analysis on each NASA / Caltech / JPL / Instrument Software and Science Data Systems 7
Hidden Markov Model results NASA / Caltech / JPL / Instrument Software and Science Data Systems 8 Above: Total state number of state changes per day for the sub-network ecfs62 Below: Total number of state changes per day for the sub-network mhms40
Covariance Descriptor Results NASA / Caltech / JPL / Instrument Software and Science Data Systems 9 Average covariance divergence over all stations in the network between the entire time series and all sub-windows of length 100 days.
PPCA Results NASA / Caltech / JPL / Instrument Software and Science Data Systems 10 The first three PPCA components of the sub-network mhms40 in the E-W (top), N-S (middle), and vertical (bottom) directions. Note the large motion around days 2000-2100 in the first (red) component in all three directions.
Strain Movie 11 NASA / Caltech / JPL / Instrument Software and Science Data Systems Snap shots of the horizontal strain field estimated before, during, and after the time period spanning the transient event. Although was not useful as a detection mechanism, due to the large background noise levels, it has a positive confirmation of an event taking place in the Whittier area during the suspected transient period.
Detection! HMM and PPCA indicated an event in the mhms40 sub- network around days 2000-2100 CDA indicated no event Majority vote indicates detection We then used Simplex to find a fault geometry and motion model to match the observed signal NASA / Caltech / JPL / Instrument Software and Science Data Systems 12
Simplex: Okada Dislocation & Downhill Simplex (Numerical Recipes) Disloc (forward): sums Okada (BSSA 1985) fault patch slips, for surface displacements Homogeneous, isotropic, infinite elastic half space Patch, slip defined by P = (x,y, ϕ,d, δ,u1,u2,u3,l,w) Chisquare: Compares observed surface displacement with calculated (Disloc) values; scale by uncertainty –Χ 2 = Σ(obs i – calc i ) 2 /σ i 2 Simplex (inversion): – Initial set of “ P ” (patches, slips): some fixed, N free – Displacement observations at k points (obs i ) – Create N perturbed models (one parameter perturbed in each model). – Compute N sets of Disloc surface displacements at the k points, and N Χ 2 Iterate (four allowed steps): – If improves worst-vertex Χ 2, reflect thru opposite face: – Or, reflect with 2x expansion: – Or, contract by 0.5: – Or contract all towards best vertex: – (Simulated Annealing variations are also supported) – Iterate to convergence (a local minimum Χ 2, or report FAIL) Report best slip/fault model, residuals ENUENU P 1 Χ 2 1. P N Χ 2 N N-dimensional simplex of models (Hurst et al, GRL 2000)
Simplex Results NASA / Caltech / JPL / Instrument Software and Science Data Systems 14 Map view of simplex estimated Fault #1 (whc1 analysis).
Simplex Results Fault #1 x = 8.1 +/- 0.6 km (relative to station whc1, which is at (lon lat) (- 118.031200 33.9799000)) y = -4.4 +/- 1.1 strike = -57.86 +/- 3.8 deg dip = 37.24 +/- 2.4 depth = 3.58 +/- 0.4 km width = 2.79 +/- 0.8 length = 16.39 +/- 1.7 strike-slip = -0.87 +/- 3.0 mm dip-slip = 70.09 +/- 15. (this fault dips to the NE) The chisquare is 1.33839; the coefficient of determination (fraction explained variance) is 0.6. Fault vertices (in map view, lon, lat coordinates; see Figure 2) are -117.943,33.941 and -118.094,34.019 (at base of fault slip, that is 3.58 km depth); -118.106,34.002 and -117.9562,33.924 (at top of fault slip, that is at 1.89 km). NASA / Caltech / JPL / Instrument Software and Science Data Systems 15
Conclusion: Towards Automated Operation The three analysis methods require (almost) no pre-tuning or knob-turning and can be run in an automated fashion on incoming data streams (a “plumbing” issue) Detection criteria for individual methods is a little trickier – need to set good and principled thresholds for CDA and HMM detections, and an objective criteria for PPCA detections. A voting scheme works well to reduce false alarms – this is something the transient detection community at large should consider when moving to automated operation. Simplex can be automated via simple grid search over the candidate region; individual simplex runs are computationally cheap. Some cleverness on picking initial parameters can save effort. Question: What trade-offs do we want to make (e.g., false alarms vs. detections)? 16 NASA / Calech / JPL / Instrument Software and Science Data Systems Special thanks to the NASA AIST program, which funded this work