NKS meets the Grid and e-Science NKS2003 Boston June 29 2003 Geoffrey Fox Community Grids Lab, Indiana University

NKS meets the Grid and e-Science NKS2003 Boston June 29 2003 Geoffrey Fox Community Grids Lab, Indiana University gcf@indiana.edu http:// www.grid2002.org

Moore’s Law for Sensors; Data Deluge e-Science Drivers –Science will be deluged with data from accelerators (LHC 10 petabytes/year), satellites (InSAR for earthquakes), telescopes, sensors, video surveillance … –Scientific research is distributed across the world Grid Technology aims to integrate distributed data, people, computers (simulation or better data-mining) –Commercial interest from “utility computing” etc. NKS can provide the underlying modeling approach? The total area of astronomical telescopes in m 2, and CCDs measured in Gigapixels, over the last 25 years. The number of pixels and the data double every year.

Database Analysis Visualization Exploration (Mathematica) Repositories Federated Databases Sensor Nets Streaming Data SERVOGrid Caricature Linked NKS Models (with each other and With data)

Simulations for Chaotic Earth Systems Earth systems are now thought to be chaotic: - Many scales in space and time - Most dynamics are fundamentally unobservable - Includes stochastic processes (random forcings) Examples include: - Weather and climate - Earthquakes and other crustal processes - Plate tectonics and mantle convection - Geodynamo Two possible approaches to forecasting and prediction: Deterministic: Solve differential equations with initial conditions, boundary conditions and fixed parameters. Critical problem is that many (or all) of these are unknown in nature. There is a data deluge but wrong data for PDE’s. Doomed as an approach to earthquake forecasting even with “Earth Simulator follow-on” 2009 Petaflop supercomputer Pattern Informatics and Complexity: The focus is on studying the manifold of all possible space- time patterns that a system can display, then using either pure observation or phenomenological dynamics constrained by the data we can actually observe. Successful examples of Pattern-based Forecasting include weather and El Nino forecasting.

Models of Processes with Many Scales in Length and Time Statistical Dynamics of an Earthquake Fault: The Burridge-Knopoff Slider Block Model R. Burridge and L. Knopoff, Bull. Seism. Soc. Am, 57, 341 (1967) The nearest-neighbor BK model was the first slider block model. Sticking points on the fault are represented by blocks having uniform loader spring constant K L (= k p in figure at right). Each block is connected to its 2d nearest neighbors (d = spatial dimension) by springs having constant K C ( = k c at right). A friction law prevents the blocks from sliding until sufficient force (stress) builds up. A simulated earthquake begins when the force on a block due to the plate motion reaches a stress threshold  F. The avalanche of failing blocks, triggered by stress transfer from sliding blocks, represents an earthquake. Earthquake work from John Rundle (University of Davis) as part of SERVOGrid – Solid Earth Research Virtual Observatory Grid – led by JPL CA Model for Earthquakes

Fault Network Model for Southern California Dynamics of Earthquakes from Numerical Simulations of CAs ”Virtual California” is a Cellular Automaton Model (J.B.R. et al., Phys. Rev. E, 61, 2418, 2000; P.B. Rundle et al., Phys. Rev. Lett., 87, 148501, 2001) Example of one of the large earthquakes that occur during a simulation.  = CFF Stress: Time vs. Space Buildup of Coulonb Failure Function stress over time and space. Horizontal lines are earthquakes Historic Earthquakes: Last 200 Years The historic record of earthquakes over the last 200 years is shown at left. The model fault system used for the simulations. San Andreas Fault Space (Fault Segments) Time (Years) A representation of the fault friction encoded via data assimilation of historic events Friction Model Large Event

Positively correlated: (red - red) & (blue - blue). Negatively correlated: (red - blue). Uncorrelated: (red - green) & (blue - green). JBR et al, Phys. Rev. E., v 61, 2000, & AGU Monograph “GeoComplexity & the Physics of Earthquakes” Method: Correlation operator methods are used to compute the characteristic basis patterns, or eigenpatterns, of the earthquake activity. These eigenpatterns represent the characteristic modes of correlation and anticorrelation of earthquake activity. The corresponding eigenvalues, or eigenprobabilities, are a measure of the contribution of given eigenpatterns to the overall activity during the time period of interest. Space-time Patterns in Earthquake Simulations The eigenpatterns represent the normal modes of the earthquake activity time series 215 sites = 215 Time series  Time 

“Pattern Informatics” Method is Somewhat Like Quantum Mechanics! Earthquake activity over a period of time can be represented by a state vector  (x,t), which can be written as a sum over KL eigenfunctions. Differences in state vectors have been found to represent a probability measure for future activity. Method analyzes the shifting patterns of earthquakes through time. Plot of Log 10  P(x), potential for large earthquakes, M  5, ~ 2000 to 2010 How to generate an earthquake forecast (~2000 to 2010) 1. Spatially coarse grain (tile) the region with boxes.1 o x.1 o on a side (~3000 boxes, ~ 2000 with at least one earthquake from 1932 to 2000). This scale is approximately the size of a M ~ 6 earthquake, although method seems to be sensitive down to a level of M  5. 2.  1 (x)  Temporal average of activity from 1932 to 1990 for large earthquakes 3.  2 (x)  Temporal average of activity from 1932 to 2000 for large earthquakes 4.  (x) =  2 (x) -  1 (x)  Change in average activity, 1990 to 2000, for large earthquakes 5.  P(x) = {  (x)} 2 -  Increase in probability for a large earthquake. Symbol <> represents spatial average. 6. Color code the result. From retrospective studies, we find that  P(x) measures not only the average change in activity of large events during 1990-2000 (triangles at right), but also indicates locations for future activity for the period ~ 2000 to 2010. (JB Rundle, KF Tiampo, W. Klein, JSS Martins, PNAS, v99, Supl 1, 2514-2521, Feb 19, 2002; KF Tiampo, KF Tiampo, JB Rundle, S. McGinnis, S. Gross and W. Klein, Europhys. Lett., 60, 481-487, 2002 )

Patterns in Nature: ENSO (El Nino Southern Oscillation) and The Pacific Decadal Oscillation ENSO is the leading principal component of equatorial sea surface temperature variability. The Pacific Decadal Oscillation (PDO) Index is the leading principal component of North Pacific monthly sea surface temperature variability (poleward of 20N for the 1900-93 period). ENSOs are now being forecast using Karhunen-Loeve (KL) Analysis, also called Empirical Orthogonal Function (EOF), or Principal Component Analysis (PCA) At right are shown typical wintertime sea surface temperature (colors), sea level pressure (contours) and surface windstress (arrows) anomaly patterns of ENSO & PDO. Differences between ENSO & PDO: 1. 20th century PDO "events" persist for 20-to-30 years, while typical ENSO events persist for 6 to 18 months. 2. The climatic fingerprints of the PDO are most visible in the North Pacific/North American sector, while secondary signatures exist in the tropics - the opposite is true for ENSO http://tao.atmos.washington.edu/pdo/

Pattern Informatics is Interesting But it is “only” qualitative as are many fields of science where “real theory” too complex –Note Computational Fluid Dynamics for aircraft is in contrast quantitative –Earth Science, Strong Interactions in particle physics, most biology don’t have quantitative practical models. Suggest we combine a new way of looking at things (NKS) and data-deluged science to make “Pattern Complexity” –Data is a function of space and time and will give both dynamics and boundary conditions (latter is “old science” view of data) We need Mathematica to become a Grid Service and Combine NKS with pattern informatics and data assimilation

Info Grid Multi Scale Parallel Computing Experiments GeoInformatics Workflow Integration NKS Approach General Complex Systems Simulations Load BalancingAlgorithms Integrated IDE Sensors/Satellites Other Fields X-Complexity Infrastructure e-Science Grid Computer Science Modeling Geology Clusters Grid Visualization Field Complex Fluids Stock Market Grid Portals Databases BioComplexity

Database Service Sensor Service Compute Service Parallel Simulation Service Middle Tier with XML Interfaces Exploration Service Application Service-1 Users Database Application Service-2 Application Service-3 CCE Control Portal Aggregation SERVOGrid Complexity Computing Environment CCE XML Meta-data Service Complexity (NKS Model) Simulation Service

Approach Build on e-Science methodology and Grid technology Geocomplexity and Biocomplexity applications with multi-scale models, scalable parallelism, data assimilation as key issues –Data and NKS driven models Use existing code/database technology (SQL/Fortran/C++) linked to “Application Web/OGSA services” –XML specification of models, computational steering, scale supported at “Web Service” level as don’t need “high performance” here –Allows use of Semantic Grid technology NKS Models WS linking to user and Other WS (data sources) Application WS

HPC Simulation Data Filter Data Filter Data Filter Data Filter Data Filter Distributed Filters massage data For simulation Other Grid and Web Services Analysis Control Visualize SERVOGrid (Complexity)Computing Model Grid OGSA-DAI Grid Services This Type of Grid integrates with Parallel computing Multiple HPC facilities but only use one at a time Many simultaneous data sources and sinks Grid Data Assimilation

Data Assimilation Data assimilation implies one is solving some optimization problem which might have Kalman Filter like structure As discussed by DAO at Earth Science meeting, one will become more and more dominated by the data (N obs much larger than number of simulation points). Natural approach is to form for each local (position, time) patch the “important” data combinations so that optimization doesn’t waste time on large error or insensitive data. Data reduction done in natural distributed fashion NOT on HPC machine as distributed computing most cost effective if calculations essentially independent –Filter functions must be transmitted from HPC machine

Distributed Filtering HPC Machine Distributed Machine Data Filter N obs local patch 1 N filtered local patch 1 Data Filter N obs local patch 2 N filtered local patch 2 Geographically Distributed Sensor patches N obs local patch >> N filtered local patch ≈ Number_of_Unknowns local patch Send needed Filter Receive filtered data In simplest approach, filtered data gotten by linear transformations on original data based on Singular Value Decomposition of Least squares matrix Factorize Matrix to product of local patches

NKS meets the Grid and e-Science NKS2003 Boston June 29 2003 Geoffrey Fox Community Grids Lab, Indiana University

Similar presentations

Presentation on theme: "NKS meets the Grid and e-Science NKS2003 Boston June 29 2003 Geoffrey Fox Community Grids Lab, Indiana University"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

NKS meets the Grid and e-Science NKS2003 Boston June 29 2003 Geoffrey Fox Community Grids Lab, Indiana University

Similar presentations

Presentation on theme: "NKS meets the Grid and e-Science NKS2003 Boston June 29 2003 Geoffrey Fox Community Grids Lab, Indiana University"— Presentation transcript:

Similar presentations

About project

Feedback