Time and Depth Imaging Algorithms in a Hardware Accelerator Paradigm

Time and Depth Imaging Algorithms in a Hardware Accelerator Paradigm
G.Satta, A.Cristini, G.Siddi Moreau, G.Caddeo, D.Theis, C.Gallo, Z.Heilmann, E.Bonomi CRS4

Imaging and Numerical Geophysics at CRS4
Center for Advanced Studies, Research and Development in Sardinia Imaging and Numerical Geophysics at CRS4 Seismic imaging combined to High Performance Computing is a competitive sector where CRS4 is involved since 1992 Development and implementation of mathematical methods for ground characterization and subsurface reconstruction Wave propagation modeling; Seismic depth migration; Common-reflection-surface stacking; Seismic attributes computation; Inverse problem for environmental sciences

Outline Imaging Algorithms in a Hardware Accelerator Paradigm
Introduction 1 Motivation Data-driven solutions 2 Migration scheme 3 Synthetic data examples 4 Conclusion 5

Motivation HPC trends in seismic Highly accurate imaging techniques (example RTM) require an increase in powerful computational resources. Advances in computing led to an explosion in the amount of data being generated. Whereas the performance of a single core is non longer increasing like in the past.

… Hardware solutions Multicore systems Hybrid multicore systems
Motivation Hardware solutions … Multicore systems Hybrid multicore systems Dedicated Hardware (FPGAs)

Motivation In spite of their spectacular speed, FPGAs and multicores obey a restrictive programming paradigm that makes the numerical solution of many algebraic problems less attractive. For example, the current, easiest way to implement RTM is to use a finite-difference approximation and an explicit time marching scheme. Drawbacks: Limit on time-step size only conditionally stable Decreasing time step High-order schemes numerical dispersion problems

Motivation The real challenge for the developer is the reduction of a mathematical model to a sequence of computational tasks that perfectly fit the paradigm supported by these extreme architectures.

Data-driven solutions
Algorithmic solution Data-driven CRS technique is the perfect example of such a problem that fits the challenge. Properties of this technique are: A concurrent optimization problem that maximizes the coherence of traveltime prototypes and recorded data, A lack of need for the numerical solution of algebraic equations, A lack of need to input medium velocity.

Data-driven solutions
The 3D ZO CRS problem The CRS traveltime formula approximated around the normal incidence ray: 2 angular parameters ( emergence angle & azimuth ) 3 Normal Wave front parameters 3 NIP Wave front parameters There are eight attributes to find for each pixel of the zero-offset section.

Semblance Coherence computation:
Data-driven solutions Semblance Coherence computation: each ti is provided by the CRS traveltime formula The eight parameters are determined by maximizing the semblance S. Since 0  S  1, it is a usual practice to solve an equivalent problem by minimizing F = 1−S.

From serial to parallel programming
Data-driven solutions From serial to parallel programming The serial CRS implementation takes advantage of: The overlapping of spatial apertures, The slight difference of the attributes between near pixels along the same trace.

From serial to parallel programming
Data-driven solutions From serial to parallel programming To extract two levels of parallelism: Reload the traces for every aperture, Do not take into consideration the similarity of attributes corresponding to neighboring pixels.

First level: spatial parallelism
Data-driven solutions First level: spatial parallelism Distributed Mem Cache D a t a p u l l P a r a d i g m

CPU Performance Running on Intel Xeon 5450 3GHz Data-driven solutions
209 seconds per output sample 116 seconds in line search

Multi-core Performance
Data-driven solutions Multi-core Performance 16

Second level of parallelism
Data-driven solutions Second level of parallelism FPGA/GPGPU ALU

Axis time parallelism with FPGA
Data-driven solutions Axis time parallelism with FPGA Each kernel works in one trace at a time to calculate the Semblance There is a double buffer to not lock the calculation Memory of PCI card Kernels Manager

Axis time parallelism with GPGPU
Data-driven solutions Axis time parallelism with GPGPU Every arithmetic logic unit (ALU) calculates a Semblance Every ALU reads the new traces Memory of PCI card ALU

Data Usage for multiple t0
Data-driven solutions Data Usage for multiple t0 Data Use with 1 t0 Data Use with 4 t0 Data Use with 16 t0 Data Use with 64 t0 20

Data-driven time imaging
Migration scheme Data-driven time imaging Is it possible to design an accurate scheme for time imaging (stacking and migration) that: Does not require the knowledge of a velocity model, Is solely driven by data, Uses the existing optimized CRS infrastructure, Perfectly fits the hardware accelerator? Work in progress: New traveltime formula with less attributes. …very fast time imaging!

Time migration: flow Chart
Migration scheme Time migration: flow Chart Input prestack data Fix a point in image space Choose an initial set of parameters Map the point into the stack space Collect seismic prestack traces Calculate the coherency Step of data-driven optimization Is the optimum reached? yes no

Same computing program but two outputs.
Migration scheme Same computing program but two outputs. Migrated volume Post-stack volume

Synthetic data examples
Migration scheme Synthetic data examples Dip Angle Time Velocity

Depth Migration algorithms
Depth migration scheme Depth Migration algorithms Question: What kind of algorithm has an intrinsic data parallel structure and maintains good numerical properties? 60x on FPGA A possible candidate: PSPI It was developed for data parallel architecture. First version for Connection Machine After many years of development, PSPI is a standard tool, but, of course, is a one-way method. 20x on GPGPU Is it possible to keep good numerical properties while preserving the data parallel structure of PSPI in order to solve the Helmholtz equation?

Depth migration scheme
Migration algorithms Our goal is to write a solution for the Helmholtz equation by factoring the two-way operator into two one-way operators. Both operators describe a propagation along the vertical axis where z becomes the marching variable. The factorization is exact only when media are uniform. For non-uniform media, an additive correction operator is necessary to iteratively compensate the truncation error.

Wedge model

Vertical gradient model
Depth migration scheme Vertical gradient model

Edge model

2.5D: Migration example Velocity model

2.5D: Migration example First iteration One-way Two-way
Depth migration scheme 2.5D: Migration example One-way Two-way First iteration

Thank you for your attention!
Algorithm Imaging in a Hardware Accelerator Paradigm Conclusions The perfect match between an algorithm and the corresponding hardware is possible at the price of rethinking the data structure and the algebraic problem. Two-way depth, data-driven stacking and time migration that fall in the above description can be potentially accelerated on the dedicated SIMD hardware. Thank you for your attention!

Time and Depth Imaging Algorithms in a Hardware Accelerator Paradigm

Similar presentations

Presentation on theme: "Time and Depth Imaging Algorithms in a Hardware Accelerator Paradigm"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Time and Depth Imaging Algorithms in a Hardware Accelerator Paradigm

Similar presentations

Presentation on theme: "Time and Depth Imaging Algorithms in a Hardware Accelerator Paradigm"— Presentation transcript:

Similar presentations

About project

Feedback