Time and Depth Imaging Algorithms in a Hardware Accelerator Paradigm

Slides:

Advertisements

Similar presentations

Arc-length computation and arc-length parameterization

Advertisements

Edge Preserving Image Restoration using L1 norm

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Kinematic Synthesis of Robotic Manipulators from Task Descriptions June 2003 By: Tarek Sobh, Daniel Toundykov.

Seismic Reflection Processing Illustrations The Stacking Chart and Normal Moveout Creating a seismic reflection section or profile requires merging the.

Processing: zero-offset gathers

Stacked sections are zero offset sections

On Attributes and Limitations of Linear Optics in Computing A personal view Joseph Shamir Department of Electrical Engineering Technion, Israel OSC2009.

1 Xuyao Zheng Institute of Geophysics of CEA. 2 Outline 1.Motivation 2.Model and synthetic data 3.Calculation of Green functions 4.Pre-stack depth migration.

A New Block Based Motion Estimation with True Region Motion Field Jozef Huska & Peter Kulla EUROCON 2007 The International Conference on “Computer as a.

Reverse-Time Migration

Final Gathering on GPU Toshiya Hachisuka University of Tokyo Introduction Producing global illumination image without any noise.

Xi’an Jiaotong University 1 Quality Factor Inversion from Prestack CMP data using EPIF Matching Jing Zhao, Jinghuai Gao Institute of Wave and Information,

Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.

Earth Study 360 Technology overview

An approach for solving the Helmholtz Equation on heterogeneous platforms An approach for solving the Helmholtz Equation on heterogeneous platforms G.

COLLABORATIVE EXECUTION ENVIRONMENT FOR HETEROGENEOUS PARALLEL SYSTEMS Aleksandar Ili´c, Leonel Sousa 2010 IEEE International Symposium on Parallel & Distributed.

1 Hybrid methods for solving large-scale parameter estimation problems Carlos A. Quintero 1 Miguel Argáez 1 Hector Klie 2 Leticia Velázquez 1 Mary Wheeler.

06 - Boundary Models Overview Edge Tracking Active Contours Conclusion.

Raanan Dafni,  Dual BSc in Geophysics and Chemistry (Tel-Aviv University).  PhD in Geophysics (Tel-Aviv University).  Paradigm Geophysics R&D ( ).

A performance analysis of multicore computer architectures Michel Schelske.

An Introduction to Programming and Algorithms. Course Objectives A basic understanding of engineering problem solving process. A basic understanding of.

MS Thesis Defense “IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA” by Deepthi Gummadi CoE EECS Department April 21, 2014.

Attribute- Assisted Seismic Processing and Interpretation 3D CONSTRAINED LEAST-SQUARES KIRCHHOFF PRESTACK TIME MIGRATION Alejandro.

Robin McDougall Scott Nokleby Mechatronic and Robotic Systems Laboratory 1.

Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.

© 2013, PARADIGM. ALL RIGHTS RESERVED. Long Offset Moveout Approximation in Layered Elastic Orthorhombic Media Zvi Koren and Igor Ravve.

Pipelined and Parallel Computing Data Dependency Analysis for 1 Hongtao Du AICIP Research Mar 9, 2006.

Iterative Improvement Algorithm 2012/03/20. Outline Local Search Algorithms Hill-Climbing Search Simulated Annealing Search Local Beam Search Genetic.

QCAdesigner – CUDA HPPS project

Spatiotemporal Saliency Map of a Video Sequence in FPGA hardware David Boland Acknowledgements: Professor Peter Cheung Mr Yang Liu.

Overview of Stark Reality Plugins for OpendTect Coming soon to a workstation near you.

Ultrasonic reflectivity imaging with a depth extrapolation algorithm Ernesto Bonomi, Giovanni Cardone, Enrico Pieroni CRS4, Italy.

Wave-equation migration velocity analysis Biondo Biondi Stanford Exploration Project Stanford University Paul Sava.

Image-Based Rendering Geometry and light interaction may be difficult and expensive to model –Think of how hard radiosity is –Imagine the complexity of.

Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From

Graphics Processor Clusters for High Speed Backpropagation 2011 High Performance Embedded Computing Workshop 22 September 2011 Daniel P. Campbell, Thomas.

Using Neumann Series to Solve Inverse Problems in Imaging Christopher Kumar Anand.

CSCI 631 – Foundations of Computer Vision March 15, 2016 Ashwini Imran Image Stitching.

Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,

11/25/03 3D Model Acquisition by Tracking 2D Wireframes Presenter: Jing Han Shiau M. Brown, T. Drummond and R. Cipolla Department of Engineering University.

A Presentation on Adaptive Neuro-Fuzzy Inference System using Particle Swarm Optimization and it’s Application By Sumanta Kundu (En.R.No.

1 ”MCUDA: An efficient implementation of CUDA kernels for multi-core CPUs” John A. Stratton, Sam S. Stone and Wen-mei W. Hwu Presentation for class TDT24,

Sub-fields of computer science. Sub-fields of computer science.

Generalized and Hybrid Fast-ICA Implementation using GPU

Two-Dimensional Phase Unwrapping On FPGAs And GPUs

Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming

Parallel Plasma Equilibrium Reconstruction Using GPU

Ioannis E. Venetis Department of Computer Engineering and Informatics

Microarchitecture.

ECRG High-Performance Computing Seminar

Inverse scattering terms for laterally-varying media

Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek

Tohoku University, Japan

3D Graphics Rendering PPT By Ricardo Veguilla.

I. Tutorial: ISS imaging

Image Processing for Physical Data

17-Nov-18 Parallel 2D and 3D Acoustic Modeling Application for hybrid computing platform of PARAM Yuva II Abhishek Srivastava, Ashutosh Londhe*, Richa.

M-OSRP 2006 Annual Meeting, June 6, 2007

Objective of This Course

Bin Ren, Gagan Agrawal, Brad Chamberlain, Steve Deitz

Haiyan Zhang and Arthur B. Weglein

Direct horizontal image gathers without velocity or “ironing”

Some remarks on the leading order imaging series

Algorithms and Problem Solving

M. Kezunovic (P.I.) S. S. Luo D. Ristanovic Texas A&M University

COMPUTER ORGANIZATION AND ARCHITECTURE

An optimized implicit ﬁnite-difference scheme for the two-dimensional Helmholtz equation Zhaolun Liu Next, I will give u an report about the “”

Wave Equation Dispersion Inversion of Guided P-Waves (WDG)

Presentation transcript:

Time and Depth Imaging Algorithms in a Hardware Accelerator Paradigm G.Satta, A.Cristini, G.Siddi Moreau, G.Caddeo, D.Theis, C.Gallo, Z.Heilmann, E.Bonomi CRS4

Imaging and Numerical Geophysics at CRS4 Center for Advanced Studies, Research and Development in Sardinia Imaging and Numerical Geophysics at CRS4 Seismic imaging combined to High Performance Computing is a competitive sector where CRS4 is involved since 1992 Development and implementation of mathematical methods for ground characterization and subsurface reconstruction Wave propagation modeling; Seismic depth migration; Common-reflection-surface stacking; Seismic attributes computation; Inverse problem for environmental sciences

Outline Imaging Algorithms in a Hardware Accelerator Paradigm Introduction 1 Motivation Data-driven solutions 2 Migration scheme 3 Synthetic data examples 4 Conclusion 5

Motivation HPC trends in seismic Highly accurate imaging techniques (example RTM) require an increase in powerful computational resources. Advances in computing led to an explosion in the amount of data being generated. Whereas the performance of a single core is non longer increasing like in the past.

… Hardware solutions Multicore systems Hybrid multicore systems Motivation Hardware solutions … Multicore systems Hybrid multicore systems Dedicated Hardware (FPGAs)

Motivation In spite of their spectacular speed, FPGAs and multicores obey a restrictive programming paradigm that makes the numerical solution of many algebraic problems less attractive. For example, the current, easiest way to implement RTM is to use a finite-difference approximation and an explicit time marching scheme. Drawbacks: Limit on time-step size only conditionally stable Decreasing time step High-order schemes numerical dispersion problems

Motivation The real challenge for the developer is the reduction of a mathematical model to a sequence of computational tasks that perfectly fit the paradigm supported by these extreme architectures.

Data-driven solutions Algorithmic solution Data-driven CRS technique is the perfect example of such a problem that fits the challenge. Properties of this technique are: A concurrent optimization problem that maximizes the coherence of traveltime prototypes and recorded data, A lack of need for the numerical solution of algebraic equations, A lack of need to input medium velocity.

Data-driven solutions The 3D ZO CRS problem The CRS traveltime formula approximated around the normal incidence ray: 2 angular parameters ( emergence angle & azimuth ) 3 Normal Wave front parameters 3 NIP Wave front parameters There are eight attributes to find for each pixel of the zero-offset section.

Semblance Coherence computation: Data-driven solutions Semblance Coherence computation: each ti is provided by the CRS traveltime formula The eight parameters are determined by maximizing the semblance S. Since 0  S  1, it is a usual practice to solve an equivalent problem by minimizing F = 1−S.

From serial to parallel programming Data-driven solutions From serial to parallel programming The serial CRS implementation takes advantage of: The overlapping of spatial apertures, The slight difference of the attributes between near pixels along the same trace.

From serial to parallel programming Data-driven solutions From serial to parallel programming To extract two levels of parallelism: Reload the traces for every aperture, Do not take into consideration the similarity of attributes corresponding to neighboring pixels.

First level: spatial parallelism Data-driven solutions First level: spatial parallelism Distributed Mem Cache D a t a p u l l P a r a d i g m

CPU Performance Running on Intel Xeon 5450 3GHz Data-driven solutions 209 seconds per output sample 116 seconds in line search

Multi-core Performance Data-driven solutions Multi-core Performance 16

Second level of parallelism Data-driven solutions Second level of parallelism FPGA/GPGPU ALU

Axis time parallelism with FPGA Data-driven solutions Axis time parallelism with FPGA Each kernel works in one trace at a time to calculate the Semblance There is a double buffer to not lock the calculation Memory of PCI card Kernels Manager

Axis time parallelism with GPGPU Data-driven solutions Axis time parallelism with GPGPU Every arithmetic logic unit (ALU) calculates a Semblance Every ALU reads the new traces Memory of PCI card ALU

Data Usage for multiple t0 Data-driven solutions Data Usage for multiple t0 Data Use with 1 t0 Data Use with 4 t0 Data Use with 16 t0 Data Use with 64 t0 20

Data-driven time imaging Migration scheme Data-driven time imaging Is it possible to design an accurate scheme for time imaging (stacking and migration) that: Does not require the knowledge of a velocity model, Is solely driven by data, Uses the existing optimized CRS infrastructure, Perfectly fits the hardware accelerator? Work in progress: New traveltime formula with less attributes. …very fast time imaging!

Time migration: flow Chart Migration scheme Time migration: flow Chart Input prestack data Fix a point in image space Choose an initial set of parameters Map the point into the stack space Collect seismic prestack traces Calculate the coherency Step of data-driven optimization Is the optimum reached? yes no

Same computing program but two outputs. Migration scheme Same computing program but two outputs. Migrated volume Post-stack volume

Synthetic data examples Migration scheme Synthetic data examples Dip Angle Time Velocity

Depth Migration algorithms Depth migration scheme Depth Migration algorithms Question: What kind of algorithm has an intrinsic data parallel structure and maintains good numerical properties? 60x on FPGA A possible candidate: PSPI It was developed for data parallel architecture. First version for Connection Machine After many years of development, PSPI is a standard tool, but, of course, is a one-way method. 20x on GPGPU Is it possible to keep good numerical properties while preserving the data parallel structure of PSPI in order to solve the Helmholtz equation?

Depth migration scheme Migration algorithms Our goal is to write a solution for the Helmholtz equation by factoring the two-way operator into two one-way operators. Both operators describe a propagation along the vertical axis where z becomes the marching variable. The factorization is exact only when media are uniform. For non-uniform media, an additive correction operator is necessary to iteratively compensate the truncation error.

Depth migration scheme Wedge model

Depth migration scheme Wedge model

Vertical gradient model Depth migration scheme Vertical gradient model

Vertical gradient model Depth migration scheme Vertical gradient model

Depth migration scheme Edge model

Depth migration scheme Edge model

Depth migration scheme 2.5D: Migration example Velocity model

2.5D: Migration example First iteration One-way Two-way Depth migration scheme 2.5D: Migration example One-way Two-way First iteration

Thank you for your attention! Algorithm Imaging in a Hardware Accelerator Paradigm Conclusions The perfect match between an algorithm and the corresponding hardware is possible at the price of rethinking the data structure and the algebraic problem. Two-way depth, data-driven stacking and time migration that fall in the above description can be potentially accelerated on the dedicated SIMD hardware. Thank you for your attention!