Presentation is loading. Please wait.

Presentation is loading. Please wait.

MIT Lincoln Laboratory 999999-1 XYZ 1/6/2014 SSCA #3 Sensor Processing Knowledge Formation and Data I/O Serial v1.0 HPCS Productivity Benchmarks Working.

Similar presentations


Presentation on theme: "MIT Lincoln Laboratory 999999-1 XYZ 1/6/2014 SSCA #3 Sensor Processing Knowledge Formation and Data I/O Serial v1.0 HPCS Productivity Benchmarks Working."— Presentation transcript:

1 MIT Lincoln Laboratory XYZ 1/6/2014 SSCA #3 Sensor Processing Knowledge Formation and Data I/O Serial v1.0 HPCS Productivity Benchmarks Working Group MIT Lincoln Laboratory January 4, 2007

2 MIT Lincoln Laboratory 1/6/2014 Outline Scalable Synthetic Compact Applications SSCA #3 –Overview –Quick Recipe Data I/O Mode Implementation and Results

3 MIT Lincoln Laboratory 1/6/2014 Full Apps HPCS Compact Apps Micro BMKs APP SIZE/COMPLEXITY SYSTEM SIZE/ COMPLEXITY NextGen Apps Identify which dimensions that must be examined at full complexity and which dimensions that can be examined at reduced scale while providing understanding of both full applications today and future applications Scalable Synthetic Compact Applications Goals Building on a motivation slide from Fred Johnson (15 January 2004)

4 MIT Lincoln Laboratory 1/6/2014 HPCS Benchmark Spectrum SSCA #3

5 MIT Lincoln Laboratory 1/6/2014 Outline The Vision SSCA #3 –Overview –Quick Recipe Data I/O Mode Implementation and Results

6 MIT Lincoln Laboratory 1/6/2014 SSCA #3 Focuses on two stages: –Front end image processing and storage (Stage 1) –Back end image retrieval and knowledge formation (Stage 2) It is representative of many areas: –Medical imaging (e.g.: tumor growth) Image many patients daily Later compare images of same patient over time –Astronomical image processing (e.g.: monitor supernovae) Image many regions of the sky daily Later compare images of a region over time –Reconnaissance monitoring (e.g.: enemy movement) Image many areas daily Later compare images of a given region over time Overview

7 MIT Lincoln Laboratory 1/6/2014 Benchmark stresses computation, communication, and data I/O Can be run in 3 modes: –System Mode: A combination of Compute & Data I/O Modes –Compute Mode (minimized Data I/O Mode) –Data I/O Mode (minimized Compute Mode) Principal performance goal is throughput –Maximize rate at which answers are generated –May overlap operation of data I/O and compute kernels –Data I/O and compute kernels may run on different systems –Some data is required to be contiguous Overview

8 MIT Lincoln Laboratory 1/6/2014 SSCA #3 – System Mode Computation Data I/O Community has traditionally focused on Computation … … but Data I/O performance is increasingly important Coeffs, Group of Templates Image Pair Stage 1: Front-End Sensor Processing Indices, Group of Templates Stage 2: Back-End Knowledge Formation Validation Group of Templates Raw Data SAR Image Scalable Data and Template Generator Kernel #2 Image Storage Groups of Templates Detection Sub-Images Grid of Images Detection Sub-Images Detections, Template Indices Kernel #4 Detection SAR Image Template Insertion Kernel #3 Image Retrieval Templates & Indices Raw Data Image Pair Kernel #1 Data Read and Image Formation Templates Group of Templates Raw Complex Data Coeffs Template Positional Indices Template Indices Coeffs

9 MIT Lincoln Laboratory 1/6/2014 SAR Image Knowledge Formation SAR Image File Raw SAR File Template Files Groups of Template Files Raw SAR File Kernel #2 Image Storage SAR Image File Detection File Kernel #3 Image Retrieval Template Files Template Files Groups of Template Files Sub-Image Detection Files Image Files Sensor Processing Raw SAR Data Files Validation Detections Kernel #4 Detection SAR Image Pair Templates SSCA #3 – Compute Mode Raw SAR Templates SAR Image Template Insertion Scalable Data and Template Generator Kernel #1 Image Formation Templates

10 MIT Lincoln Laboratory 1/6/2014 SSCA #3: Compute Mode Challenges Validation Detections Kernel #4 Detection SAR Image Templates Raw SAR Templates SAR Image Template Insertion Scalable Data and Template Generator Kernel #1 Image Formation Templates Pulse compression Polar Interpolation FFT, IFFT (corner turn) Sequential store Non-sequential retrieve Large & small I/O Large Images difference & Threshold Many small correlations on selected pieces of a large image Scalable synthetic data generation Front-End Sensor Processing Back-End Knowledge Formation

11 MIT Lincoln Laboratory 1/6/2014 SSCA #3 – Data I/O Mode Image Pair Stage 1: Front-End Group of Small Data Stage 2: Back-End Groups of Small Data Groups of Small Data Large Data Image Scalable Data and Template Generator Kernel #2 Image Storage Groups of Small Data Sub-Images Grid of Images Sub-Images Kernel #4 Kernel #3 Image Retrieval Large Data Image Pair Kernel #1 Data Read and Image Formation Large Complex Data

12 MIT Lincoln Laboratory 1/6/2014 The Vision SSCA #3 –Overview –Quick Recipe Data I/O Mode Implementation and Results Outline

13 MIT Lincoln Laboratory 1/6/2014 Ingredients To run Data I/O Mode, the user only needs set: 1) SCALE, 2) N_SDG_GROUPS, and 3) grid Where: SCALE = a parameter that sets the size of raw input data, and image. It should be set so that these are a significant fraction of a single processors memory. N_SDG_GROUPS = number of raw input data and templates groups. It should be set large enough to avoid disk cache effects. And the number of images in the grid is: GRID_SIDE_SIZE x GRID_SIDE_SIZE x AV_GRID_DEPTH AV_GRID_DEPTH GRID_SIDE_SIZE

14 MIT Lincoln Laboratory 1/6/2014 Ingredients Parameters to Code: PICTURE_SIZE = GRID_SIDE_SIZE 2 is the number of images in a picture EST_TOT_GRID_SIZE = PICTURE_SIZE x AV_GRID_DEPTH is the total number of times that the input data will be retrieved, and the total number of images stored to the grid m c x n = is the size of the raw complex valued input data m c = 2 x ceil(80 x SCALE) n = 2 x ceil( x SCALE + 60) ROTATION_STEP is the templates rotation angle increment in degrees nDistinctLetters x nDistinctRotations is total number of pixelated templates nDistinctLetters = number of least correlated letters in alphabet (21) nDistinctRotations = num of ROTATION_STEP angles between 0 and 360 degs FONT_SIZE x FONT_SIZE = size of a single template in pixels

15 MIT Lincoln Laboratory 1/6/2014 Ingredients Parameters to Code (Cont.): m x n x = size of an image m = 2*ceil(m c / ) k1n = x (1.5 -1/n) k xmin = sqrt( x (m/m c ) 2 ) k xmax = sqrt((4 x k1n.^2) x (1/m c ) 2 ) n x = 2 x ceil(20 x SCALE*(k xmax -k xmin )/pi) + 20 nSubImages = floor( pOccupancy x p2ndNot1st x (m /(SARLOBE_DISTANCE x FONT_SIZE)) x (n x /(SARLOBE_DISTANCE x FONT_SIZE)) ) = number of smaller images to be stored (by the last kernel), where: pOccupancy = 0.5is the probability of template occupancy, and p2ndNot1st = 0.5is the probability that a template appear in the second image but not in the first Total memory required, in bytes = N_SDG_GROUPS x (8 x m c x n + 4 x nDistinctLetters x nDistinctRotations x FONT_SIZE 2 ) + EST_TOT_GRID_SIZE x (4 x m x n x + 4*nSubImages x (4 x FONT_SIZE) 2 ) + (coefficients, support and verification parameters; stored once) Grows with SCALE 2

16 MIT Lincoln Laboratory 1/6/2014 Directions SDG Create a group –Create a random single precision complex valued (large) m c x n matrix –Store the data –Create a random real valued (small) FONT_SIZE x FONT_SIZE matrix –Store small matrix nDistinctLetters x nDistinctRotations times Copy the above group N_SDG_GROUPS times STAGE 1 for iImage = 1 to EST_TOT_GRID_SIZE KERNEL 1 –Randomly pick and retrieve one of the N_SDG_GROUPS groups –Create a random single precision real valued m x n x matrix KERNEL 2 –Randomly select i and j values in the range [1, GRID_SIDE_SIZE] and use these to create a filename. –Store the image matrix end

17 MIT Lincoln Laboratory 1/6/2014 Directions STAGE 2 for iImageSeq = 1 to PICTURE_SIZE –Randomly select i and j values in the range [1, GRID_SIDE_SIZE] –Find the grid depth at this particular point for k = 1 to gridPointDepth-2 KERNEL 3 – Retrieve a pair of images, and an SDG group of templates KERNEL 4 for l = 1 to nSubImages – Create a random (4 x FONT_SIZE) x (4 x FONT_SIZE) matrix – Store the sub image end

18 MIT Lincoln Laboratory 1/6/2014 Outline The Vision SSCA #3 –Overview –Quick Recipe Data I/O Mode Implementation and Results

19 MIT Lincoln Laboratory 1/6/2014 Types of Data I/O Implemented: FWRITE, binary, IEEE floating point with appropriate big or little- endian byte ordering and 32-bit data type HDF5, HDF5 32 bit float format Modes: System Mode –Includes both Compute (SAR Processing), and Data I/O Modes. Compute Mode –Dials the smallest possible Grid of 2 images, thus minimizing data I/O. Data I/O Mode –Generates random data, thus foregoing SAR processing. Outputs metrics at each level in the systems hierarchy – Kernels, Stages, and Overall SSCA #3: –Bytes, seconds, bandwidth (bytes/sec) SSCA #3 Serial Release v1.0

20 MIT Lincoln Laboratory 1/6/2014 One of many possible implementations Over 2200 lines of well commented MATLAB code. Carefully picked functional breakdown, data structures, variable names, and comments Coding standard: Modified Programming in C++, Rules and Recommendations by Mats Henricson and Erik Nyquist of Ellemtel Telecommunication System Laboratories, Development tools used –MATLAB Version (R14) Service Pack 3 (version required) –Octave Version –Pentium® GHz CPU with 1.00GB of RAM, and 2.5GB of virtual RAM, running on MS Windows XP Professional Version 2002 Service Pack 1 –On a dedicated dual processor hyperthreaded P4 Xeon, 2.8 GHz, ½ MB cache, GNU/Linux (Redhat 9) Accompanying documentation: –Written Specification, and these slides –MANIFEST.txt – list of files with brief description –README.txt – installation and run time instructions; code overview –RELEASE_NOTES.txt – known outstanding issues in current release SSCA #3 Serial Release v1.0

21 MIT Lincoln Laboratory 1/6/2014 SSCA #3 Release v1.0a

22 MIT Lincoln Laboratory 1/6/2014 Summary Challenges: Large scale parallel two-dimensional (2D) Inverse Fast Fourier Transform (IFFT); may require a corner turn or a gather scatter (depending on architecture), with large quantities of data. Polar interpolation is known to be even more computationally intense than IFFT (Kernel 1). Streaming image data storage to a data I/O device (write) may involve large block data transfers, storing one large image after another (Kernel 2). Random location image sequence retrieval from a data I/O device (read) also involving large quantities of data, with possibly stressful spatial or temporal memory access patterns, and locality issues (Kernel 3). Small data I/O in all four kernels. Large data I/O in three of the four kernels. Many small convolutions on random pieces of a large image (Kernel 4). Status: Written and Matlab Executable Specification v1.0 released June 22, 2006 Architecture of Data I/O Mode – Martha Bancroft of Shomo Tech Systems, and Jeremy Kepner Works with Octave Written Specification – SAR Editor – Glenn Schrader, MIT Lincoln Laboratory C version based on release v1.0a (unofficial) – Meng-Ju of UMD, and Janice Onanian McMahon of USC/ISI

23 MIT Lincoln Laboratory 1/6/2014 SSCA #3 Backup Slides

24 MIT Lincoln Laboratory 1/6/2014 SSCA #3 Specification Intent Overview Compute Mode Main Components –Synthetic Scalable Data Generator –Kernel 1 SAR Image Formation –Template Insertion –Kernel 4 Detection –Validation Data I/O Mode Main Components –Kernel 1 Large & Small Data Retrieval –Image Grid –Kernel 2 Image Storage –Kernel 3 Image Retrieval –Kernel 4 Small Image Storage

25 MIT Lincoln Laboratory 1/6/2014 The Vision Scalable Synthetic Compact Applications Bridge the gap between scalable synthetic kernel benchmarks and (non-scalable) real applications, and become an important benchmarking tool Is representative of real application workloads while not being numerically rigorous –memory access characteristics –communications characteristics –I/O characteristics Multi-processor compact application, designed to be easily scalable and verifiable No limits on the distribution to vendors and universities SSCAs represent a wide spectrum of potential HPCS Mission Partner applications

26 MIT Lincoln Laboratory 1/6/2014 Executable Specification What is an Executable Specification: It implements the Written Specification, illustrating all specified properties; it is just one of many possible implementations It provides developers further insight into the corresponding Written Specification It is a tool for developers with which to validate their own work It includes a serial version, and may include one or more approaches to a parallel version It must be easily readable and intelligible, through its choice of functional structure, variable names, comments, and supporting documentation Structure: Scalable Data Generator –Creates synthetic data that can be scaled to stress any computer from a single workstation to a petascale multiprocessor Kernels – timed computational algorithms Verification – checks the correctness of select results Validation – validates the resulting solution

27 MIT Lincoln Laboratory 1/6/2014 SSCA #3 Specification Intent Overview Compute Mode Main Components –Synthetic Scalable Data Generator –Kernel 1 SAR Image Formation –Template Insertion –Kernel 4 Detection –Validation Data I/O Mode Main Components –Kernel 1 Large & Small Data Retrieval –Image Grid –Kernel 2 Image Storage –Kernel 3 Image Retrieval –Kernel 4 Small Image Storage

28 MIT Lincoln Laboratory 1/6/2014 SAR Image Knowledge Formation SAR Image File Raw SAR File Template Files Groups of Template Files Raw SAR File Kernel #2 Image Storage SAR Image File Detection File Kernel #3 Image Retrieval Template Files Template Files Groups of Template Files Sub-Image Detection Files Image Files Sensor Processing Raw SAR Data Files Validation Detections Kernel #4 Detection SAR Image Pair Templates SSCA #3 – Compute Only Mode Raw SAR Templates SAR Image Template Insertion Scalable Data and Template Generator Kernel #1 Image Formation Templates

29 MIT Lincoln Laboratory 1/6/2014 Spotlight SAR

30 MIT Lincoln Laboratory 1/6/2014 Radar captures echo returns from a swath on the ground Notional linear FM chirp pulse train, plus two ideally non-overlapping echoes returned from different positions on the swath Summation and scaling of echo returns realizes a challengingly long antenna aperture along the flight path Compute Mode - SAR Overview... delayed transmitted SAR waveform reflection coefficient scale factor, different for each return from the swath received raw SAR Cross-Range, Y = 2Y 0 Fixed to Broadside Range, X = 2X 0 Synthetic Aperture, L

31 MIT Lincoln Laboratory 1/6/2014 Scalable Synthetic Data Generator Generates synthetic raw SAR complex data Data size is scalable to enable rigorous testing of high performance computing systems –User defined scale factor determines the size of images generated Generates templates that consist of rotated and pixelated capitalized letters Cross-Range Range Spotlight SAR Returns

32 MIT Lincoln Laboratory 1/6/2014 Kernel 1 SAR Image Formation s(,k u ) f(x,y) F(k x,k y ) Interpolation k x = sqrt(4k 2 –k u 2 ) k y = k u Matched Filtering Fourier Transform (t,u)B( k u ) Inverse Fourier Transform (k x,k y ) B (x,y) s* 0 (,k u ) s(t,u) Received Samples Fit a Polar Swath Processed Samples Fit a Rectangular Swath f o kxkx kyky Range, Pixels Cross-Range, Pixels Spotlight SAR Reconstruction Spatial Frequency Domain Interpolation

33 MIT Lincoln Laboratory 1/6/2014 Template Insertion ( not timed) Inserts rotated pixelated capital letter templates into each SAR image –Non-overlapping locations and rotations –Randomly selects 50% –Used as ideal detection targets in Kernel 4 Y Pixels X Pixels Hypothetical %100 Insertion of Templates Image Inserted with only %50-Random Templates

34 MIT Lincoln Laboratory 1/6/2014 Kernel 4 Detection Detects targets in SAR images 1.Image difference 2.Threshold 3.Sub-regions 4.Correlate with every template max is target ID Computationally difficult –Many small correlations over random pieces of a large image Requires 100% recognition and no false alarms including objects that cross distributed memory boundaries Image Difference Image A Image B Thresholded Sub-region Correlated

35 MIT Lincoln Laboratory 1/6/2014 Validation Detections Kernel #4 Detection SAR Image Templates Raw SAR Templates SAR Image Template Insertion Scalable Data and Template Generator Kernel #1 Image Formation Templates Computational Challenges Pulse compression Polar Interpolation FFT, IFFT (corner turn) Sequential store Non-sequential retrieve Large & small IO Large Images difference & Threshold Many small correlations on selected pieces of a large image Scalable synthetic data generation Front-End Sensor Processing Back-End Knowledge Formation

36 MIT Lincoln Laboratory 1/6/2014 SSCA #3 Specification Intent Overview Compute Mode Main Components –Synthetic Scalable Data Generator –Kernel 1 SAR Image Formation –Template Insertion –Kernel 4 Detection –Validation Data I/O Mode Main Components –Kernel 1 Large & Small Data Retrieval –Image Grid –Kernel 2 Image Storage –Kernel 3 Image Retrieval –Kernel 4 Small Image Storage

37 MIT Lincoln Laboratory 1/6/2014 SSCA #3 – Data I/O Mode Image Pair Stage 1: Front-End Group of Small Data Stage 2: Back-End Groups of Small Data Groups of Small Data Large Data Image Scalable Data and Template Generator Kernel #2 Image Storage Groups of Small Data Sub-Images Grid of Images Sub-Images Kernel #4 Kernel #3 Image Retrieval Large Data Image Pair Kernel #1 Data Read and Image Formation Large Complex Data

38 MIT Lincoln Laboratory 1/6/2014 Large Data Kernel #1 Scalable Data Generator Scalable Synthetic Data Generator Associated Groups of Small Data Generates large complex data, and groups of small data. Writes a dialed number of large complex data to external memory. For each large data, it writes a group of small data to external memory. Single precision Not timed Large Complex Data Groups of Small Data

39 MIT Lincoln Laboratory 1/6/2014 Kernel 1 Data Retrieval Randomly reads one large complex data from external memory, at each Stage 1 pass. Also reads associated group of small data from external memory, at each Stage 1 pass. Generates a single precision random image (of the size dialed by SCALE). I/O is timed Image Kernel #1 Data Read Stage 1: Front-End Large Complex Data Large Data Small Data Associated Groups of Small Data

40 MIT Lincoln Laboratory 1/6/2014 Image Grid External memory image Grid is accessed by Kernels 2 & 3. It is scalable by image size, number of images. Image size requires a non-trivial amount of memory. Intended for dealing with enormous quantity of data, with simultaneous reads and writes. Image grid, shown scaled to 80 images Grid Image AV_GRID_DEPTH GRID_SIDE_SIZE

41 MIT Lincoln Laboratory 1/6/2014 Kernel 2 Image Storage Writes a different image to a random location in the external memory on the Grid at each Stage 1 pass. Images may be stored together, or in separate pieces (to allow simultaneous reading/writing of the same image). I/O is timed Image Kernel #2 Image Storage Images in Grid Stage 1: Front-End Computes filenames and addresses, and writes streaming data to random locations on Grid at each Stage 1 Front-End processing pass.

42 MIT Lincoln Laboratory 1/6/2014 Kernel 3 Image Retrieval From a random location in the Grid, it computes the address of an image sequence and reads a pair of its images until it reaches its full depth, at each Stage 2 pass. An image sequence is read through its entire Grids Depth. Also reads a group of small data at each Stage 2 pass. I/O is timed Group of small data Stage 2: Back-End Image Pair Kernel #3 Image Retrieval Image Image Grid Templates Images In Grid

43 MIT Lincoln Laboratory 1/6/2014 Kernels 2 and 3 Kernel 3 Image Pair Input Additional notes: If an optimal scheme is picked for data storage, it may not be optimal for data retrieval, and vice versa. Read behind Write is allowed. Kernel 2 Image Output

44 MIT Lincoln Laboratory 1/6/2014 Kernel 4 Small Image Image pair Sub-Image Kernel #4 Small Image Output Sub-Images Writes labeled sub-images. This is repeated for each image pair, at each grid point, at each Stage 2 pass. I/O is timed Stage 2: Back-End

45 MIT Lincoln Laboratory 1/6/2014 References Carrara, Walter G., Ron S. Goodman and Ronald M. Majewski, Spotlight Synthetic Aperture Radar: Signal Processing Algorithms. Boston: Artech House, Corlander, John C. and Robert N. McDonough, Synthetic Aperture Radar: Systems and Signal Processing. New York: Wiley, Haney, R., Meuse T., Kepner, J., and Lebak, J., The HPEC Challenge Benchmark Suite, High Performance Embedded Computing Conference, Lexington, MA Jakowatz, Charles V., Jr., et al., Spotlight-Mode Synthetic Aperture Radar: A Signal Processing Approach. Boston Kluwer Academic Publishers,1996. Rihaczek, August W., Principles of High-Resolution Radar. Boston: Artech House Originally published: New York: McGraw-Hill, Stimson, George W., III, Introduction to Airborne Radar Second Edition. World Color Book Services, 1998.


Download ppt "MIT Lincoln Laboratory 999999-1 XYZ 1/6/2014 SSCA #3 Sensor Processing Knowledge Formation and Data I/O Serial v1.0 HPCS Productivity Benchmarks Working."

Similar presentations


Ads by Google