SLAAC SLD Update Steve Crago USC/ISI September 14, 1999 DARPA.

Slides:



Advertisements
Similar presentations
SHREYAS PARNERKAR. Motivation Texture analysis is important in many applications of computer image analysis for classification or segmentation of images.
Advertisements

Accessing I/O Devices Processor Memory BUS I/O Device 1 I/O Device 2.
1 Lecture 13: Cache and Virtual Memroy Review Cache optimization approaches, cache miss classification, Adapted from UCB CS252 S01.
The University of Adelaide, School of Computer Science
Sumitha Ajith Saicharan Bandarupalli Mahesh Borgaonkar.
ECMA / SLAAC Status Provo, Utah 9/14/1999 Presented by Matthew French Systems Programmer, Information Sciences Institute GOVERNMENT ELECTRONIC SYSTEMS.
Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.
The 3D FDTD Buried Object Detection Forward Model used in this project was developed by Panos Kosmas and Dr. Carey Rappaport of Northeastern University.
Computer Science 320 Parallel Computing Design Patterns.
© Karen Miller, What do we want from our computers?  correct results we assume this feature, but consider... who defines what is correct?  fast.
 Understanding the Sources of Inefficiency in General-Purpose Chips.
Three-Dimensional Template Correlation: Object Recognition in 3D Voxel Data Tom VanCourtBoston University Yongfeng GuECE Department Martin Herbordt CAAD.
Real-time Embedded Face Recognition for Smart Home Fei Zuo, Student Member, IEEE, Peter H. N. de With, Senior Member, IEEE.
Chapter 12 Pipelining Strategies Performance Hazards.
1 FPGA Lab School of Electrical Engineering and Computer Science Ohio University, Athens, OH 45701, U.S.A. An Entropy-based Learning Hardware Organization.
Field-Programmable Logic and its Applications INTERNATIONAL CONFERENCE August 30 – September 01, 2004 Albert A. Conti, Tom Van Court, Martin C. Herbordt.
Chapter 12 CPU Structure and Function. Example Register Organizations.
A Performance and Energy Comparison of FPGAs, GPUs, and Multicores for Sliding-Window Applications From J. Fowers, G. Brown, P. Cooke, and G. Stitt, University.
BRASS Analysis of QuasiStatic Scheduling Techniques in a Virtualized Reconfigurable Machine Yury Markovskiy, Eylon Caspi, Randy Huang, Joseph Yeh, Michael.
Sub-Nyquist Sampling DSP & SCD Modules Presented by: Omer Kiselov, Daniel Primor Supervised by: Ina Rivkin, Moshe Mishali Winter 2010High Speed Digital.
Harris Corner Detector on FPGA Rohit Banerjee Jared Choi : Parallel Computer Architecture and Programming.
RICE UNIVERSITY Implementing the Viterbi algorithm on programmable processors Sridhar Rajagopal Elec 696
Dr. Rabie A. Ramadan Al-Azhar University Lecture 6
1 Chapter 04 Authors: John Hennessy & David Patterson.
Operating Systems for Reconfigurable Systems John Huisman ID:
Introduction to Robo Pro
© 2001 Mercury Computer Systems, Inc. Real-Time Geo- Registration on High-Performance Computers Alan Chao Monica Burke ALPHATECH, Inc. High Performance.
FPGA FPGA2  A heterogeneous network of workstations (NOW)  FPGAs are expensive, available on some hosts but not others  NOW provide coarse- grained.
Floating-Point Reuse in an FPGA Implementation of a Ray-Triangle Intersection Algorithm Craig Ulmer June 27, 2006 Sandia is a multiprogram.
Intruder Alert System By: Jordan Tymburski Rachita Bhatia.
1 Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT.
COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION 03/26/
Lecture 16: Reconfigurable Computing Applications November 3, 2004 ECE 697F Reconfigurable Computing Lecture 16 Reconfigurable Computing Applications.
Owner: VBHUSales Training 03/15/2013 Cypress Confidential IDT 72T36135M vs. Cypress CYF072x Video Buffering Applications High density FIFOs with unmatched.
SAR ATR Challenge Problem Update SLAAC Retreat March 1999 Brian K. Bray Sandia National Laboratories
Computer Architecture CPSC 350
SAR-ATR/FOA Compiler for ACS SLAAC Retreat, March ‘99 Brad L. Hutchings Configurable Computing Lab Brigham Young University.
Hardware Benchmark Results for An Ultra-High Performance Architecture for Embedded Defense Signal and Image Processing Applications September 29, 2004.
Sandia National Labs SAR ATR Hour for the SLAAC Fall ‘99 Retreat Intro/Module Performance Goals: Brian Bray FOA: Scott Hemmert SLD: Steve Crago CDI: Mike.
USC GOVERNMENT ELECTRONIC SYSTEMS L O C K H E E D M A R T I N SLAAC / ECMA Demonstration DARPA Thursday 25 March 1999.
SLAAC Systems Level Applications of Adaptive Computing Delivering ACS Technology to Applications DARPA/ITO Adaptive Computing Systems PI Meeting San Juan,
Spring 2003CSE P5481 Advanced Caching Techniques Approaches to improving memory system performance eliminate memory operations decrease the number of misses.
Spatiotemporal Saliency Map of a Video Sequence in FPGA hardware David Boland Acknowledgements: Professor Peter Cheung Mr Yang Liu.
Acceleration of the Retinal Vascular Tracing Algorithm using FPGAs Direction Filter Design FPGA FIREBIRD BOARD Framegrabber PCI Bus Host Data Packing Design.
SLAAC / ECMA (Electronic CounterMeasures Analysis) Application of ACS
Backprojection and Synthetic Aperture Radar Processing on a HHPC Albert Conti, Ben Cordes, Prof. Miriam Leeser, Prof. Eric Miller
® Virtex-E Extended Memory Technical Overview and Applications.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
1 Level 1 Pre Processor and Interface L1PPI Guido Haefeli L1 Review 14. June 2002.
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
Progress on Pixel Region Optimization and SystemVerilog Simulation Phase 2 Pixel Electronics Meeting – Progress on Pixel Region Optimization and SystemVerilog.
Los Alamos National Laboratory Streams-C Maya Gokhale Los Alamos National Laboratory September, 1999.
Chapter One Introduction to Pipelined Processors.
1 An FPGA Implementation of the Two-Dimensional Finite-Difference Time-Domain (FDTD) Algorithm Wang Chen Panos Kosmas Miriam Leeser Carey Rappaport Northeastern.
Calliope-Louisa Sotiropoulou FTK: E RROR D ETECTION AND M ONITORING Aristotle University of Thessaloniki FTK WORKSHOP, ALEXANDROUPOLI: 10/03/2014.
William Stallings Computer Organization and Architecture 6th Edition
Backprojection Project Update January 2002
Adaptive Median Filter
Hiba Tariq School of Engineering
Architecture & Organization 1
FIT Front End Electronics & Readout
Cache Memory Presentation I
RECONFIGURABLE PROCESSING AND AVIONICS SYSTEMS
Architecture & Organization 1
Mihir Awatramani Lakshmi kiran Tondehal Xinying Wang Y. Ravi Chandra
Operating Systems Chapter 5: Input/Output Management
CS703 - Advanced Operating Systems
Overview Problem Solution CPU vs Memory performance imbalance
Presentation transcript:

SLAAC SLD Update Steve Crago USC/ISI September 14, 1999 DARPA

Steve Crago USC INFORMATION SCIENCES INSTITUTE Page 2 Second-Level Detection Detection Indexer (SLD) Focus of Attention ESAR Image LPMMSECRM Identification Belief Management (Fusion Executive) MPM Joint STARS Advance Workstation (JAWS) ATR Results Display PGA Second-Level Detection SAR Image T72 Annotated SAR Image T72 *Not Joint STARS imagery Goal 200x performance improvement over old custom hardware

Steve Crago USC INFORMATION SCIENCES INSTITUTE Page 3 Interface l Input n Chips (regions of interest, 8-bit pixels) n Bright and Surround Templates (expected SAR reflection and absorption, 1-bit pixels) l Output n Hypothetical target matches

Steve Crago USC INFORMATION SCIENCES INSTITUTE Page 4 Chip *Not Joint STARS imagery 64 pixels 48 pixels Template Search Space 32 pixels 15 pixels Chip SLD Search Space

Steve Crago USC INFORMATION SCIENCES INSTITUTE Page 5 Computation SM(i, j) =  B(u,v)M(i+u, j+v), 8-bit additions TH(i,j) = SM(i,j)/BC - Bias BS(i, j) =  B(u,v)[M(i+u, j+v)<TH(i,j)] 8-bit comparisons 1-bit additions SS(i, j) =  S(u,v)[M(i+u, j+v)<TH(i,j)] 8-bit comparisons 1-bit additions P1 Shape Sum P2 Threshold P3 Bright Sum P4 Surround Sum Q(i, j) = [BS(i, j) + SS(i, j)]* P5 Hit Quality Calculate average intensity of chip pixels at positions expected to reflect signal For each position in the search space: Count number of pixels that exceed average intensity under “on” bright template pixels Counter number of pixels that are less than average intensity under “on” surround template pixels Check hit conditions, calculate hit quality, and return 2 highest hit quality scores

Steve Crago USC INFORMATION SCIENCES INSTITUTE Page 6 ACS Implementation l Compute independent search space pixels in parallel ( computational elements per FPGA) Template Memory Packed 8-bit pixels Packed bits Host Highest Quality Hits (Chip, Template IDs, location) Chip Pixels Adaptive Computing Element

Steve Crago USC INFORMATION SCIENCES INSTITUTE Page 7 I/O Requirements l Each eight-bit chip pixel used for 550 operations per match task l Each FIFO element contains 8 chip pixels l Each FIFO elements contains enough operands for 3600 operations  I/O will not be a bottleneck any time soon!

Steve Crago USC INFORMATION SCIENCES INSTITUTE Page 8 Memory Requirements l Template pixels are only 1 bit (each memory access provides 18 operands) l Computation uses one template bit per cycle l Pixels are broadcast to all compute elements that are working on a single match task l Multiple ports for parallel match tasks reduce logic complexity  Memory bandwidth will not be a bottleneck any time soon, but ports are helpful!

Steve Crago USC INFORMATION SCIENCES INSTITUTE Page 9 Virtex Features l Chip pixel alignment pipeline n BlockSelectRAM+ can replace logic cells n Could buy some speed l Template-specific reconfiguration n Potential speedup due to sparseness of templates n Clear win not yet clear

Steve Crago USC INFORMATION SCIENCES INSTITUTE Page 10 Performance Projections

Steve Crago USC INFORMATION SCIENCES INSTITUTE Page 11 Schedule l Single-chip implementation working l Full Wildforce implementation: 9/99 l SLAAC-2 implementation: 9/99? l Virtex implementation: ??? n Remap to use additional logic should be easy n Utilization of BlockSelectRAM+ will take a little more time, but straightforward