CHREC F3: Target Tracking Rafael Garcia 11/26/08.

Slides:



Advertisements
Similar presentations
Controller Tests Stephen Kaye Controller Test Motivation Testing the controller before the next generation helps to shake out any remaining.
Advertisements

System Integration and Performance
CGrid 2005, slide 1 Empirical Evaluation of Shared Parallel Execution on Independently Scheduled Clusters Mala Ghanesh Satish Kumar Jaspal Subhlok University.
VADA Lab.SungKyunKwan Univ. 1 L3: Lower Power Design Overview (2) 성균관대학교 조 준 동 교수
M. Muztaba Fuad Masters in Computer Science Department of Computer Science Adelaide University Supervised By Dr. Michael J. Oudshoorn Associate Professor.
Sumitha Ajith Saicharan Bandarupalli Mahesh Borgaonkar.
SKELETON BASED PERFORMANCE PREDICTION ON SHARED NETWORKS Sukhdeep Sodhi Microsoft Corp Jaspal Subhlok University of Houston.
Electrical and Computer Engineering SMART GOGGLES To Chong Ryan Offir Matt Ferrante James Kestyn Advisor: Dr. Tilman Wolf Preliminary Design Review.
1 Virtual Machine Resource Monitoring and Networking of Virtual Machines Ananth I. Sundararaj Department of Computer Science Northwestern University July.
TRADING OFF PREDICTION ACCURACY AND POWER CONSUMPTION FOR CONTEXT- AWARE WEARABLE COMPUTING Presented By: Jeff Khoshgozaran.
High-level System Modeling and Power Management Techniques Jinfeng Liu Dept. of ECE, UC Irvine Sep
Chapter 1 and 2 Computer System and Operating System Overview
Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Project performed by: Naor Huri Idan Shmuel.
Optimization Of Power Consumption For An ARM7- BASED Multimedia Handheld Device Hoseok Chang; Wonchul Lee; Wonyong Sung Circuits and Systems, ISCAS.
A Performance and Energy Comparison of FPGAs, GPUs, and Multicores for Sliding-Window Applications From J. Fowers, G. Brown, P. Cooke, and G. Stitt, University.
Author: D. Brooks, V.Tiwari and M. Martonosi Reviewer: Junxia Ma
Virtual Architecture For Partially Reconfigurable Embedded Systems (VAPRES) Architecture for creating partially reconfigurable embedded systems Module.
Overview and Mathematics Bjoern Griesbach
© 2003 Xilinx, Inc. All Rights Reserved Power Estimation.
How Computers Work. A computer is a machine f or the storage and processing of information. Computers consist of hardware (what you can touch) and software.
Networking Virtualization Using FPGAs Russell Tessier, Deepak Unnikrishnan, Dong Yin, and Lixin Gao Reconfigurable Computing Group Department of Electrical.
1 Miodrag Bolic ARCHITECTURES FOR EFFICIENT IMPLEMENTATION OF PARTICLE FILTERS Department of Electrical and Computer Engineering Stony Brook University.
Predicting performance of applications and infrastructures Tania Lorido 27th May 2011.
By: Oleg Schtofenmaher Maxim Fudim Supervisor: Walter Isaschar Characterization presentation for project Winter 2007 ( Part A)
Elad Hadar Omer Norkin Supervisor: Mike Sumszyk Winter 2010/11, Single semester project. Date:22/4/12 Technion – Israel Institute of Technology Faculty.
DLS Digital Controller Tony Dobbing Head of Power Supplies Group.
COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.
Spartan-II Memory Controller For QDR SRAMs Lobby Pitch February 2000 ®
Lessons Learned The Hard Way: FPGA  PCB Integration Challenges Dave Brady & Bruce Riggins.
Towards the Design of Heterogeneous Real-Time Multicore System m Yumiko Kimezawa February 1, 20131MT2012.
Particle Filters.
Karman filter and attitude estimation Lin Zhong ELEC424, Fall 2010.
PROCStar III Performance Charactarization Instructor : Ina Rivkin Performed by: Idan Steinberg Evgeni Riaboy Semestrial Project Winter 2010.
Exploiting Partially Reconfigurable FPGAs for Situation-Based Reconfiguration in Wireless Sensor Networks Rafael Garcia, Dr. Ann Gordon-Ross, Dr. Alan.
Towards the Design of Heterogeneous Real-Time Multicore System Adaptive Systems Laboratory, Master of Computer Science and Engineering in the Graduate.
ISSS 2001, Montréal1 ISSS’01 S.Derrien, S.Rajopadhye, S.Sur-Kolay* IRISA France *ISI calcutta Combined Instruction and Loop Level Parallelism for Regular.
By: Daniel BarskyNatalie Pistunovich Supervisors: Rolf HilgendorfInna Rivkin 10/06/2010.
Towards Dynamic Green-Sizing for Database Servers Mustafa Korkmaz, Alexey Karyakin, Martin Karsten, Kenneth Salem University of Waterloo.
Séminaire COSI-Roscoff’011 Séminaire COSI ’01 Power Driven Processor Array Partitionning for FPGA SoC S.Derrien, S. Rajopadhye.
Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.
Network On Chip Platform
Processor Architecture
Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.
Spatiotemporal Saliency Map of a Video Sequence in FPGA hardware David Boland Acknowledgements: Professor Peter Cheung Mr Yang Liu.
VAPRES A Virtual Architecture for Partially Reconfigurable Embedded Systems Presented by Joseph Antoon Abelardo Jara-Berrocal, Ann Gordon-Ross NSF Center.
Multilevel Caches Microprocessors are getting faster and including a small high speed cache on the same chip.
A Programmable Single Chip Digital Signal Processing Engine MAPLD 2005 Paul Chiang, MathStar Inc. Pius Ng, Apache Design Solutions.
Adaptive Sleep Scheduling for Energy-efficient Movement-predicted Wireless Communication David K. Y. Yau Purdue University Department of Computer Science.
Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Nov 3, 2005.
Company LOGO Project Characterization Spring 2008/9 Performed by: Alexander PavlovDavid Domb Supervisor: Mony Orbach GPS/INS Computing System.
DDRIII BASED GENERAL PURPOSE FIFO ON VIRTEX-6 FPGA ML605 BOARD PART B PRESENTATION STUDENTS: OLEG KORENEV EUGENE REZNIK SUPERVISOR: ROLF HILGENDORF 1 Semester:
Sunpyo Hong, Hyesoon Kim
Chapter 1 Software Development Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2008.
LACSI 2002, slide 1 Performance Prediction for Simple CPU and Network Sharing Shreenivasa Venkataramaiah Jaspal Subhlok University of Houston LACSI Symposium.
PROCStar III Performance Charactarization Instructor : Ina Rivkin Performed by: Idan Steinberg Evgeni Riaboy Semestrial Project Winter 2010.
System on a Programmable Chip (System on a Reprogrammable Chip)
Current Generation Hypervisor Type 1 Type 2.
B.Sc. Thesis by Çağrı Gürleyük
Hardware September 19, 2017.
FPGAs in AWS and First Use Cases, Kees Vissers
Intel’s Core i7 Processor
Improving java performance using Dynamic Method Migration on FPGAs
Hyperthreading Technology
Introduction to cosynthesis Rabi Mahapatra CSCE617
Abelardo Jara-Berrocal Joseph Antoon Ph.D. Students
Energy Efficient Scheduling in IoT Networks
The performance requirements for DSP applications continue to grow and the traditional solutions do not adequately address this new challenge Paradigm.
A High Performance SoC: PkunityTM
COMP755 Advanced Operating Systems
Parallel Exact Stochastic Simulation in Biochemical Systems
Presentation transcript:

CHREC F3: Target Tracking Rafael Garcia 11/26/08

2 F3 Goals, Motivations, & Challenges Goals  Develop applications & design strategies for scalable architectures from case-study  Analyze & examine available multi-FPGA platforms and tools for scalable system design Motivations  Meet performance requirements in HPC/HPEC scenarios by mapping across multiple FPGAs  Exploit multi-FPGA platforms to develop larger, complex designs and algorithms  Increase understanding of performance prediction, power, and usability for scalable apps Challenges  Perform multilevel algorithm partitioning, analysis, and optimization for multi-FPGA systems  Determine influence of application characteristics on selection of platforms, tools and languages F3 Insights Formulation Translation Design Execution

Kalman Filter Overview Traditional Kalman filters estimate the state of a dynamic system in a noisy environment Commonly used in target prediction and can be extended to multiple dimensions, targets, and models Excellent target tracker when an accurate model is known  Useful even if an accurate model is not known

Current Architecture 4 tightly coupled FPGAs mapped to 4 quadrants  System is driven by two global clocks  100MHZ inter-FPGA communication links  50MHz data-processing clock 2-step processing cycle returns results at 25MSa/s  Inter-FPGA communication occurs when target crosses a quadrant boundary Current state of target is passed along Non-pipelined design  2-step cycle where one cycle depends on the previous one and the other cycle depends on pseudo-sensor data from host CPU Low frequency and lack of pipeline registers is expected to lower power consumption 2-cycle design simplifies communication network

Current Architecture Continuously receiving pseudo-sensor data and returning condensed information Limited to a single target per quadrant Set sensor sampling rate of 25MSa/s ResourceM4K ramsDSPsALUTs Stratix II: EP2S180F1020C3 1%15%2%

Simplified Algorithm Assumes steady-state operation  Target must closely follow given movement model for accurate results Allows for precomputed covariance and Kalman-gain terms Model tracks four parameters  Horizontal position  Vertical position  Horizontal velocity  Vertical velocity Algorithm Changes  Remove the hardcoded terms, increasing prediction accuracy during non- steady-state situations  Modify model to include Z- axis parameters for airborne targets SensorTargetPrecisionResourceKernel Low PowerSlowFixedLowKalman Filter Fast SamplingFastFixedLowKalman Filter Multi-ScaleAirborneFloatingHighMKS High-NoiseNoisyFloatingMediumKalman Filter SelectiveMultipleFloatingHighFeature Selection New Module Types RCML Representation

7 VA migration

Kalman Filter Estimates state of a dynamic system in a noisy environment  In this case, the ‘dynamic system’ is a moving target Commonly used in target prediction and can be extended to multiple dimensions, targets, and models Assumes sensor noise is white Gaussian noise Requires a pre-programmed model describing the target’s motion Works in a continuous 2-cycle loop Developed in 1960 by Rudolf E. Kalman (A UF professor from !)

Kalman Filter can be viewed as a simple black box  An input stream of samples measuring a target’s position is contaminated with noisy samples  The output is a stream of samples with most of the noisy samples filtered Kalman System Models Accurate Samples Noisy Samples Mostly Accurate Samples Kalman Filter -9.8 m/sNE wind at 23mph Follows Road

Reasons for sensor noise Battery Power  variable battery voltage voltage regulators cost money, draw power, and are not perfect Sensors  low quality sensors cost-cutting for mass production sometimes requires cheap sensors  incorrectly deployed sensors bad orientation, obstructed sensor Environment  environmental conditions rain, dust, night-time tracking, snow Multiple targets  misinterpreted samples from neighboring targets during multiple-target tracking Sensor processing stage must ensure proper target isolation Wireless signal  bad data from neighboring sensors due to a weak wireless signal

Kalman Filter example

PR Virtual Architecture with Kalman Filters Sensor records samples Image processing step extracts specific features  Target size, vertical position, horizontal position, target bearing, elevation, etc. Kalman filters extract sensor noise Results are sent to a central location to be displayed Module interface Kalman filter Kalman filter Kalman filter Kalman filter Kalman filter Switch 1 Switch 2 Switch 3Switch 4 Switch 5 Sensor Interface Display Interface Communication architecture VLX25

FPGA and PR benefits for the Kalman Filter FPGA amenable features  Low memory requirements Simple filter with streaming inputs and outputs  Can be implemented using only logic and MAC units  Requires only multiplication and addition No complex time-consuming operations such as division, square-root, differentiation, etc.  Low bandwidth requirements Filter receives/produces a stream of coordinates, not a stream of images PR amenable features  Optimum resource usage The right filter type for the right job  Swapping modules does not halt execution Active filters are never disturbed

Experimental FPGA Power Measurements

GiDEL Host Specifications  Dual Xeon 3.00 GHz processors (Pentium 4 era)  2GB RAM  Single 500GB hard drive  CD Drive  600W max power supply  (Kappa clone) ProcStar II Power Characteristics  Main board supply rated at 7.6A at 3.3V 7.6A × 3.3V = 25.08W maximum power available to:  Stratix II EP2S180 FPGA (4x)  2GB SODIMM DDR memory(2x)(only 1 used for tests)  64MB SRAM memory (8x)  Miscellaneous oscillators, peripherals, controllers, etc. This means roughly 5W max available to each FPGA Test Design Characteristics  Kalman tracking filters Heavy multiplier usage, no block rams, minimal logic usage (w/ dedicated multipliers)  In all cases, design runs at 33MHz Experimental Setup

Methodology GiDEL host system measured without FPGA board  P3 Kill-A-Watt AC power meter used for measurements 0.2% documented accuracy  Accurate to within 1 Watt  7 different test cases with varying power utilization GiDEL host system measured with FPGA board  Same 7 test cases were used (without loading an FPGA design) This provides minimum power-use baseline for ProcStar II  GiDEL board is loaded with FPGA-computationally intensive design CPU is kept idle Power consumption under regular design is measured 33 MHz)  2% logic use (per FPGA)  15% multiplier use (per FPGA)  1 filter instance per FPGA Power consumption under maximum-multiplier-use design is measured 33 MHz)  4% logic use  88% multiplier use  7 filter instances per FPGA Power consumption under maximum-logic-use design is measured 33 MHz)  77% logic use  0% multiplier use  34 filter instances per FPGA

Test CasesWithout ProcStar II With ProcStar II 1. Server off (not standby) 8 W 2. Idle127 W137 W 3. Idle with CDROM spinning 131 W141 W 4. Full HDD load (defrag) 132 W143 W 5. Full CPU load (1 thread) 188 W198 W 6. Full CPU load (4 threads) 255 W257 W 7. Full CPU/HDD load (3 threads, defrag) 258 W264 W Results: Baseline ProcStar II Threads are simple while(1) loops Although only 2 cores are present, 4 threads were used to bypass Hyper-threading and OS scheduling  HDD load is an exception since defrag requires its own thread to be effective

Results: Kalman Filters on ProcStar II Power estimates  12.5% toggle rate 33 MHz  Experimental numbers below assume FPGAs consume all power (ie. ProcStar II memories, glue logic, etc. consume 0W) Design 1  140 W total power ~3.25 W per FPGA  15% mult., 2% logic  1 filter instance, high F max Design 2  140 W total power ~3.25 W per FPGA  88% mult., 4% logic  7 filter instances, high F max Design 3  152 W total power ~6.25 W per FPGA  0% mult., 77% logic  34 filter instances, low F max

Results: Kalman Filter in ProcStar II *Measured power is derived by subtracting baseline power consumption on ProcStar II board from measured power consumption and dividing by 4  Power consumed from board components not accounted for, actual FPGA power consumption is lower

Questions?