Design Exploration of a Human-machine Interface (HMI) Application Francis Li Sam Madden.

Slides:



Advertisements
Similar presentations
How to use TinyOS Jason Hill Rob Szewczyk Alec Woo David Culler An event based execution environment for Networked Sensors.
Advertisements

Phil Buonadonna, Jason Hill CS-268, Spring 2000 MOTE Active Messages Communication Architectures for Networked Mini-Devices Networked sub-devicesActive.
DSPs Vs General Purpose Microprocessors
Lecture 4 Introduction to Digital Signal Processors (DSPs) Dr. Konstantinos Tatas.
Advanced Computer Architectures Laboratory on DLX Pipelining Vittorio Zaccaria.
Mohammed Yousef Abd El ghany, Faculty of Eng., Comm. Dep., 3rd year. Digital Signal Processor The Heart of Modern Real-Time Control Systems.
 Understanding the Sources of Inefficiency in General-Purpose Chips.
TOSSIM A simulator for TinyOS Presented at SenSys 2003 Presented by : Bhavana Presented by : Bhavana 16 th March, 2005.
L27:Lower Power Algorithm for Multimedia Systems 성균관대학교 조 준 동
Introduction CS 524 – High-Performance Computing.
Processor Technology and Architecture
COMP3221: Microprocessors and Embedded Systems Lecture 2: Instruction Set Architecture (ISA) Lecturer: Hui Wu Session.
Jason Hill, Robert Szewczyk, Alec Woo Spring 2000 TinyOS Operating System for Networked Sensors Networked SensorsSystem Structure Composing Components.
Behavioral Design Outline –Design Specification –Behavioral Design –Behavioral Specification –Hardware Description Languages –Behavioral Simulation –Behavioral.
Reconfigurable Sensor Networks Chris Elliott Honours in Digital Systems Charles Greif and Nandita Bhattacharjee.
Generic Sensor Platform for Networked Sensors Haywood Ho.
Tiny OS Optimistic Lightweight Interrupt Handler Simon Yau Alan Shieh CS252, CS262A, Fall The.
Integrated  -Wireless Communication Platform Jason Hill.
Generic Sensor Platform for Networked Sensors Haywood Ho.
MAPLD 2005 A High-Performance Radix-2 FFT in ANSI C for RTL Generation John Ardini.
Electrical and Computer Engineering Minimal Movement Interactive Entertainment Unit Ryan Kelly Michael Lorenzo Ernie Wilson Chase Francis Professor Neal.
Chapter 4 Processor Technology and Architecture. Chapter goals Describe CPU instruction and execution cycles Explain how primitive CPU instructions are.
UCB November 8, 2001 Krishna V Palem Proceler Inc. Customization Using Variable Instruction Sets Krishna V Palem CTO Proceler Inc.
On the Energy Efficient Design of Wireless Sensor Networks Tariq M. Jadoon, PhD Department of Computer Science Lahore University of Management Sciences.
SSS 4/9/99CMU Reconfigurable Computing1 The CMU Reconfigurable Computing Project April 9, 1999 Mihai Budiu
Intel ® Research mote Ralph Kling Intel Corporation Research Santa Clara, CA.
Adaptive Video Coding to Reduce Energy on General Purpose Processors Daniel Grobe Sachs, Sarita Adve, Douglas L. Jones University of Illinois at Urbana-Champaign.
LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede.
Getting Started With DSP A. What is DSP? B. Which TI DSP do I use? Highest performance C6000 Most power efficient C5000 Control optimized C2000 TMS320C6000™
FPGA Based Fuzzy Logic Controller for Semi- Active Suspensions Aws Abu-Khudhair.
Resources: Hardware and Software Senior Design – Fall 2011.
Introduction to Microcontrollers Dr. Konstantinos Tatas
RaPTEX: Rapid Prototyping of Embedded Communication Systems Dr. Alex Dean & Dr. Mihai Sichitiu (ECE) Dr. Tom Wolcott (MEAS) Motivation  Existing work.
Spring 2000, 4/27/00 Power evaluation of SmartDust remote sensors CS 252 Project Presentation Robert Szewczyk Andras Ferencz.
The 6713 DSP Starter Kit (DSK) is a low-cost platform which lets customers evaluate and develop applications for the Texas Instruments C67X DSP family.
Bringing your technology to life…
Digital Signal Processors for Real-Time Embedded Systems By Jeremy Kohel.
Real time DSP Professors: Eng. Julian Bruno Eng. Mariano Llamedo Soria.
A System Architecture for Networked Sensors Jason Hill, Robert Szewczyk, Alec Woo, Seth Hollar, David Culler, Kris Pister
Trigger design engineering tools. Data flow analysis Data flow analysis through the entire Trigger Processor allow us to refine the optimal architecture.
Low-Power Wireless Sensor Networks
Research on Reconfigurable Computing Using Impulse C Carmen Li Shen Mentor: Dr. Russell Duren February 1, 2008.
HW/SW PARTITIONING OF FLOATING POINT SOFTWARE APPLICATIONS TO FIXED - POINTED COPROCESSOR CIRCUITS - Nalini Kumar Gaurav Chitroda Komal Kasat.
ASIP Architecture for Future Wireless Systems: Flexibility and Customization Joseph Cavallaro and Predrag Radosavljevic Rice University Center for Multimedia.
DSP Processors We have seen that the Multiply and Accumulate (MAC) operation is very prevalent in DSP computation computation of energy MA filters AR filters.
Overview of: System Architecture Directions for Networked Sensors John Kerwin CSE 291 Sensor Networks Paper by: Jason Hill, Robert Szewczyk, Alec Woo,
System Architecture Directions for Networked Sensors Jason Hill, Robert Szewczyk, Alec Woo, Seth Hollar, David Culler, Kris Pister Presented by Yang Zhao.
Minimizing Energy Consumption in Sensor Networks Using a Wakeup Radio Matthew J. Miller and Nitin H. Vaidya IEEE WCNC March 25, 2004.
Computer Organization & Assembly Language © by DR. M. Amer.
ATtiny23131 A SEMINAR ON AVR MICROCONTROLLER ATtiny2313.
SEED: A RISC Architecture Simulator Scriptable, Extensible, Emulator, and Debugger An Undergraduate Senior Research Project By: Ryan Moore Mentor: Dr.
28/03/2003Julie PRAST, LAPP CNRS, FRANCE 1 The ATLAS Liquid Argon Calorimeters ReadOut Drivers A 600 MHz TMS320C6414 DSPs based design.
Sub-Nyquist Sampling Algorithm Implementation on Flex Rio
Power and Control in Networked Sensors E. Jason Riedy and Robert Szewczyk Presenter: Fayun Luo.
System Architecture Directions for Networked Sensors Jason Hill, Robert Szewczyk, Alec Woo, Seth Hollar, David Culler, Kris Pister Presenter: James.
CISC Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware
A Systematic Approach to the Design of Distributed Wearable Systems Urs Anliker, Jan Beutel, Matthias Dyer, Rolf Enzler, Paul Lukowicz Computer Engineering.
Architecture Selection of a Flexible DSP Core Using Re- configurable System Software July 18, 1998 Jong-Yeol Lee Department of Electrical Engineering,
Adaptive Sleep Scheduling for Energy-efficient Movement-predicted Wireless Communication David K. Y. Yau Purdue University Department of Computer Science.
1. TMS320C6X DSP Programming with Simulink – TI C6000 DSP Target i) TI C6000 DSP target enables simulink blocks to model or program signal processing algorithm.
System-on-Chip Design Homework Solutions
Parallelizing Functional Tests for Computer Systems Using Distributed Graph Exploration Alexey Demakov, Alexander Kamkin, and Alexander Sortov
INTRODUCTION TO WIRELESS SENSOR NETWORKS
Sridhar Rajagopal Bryan A. Jones and Joseph R. Cavallaro
Andreas Hoffmann Andreas Ropers Tim Kogel Stefan Pees Prof
ECE354 Embedded Systems Introduction C Andras Moritz.
Embedded Systems Design
Introduction to Digital Signal Processors (DSPs)
Mapping DSP algorithms to a general purpose out-of-order processor
Presentation transcript:

Design Exploration of a Human-machine Interface (HMI) Application Francis Li Sam Madden

The Application Data glove interface –Wired, bulky SmartDust scenario –A mote on each fingertip Investigate implementations Explore design alternatives

Proof-of-Concept Prototype By SmartDust group –Atmel AVR Microprocessor –RFM TR1000 Radio –6 accelerometers –Host PC performs processing Analysis –Power: 45 mW measured –Continuous operation of processor, accelerometers, communication with host

Application Analysis Processing (on PC) –Do 20 times per second, for each accelerometer Read in X and Y samples (10 bits each) Compute rolling average to smooth input data Convert averages to polar coordinates –Dominates cost: sqrt, acos, atan –Secondary cost: floating point operations –Periodically, calculate gesture via simple template matching (static hand positions)

Application Analysis (cont) Communication (from Atmel to PC) –20 samples / sec 6 accelerometers 4 bytes/sample  480 bytes/sec –115.6 kb/sec RF link –Radio = 3V, when transmitting  1.2 mW for radio alone Real world power >> 1.2 mW, due to software and analog overhead ( real world analysis later )

Optimization Process Match Application to HW

Optimization Process Match Application to HW Match Hardware to Application

Optimization Process Match Application to HW –Local computation to reduce communication Match Hardware to Application

Optimization Process Match Application to HW –Local computation to reduce communication –Floating point  Fixed Point Match Hardware to Application

Optimization Process Match Application to HW –Local computation to reduce communication –Floating point  Fixed Point Match Hardware to Application –Distributed vs. Centralized

Optimization Process Match Application to HW –Local computation to reduce communication –Floating point  Fixed Point Match Hardware to Application –Distributed vs. Centralized –TI vs. Atmel

Optimization Process Match Application to HW –Local computation to reduce communication –Floating point  Fixed Point Match Hardware to Application –Distributed vs. Centralized –TI vs. Atmel –DSP

Optimization Process Match Application to HW –Local computation to reduce communication –Floating point  Fixed Point Match Hardware to Application –Distributed vs. Centralized –TI vs. Atmel –DSP

Communication vs.Computation Estimates of local processing cost on Atmel (via simulation of GCC program) Average: 2223 instr. x 2 CalcPolar: instr.  2.83x10 6 instructions Report gesture once per second FindGestureError: 5444 instr. 10 gestures, 6 accelerometers   3.26x10 5 instr. Memory operations are 2 cyles/instruction Total cycles ~ 3.7M  4Mhz  13.5 mW Communication = 8 bits/sec  negligible cost Loop 620 / sec

Communication vs.Computation 2 Cost of communication to Host PC (measured) 4317 nJ/bit From Culler, Hill, Szewczyk, Woo, “System Architecture For Networked Sensors.”  4317nJ/bit 480 bytes/sec 8 = mW Processor still sucks power –Current implementation requires 13.5mW –Using sleep, only 1.17 mW  mW total

Optimization Process Match Application to HW –Local computation to reduce communication –Floating point  Fixed Point Match Hardware to Application –Distributed vs. Centralized –TI vs. Atmel –DSP

Distributed vs. Centralized Move some processing to each sensor –6 processors Each computing average, polar transform Transmitting 4 x 8 = 32bits once/second Using Atmel processor on each mote –Computation ~.5M cycles/sec  2.7V  5.4mW –Communication Very small: 4317nJ 32 =.13 mW –5.53 mW/mote = 33.2 mW total (Bad Idea!)

Optimization Process Match Application to HW –Local computation to reduce communication –Floating point  Fixed Point Match Hardware to Application –Distributed vs. Centralized –TI vs. Atmel –DSP

TI Microcontroller Evaluation A microcontroller with better specs –MSP430P  A/Mhz active mode 1.5  A standby (6 ns wakeup) Used IAR Systems compiler, profiler, development environment Analysis –Centralized 3.3V, 4 Mhz: 3.8 mW –Distributed 2.5V, 1 Mhz: 0.48 mW per mote Six processors  2.9 mW

Optimization Process Match Application to HW –Local computation to reduce communication –Floating point  Fixed Point Match Hardware to Application –Distributed vs. Centralized –TI vs. Atmel –DSP

TI DSP Evaluation TMS320C54x Used TI Code Composer Studio, compiler, simulator Power –Active Mode, 3.3V 10 Mhz: 33 mW –IDLE1, 0.36 mW Analysis –Centralized: 7.8 mW –Distributed: 1.6 mW per mote Six processors = 9.6 mW total

TI DSP Evaluation Part 2 TMS320C55x (two parallel MACs) Same tools, with C55x compiler, simulator Power: No details available... –Advertised: 0.9V, 0.05 mW/Mhz Analysis –Centralized: cycles (vs x) 2 Mhz: 0.1 mW –Distributed: cycles (vs x) 1 Mhz: 0.05 mW Six processors: 0.3 mW total

Other Explorations Hand optimized code –Possible to massively reduce computation cost –FP/Transcendentals conspicuously painful –Outside scope of our exploration Radio Hardware –Bluetooth ~ 100 times more efficient Reconfigurable Computing Other circuitry (e.g. accelerometers)

Results Summary Cost, in mW of various implementations using sleep mode, 28 without 31/104 % improvement with same hardware 170x improvement with new hardware

Conclusions By finding better mappings from SW  HW  Application, big performance gains are possible. Effective use of local processor resources can reduce communication overheads, which are significant. DSPs and other specialized processors can be a big win and don’t require hand-coded assembly or reconfigurable design