Design Exploration of a Human-machine Interface (HMI) Application Francis Li Sam Madden.

Similar presentations

Presentation on theme: "Design Exploration of a Human-machine Interface (HMI) Application Francis Li Sam Madden."— Presentation transcript:

1 Design Exploration of a Human-machine Interface (HMI) Application Francis Li Sam Madden

2 The Application Data glove interface –Wired, bulky SmartDust scenario –A mote on each fingertip Investigate implementations Explore design alternatives

3 Proof-of-Concept Prototype By SmartDust group –Atmel AVR Microprocessor –RFM TR1000 Radio –6 accelerometers –Host PC performs processing Analysis –Power: 45 mW measured –Continuous operation of processor, accelerometers, communication with host

4 Application Analysis Processing (on PC) –Do 20 times per second, for each accelerometer Read in X and Y samples (10 bits each) Compute rolling average to smooth input data Convert averages to polar coordinates –Dominates cost: sqrt, acos, atan –Secondary cost: floating point operations –Periodically, calculate gesture via simple template matching (static hand positions)

5 Application Analysis (cont) Communication (from Atmel to PC) –20 samples / sec 6 accelerometers 4 bytes/sample  480 bytes/sec –115.6 kb/sec RF link –Radio = 12mA @ 3V, when transmitting  1.2 mW for radio alone Real world power >> 1.2 mW, due to software and analog overhead ( real world analysis later )

6 Optimization Process Match Application to HW

7 Optimization Process Match Application to HW Match Hardware to Application

8 Optimization Process Match Application to HW –Local computation to reduce communication Match Hardware to Application

9 Optimization Process Match Application to HW –Local computation to reduce communication –Floating point  Fixed Point Match Hardware to Application

10 Optimization Process Match Application to HW –Local computation to reduce communication –Floating point  Fixed Point Match Hardware to Application –Distributed vs. Centralized

11 Optimization Process Match Application to HW –Local computation to reduce communication –Floating point  Fixed Point Match Hardware to Application –Distributed vs. Centralized –TI vs. Atmel

12 Optimization Process Match Application to HW –Local computation to reduce communication –Floating point  Fixed Point Match Hardware to Application –Distributed vs. Centralized –TI vs. Atmel –DSP

13 Optimization Process Match Application to HW –Local computation to reduce communication –Floating point  Fixed Point Match Hardware to Application –Distributed vs. Centralized –TI vs. Atmel –DSP

14 Communication vs.Computation Estimates of local processing cost on Atmel (via simulation of GCC program) Average: 2223 instr. x 2 CalcPolar: 19017 instr.  2.83x10 6 instructions Report gesture once per second FindGestureError: 5444 instr. 10 gestures, 6 accelerometers  5444 60  3.26x10 5 instr. Memory operations are 2 cyles/instruction Total cycles ~ 3.7M  4Mhz  13.5 mW Communication = 8 bits/sec  negligible cost Loop 620 / sec

15 Communication vs.Computation 2 Cost of communication to Host PC (measured) 4317 nJ/bit From Culler, Hill, Szewczyk, Woo, “System Architecture For Networked Sensors.”  4317nJ/bit 480 bytes/sec 8 = 16.57 mW Processor still sucks power –Current implementation requires 13.5mW –Using sleep, only 1.17 mW  17.74 mW total

16 Optimization Process Match Application to HW –Local computation to reduce communication –Floating point  Fixed Point Match Hardware to Application –Distributed vs. Centralized –TI vs. Atmel –DSP

17 Distributed vs. Centralized Move some processing to each sensor –6 processors Each computing average, polar transform Transmitting 4 x 8 = 32bits once/second Using Atmel processor on each mote –Computation ~.5M cycles/sec  2mA @ 2.7V  5.4mW –Communication Very small: 4317nJ 32 =.13 mW –5.53 mW/mote = 33.2 mW total (Bad Idea!)

18 Optimization Process Match Application to HW –Local computation to reduce communication –Floating point  Fixed Point Match Hardware to Application –Distributed vs. Centralized –TI vs. Atmel –DSP

19 TI Microcontroller Evaluation A microcontroller with better specs –MSP430P112 330  A/Mhz active mode 1.5  A standby (6 ns wakeup) Used IAR Systems compiler, profiler, development environment Analysis –Centralized 3.3V, 4 Mhz: 3.8 mW –Distributed 2.5V, 1 Mhz: 0.48 mW per mote Six processors  2.9 mW

20 Optimization Process Match Application to HW –Local computation to reduce communication –Floating point  Fixed Point Match Hardware to Application –Distributed vs. Centralized –TI vs. Atmel –DSP

21 TI DSP Evaluation TMS320C54x Used TI Code Composer Studio, compiler, simulator Power –Active Mode, 3.3V 10 Mhz: 33 mW –IDLE1, 0.36 mW Analysis –Centralized: 7.8 mW –Distributed: 1.6 mW per mote Six processors = 9.6 mW total

22 TI DSP Evaluation Part 2 TMS320C55x (two parallel MACs) Same tools, with C55x compiler, simulator Power: No details available... –Advertised: 0.9V, 0.05 mW/Mhz Analysis –Centralized: 1170240 cycles (vs 2290440 54x) 2 Mhz: 0.1 mW –Distributed: 195040 cycles (vs 381740 54x) 1 Mhz: 0.05 mW Six processors: 0.3 mW total

23 Other Explorations Hand optimized code –Possible to massively reduce computation cost –FP/Transcendentals conspicuously painful –Outside scope of our exploration Radio Hardware –Bluetooth ~ 100 times more efficient Reconfigurable Computing Other circuitry (e.g. accelerometers)

24 Results Summary Cost, in mW of various implementations 17.74 using sleep mode, 28 without 31/104 % improvement with same hardware 170x improvement with new hardware

25 Conclusions By finding better mappings from SW  HW  Application, big performance gains are possible. Effective use of local processor resources can reduce communication overheads, which are significant. DSPs and other specialized processors can be a big win and don’t require hand-coded assembly or reconfigurable design

Download ppt "Design Exploration of a Human-machine Interface (HMI) Application Francis Li Sam Madden."

Similar presentations

Ads by Google