Presentation is loading. Please wait.

Presentation is loading. Please wait.

© Krithi Ramamritham / Kavi Arya 1 System Software for Embedded Systems Krithi Ramamritham Kavi Arya IIT Bombay VLSI 2004.

Similar presentations

Presentation on theme: "© Krithi Ramamritham / Kavi Arya 1 System Software for Embedded Systems Krithi Ramamritham Kavi Arya IIT Bombay VLSI 2004."— Presentation transcript:

1 © Krithi Ramamritham / Kavi Arya 1 System Software for Embedded Systems Krithi Ramamritham Kavi Arya IIT Bombay VLSI 2004

2 © Krithi Ramamritham / Kavi Arya 2 Embedded Systems?

3 © Krithi Ramamritham / Kavi Arya 3 Plan Embedded Systems –Introduction –Application Examples New Approaches to building ESW –New paradigms: Lava, Handel-C –Examples + “Engineering Returns to Software” –Build a RISC processor in 48hrs –Advantages of reconfigurable hardware. Real-time support for ESW

4 © Krithi Ramamritham / Kavi Arya 4 Embedded Systems Single functional e.g. pager, mobile phone Tightly constrained –cost, size, performance, power, etc. Reactive & real-time –e.g. car’s cruise controller –delay in computation => failure of system

5 © Krithi Ramamritham / Kavi Arya 5 Hardware is not the whole System !!! A Micro-Electronic System is the result of a projection of … –Architecture –Hardware –Software … distinguished by its gross Functional Behaviour ! Software is an important part of the Product and must be part of the Design Process … or we are only designing a Component of the system.

6 © Krithi Ramamritham / Kavi Arya 6 Why Is Embedded Software Not Just Software On Small Computers? Embedded = Dedicated Interaction with physical processes –sensors, actuators, processes Critical properties are not all functional –real-time, fault recovery, power, security, robustness Heterogeneity –hardware/software tradeoffs, mixed architectures Concurrency –interaction with multiple processes Reactivity –operating at the speed of the environment These features look more like hardware! + These features look more like hardware! Source: Edward A. Lee, UC Berkeley SRC/ETAB Summer Study 2001 Source: Edward A. Lee, UC Berkeley SRC/ETAB Summer Study 2001

7 © Krithi Ramamritham / Kavi Arya 7 What is Embedded SW? One definition: “Software that is directly in contact with, or significantly affected by, the hardware that it executes on, or can directly influence the behavior of that hardware.”

8 © Krithi Ramamritham / Kavi Arya 8 What is Embedded SW? What is it not ? Application software can be recompiled and executed on any number of hardware platforms so long as the basic services/libraries are provided. –It is divided by vertical market segments (application domains) –Well-established methodologies, architectures,… –HW platform independent, highly portable Any SW that has no direct relationship with HW.

9 © Krithi Ramamritham / Kavi Arya 9 Embedded System Challenges for HW Folks PARADIGM CHANGE! –Designers main tasks convert from processor integration to performance analysis. Concentration on functional requirements instead of integration work –Concentration on architectural exploration (including performance analysis  Re-use and Platform-based design become key!  Early validation of system/solution correctness  Parallel hardware and software development  More effective use of previous work  Faster ways to build new elements of a solution  Ways to test more effectively, efficiently, quickly

10 © Krithi Ramamritham / Kavi Arya 10 Software Guys can Learn from Hardware Experts! Concurrency –the synchrony abstraction –event-driven modeling Reusability –cell libraries –interface definition Reliability –leveraging limited abstractions –leveraging verification Heterogeneity –mixing synchronous and asynchronous designs –resource management Source: Edward A. Lee, UC Berkeley SRC/ETAB Summer Study 2001 Source: Edward A. Lee, UC Berkeley SRC/ETAB Summer Study 2001

11 © Krithi Ramamritham / Kavi Arya 11 Trade-offs. Methodology ESW Architectural specifics Portability –ESW itself is intended to provide portability for higher SW layers –(At least parts of) ESW is per definition not portable Real-time –Restricted use of standardized Inter-process communication (IPC) mechanisms (CORBA,…) for performance reasons –Typically hard real-time requirements RTOS dependency –Implementation of OS like services –Sometimes shielding of the RTOS to higher level SW layers –Direct dependency on RTOS implementation

12 © Krithi Ramamritham / Kavi Arya 12 Functional Design & Mapping HW1HW2HW3HW4 Hardware Interface RTOS/Drivers Threa d Architectural Design F1 F2 F3 F4 F5 Functional Design (F3)(F4) (F5) (F2) Source: Ian Phillips, ARM VSIA 2001 Source: Ian Phillips, ARM VSIA 2001

13 © Krithi Ramamritham / Kavi Arya 13 The Embedded Market: Disruptive Change Traditional Embedded World Never small enough Never fast enough Headless/Character-based Standalone Boot & Run from ROM More Hardware than Software Low-Level Programming Model Application tied to hardware Today’s Embedded World Never functional enough Always connected High Integration Chips (ASIC/SOC) Architectural diversity COTS & custom hardware EPROM/Flash/Rotating Media Software Intensive Web interfaces OOP Programming Model Standard applications Time to Market Pressures Shortage of Embed. SW Engineers Source: Jim Ready President / CEO MontaVista Software

14 © Krithi Ramamritham / Kavi Arya 14 Plan Embedded Systems New Approaches to building ESW –New paradigms: Lava, Handel-C –Examples + “Engineering Returns to Software” –Build a RISC processor in 48hrs –Advantages of reconfigurable hardware. Real-time support for ESW

15 © Krithi Ramamritham / Kavi Arya 15 Motorola Software Survey Findings Hardware design is a software task: IC designers write code (VHDL, Verilog, Scripting)! We must become a software-intensive embedded system solutions company, focused on integrating our platforms into users’ products - in the future we’ll be neither a hardware nor a software company –Focus on developing systems capability, not just a software counterpart to our current hardware capability (though that’s needed too) –We should have software content from drivers to applications The fundamental goal isn’t 70% margin on software products, it’s helping someone choose your total solution –Embedded systems platforms and solutions will be the key to market differentiation and profitable growth Source: Bob Altizer, BASYS VSIA 2001 Source: Bob Altizer, BASYS VSIA 2001

16 © Krithi Ramamritham / Kavi Arya 16 Common Design Metrics NRE (Non-recurring engineering) cost Unit cost Size (bytes, gates) Performance (execution time) Power (more power=> more heat & less battery time) Flexibility (ability to change functionality)

17 © Krithi Ramamritham / Kavi Arya 17 Time to prototype Time to market Maintainability Correctness Safety (probability that system won’t cause harm) Common Design Metrics

18 © Krithi Ramamritham / Kavi Arya 18 Time to Market Design Metric Simplified revenue model –Product life = 2W, peak at W –Time of market entry defines a triangle, representing market penetration –Triangle area equals revenue Loss –The difference between the on-time and delayed triangle areas Avg. time to market today = 8 mth 1 day delay may amount to $Ms –see Sony Playstation vs XBox On-time Delayed entry Peak revenue Peak revenue from delayed entry Market rise Market fall W2W Time D On-time Delayed Revenues ($) Source: Embedded System Design: Frank Vahid/ Tony Vargis (John Wiley & Sons, Inc.2002)

19 © Krithi Ramamritham / Kavi Arya 19 NRE and unit cost metrics But, must also consider time-to-market Compare technologies by costs -- best depends on quantity – Technology A: NRE=$2,000, unit=$100 – Technology B: NRE=$30,000, unit=$30 – Technology C: NRE=$100,000, unit=$2 Source: Embedded System Design: Frank Vahid/ Tony Vargis (John Wiley & Sons, Inc.2002)

20 © Krithi Ramamritham / Kavi Arya 20 Losses due to delayed market entry Area = 1/2 * base * height –On-time = 1/2 * 2W * W –Delayed = 1/2 * (W-D+W)*(W-D) Percentage revenue loss = (D(3W- D)/2W 2 )*100% Try some examples On-time Delayed entry Peak revenue Peak revenue from delayed entry Market rise Market fall W2W Time D On-time Delayed Revenues ($) –Lifetime 2W=52 wks, delay D=4 wks –(4*(3*26 –4)/2*26^2) = 22% –Lifetime 2W=52 wks, delay D=10 wks –(10*(3*26 –10)/2*26^2) = 50% –Delays are costly! Source: Embedded System Design: Frank Vahid/ Tony Vargis (John Wiley & Sons, Inc.2002)

21 © Krithi Ramamritham / Kavi Arya 21 Trends Moore’s Law –IC transistor capacity doubles every 18 mths –1981: leading edge chip had 10k transistors –2002: leading edge chip has 150M transistors Designer productivity has improved due to better tools: –Compilation/Synthesis tools –Libraries/IP –Test/verification tools –Standards –Languages and frameworks (Handel-C, Lava, Esterel, …) –1981: designer produced 100 transistors per month –2002 designer produces 5000 transistors per month

22 © Krithi Ramamritham / Kavi Arya 22 Our New Understanding We have simultaneous optimisations of competing design metrics: speed, size, power, complexity, etc. We need a “Renaissance Engineer” –with holistic view of design process and comfortable with technologies ranging from hardware, software to formal methods Maturation of behavioral synthesis tools and other tools has enabled this kind of unified view of hardware/ software co-design. Design efforts now focus at higher levels of abstraction => abstract specifications now refined into programs and then into gates and logic. There is no fundamental difference of between what hardware and software can implement.

23 © Krithi Ramamritham / Kavi Arya 23 Designer Productivity “The Mythical Man Month” by Frederick Brooks ’75 More designers on team => lower productivity because of increasing communication costs between groups Consider 1M transistor project: - Say, a designer has productivity of 5000 transistor/mth - Each extra designer => decrease of 100 transistor/mth productivity in group due to comm. costs –1 designer1M/5000 = 200mth –10 designer1M/(10*4100) = 24.3mth –25 designer1M/(25*2600) = 15.3mth –27 designer1M/(27*2400) =15.4mth Need new design technology to shrink the design gap Source: Embedded System Design: Frank Vahid/ Tony Vargis (John Wiley & Sons, Inc.2002)

24 © Krithi Ramamritham / Kavi Arya 24 Design Productivity Gap Designer productivity has grown over the last decade Rate of improvement has not kept pace with the chip- capacity growth 1981: leading edge chip: –100 designers * 100 trans/mth => 10k trans complexity 2002: leading edge chip: –30k designer mth * 5k trans/mth => 150M trans complexity Designers at avg. of $10k pm => cost of building leading edge chips gone from $1M in 1981 to $300M in 2002 Need paradigm shift to cope with the complexities of system design

25 © Krithi Ramamritham / Kavi Arya 25 Plan Embedded Systems –Introduction –Application Examples New Approaches to building ESW –New paradigms: Lava, Handel-C –Examples + “Engineering Returns to Software” –Build a RISC processor in 48hrs –Advantages of reconfigurable hardware. Real-time support for ESW

26 © Krithi Ramamritham / Kavi Arya 26 Embedded Applications They are everywhere! wristwatches, washing machines, microwave ovens, elevators, mobile telephones, printers, FAX machines, telephone exchanges, automobiles, aircrafts

27 © Krithi Ramamritham / Kavi Arya 27 Embedded Apps A modern home – has one general purpose desktop PC – but has several embedded systems. More prevalent in industrial sectors –Dozens of embedded computers in modern automobiles – chemical and nuclear power plants

28 © Krithi Ramamritham / Kavi Arya 28 Embedded Applications An embedded system typically has a digital signal processor and a variety of I/O devices connected to sensors and actuators. Computer (controller) is surrounded by other subsystems, sensors and actuators Computer -- Controller's function is : to monitor parameters of physical processes of its surrounding system to control these processes whenever needed.

29 © Krithi Ramamritham / Kavi Arya 29 Simple Examples A simple thermostat controller periodically reads the temperature of the chamber switches on or off the cooling system. a pacemaker constantly monitors the heart paces the heart when heart beats are missed

30 © Krithi Ramamritham / Kavi Arya 30 Open loop temperature control Closed loop temperature control

31 © Krithi Ramamritham / Kavi Arya 31 Feedback Control Feedforward Control

32 © Krithi Ramamritham / Kavi Arya 32 Example: Elevator Controller

33 © Krithi Ramamritham / Kavi Arya 33 Remote Camera-based Survelliance Observers and the observed sites connected through a network. Input from sites displayed at observers' end at regular intervals. Need: System should capture, process and transmit images at regular intervals, predictably

34 © Krithi Ramamritham / Kavi Arya 34 When there is an alarm Observer redirects one or more cameras to zoom in on to a specific part of a site. Sends commands with the necessary pan/tilt/zoom parameters across the network. Cameras retarget their views within bounded time and start transmitting as before, scenes from the chosen location.

35 © Krithi Ramamritham / Kavi Arya 35 What do we need? timely transmission of user needs from observer to camera. camera platform retargeting the camera within bounded time. camera capturing images at regular intervals images sent to observers predictably across the network

36 © Krithi Ramamritham / Kavi Arya 36 Functional Design & Mapping HW1HW2HW3HW4 Hardware Interface RTOS/Drivers Threa d Architectural Design F1 F2 F3 F4 F5 Functional Design (F3)(F4) (F5) (F2) Source: Ian Phillips, ARM VSIA 2001 Source: Ian Phillips, ARM VSIA 2001

37 © Krithi Ramamritham / Kavi Arya 37 Examples of Embedded Systems We will look at the details of A simple Digital Camera Digital Flight Control Plastic Injection Molding What the future holds… e.g., automotive electronics

38 © Krithi Ramamritham / Kavi Arya 38 Digital camera… Only recently possible –Systems-on-a-chip Multiple processors and memories on one IC –High-capacity flash memory Source: Embedded System Design: Frank Vahid/ Tony Vargis (John Wiley & Sons, Inc.2002)

39 © Krithi Ramamritham / Kavi Arya 39 Designer’s perspective: t wo key tasks Processing images and storing in memory When shutter pressed: –Image captured –Converted to digital form by charge-coupled device (CCD) –Compressed and archived in internal memory Uploading images to PC Digital camera attached to PC Special software commands camera to transmit archived images serially

40 © Krithi Ramamritham / Kavi Arya 40 Compression Store more images Transmit image to PC in less time JPEG (Joint Photographic Experts Group)

41 © Krithi Ramamritham / Kavi Arya 41 Requirements Specification System’s requirements – what system should do –Nonfunctional requirements Constraints on design metrics (e.g., “should use 0.001 watt or less”) –Functional requirements System’s behavior (e.g., “output X should be input Y times 2”) –….

42 © Krithi Ramamritham / Kavi Arya 42 Requirements Specification… Initial specification is general - from marketing dept. E.g., short document detailing market need for a low-end digital camera that: –captures and stores at least 50 low-res images and uploads to PC, –costs around $100 with single medium-size IC costing less that $25, –has long as possible battery life, –expected sales vol. =200,000 if mkt entry < 6 mths –100,000 if between 6 and 12 months, –insignificant sales beyond 12 months

43 © Krithi Ramamritham / Kavi Arya 43 Nonfunctional requirements Design metrics of importance based on initial specification – Performance : time required to process image – Size : number of elementary logic gates (2-input NAND gate) in IC – Power : measure of avg. electrical energy consumed while processing – Energy : battery lifetime (power x time)

44 © Krithi Ramamritham / Kavi Arya 44 Nonfunctional requirements… Constrained metrics –Values must be below (sometimes above) certain threshold Optimization metrics –Improved as much as possible to improve product Metric can be both constrained and optimization

45 © Krithi Ramamritham / Kavi Arya 45 Nonfunctional requirements… Power –Must operate below certain temperature (cooling fan not possible) –Therefore, constrained metric Energy –Reducing power or time reduces energy –Optimized metric: want battery to last as long as possible

46 © Krithi Ramamritham / Kavi Arya 46 Nonfunctional requirements… Performance –Must process image fast enough to be useful –1 sec reasonable constraint Slower would be annoying Faster not necessary for low-end of market –Therefore, constrained metric Size –Must use IC that fits in reasonably sized camera –Constrained and optimization metric Constraint may be 200,000 gates, but smaller would be cheaper

47 © Krithi Ramamritham / Kavi Arya 47 Informal functional specification Flowchart breaks functionality down into simpler functions Each function’s details described in English Low quality image has resolution of 64 x 64 Mapping functions to a particular processor type not done at this stage serial output e.g., 011010... yes no CCD input Zero-bias adjust DCT Quantize Archive in memory More 8×8 blocks? Transmit serially yes no Done ?

48 © Krithi Ramamritham / Kavi Arya 48 Informal functional specification serial output e.g., 011010... yes no CCD input Zero-bias adjust DCT Quantize Archive in memory More 8×8 blocks ? Transmit serially yes no Done ?

49 © Krithi Ramamritham / Kavi Arya 49 Refined functional specification Refine informal specification into one that can actually be executed Can use C-like code to describe each function –Called system-level model, prototype, or simply model –Also is first implementation Image file 10101101 01101010 10010101 101... CCD.C CNTRL. C UART.C output file 101010101 010101010 101010101 0... CODEC. C CCDPP. C Executable model of digital camera

50 © Krithi Ramamritham / Kavi Arya 50 Design Determine system’s architecture –Processors Any combination of single-purpose (custom or standard) or general-purpose processors –Memories, buses Map functionality to that architecture –Multiple functions on one processor –One function on one or more processors

51 © Krithi Ramamritham / Kavi Arya 51 Design.. Implementation –A particular architecture and mapping –Solution space is set of all implementations Starting point –Low-end general-purpose processor connected to flash memory All functionality mapped to software running on processor Usually satisfies power, size, time-to-market constraints If timing constraint not satisfied then try: –use single-purpose processors for time-critical functions –rewrite functional specification

52 © Krithi Ramamritham / Kavi Arya 52 Implementation 1: Microcontroller alone Low-end processor could be Intel 8051 microcontroller Total IC cost including NRE about $5 Well below 200 mW power Time-to-market about 3 months However…

53 © Krithi Ramamritham / Kavi Arya 53 Implementation 1: Microcontroller alone… However, one image per second not possible –12 MHz, 12 cycles per instruction Executes one million instructions per second – CcdppCapture has nested loops resulting in 4096 (64 x 64) iterations ~100 assembly instructions each iteration 409,000 (4096 x 100) instructions per image Half of budget for reading image alone –Would be over budget after adding compute-intensive DCT and Huffman encoding

54 © Krithi Ramamritham / Kavi Arya 54 Implementation 2: Microcontroller and CCDPP 8051 UART CCDPP RAM EEPROM SOC

55 © Krithi Ramamritham / Kavi Arya 55 Implementation 2: Microcontroller and CCDPP CCDPP function on custom single-purpose processor –Improves performance – less microcontroller cycles –Increases NRE cost and time-to-market –Easy to implement: Simple datapath, Few states in controller Simple UART easy to implement as single-purpose processor also EEPROM for program memory and RAM for data memory added as well 8051 UART CCDPP RAM EEPROM SOC

56 © Krithi Ramamritham / Kavi Arya 56 Microcontroller Synthesizable version of Intel 8051 available –Written in VHDL –Captured at register transfer level (RTL) Fetches instruction from ROM Decodes using Instruction Decoder ALU executes arithmetic operations –Source and destination registers reside in RAM Special data movement instructions used to load and store externally Special program generates VHDL description of ROM from output of C compiler/linker To External Memory Bus Controller 4K ROM 128 RAM Instruction Decoder ALU Block diagram of Intel 8051 processor core

57 © Krithi Ramamritham / Kavi Arya 57 Implementation 2: Microcontroller and CCDPP Analysis of implementation 2 –Total execution time for processing one image: 9.1 seconds –Power consumption: 0.033 watt –Energy consumption: 0.30 joule (9.1 s x 0.033 watt) –Total chip area: 98,000 gates

58 © Krithi Ramamritham / Kavi Arya 58 Implementation 3: Microcontroller and CCDPP/Fixed-Point DCT 9.1 seconds still doesn’t meet performance constraint of 1 second DCT operation prime candidate for improvement –Execution of implementation 2 shows microprocessor spends most cycles here –Could design custom hardware like we did for CCDPP More complex so more design effort –Instead, will speed up DCT functionality by modifying behavior

59 © Krithi Ramamritham / Kavi Arya 59 DCT floating-point cost Floating-point cost –DCT uses ~260 floating-point operations per pixel transformation –4096 (64 x 64) pixels per image –1 million floating-point operations per image –No floating-point support with Intel 8051 Compiler must emulate –Generates procedures for each floating-point operation: mult, add –Each procedure uses tens of integer operations –Thus, > 10 million integer operations per image –Procedures increase code size Fixed-point arithmetic can improve on this

60 © Krithi Ramamritham / Kavi Arya 60 Implementation 3: Microcontroller and CCDPP/Fixed-Point DCT Analysis of implementation 3 –Use same analysis techniques as implementation 2 –Total execution time for processing one image: 1.5 seconds –Power consumption: 0.033 watt (same as 2) –Energy consumption: 0.050 joule (1.5 s x 0.033 watt) Battery life 6x longer!! –Total chip area: 90,000 gates 8,000 less gates (less memory needed for code)

61 © Krithi Ramamritham / Kavi Arya 61 Implementation 4: Microcontroller and CCDPP/DCT Performance close but not good enough Must resort to implementing CODEC in hardware –Single-purpose processor to perform DCT on 8 x 8 block 8051 UARTCCDP P RAM EEPROM SOC CODEC

62 © Krithi Ramamritham / Kavi Arya 62 Implementation 4: Microcontroller and CCDPP/DCT Analysis of implementation 4 –Total execution time for processing one image: 0.099 seconds (well under 1 sec) –Power consumption: 0.040 watt Increase over 2 and 3 because SOC has another processor –Energy consumption: 0.00040 joule (0.099 s x 0.040 watt) Battery life 12x longer than previous implementation!! –Total chip area: 128,000 gates, significant increase over previous implementations

63 © Krithi Ramamritham / Kavi Arya 63 Digital Camera -- Summary Digital camera example –Specifications in English and executable language –Design metrics: performance, power and area Several implementations –Microcontroller: too slow –Microcontroller and coprocessor: better, but still too slow –Fixed-point arithmetic: almost fast enough –Additional coprocessor for compression: fast enough, but expensive and hard to design –Tradeoffs between hw/sw

64 © Krithi Ramamritham / Kavi Arya 64 Summary of implementations Implementation 3 Close performance Cheaper Less time to build Implementation 4 –Great performance and energy consumption –More expensive and may miss time-to-market window If DCT designed ourselves then increased NRE cost and time-to- market If existing DCT purchased then increased IC cost Which is better?

65 © Krithi Ramamritham / Kavi Arya 65 CLIENT - pilot SERVER - simulator 2. Flight Simulator Constraints on responses to pilot inputs, aircraft state updates

66 © Krithi Ramamritham / Kavi Arya 66 CLIENT SERVER Time Periods to meet Timing Requirements RequirementChoice MadeRationale Continuous pilot inputs should be polled at rates greater than 16 ms The time period of the writer on Client should be less than 16 ms The writer thread on the Client polls for the pilot inputs from the joystick

67 © Krithi Ramamritham / Kavi Arya 67 CLIENT SERVER Time Periods to meet Timing Requirements… RequirementChoice MadeRationale The state of the aircraft is to be advanced at 12.5 ms time steps The time period of the Flight Dynamics thread on the Server is 12.5 ms The flight dynamics thread on the Server advances the state of the system

68 © Krithi Ramamritham / Kavi Arya 68 Time Periods to meet Timing Requirements… RequirementChoice MadeRationale Response time for pilots should be less than 150 ms for commercial aircrafts and 100 ms for fighter aircrafts Reader and Writer threads on Server, and the Reader thread on the Client should be as fast as the system permits. (Time period of 4ms in our case) Delay in data transfer at these threads increases the response time These threads should be interrupt driven in order to minimize the response time

69 © Krithi Ramamritham / Kavi Arya 69 Example: Injection Molding –Keep plastic at proper temperature (liquid, not boiling) –Control injector solenoid (make sure that the motion of the solenoid terminates before the piston reaches the end of its travel. Source: “Laboratory for Perceptual Robotics, UMass” Copyright 1996 by Roderic A. Grupen

70 © Krithi Ramamritham / Kavi Arya 70 Controlling a reaction we know: –if temperature too high, it explodes –maximum rate of temperature increase –rate of cooling events: –temperature change –temperature > safe threshold we can derive: –how often we have to check temperature –when we have to finish cooling

71 © Krithi Ramamritham / Kavi Arya 71 Example – Injection Molding (cont.) – Timing constraints

72 © Krithi Ramamritham / Kavi Arya 72 Example – Injection Molding (cont.) – Concurrent control tasks

73 © Krithi Ramamritham / Kavi Arya 73 Examples of Embedded Systems We looked at details of A simple Digital Camera Digital Flight Control Plastic Injection Molding The world gets exciting… e.g. Automotive electronics

74 © Krithi Ramamritham / Kavi Arya 74 Automotive Electronics

75 © Krithi Ramamritham / Kavi Arya 75 Cruise Control Controls car speed Actuates the throttle valve by a cable connected to an actuator, instead of by pressing a pedal. The throttle valve controls the power and speed of the engine by limiting how much air the engine takes in. engine

76 © Krithi Ramamritham / Kavi Arya 76 Control Architecture for Cruise Control

77 © Krithi Ramamritham / Kavi Arya 77 State Machine for Activation

78 © Krithi Ramamritham / Kavi Arya 78 Adaptive Cruise Control with Driver Alert Helps to reduce the need for drivers to manually adjust speed or disengage cruise control when encountering Slower traffic. Automatically manages vehicle speed to maintain a distance set by the driver. Alerts drivers when slower traffic is detected in the path. Audible and visual alerts warn the driver when braking is necessary to avoid slower moving vehicles ahead. Drivers can adjust system sensitivity to their preferred driving style.

79 © Krithi Ramamritham / Kavi Arya 79 Web Servers… get smaller

80 © Krithi Ramamritham / Kavi Arya 80 iPic : Tiny Web-Server 2mm*2mm, PIC 12c508 512b ROM, 24b RAM, 6bits IO, 4MHz RC

81 © Krithi Ramamritham / Kavi Arya 81 Plan Embedded Systems New Approaches to building ESW –New paradigms: Lava, Handel-C –Examples + “Engineering Returns to Software” –Build a RISC processor in 48hrs –Advantages of reconfigurable hardware. Real-time support for ESW

82 © Krithi Ramamritham / Kavi Arya 82 Lava Not so much a hardware description language More a style of circuit description Emphasises connection patterns Think of Lego

83 © Krithi Ramamritham / Kavi Arya 83 Lava Mary Sheeran, Koen Classen, & Satnam Singh Chalmers University (Sweden) Based on earlier work on MuFP to describe circuit functionality and layout in single language Built using functional programming paradigm

84 © Krithi Ramamritham / Kavi Arya 84 Behaviour and Structure f g g f f ->- g

85 © Krithi Ramamritham / Kavi Arya 85 Lava Properties Higher-order functions –Circuits are functions –May be passed as arguments to other functions. –=> Easier to produce parameterized circuits than with VHDL. Functions can return circuits as results – Circuit combinators take circuits as arguments, return circuits as results. –=> Powerful glue for composing circuits to form larger systems. Circuit combinators combine behavior + layout –Combinators lay out circuits in rows, columns, triangles, trees etc. Performance of circuit –Improved by exploring the layout design space by experimenting with alternative layout combinators. Examples of circuits produced: –High speed constant coefficient multipliers, finite impulse response filters (1D and 2D), adder tree networks and sorting butterfly networks.

86 © Krithi Ramamritham / Kavi Arya 86 Parallel Connection Patterns f -|- g g f

87 © Krithi Ramamritham / Kavi Arya 87 map f ffff

88 © Krithi Ramamritham / Kavi Arya 88 Four Sided Tiles

89 © Krithi Ramamritham / Kavi Arya 89 Column

90 © Krithi Ramamritham / Kavi Arya 90 Full Adder fa fa (cin, (a,b)) = (sum, cout) where part_sum = xor (a, b) sum = xorcy (part_sum, cin) cout = muxcy (part_sum, (a, cin)) a b cin cout sum

91 © Krithi Ramamritham / Kavi Arya 91 Generic Adder fa adder = col fa

92 © Krithi Ramamritham / Kavi Arya 92 Top Level adder16Circuit = do a <- inputVec ”a” (bit_vector 15 downto 0) b <- inputVec ”b” (bit_vector 15 downto 0) (s, carry) <- adder4 (a, b) sum <- outputVec ”sum” s (bit_vector 16 downto 0) ? circuit2VHDL ”add16” adder16Circuit ? circuit2EDIF ”add16” adder16Circuit ? circuit2Verilog ”add16” adder16Circuit

93 © Krithi Ramamritham / Kavi Arya 93 Xilinx FPGA Implementation 16-bit implementation on a XCV300 FPGA Vertical layout required to exploit fast carry chain No need to specify coordinates in HDL code

94 © Krithi Ramamritham / Kavi Arya 94 16-bit Adder Layout Source: Mary Sheeran Nov.2002

95 © Krithi Ramamritham / Kavi Arya 95 Four adder trees Source: Mary Sheeran Nov.2002

96 © Krithi Ramamritham / Kavi Arya 96 No Layout Information Source: Mary Sheeran Nov.2002

97 © Krithi Ramamritham / Kavi Arya 97 Plan Embedded Systems New Approaches to building ESW –New paradigms: Lava, Handel-C –Examples + “Engineering Returns to Software” –Build a RISC processor in 48hrs –Advantages of reconfigurable hardware. Real-time support for ESW

98 © Krithi Ramamritham / Kavi Arya 98 Handel-C Programming language - enables compilation of programs into synchronous hardware NOT Hardware Description Language - it’s a prog. language aimed at compiling high-level algorithms into gate-level hardware Syntax (loosely) based on “C” Handel-C is to hardware (gates) what “C” is to micro-assembly code

99 © Krithi Ramamritham / Kavi Arya 99 Handel-C (cont.) Inventor - Ian Page, Programming Research Group (Oxford University/UK) Semantics based on Hoare’s Communication Seq. Processes (CSP) model & Occam: transputer prog. language Industry heavyweights using tools: Marconi, Ericcson, BAe, Creative Labs, etc.

100 © Krithi Ramamritham / Kavi Arya 100 What this means Hardware design produced is exactly the hardware specified in source program No intermediate “interpreting” layer as in assembly language targeting general purpose microprocessor Logic gates are assembly instructions of Handel- C system Design/re-design/optimise at software level!!!

101 © Krithi Ramamritham / Kavi Arya 101 What This Means True parallelism –not time-shared (interpreted) parallelism of gen.purpose computers PAR {a;b} –instructions executed in // at same instant of time by 2 sep. pcs of hw Timing –branches that complete early forced to wait for slowest branch before continuing

102 © Krithi Ramamritham / Kavi Arya 102 Comparison with “C” Similar: - Programs inherently sequential - Similar control-flow constructs: if-then-else, switch, while, for, etc. Dissimilar : - No malloc/ dynamic store allocation - No recursion (limited rec. in macros) - No nested procedures - No stdin/stdout - “Void main()” - variable width words - PAR, etc.

103 © Krithi Ramamritham / Kavi Arya 103 Handel-C is based on ANSI-standard C without external library-functions: –I/O functions: printf(), putc(), scanf(),... –File functions: fopen(), fclose(), fprintf(),... –String-functions: length(), strcpy(), strcmp(),… –Math-functions: sin(), cos(), sqrt(),… –...

104 © Krithi Ramamritham / Kavi Arya 104 Supported declarations statements & instructions: Main program structure Variables Arrays Switch statement FOR Loop Comments Constants Scope & Variable sharing Arithmetic, Relational, Relational Logic ops Conditional Execution While loop Do … While Loop

105 © Krithi Ramamritham / Kavi Arya 105 Channel Communication link!v … link?v –channel input is form of assignment Provides link between parallel (‘//’) branches –One // branch outputs data onto channel –Other // branch reads data from channel => Synchronisation –data transfers only when both processes are ready

106 © Krithi Ramamritham / Kavi Arya 106 Additional Features & Statements Channel unsigned int 8 a; chan unsigned int 8 c; c ! 5; c ? A;

107 © Krithi Ramamritham / Kavi Arya 107 Additional Features & Statements Prialt prialt { case CommsStatement : Statement break;... default: Statement break; }

108 © Krithi Ramamritham / Kavi Arya 108 Example 1 (sum) Void main() {unsigned int 16 sum; // variable width word unsigned int 8 data; chanin input;// input/output chanout output; sum=0; do { input?data; sum = sum + (0@data); } while (data!=0); output!sum; } IMPORTANT – width!!

109 © Krithi Ramamritham / Kavi Arya 109 Example 2 (divider) #define DATA_WIDTH 16 Void main(void) {unsigned int DATA_WIDTH a, mult, result; unsigned int (DATA_WIDTH*2 -1) b; chanin input; chanout output; while (1) { input?a; input?result; b = result @ 0; mult = 1<< (DATA_WIDTH-1) result = 0; >>>> output ! Result; } result = integer(a / b)

110 © Krithi Ramamritham / Kavi Arya 110 Example 2 (cont.) while (mult != 0) { if (0 @ a) >= b) par {a -= b <- width(a); result != mult; } par {b = b >> 1; mult = mult >> 1; }

111 © Krithi Ramamritham / Kavi Arya 111 Example 3 Void main(void) { chan unsigned int undefined link[2]; chanin unsigned int 8 input; chanout unsigned int 8 output unsigned int undefined state[3]; par {while (1) // first queue location {input ? State[0]; link[0] ! State[0]; } while (1) // second queue location {link[0] ? State[1]; link[1] ! State[1]; } while (1) // third queue location {link[1] ? State[2]; output ! State[2]; } State[0]State[1]State[2] Parallel tasks Comm between tasks Array of variables Array of channels Parameterised on width inputoutput Link[0]Link[1]

112 © Krithi Ramamritham / Kavi Arya 112 Additional Features & Statements Timing An assignment statement takes exactly one clock cycle to execute. Everything else is free void main(void) { unsigned 8 x, y; … x = x + y; }

113 © Krithi Ramamritham / Kavi Arya 113 Timing/efficiency issues One clock source for entire program - Assignment & delay take one clock cycle - Expressions are “for free” Handel-C designed such that experienced programmer can immediately tell which instructions execute on which clock cycles Example x = y; x = (((y*z) + (w*v) )<<2)<-7; both statements take one clock cycle Clock at longest logic depth => reduce the depth of logic to speed up program => pipelining

114 © Krithi Ramamritham / Kavi Arya 114 Porting “C” to Handel-C Decide how software maps to hardware platform Partition algorithm between multiple FPGAs Port C to Handel-C & use simulator to check correctness Modify code to take advantage of extra operators in Handel-C - simulate to ensure correctness Add fine-grain parallelism through PAR & parallel assignments or parallellise algorithm - simulate Add hardware interfaces for target architecture & map simulator channels communications onto these interfaces - simulate Use FPGA place & route tools to generate FPGA images

115 © Krithi Ramamritham / Kavi Arya 115 Design Flow Overview Port algorithm to Handel-C Compile program file for simulator Use simulator to evaluate and debug design Add interfaces to external hardware Use Handel-C compiler to target h/w netlist Use FPGA tools to place & route netlist Program FPGA with result of place & route Modify/ debug program

116 © Krithi Ramamritham / Kavi Arya 116 Essence Software approach allows us to rapidly prototype applications for a given domain Handel-C provides a seamless approach to derive expressive and fast implementations from the software level Cost of silicon is falling & shortage of trained engineers & high cost of programmer time => Software based, high-level approaches to solving problems become increasingly attractive.

117 © Krithi Ramamritham / Kavi Arya 117 Handel-C Concepts (Recap) Describes hardware - h/w design produced = h/w in source program Logic gates are assembly instructions of Handel-C system Real parallelism – not interpreted Assignment, delay take 1 clock cycle; Expression evaluation is free No side-effects I.e. a++ is statement (not expression as in ‘C’) Variable width words => great performance improvement over software Min. datapath widths => minimal h/w usage

118 © Krithi Ramamritham / Kavi Arya 118 Additional Features & Statements Concurrency... par { } … { … }

119 © Krithi Ramamritham / Kavi Arya 119 Concurrency (example) void main(void) { unsigned 8 x, y; unsigned 5 temp1; unsigned 4 temp2;... temp1 = (0@(x <- 4)) + (0@(y <- 4)); temp2 = (x \\ 4) + (y \\ 4); x = (temp2 + (0@temp1[4])) @ temp1[3:0]; }

120 © Krithi Ramamritham / Kavi Arya 120 Additional Features & Statements Concurrency... par { temp1=(0@(x<- 4))+(0@(y<-4)); temp2=(x\\4)+(y\\4); } x=(temp2+(0@temp1[4]))@t emp1[3:0];...

121 © Krithi Ramamritham / Kavi Arya 121 Features & Statements (contd.) Delay... par { x = 1; { delay; x=2; } while (x == 0) delay;

122 © Krithi Ramamritham / Kavi Arya 122 Additional Features & Statements Channel unsigned int 8 a; chan unsigned int 8 c; c ! 5; c ? A; Single variable must not be accessed by >1 // branch => par {out!3; out!4 }// illegal

123 © Krithi Ramamritham / Kavi Arya 123 Features & Statements(contd.) Macros(Examples - contd) –Combinatorial macro expr abs(a) = ((a) [width(a)-1] == 0 ? (a) : (-a)); shared expr incwrap(e, m) = (((e==m) ? 0 : (e)+1); –Recursive macro expr copy (e, n) = select(n==1, (e), copy(e, n/2) @ copy(e, n-(n/2)))

124 © Krithi Ramamritham / Kavi Arya 124 Features & Statements(contd) Operators for Bit Manipulation z = x <- 2; // Take least significant bits z = y \\ 2; // Drop least significant bits z = x @ y; // Concatenation z = x[3]; // Bit selection z = y[2:3]; // Bus selection z = width(x); // Width of expression Note: in the form y[m:n] the order is MSB:LSB Unsigned int 3 y = 4; y[0] is 0; y[2] is 1;

125 © Krithi Ramamritham / Kavi Arya 125 Additional Features & Statements External RAM / ROM ram unsigned int 4 ExtRAM[8] with {offchip = 1, data = {"P01", "P02", "P03", "P04"}, addr = {"P05", "P06", "P07"}, we = {"P08"}, oe = {"P09"}, cs = {"P10"} }; rom unsigned int 4 ExtROM[8] with {offchip = 1, data = {"P01", "P02", "P03", "P04"}, addr = {"P05", "P06", "P07"}, we = {}, oe = {"P09"}, cs = {"P10"} };

126 © Krithi Ramamritham / Kavi Arya 126 Additional Features & Statements Internal RAM / ROM ram unsigned int 8 speicher[256]; rom unsigned int 8 program[] = {1,2,3,4}; unsigned char i; i = 3; speicher[i] = 25; for (i = 0; i < 4; i++) stdout ! program[i];

127 © Krithi Ramamritham / Kavi Arya 127 Recursive Macro Expressions – Example Illustrates the generation of large quantities of hardware from simple macros. Multiplier whose width depends on the parameters of the macro. Starting point for generating large regular hardware structures using macros. Single-cycle long multiplication from single macro: macro expr multiply(x, y) = select(width(x) == 0, 0, multiply(x \\ 1, y << 1) + (x[0] == 1 ? y : 0)); a = multiply (b, c);

128 © Krithi Ramamritham / Kavi Arya 128 Timing

129 © Krithi Ramamritham / Kavi Arya 129 Additional Features & Statements Off-Chip Interface –Input, registered Input, latched Input –Output –Tristate Bus Off-Chip Interface (e xamples) interface bus_in (int 4) InBus() with {data = {"P1", "P2", "P3", "P4"} }; int 4 x; x =; interface bus_out () OutBus (x+y) with {data = {"P11", "P12", "P13", "P14"} };

130 © Krithi Ramamritham / Kavi Arya 130 Parallel Access to Variables Rules of parallelism: same variable must not be accessed from two separate parallel branches. (to avoid resource conflicts on the variables) Actually, the same variable must not be assigned to more than once on the same clock cycle but may be read as often as required (see wires!) Allows some useful and powerful programming techniques. eg: par { a = b; b = a; } // swaps values of a and b in single clock cycle.

131 © Krithi Ramamritham / Kavi Arya 131 Parallel Access to Variables Four place queue: while(1) { par {int x[3]; x[0] = in; x[1] = x[0]; x[2] = x[1]; // values at “out” delayed out = x[2]; // by 4 clock cycles }

132 © Krithi Ramamritham / Kavi Arya 132 Time Efficiency of Handel-C Hardware Requirement: Clock period for program to be longer than longest path thru combinatorial logic in whole program. => once FPGA place and route is done, max. clock-rate = 1/longest-path-delay Example: FPGA place and route tools calculate longest path delay between flip-flops in a design is 70nS. The max. clock rate is 1/70nS = 14.3MHz. Speed allowed by system: 400kHz - 100MHz BUT WHAT IF THIS IS NOT FAST ENOUGH

133 © Krithi Ramamritham / Kavi Arya 133 Improving Time Efficiency Reducing Logic Depth Avoid multiplication, avoid wide-adders, reduce complex expressions into stages, etc. unsigned 8 x; unsigned 8 y; unsigned 5 temp1; unsigned 4 temp2; par { temp1 = (0@(x<-4)) + (0@(y<-4)); temp2 = (x \\ 4) + (y \\ 4); } x = (temp2+(0@temp1[4])) @ temp1[3:0]; Pipelining => increased latency for higher throughput

134 © Krithi Ramamritham / Kavi Arya 134 Serialisation Multiplication in more than one clock cycle in order to save hardware Algorithm is parametrizable by a compile-time constant macro proc mult_serial(x, y, xy) { macro expr count_width = 5; macro expr steps = 1 << count_width; macro expr bits = width(xy) / steps; unsigned count_width count; par { xy = 0; count = 0; } do par { xy += (0 @ (x <- bits)) * y; x >>= bits; y <<= bits; count++; } while (count != 0); }

135 © Krithi Ramamritham / Kavi Arya 135 Serialisation Gatecount for a 32-bit multiplication

136 © Krithi Ramamritham / Kavi Arya 136 Plan Embedded Systems New Approaches to building ESW –New paradigms: Lava, Handel-C –Examples (“Engineering Returns to Software” –Build a RISC processor in 48hrs –Advantages of reconfigurable hardware. Real-time support for ESW

137 © Krithi Ramamritham / Kavi Arya 137 RISC-Processor Features: –16 instructions – 4 bit I/O Ports – one accumulator – Program memory (16x8 ROM) – Data memory (16x4 RAM) Problem: Execute a program stored in ROM to calculate the first few members of the Fibonacci number sequence. 1, 2, 3, 5, 8, 13, 21, 34, … fib(n) = 1if n=0 V n=1 fib(n) = fib(n-1) + fib(n-2)if n>=2

138 © Krithi Ramamritham / Kavi Arya 138 RISC-Processor Instruction Set

139 © Krithi Ramamritham / Kavi Arya 139 RISC-Processor (cont.) Program : chanin input; chanout output; // Parameterisation #define dw 32 /* Data width */ #define opcw 4 /* Op-code width */ #define oprw 4 /* Operand width */ #define rom_aw 4 /* Width of ROM address bus */ #define ram_aw 4 /* Width of RAM address bus */ // The opcodes #define HALT 0 #define LOAD 1 #define LOADI 2 #define STORE 3 #define ADD 4 #define SUB 5 #define JUMP 6 #define JUMPNZ 7 #define INPUT 8 #define OUTPUT 9 // The assembler macro #define _asm_(opc, opr) (opc + (opr << opcw))

140 © Krithi Ramamritham / Kavi Arya 140 RISC-Processor(cont.) I/O Interface unsigned int dw output; interface bus_clock_in (unsigned int 1) reset() with {data = reset_pin}; interface bus_in (unsigned int dw) input() with {data = in_pins}; interface bus_out () out(output) with {data = out_pins}; Definition of available opcode #define HLD 0 #define NOP 1 #define OUT 2 #define IN 3... #define SRA 15

141 © Krithi Ramamritham / Kavi Arya 141 RISC-Processor Declaration of FPGA and Pinning set family = Altera10K; set part = "EPF10K70RC240-3"; set clock = external "91"; macro expr in_pins = {"38", "83", "101", "148"}; macro expr out_pins = {"153", "202", "218", "19"}; macro expr reset_pin = {"45"}; Defining Parameters #define dw 4 /* Data width */ #define opcw 4 /* Op-code width */ #define oprw 4 /* Operand width */ #define rom_aw 4 /* Width of ROM addr bus */ #define ram_aw 4 /* Width of RAM addr bus */

142 © Krithi Ramamritham / Kavi Arya 142 RISC-Processor (cont.) Program (cont): // Rom program data rom unsigned int undefined program[] = { _asm_(LOADI, 1), /* 0 */ /* Get a one */ _asm_(STORE, 3),/* 1 */ /* Store this */ _asm_(STORE, 1), /* 2 */ _asm_(INPUT, 0), /* 3 */ /* Read value from user */ _asm_(STORE, 2), /* 4 */ /* Store this */ _asm_(LOAD, 1), /* 5 */ /* Loop entry point */ _asm_(ADD, 0), /* 6 */ /* Make a fib number */ _asm_(STORE, 0), /* 7 */ /* Store it */ _asm_(OUTPUT, 0), /* 8 */ /* Output it */ _asm_(ADD, 1), /* 9 */ /* Make a fib number */ _asm_(STORE, 1), /* a */ /* Store it */ _asm_(OUTPUT, 0), /* b */ /* Output it */ _asm_(LOAD, 2), /* c */ /* Decrement counter */ _asm_(SUB, 3), /* d */ _asm_(JUMPNZ, 4), /* e */ /* Repeat if not zero */ _asm_(HALT, 0) /* f */ };

143 © Krithi Ramamritham / Kavi Arya 143 RISC-Processor (cont.) Program (cont): /* RAM for processor */ ram unsigned int dw data[1 << ram_aw]; /* Processor registers */ unsigned int rom_aw pc; /* Program counter */ unsigned int (opcw+oprw) ir; /* Instruction register */ unsigned int dw x; /* Accumulator */ /* Macros to extract opcode and operand fields */ #define opcode (ir <- opcw) #define operand (ir \\ opcw)

144 © Krithi Ramamritham / Kavi Arya 144 RISC-Processor (cont.) Program (cont): /* Main program */ void main(void) { pc = 0; // Processor loop do { // fetch par { ir = program[pc]; pc = pc + 1; } /* === MAIN DECODE/EXECUTE ===*/ } while (opcode != HALT); } /* main program */

145 © Krithi Ramamritham / Kavi Arya 145 RISC-Processor (cont.) Program (cont): // decode and execute switch (opcode) { case LOAD : x = data[operand<-ram_aw]; break; case LOADI : x = 0 @ operand; break; case STORE : data[operand<-ram_aw] = x; break; case ADD : x = x+data[operand<-ram_aw]; break; case SUB : x = x-data[operand<-ram_aw]; break; case JUMP : pc = operand<-rom_aw; break; case JUMPNZ : if (x!=0) pc=operand<-rom_aw; break; case INPUT : input ? x; break; case OUTPUT : output ! x; break; default : while(1) delay; // unknown opcode }

146 © Krithi Ramamritham / Kavi Arya 146 RISC-Processor (cont.) The Final Program! (Don’t worry if you can’t read it - fits on a page!!)

147 © Krithi Ramamritham / Kavi Arya 147 Simulation & debugging The simulator is integrated into the compiler. Executing a cycle-based simulation. Variables are traceable at any clock cycle. Port interface will be replaced by standard I/O. Handel-C simulator supports debugging at any clock-cycle. Highlighting of characteristic Values e.g. Area of any program line.

148 © Krithi Ramamritham / Kavi Arya 148 Some Representative Work “Customising Graphics Applications: Techniques & Programming Interface” Henry Styles & Wayne Luk, Proceedings of IEEE Symposium on Field Programmable Custom Computing Machines, IEEE Computer Society Press, 2000. Exploit custom data-formats and datapath widths to optimise graphics operations such as texture mapping & hidden- surface removal. Discusses techniques for balancing graphics pipeline Customised architectures captured in Handel-C compiled for Xilinx Virtex FPGAs Handel-C API based on OpenGL standard for automatic speedup of graphics applications, include Quake-2 action game.

149 © Krithi Ramamritham / Kavi Arya 149 The Graphics Pipeline

150 © Krithi Ramamritham / Kavi Arya 150 Performance Case Studies Geometric Visualisation Implementation MediumClock rate (MHz)Frame rate (FPS) Cost Software on PC40024$1,000 Xilinx XCV10004041$4,000 Nvidia TNT2 Ultra17055$200 Nvidia is a 3-D graphics chipset – I.e. specialised graphics ASIC Chart => FPGA platform fast approaching performance of dedicated graphics ASICfor gen. Purpose graphics applications

151 © Krithi Ramamritham / Kavi Arya 151 Performance Case Studies Infrared Simulation requires custom pixel format not supported by graphics ASICs Implementation MediumClock rate (MHz)Frame rate (FPS) Cost Software on PC40096$1,000 Xilinx XCV100040330$4,000 SGI Onyx2 Reality1802750$180,000 Onyx contains two 180 MHz MIPs processors, two Geometry Engine processors and two rasteriser ASICs, with a memory Bandwidth of 6.4 GB/sec (I.e. 10X cost & mem.b/w of FPGA

152 © Krithi Ramamritham / Kavi Arya 152 Some Observations FPGA renderer is a low-cost platform for custom graphics applications Development time of a customised FPGA renderer comparable to optimised software => effective to use a reconfigurable platform Good for reconfigurable designs where ASIC is not available or too expensive Useful in exploring desirable algorithms and architectures for ASICs Hardware renderer may be customised to maximixe performance for each application

153 © Krithi Ramamritham / Kavi Arya 153 Some Features of the Rapid Prototyping Board Full length 32 bit PCI card Virtex XCV1000: 1.000.000 system gates, 131 kBit Block RAM, 393 kBit SelectRAM Programmable clock 400 kHz to 100 MHz 4 banks of fast asynchronous 32 bit wide SRAM, each 2 Mbytes PCI interface: 32 bit, 33 MHz, 132 Mbytes/sec burst 2 x PMC sites for VME grade I/O & processing modules 50 pin Aux I/O, 8 LEDs

154 © Krithi Ramamritham / Kavi Arya 154 Summary Cost of silicon is falling & Products are getting more complex & Time-to-market shrinking rapidly & shortage of trained engineers & cost of programmer time is major constraint => Software based, high-level approaches to solving problems become increasingly attractive. New generation of languages let us build systems at high level of abstraction. High-density FPGAs and SoCs allow complex designs to be rapidly prototyped => reduce the development cycle of new technology – perhaps even to deploy final product as “soft cores”. Broader understanding demanded from system designer – need “Renaissance Engineer” with equal understanding of hardware and software.

155 © Krithi Ramamritham / Kavi Arya 155 Plan Embedded Systems New Approaches to building ESW Real-Time Support –Special Characteristics of Real-Time Systems –Real-Time Constraints –Canonical Real-Time Applications –Scheduling in Real-time systems –Operating System Approaches

156 © Krithi Ramamritham / Kavi Arya 156 computer world real world e.g., PCindustrial system, airplane average response for user, events occur in environment at own speed interactive occasionally longer reaction too slow: deadline miss reaction: user annoyed reaction: damage, pot. loss of human life computer controls speed of user computer must follow speed of environment “computer time”“real-time” What is “real” about real-time?

157 © Krithi Ramamritham / Kavi Arya 157 A real-time system is a system that reacts to events in the environment by performing predefined actions I/O - data Real-Time Systems Real-time computing system event action within specified time intervals. time

158 © Krithi Ramamritham / Kavi Arya 158 CLIENT SERVER Flight Avionics Constraints on responses to pilot inputs, aircraft state updates

159 © Krithi Ramamritham / Kavi Arya 159 Constraints: –Keep plastic at proper temperature (liquid, but not boiling) –Control injector solenoid (make sure that the motion of the piston reaches the end of its travel)

160 © Krithi Ramamritham / Kavi Arya 160 Real-Time Systems: Properties of Interest Safety : Nothing bad will happen. Liveness : Something good will happen. Timeliness : Things will happen on time -- by their deadlines, periodically,....

161 © Krithi Ramamritham / Kavi Arya 161 In a Real-Time System…. correct value delivered too late is incorrect e.g., traffic light: light must be green when crossing, not enough before Real -time: (Timely) reactions to events as they occur, at their pace: (real-time) system (internal) time same time scale as environment (external) time Correctness of results depends on value and its time of delivery

162 © Krithi Ramamritham / Kavi Arya 162 Performance Metrics in Real-Time Systems Beyond minimizing response times and increasing the throughput: – achieve timeliness. More precisely, how well can we predict that deadlines will be met?

163 © Krithi Ramamritham / Kavi Arya 163 Types of RT Systems Dimensions along which real-time activities can be categorized: how tight are the deadlines ? --deadlines are tight when the laxity (deadline -- computation time) is small. how strict are the deadlines ? what is the value of executing an activity after its deadline? what are the characteristics of the environment ? how static or dynamic must the system be? Designers want their real-time system to be fast, predictable, reliable, flexible.

164 © Krithi Ramamritham / Kavi Arya 164 deadline (dl) + Hard, soft, firm Hard result useless or dangerous if deadline exceeded value time - hard soft Soft result of some - lower - value if deadline exceeded Deadline intervals: result required not later and not before Firm If value drops to zero at deadline

165 © Krithi Ramamritham / Kavi Arya 165 Examples Hard real time systems –Aircraft –Airport landing services –Nuclear Power Stations –Chemical Plants –Life support systems Soft real time systems –Mutlimedia –Interactive video games

166 © Krithi Ramamritham / Kavi Arya 166 Real-Time: Items and Terms Task –program, perform service, functionality –requires resources, e.g., execution time Deadline –specified time for completion of, e.g., task –time interval or absolute point in time –value of result may depend on completion time

167 © Krithi Ramamritham / Kavi Arya 167 Plan Special Characteristics of Real-Time Systems Real-Time Constraints Canonical Real-Time Applications Scheduling in Real-time systems Operating System Approaches

168 © Krithi Ramamritham / Kavi Arya 168 Timing Constraints Real-time means to be in time --- how do we know something is “in time”? how do we express that? Timing constraints are used to specify temporal correctness e.g., “finish assignment by 2pm”, “be at station before train departs”. A system is said to be (temporally) feasible, if it meets all specified timing constraints. Timing constraints do not come out of thin air: design process identifies events, derives, models, and finally specifies timing constraints

169 © Krithi Ramamritham / Kavi Arya 169 Periodic –activity occurs repeatedly –e.g., to monitor environment values, temperature, etc. time period periodic

170 © Krithi Ramamritham / Kavi Arya 170 Aperiodic –can occur any time –no arrival pattern given time aperiodic

171 © Krithi Ramamritham / Kavi Arya 171 Sporadic –can occur any time, but –minimum time between arrivals time mint sporadic

172 © Krithi Ramamritham / Kavi Arya 172 Who initiates (triggers) actions? Example: Chemical process –controlled so that temperature stays below danger level –warning is triggered before danger point …… so that cooling can still occur Two possibilities: –action whenever temp raises above warn; event triggered –look every int time intervals; action when temp if measures above warn time triggered

173 © Krithi Ramamritham / Kavi Arya 173 TT ET time t

174 © Krithi Ramamritham / Kavi Arya 174 TT ET time t

175 © Krithi Ramamritham / Kavi Arya 175 ET vs TT Time triggered –Stable number of invocations Event triggered –Only invoked when needed –High number of invocation and computation demands if value changes frequently

176 © Krithi Ramamritham / Kavi Arya 176 Slow down the environment? Importance –which parts of the system are important? –importance can change over time e.g., fuel efficiency during emergency landing Flow control who has control over speed of processing, who can slow partner down? –environment –computer system RT: environment cannot be slowed down

177 © Krithi Ramamritham / Kavi Arya 177 Other Issues to worry about Meet requirements -- some activities may run only: –after others have completed - precedence constraints –while others are not running - mutual exclusion –within certain times - temporal constraints Scheduling –planning of activities, such that required timing is kept Allocation –where should a task execute?

178 © Krithi Ramamritham / Kavi Arya 178 Plan Special Characteristics of Real-Time Systems Real-Time Constraints Canonical Real-Time Applications Scheduling in Real-time systems Operating System Approaches

179 © Krithi Ramamritham / Kavi Arya 179 A Typical Real time system Temperature sensor CPU Memory Input port Output port Heater

180 © Krithi Ramamritham / Kavi Arya 180 Code for example While true do { read temperature sensor if temperature too high then turn off heater else if temperature too low then turn on heater else nothing }

181 © Krithi Ramamritham / Kavi Arya 181 Comment on code Code is by Polling device (temperature sensor) Code is in form of infinite loop No other tasks can be executed Suitable for dedicated system or sub-system only

182 © Krithi Ramamritham / Kavi Arya 182 Extended polling example Computer Temperature Sensor 1 Temperature Sensor 2 Temperature Sensor 3 Temperature Sensor 4 Heater 1 Heater 2 Heater 3 Heater 4 Task 1 Task 2 Task 3 Task 4 Conceptual link

183 © Krithi Ramamritham / Kavi Arya 183 Polling Problems –Arranging task priorities –Round robin is usual within a priority level –Urgent tasks are delayed

184 © Krithi Ramamritham / Kavi Arya 184 Interrupt driven systems Advantages –Fast –Little delay for high priority tasks Disadvantages –Programming –Code difficult to debug –Code difficult to maintain

185 © Krithi Ramamritham / Kavi Arya 185 How can we monitor a sensor every 100 ms Initiate a task T1 to handle the sensor T1: Loop {Do sensor task T2 Schedule T2 for +100 ms } Note that the time could be relative (as here) or could be an actual time - there would be slight differences between the methods, due to the additional time to execute the code.

186 © Krithi Ramamritham / Kavi Arya 186 An alternative… Initiate a task to handle the sensor T1 T1: Do sensor task T2 Repeat {Schedule T2 for n * 100 ms n:=n+1} There are some subtleties here...

187 © Krithi Ramamritham / Kavi Arya 187 Clock, interrupts, tasks ClockProcessor Interrupts Task 1Task 2Task 3Task 4 Job/Task queue Examines Tasks schedule events using the clock...

188 © Krithi Ramamritham / Kavi Arya 188 Plan Special Characteristics of Real-Time Systems Real-Time Constraints Canonical Real-Time Applications Scheduling in Real-time systems Operating System Approaches

189 © Krithi Ramamritham / Kavi Arya 189 Why is scheduling important? Definition : A real-time system is a system that reacts to events in the environment by performing predefined actions within specified time intervals.

190 © Krithi Ramamritham / Kavi Arya 190 Schedulability analysis a.k.a. feasibility checking: check whether tasks will meet their timing constraints.

191 © Krithi Ramamritham / Kavi Arya 191 Scheduling Paradigms Four scheduling paradigms emerge, depending on whether a system performs schedulability analysis if it does, – whether it is done statically or dynamically – whether the result of the analysis itself produces a schedule or plan according to which tasks are dispatched at run-time.

192 © Krithi Ramamritham / Kavi Arya 192 1. Static Table-Driven Approaches Perform static schedulability analysis by checking if a schedule is derivable. The resulting schedule (table) identifies the start times of each task. Applicable to tasks that are periodic (or have been transformed into periodic tasks by well known techniques). This is highly predictable but, highly inflexible. Any change to the tasks and their characteristics may require a complete overhaul of the table.

193 © Krithi Ramamritham / Kavi Arya 193 2. Static Priority Driven Preemptive Approaches Tasks have -- systematically assigned -- static priorities. Priorities take timing constraints into account: –e.g. RMA: Rate-Monotonic ---- the lower the period, the higher the priority. –e.g. EDF: Earliest-deadline-first --- the earlier the deadline, the higher the priority. Perform static schedulability analysis but no explicit schedule is constructed –RMA - Sum of task Utilizations <= ln 2. –EDF - Sum of task Utilizations <= 1 At run-time, tasks are executed highest-priority-first, with preemptive-resume policy. When resources are used, need to compute worst-case blocking times. Task utilization = computation-time / Period

194 © Krithi Ramamritham / Kavi Arya 194 Static Priorities: Rate Monotonic Analysis presented by Liu and Layland in 1973 Assumptions Tasks are periodic with deadline equal to period. Release time of tasks is the period start time. Tasks do not suspend themselves Tasks have bounded execution time Tasks are independent Scheduling overhead negligible

195 © Krithi Ramamritham / Kavi Arya 195 RMA: Design Time vs. Run Time At Design Time: Tasks priorities are assigned according to their periods; shorter period means higher priority Schedulability test Taskset is schedulable if Very simple test, easy to implement. Run-time The ready task with the highest priority is executed.

196 © Krithi Ramamritham / Kavi Arya 196 RMA: Example taskset: t1, t2, t3, t4 t1 = (3, 1) t2 = (6, 1) t3 = (5, 1) t4 = (10, 2) The schedulability test: 1/3 + 1/6 + 1/5 + 2/10 ≤ 4 (2 (1/4) - 1) ? 0.9 < 0.75 ? …. not schedulable

197 © Krithi Ramamritham / Kavi Arya 197 RMA… A schedulability test is Sufficient: there may exist tasksets that fail the test, but are schedulable Necessary: tasksets that fail are (definitely) not schedulable The RMA schedulability test is sufficient, but not necessary. e.g., when periods are harmonic, i.e., multiples of each other, utilization can be 1.

198 © Krithi Ramamritham / Kavi Arya 198 Exact RMA by Joseph and Pandya, based on critical instance analysis (longest response time of task, when it is released at same time as all higher priority tasks) What is happening at the critical instance? Let T 1 be the highest priority task. Its response time R 1 = C 1 since it cannot be preempted What about T 2 ? R 2 = C 2 + delays due to interruptions by T 1. Since T 1 has higher priority, it has shorter period. That means it will interrupt T 2 at least once, probably more often. Assume T 1 has half the period of T 2, R 2 = C 2 + 2 x C 1

199 © Krithi Ramamritham / Kavi Arya 199 Exact RMA…. In general: R n i denotes the n th iteration of the response time of task i hp ( i ) is the set of tasks with higher priority as task i

200 © Krithi Ramamritham / Kavi Arya 200 Example - Exact Analysis Let us look at our example, that failed the pure rate monotonic test, although we can schedule it Exact analysis says so. R 1 = 1; easy R 3, second highest priority task hp ( t 3 ) = T 1 R 3 = 2

201 © Krithi Ramamritham / Kavi Arya 201 R 2, third highest priority task hp ( t 2 ) = {T 1,T 3 } R 2 = 3

202 © Krithi Ramamritham / Kavi Arya 202 R4, third lowest priority task hp(t4) = {T1,T3,T2 } R4 = 9 Response times of first instances of all tasks < their periods => taskset feasible under RM scheduling

203 © Krithi Ramamritham / Kavi Arya 203 3. Dynamic Planning based Approaches Feasibility is checked at run-time -- a dynamically arriving task is accepted only if it is feasible to meet its deadline. –Such a task is said to be guaranteed to meet its time constraints One of the results of the feasibility analysis can be a schedule or plan that determines start times Has the flexibility of dynamic approaches with some of the predictability of static approaches If feasibility check is done sufficiently ahead of the deadline, time is available to take alternative actions.

204 © Krithi Ramamritham / Kavi Arya 204 4. Dynamic Best-effort Approaches The system tries to do its best to meet deadlines. But since no guarantees are provided, a task may be aborted during its execution. Until the deadline arrives, or until the task finishes, whichever comes first, one does not know whether a timing constraint will be met. Permits any reasonable scheduling approach, EDF, Highest-priority,…

205 © Krithi Ramamritham / Kavi Arya 205 Cyclic scheduling Ubiquitous in large-scale dynamic real-time systems Combination of both table-driven scheduling and priority scheduling. Tasks are assigned one of a set of harmonic periods. Within each period, tasks are dispatched according to a table that just lists the order in which the tasks execute. Slightly more flexible than the table-driven approach no start times are specified In many actual applications, rather than making worse- case assumptions, confidence in a cyclic schedule is obtained by very elaborate and extensive simulations of typical scenarios.

206 © Krithi Ramamritham / Kavi Arya 206 Plan Special Characteristics of Real-Time Systems Real-Time Constraints Canonical Real-Time Applications Scheduling in Real-time systems Operating System Approaches

207 © Krithi Ramamritham / Kavi Arya 207 Real-Time Operating Systems Support process management and synchronization, memory management, interprocess communication, and I/O. Three categories of real-time operating systems: small, proprietary kernels. e.g. VRTX32,pSOS, VxWorks real-time extensions to commercial timesharing operatin systems. e.g. RT-Linux, RT-NT research kernels e.g. MARS, ARTS, Spring, Polis

208 © Krithi Ramamritham / Kavi Arya 208 Real-Time Applications Spectrum Hard Soft Real-Time Operating System General-Purpose Operating System VxWorks, Lynx, QNX,... Windows NT Windows CE Intime, HyperKernel, RTX

209 © Krithi Ramamritham / Kavi Arya 209 Real-Time Applications Spectrum Hard Soft Real-Time Operating System General-Purpose Operating System VxWorks, Lynx, QNX,... Intime, HyperKernel, RTX Windows NT Windows CE

210 © Krithi Ramamritham / Kavi Arya 210 Embedded (Commercial) Kernels Stripped down and optimized versions of timesharing operating systems. Intended to be fast – a fast context switch, – external interrupts recognized quickly – the ability to lock code and data in memory – special sequential files that can accumulate data at a fast rate To deal with timing requirements –a real-time clock with special alarms and timeouts –bounded execution time for most primitives –real-time queuing disciplines such as earliest deadline first, –primitives to delay/suspend/resume execution –priority-driven best-effort scheduling mechanism or a table-driven mechanism. Communication and synchronization via mailboxes, events, signals, and semaphores.

211 © Krithi Ramamritham / Kavi Arya 211 Real-Time Extensions to General Purpose Operating Systems E.g., extending LINUX to RT-LINUX, NT to RT-NT Advantage: – based on a set of familiar interfaces (standards) that speed development and facilitate portability. Disadvantages –Too many basic and inappropriate underlying assumptions still exist.

212 © Krithi Ramamritham / Kavi Arya 212 Using General Purpose Operating Systems GPOS offer some capabilities useful for real- time system builders RT applications can obtain leverage from existing development tools and applications Some GPOSs accepted as de-facto standards for industrial applications

213 © Krithi Ramamritham / Kavi Arya 213 Real Time Linux approaches 1.Modify the current Linux kernel to handle RT constraints –Used by KURT 2.Make the standard Linux kernel run as a task of the real-time kernel –Used by RT-Linux, RTAI

214 © Krithi Ramamritham / Kavi Arya 214 Modifying Linux kernel Advantages –Most problems, such as interrupt handling, already solved –Less initial labor Disadvantages –No guaranteed performance –RT tasks don’t always have precedence over non- RT tasks.

215 © Krithi Ramamritham / Kavi Arya 215 Running Linux as a process of a second RT kernel Advantages –Can make hard real time guarantees –Easy to implement a new scheduler Disadvantages –Initial port difficult, must know a lot about underlying hardware –Running a small real-time executive is not a substitute for a full-fledged RTOS

216 © Krithi Ramamritham / Kavi Arya 216 RTLinux is an Aid.. and a Deterrent: Flight Simulator RTLinux features well suited for the application …  Concept of hard and soft interrupts, enabled co- existence of real-time and non real-time drivers  Data in real-time space is shareable across threads  Transfer of data between real- time and non real-time threads through fifo s  Familiar POSIX based development environment … but there were scenarios when RTLinux had to be enhanced  Absence of mechanism to compare times in real-time and non real-time space. Need to measure this difference  No dynamic allocation of memory in real-time space. As a result have to hard code numbers in the implementation  System freezes on having high load

217 © Krithi Ramamritham / Kavi Arya 217 GPOS -- for RT applications? Scheduling and priorities –Preemptive, priority-based scheduling non-degradable priorities priority adjustment –No priority inheritance –No priority tracking –Limited number of priorities –No explicit support for guaranteeing timing constraints

218 © Krithi Ramamritham / Kavi Arya 218 Thread Priority = Process class + level Real-time class 26 25 24 23 22 16 Idle Above Normal Normal Below Normal Lowest Highest 31 Time-critical Dynamic classes 15 Time-critical 14 13 12 11 15 High class 1 Idle 9 8 7 11 Normal class 10 5 4 3 2 6 Idle class Thread Level

219 © Krithi Ramamritham / Kavi Arya 219 Scheduling Priorities Threads scheduled by executive. Priority based preemptive scheduling. Interrupts Deferred Procedure Calls (DPC) System and user-level threads

220 © Krithi Ramamritham / Kavi Arya 220 GPOS -- for RT applications? (contd.) Quick recognition of external events – Priority inversion due to Deferred Procedure Calls (DPC) I/O management Timers granularity and accuracy – High resolution counter with resolution of 0.8  sec. – Periodic and one shot timers with resolution of 1 msec. Rich set of synchronization objects and communication mechanisms. – Object queues are FIFO

221 © Krithi Ramamritham / Kavi Arya 221 Research Operating Systems MARS – static scheduling ARTS – static priority scheduling Spring –dynamic guarantees

222 © Krithi Ramamritham / Kavi Arya 222 MARS -- TU, Vienna (Kopetz) Offers support for controlling a distributed application based entirely on time events (rather than asynchronous events) from the environment. A priori static analysis to demonstrate that all the timing requirements are met. Uses flow control} on the maximum number of events that the system handles. Based on the time driven model -- assume everything is periodic. Static table-driven scheduling approach A hardware based clock synchronization algorithm A TDMA-like protocol to guarantee timely message delivery

223 © Krithi Ramamritham / Kavi Arya 223 ARTS -- CMU (Tokuda, et al) The ARTS kernel provides a distributed real-time computing environment. Works in conjunction with the static priority driven preemptive scheduling paradigm. Kernel is tied to various tools that a priori analyze schedulability. The kernel supports the notion of real-time objects and real-time threads. Each real-time object is time encapsulated -- a time fence mechanism:The time fence provides a run time check that ensures that the slack time is greater than the worst case execution time for an object invocation

224 © Krithi Ramamritham / Kavi Arya 224 SPRING – Umass. (Ramamritham & Stankovic) Real-time support for multiprocessors and distributed sys Strives for a more flexible combination of off-line and on- line techniques –Safety-critical tasks are dealt with via static table-driven scheduling. –Dynamic planning based scheduling of tasks that arrive dynamically. Takes tasks' time and resource constraints into account and avoids the need to a priori compute worst case blocking times Reflective kernel retains a significant amount of application semantics at run time – provides flexibility and graceful degradation.

225 © Krithi Ramamritham / Kavi Arya 225 Polis: Synthesizing OSs Given a FSM description of a RT application Each FSM becomes a task Signals, Interrupts, and polling Tasks with waiting inputs handled in FIFS order (priority order – TB done) Some interrupts can be made to directly execute the corresponding task Needed OS execute synthesized based on just what is needed

226 © Krithi Ramamritham / Kavi Arya 226 Configurable Computing Lab -- Hardware Environment

227 © Krithi Ramamritham / Kavi Arya 227 IIT-KReSIT Reconfigurable Computing Lab Projects (2003) Network packet-processing - Packet Classifier (a la Stiliades/ Laxman) Wireless Protocol – 802.11 interface card Video codec - MPEG-4 with encryption Encryption (IDEA, etc.) Real-time reactive control systems - Inertial Navigation System (ILS) - Flight simulation - Scheduling co-processor Satellite Error Correcting codec

228 © Krithi Ramamritham / Kavi Arya 228 References This tutorial is a short version of a semester-long course. Visit for all the material from that course Jack Ganssle, "The Art of Designing Embedded Systems", Newnes, 1999. David Simon, "An Embedded Software Primer", Addison Wesley, 2000. C.M. Krishna and Kang G. Shin, "RTS: Real-Time Systems", McGraw- Hill, 1997, ISBN 0-07-057043. Frank Vahid, Tony Givargis, "Embedded System Design: A Unified Hardware/ Software Introduction", John Wiley & Sons Inc., 2002. J. A. Stankovic, and K. Ramamritham, Advances in Hard Real-Time Systems, IEEE Computer Society Press, Washington DC, September 1993, 777 pages. J. A. StankovicK. Ramamritham

229 © Krithi Ramamritham / Kavi Arya 229 References… K. Ramamritham and J. A. Stankovic, Scheduling Scheduling Algorithms and Operating Systems Support for Real-Time Systems, invited paper, Proceedings of the IEEE, Jan 1994, pp. 55-67. K. RamamrithamJ. A. StankovicScheduling Algorithms and Operating Systems Support for Real-Time Systems Sundeep Kapila, K. Ramamritham, Sudhakar, Distributed Real-Time Embedded Applications using Off-the-Shelf Components? Experiences Building a Flight Simulator, IEEE/IEE Real-Time Embedded Systems Workshop (held in conjunction with the IEEE Real-Time Systems Symposium), December 2001.K. RamamrithamDistributed Real-Time Embedded Applications using Off-the-Shelf Components? Experiences Building a Flight Simulator Real-Time Linux, Handel-C material based on "Handel-C Language Reference Manual", Celoxica Ltd."Handel-C Language Reference Manual" Celoxica Ltd

230 © Krithi Ramamritham / Kavi Arya 230 References… David Harel, Hagi Lachover, Ammon Naamad, Amir Pnueli, Michal Politi, Rivi Sherman, Aharon Shtull-Trauring, and Mark Trakhtenbrot, Statemate: A working Environment for the Development of Complex Reactive Systems, IEEE Transactions on Software Engineering, Vol 16 No. 4, April 1999.Statemate: A working Environment for the Development of Complex Reactive Systems Ptolemy Project, S. Ramesh and P. Bhaduri, Validation of Pipelined processors using Esterel Tools: A Case study, Proc. of Computer Aided Verification, LNCS Vol. 1633, 1999. (pdf version). S. RameshValidation of Pipelined processors using Esterel Tools: A Case studypdf version

231 © Krithi Ramamritham / Kavi Arya 231 Summary What are Embedded Systems? What is Embedded software? New Approaches to building ESW Real-time support for ESW

Download ppt "© Krithi Ramamritham / Kavi Arya 1 System Software for Embedded Systems Krithi Ramamritham Kavi Arya IIT Bombay VLSI 2004."

Similar presentations

Ads by Google