2. Hardware for real-time systems

Slides:



Advertisements
Similar presentations
CPU Structure and Function
Advertisements

Computer Architecture
Microprocessors A Beginning.
Dr. Rabie A. Ramadan Al-Azhar University Lecture 3
Computer System Organization Computer-system operation – One or more CPUs, device controllers connect through common bus providing access to shared memory.
Computer Organization and Architecture
Microprocessors. Von Neumann architecture Data and instructions in single read/write memory Contents of memory addressable by location, independent of.
Processor System Architecture
Computer Organization and Architecture
Computer Organization and Architecture
1 Lecture 2: Review of Computer Organization Operating System Spring 2007.
Processor Technology and Architecture
Chapter 16 Control Unit Operation No HW problems on this chapter. It is important to understand this material on the architecture of computer control units,
1 Computer System Overview OS-1 Course AA
Computer System Overview
University College Cork IRELAND Hardware Concepts An understanding of computer hardware is a vital prerequisite for the study of operating systems.
Chapter 16 Control Unit Implemntation. A Basic Computer Model.
Chapter 4 Processor Technology and Architecture. Chapter goals Describe CPU instruction and execution cycles Explain how primitive CPU instructions are.
State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.
Chapter 12 CPU Structure and Function. Example Register Organizations.
Chapter 15 IA 64 Architecture Review Predication Predication Registers Speculation Control Data Software Pipelining Prolog, Kernel, & Epilog phases Automatic.
GCSE Computing - The CPU
Chapter 6 Memory and Programmable Logic Devices
Microcontroller based system design
5.1 Chaper 4 Central Processing Unit Foundations of Computer Science  Cengage Learning.
Group 5 Alain J. Percial Paula A. Ortiz Francis X. Ruiz.
1 Computer System Overview Chapter 1 Review of basic hardware concepts.
C.S. Choy95 COMPUTER ORGANIZATION Logic Design Skill to design digital components JAVA Language Skill to program a computer Computer Organization Skill.
Lecture#14. Last Lecture Summary Memory Address, size What memory stores OS, Application programs, Data, Instructions Types of Memory Non Volatile and.
1 Computer System Overview Chapter 1. 2 n An Operating System makes the computing power available to users by controlling the hardware n Let us review.
Chapter 1 Computer System Overview Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.
MICROPROCESSOR INPUT/OUTPUT
CHAPTER 3 TOP LEVEL VIEW OF COMPUTER FUNCTION AND INTERCONNECTION
CACHE MEMORY Cache memory, also called CPU memory, is random access memory (RAM) that a computer microprocessor can access more quickly than it can access.
Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.
System bus.
Computer Architecture And Organization UNIT-II Structured Organization.
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
Computer Organization - 1. INPUT PROCESS OUTPUT List different input devices Compare the use of voice recognition as opposed to the entry of data via.
Accessing I/O Devices Processor Memory BUS I/O Device 1 I/O Device 2.
Computer Organization & Assembly Language © by DR. M. Amer.
Computer Architecture 2 nd year (computer and Information Sc.)
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
Processor Architecture
Computer and Information Sciences College / Computer Science Department CS 206 D Computer Organization and Assembly Language.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
Overview von Neumann Architecture Computer component Computer function
Review of Computer System Organization. Computer Startup For a computer to start running when it is first powered up, it needs to execute an initial program.
Computer operation is of how the different parts of a computer system work together to perform a task.
Basic Elements of Processor ALU Registers Internal data pahs External data paths Control Unit.
Memory and Storage Aldon Tom. What is Memory? Memory is a solid-state digital device that stores data values. Memory holds running programs and the data.
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
Architectural Considerations A Review of Some Architectural Concepts.
بسم الله الرحمن الرحيم MEMORY AND I/O.
1 Basic Processor Architecture. 2 Building Blocks of Processor Systems CPU.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
BASIC COMPUTER ARCHITECTURE HOW COMPUTER SYSTEMS WORK.
1 Chapter 1 Basic Structures Of Computers. Computer : Introduction A computer is an electronic machine,devised for performing calculations and controlling.
Index What is an Interface Pins of 8085 used in Interfacing Memory – Microprocessor Interface I/O – Microprocessor Interface Basic RAM Cells Stack Memory.
COURSE OUTCOMES OF Microprocessor and programming
Architecture & Organization 1
Architecture & Organization 1
Morgan Kaufmann Publishers Computer Organization and Assembly Language
Memory Organization.
Presentation transcript:

2. Hardware for real-time systems

Outline Basic processor architecture Memory technologies Architectural advancements Peripheral interfacing Microprocessor vs. Microcontroller Distributed real-time architectures

Basic processor architecture

Von Neumann Architecture CPU Memory System Bus Von Neumann computer architecture without explicit IO System bus is a set of three buses Address: unidirectional and controlled by CPU Data: bidirectional, transfer data and instruction Control: a heterogeneous collection of control, status, clock and power lines. E.g., compilers, assemblers, linkers or operating system Real-time design Certain considerations in the design of system & application programs intended to run in real-time environments.

Central Processing Unit Control Unit PCR Instruction access Address Bus IR Data Bus Datapath ALU Data access Registers

Instruction processing Fetch Decode Load Execute Store A system is an assembly of components connected together in an organized way. A system is fundamentally altered if a component joins or leaves it. It has a purpose. It has a degree of permanence. It has been defined as being of particular interest. Clock cycles Sequential instruction cycle with five phases

Instruction set An instruction set describes a processor’s functionality and its architecture. Most instructions make reference to either memory locations, pointers to a memory location, or a register. Traditionally the distinction between computer organization and computer architecture is that the latter involves using only those hardware details that are visible to the programmer, while the former involves implementation details.

Instruction Forms 0-address form 1-address form 2-address form Stack machine RPN calculators E.g., NOP 1-address form Uses implicit register (accumulator), E.g., INC R1 2-address form Has the form: op-code operand1, operand2, E.g., ADD R1, R2 3-address form Has the form: op-code operand1, operand2, resultant E.g., SUB R1, R2, R3

Core Instructions There are generally six kinds of instructions. These can be classified as: horizontal-bit operation e.g. AND, IOR, XOR, NOT vertical-bit operation e.g. rotate left, rotate right, shift right, and shift left Control e.g. TRAP, CLI, EPI, DPI, HALT data movement e.g. LOAD, STORE, MOVE mathematical/special processing other (processor specific) e.g. LOCK, ILLEGAL

Addressing Modes Three basic modes Others are combinations of these immediate data direct memory location indirect memory location: INC &memory Others are combinations of these register indirect One of the working registers for storing a pointer to a data structure located in memory. double indirect

Two principal techniques for implementing control unit Microprogramming Define an instruction by a microprogram including a sequence of primitive hardware commands, for activating appropriate datapath functions. Hard-wired logic Consist of combinational and sequential circuits Offer noticeably faster execution; but take more space, difficult to create or modify instructions.

IO and interrupt considerations Enhanced System Bus CPU Memory Input/ Output , including the availability of all associated outputs Enhanced von Neumann architecture with separate memory and I/O elements IO specific registers form memory segments to status, mode and data registers of configurable I/O ports.

Memory-Mapped I/O provides a convenient data transfer mechanism. Not require the use of special CPU I/O instructions. Certain designated locations of memory appear as virtual I/O ports. Load R1, &speed ;motor speed into register R1 STORE R1, &motoraddr ;Store to address of ; motor control where speed is a bit-mapped variable and motoraddr is a memory-mapped location. I/O ports can be operated through all instructions that have a memory operand.

Input from an appropriate memory-mapped location involves executing a LOAD instruction on a pseudo-memory location connected to an input device. Output uses a STORE instruction.

Programmed I/O It offers a separate address space for I/O registers. An additional control signal “memory/IO” for distinguishing between memory and I/O accesses. It saves the limited address space. Special data movement instructions IN and OUT are used to transfer data to and from the CPU. IN R1, & port; Read the content of port and store to R1 OUT &port, R1; Write the content of R1 to port. Both require the efforts of the CPU and cost time that could impact real-time performance; Normally, the identity of the operative CPU register is embedded in the instruction code. IN: transfers data from a specified I/O device into a specified CPU register. OUT: outputs from a register to some I/O device.

Hardware interrupts Maskable and non-maskable interrupts Maskable: used for events that occur during regular operation conditions Nonmaskable: reserved for extremely critical events They are hardware mechanism for providing prompt services to external events. Reduce real-time punctuality and make response times nondeterministic. Processors provide two atomic instructions EPI: enable priority interrupt (EPI) and DPI: disable priority interrupt (DPI) Atomic instructions used for many purposes buffering, within interrupt handlers, and for parameter passing. asynchronous and

Memory technologies

Different classes of memory Volatile RAM and Nonvolatile ROM are two distinct groups of memory RAM : Random-access memory ROM: Read-only memory Today, the borderline between two groups are no longer clear. (EEPROM and flash) Address Bus (A0-An) Programmed during the manufacturing process or at application factory. EEPROM and Flash are in-system programmable. Interface lines of a generic memory component. Generic memory 2 𝑚+1 × 𝑛+1 Bits Data Bus (D0-Dn) Read Chip Select (Write)

Writable ROM: EEPROM and flash Common features Can be rewritten in a similar way as RAM, but much slower, (up to 100 times slower than read) Wears, typically be rewritten for up to 1,000,000 times Common practices in RTS EEPROM can be written sparsely, normally used as a non-volatile program and parameter storage. Flash can only be erased in large blocks, used for storing both application programs and large data records. Use flash as low-cost mass memory, and load application programs from flash to RAM for execution. Run programs from faster RAM instead of ROM. electrically erasable programmable ROM Nonwritable mask-programmed ROM used as a low-cost memory with very large production volume

Static RAM(SRAM) and dynamic RAM (DRAM) Comparisons of SRAM and DRAM SRAM: more space intensive, more expensive, but faster to access DRAM: compact, cheaper, but slower to access; need refreshing circuitry to avoid any loss of data Common practices Use DRAM in need of a large memory Use SRAM in case of no more than moderate memory need

Summary of memory types and usages Selection of the appropriate technology is a systems design issue.

Timing diagram of a memory-read bus cycle Memory Access Why is this important? Illustration of the clock synchronized memory transfer process between a device and the CPU. The symbolism “<>” shown in the data and address signals indicates that multiple lines are involved during this period in the transfer. Timing diagram of a memory-read bus cycle

How to determine bus cycle length Worst-case access times of memory and I/O ports Latencies of address decoding circuitry and possible buffers on the system bus Sync bus protocol: possible to add wait states to the default bus cycle, dynamically adapt it to different access times Async bus protocol: no wait states, data transfer based on handshaking type protocol (CPU, system bus, memory devices) Overall power consumption Grow with increasing CPU clock rate

Memory layout issues FFFFF7FF I/O Registers Memory-mapped I/O FFFFF000 Data SRAM 00080000 00047FFF Configuration EEPROM 00040000 0001FFFF Program ROM 00000000

Hierarchical memory organization Primary and secondary memory storage forms a hierarchy involving access time, storage density, cost and other factors. The fastest possible memory is desired in real-time systems, but economics dictates that the fastest affordable technology is used as required. In order of fastest to slowest, memory should be assigned, considering cost as follows: internal CPU memory registers cache main memory memory on board external devices

Locality of Reference Refers to the relative “distance” in memory between consecutive code or data accesses. If data or code fetched tends to reside relatively close in memory, then the locality of reference is high. When programs execute instructions that are relatively widely scattered locality of reference is low, Well-written programs in procedural languages tend to execute sequentially within code modules and within the body of loops, and hence have a high locality of reference. Object-oriented code tends to execute in a much more non-linear fashion. But portions of such code can be linearized (e.g. array access).

Cache A small block of fast memory where frequently used instructions and data are kept. The cache is much smaller than the main memory. Usage: Upon memory access check the address tags to see if data is in the cache. If present, retrieve data from cache, If data or instruction is not already in the cache, cache contents are written back and new block read from main memory to cache. The needed information is then delivered from cache to CPU and the address tags adjusted.

Cache Performance benefits are a function of cache hit radio. Since if needed data or instructions are not found in the cache, then the cache contents need to be written back (if any were altered) and overwritten by a memory block containing the needed information. Overhead can become significant when the hit ratio is low. Therefore a low hit ratio can degrade performance. If the locality of reference is low, a low number of cache hits would be expected, degrading performance. Using a cache is also non-deterministic – it is impossible to know a priori what the cache contents and hence the overall access time will be.

Wait States When a microprocessor must interface with a slower peripheral or memory device, a wait state may be needed to be added to the bus cycles. Wait states extends the microprocessor read or write cycle by a certain number of processor clock cycles to allow the device or memory to “catch up.” For example, EEPROM, RAM, and ROM may have different memory access times. Since RAM memory is typically faster than ROM, wait states would need to be inserted when accessing RAM. Wait states degrade overall systems performance, but preserve determinism.

Architectural advancements

Pipelining is a form of speculative execution Pipelining imparts an implicit execution parallelism in the different stages of processing an instruction. increase the instruction throughput With pipelining, more instructions can be processed in different phases simultaneously, Suppose 4-stage pipeline fetch – get the instruction from memory decode – determine what the instruction is execute – perform the instruction decoded store – store the results to memory improving processor performance.

Pipelining Sequential execution of 4-stages of 3 instructions (12 clock cycles) 3 instructions finished in 6 clock cycles Sequential instruction execution versus pipelined instruction execution. Nine complete instruction can be completed in the pipelined approach in the same time it takes to complete three instructions in the sequential (scalar) approach.

Pros of pipeline Under ideal conditions Instruction phases are all of equal length Every instruction needs the same amount of time Continuously full pipeline In general, the best possible instruction completion time of an N-stage pipeline is 1/N times of the completion time of the non-pipelined case. Utilize ALU and CPU resources more effectively

Cons of pipelining Requires buffer registers between stages Additional delay compared with non-pipeline Degrade performance in certain situations For branch instructions in the pipeline, the prefetched instructions further in the pipeline may be no longer valid and must be flushed. External interrupts (unpredictable situations) Data dependencies between consecutive instructions

Superpipelines Achieve superpipelined architectures by decomposing the instruction cycle further. 6-stage pipelines: fetch, two decodes(indirect addressing modes), execute, write-back, commit. In practice, more than10 stages in CPUs with GHz-level rates Cons Degrade by cache misses and pipeline flushing/refill (locality of instruction reference violates) Extensive pipeline is a source of significant non-determinism in RTS. use redundant hardware to replicate one or more stages in the pipeline.

ASICs (Applications Specific Integrated Circuit ) A special purpose IC designed for one application only, In essence, they are systems-on-chip including a microprocessor, memory, I/O devices and other specialized circuitry. ASICs are used in many embedded applications Image processing, avionics systems, medical systems. Real-time design issues are the same for them as they are for most other systems.

PAL(Programmable logic array) / PLA (Programmable array logic) One-time programmable logic devices for special purpose functionality in embedded systems. PAL is a programmable AND array followed by a fixed number input OR element. Each OR element has a certain number of dedicated product terms. PLA is same as PAL, but the AND array is followed by a programmable width OR array. Comparisons PLA is much more flexible and yields more efficient logic, but is more expensive. PAL is faster (uses fewer fuses) and is less expensive. Allow the product terms to be shared between macrocells, increasing device density.

FPGAs (Field programmable gate array) FPGA allows construction of a system-on-a-chip with an integrated processor, memory, and I/O. Differs from the ASIC in that it is reprogrammable, even while embedded in the system. A reconfigurable architecture allows for the programmed interconnection and functionality of a conventional processor. Move algorithms and functionality from residing in the software side into the hardware side. Widely used in embedded, mission-critical systems where fault-tolerance and adaptive functionality is essential.

Multi-core processors I & D Caches I & D Caches I & D Caches I & D Caches Internal cache bus System bus Common I/O Cache Quad-core processor architecture with individual on-chip caches and a common on-chip cache

Pros and cons Parallel processing True task concurrency for multitasking real-time applications Require a complete collection software tools for support the parallel development process. Learn to design algorithms for parallelism; otherwise, the potential of multi-core arch remains largely unused. Load balancing between cores is hard and needs expertise and right tools for performance analysis

Nondeterministic instruction processing in multi-core architecture. Timing consuming to port existing single CPU software efficiently to multi-core environment. Reduce the interests of companies to switch to multi-core in matured real-time applications. Nondeterministic instruction processing in multi-core architecture. Fast and punctual inter-core communication is a key issue when developing high-performance system. Communication channel is a well-known bottleneck A theoretical foundation for estimating speedup when number of parallel cores increase in Chapter 7 The limit of parallelism in terms of speed up appears to be a software property, not a hardware on.