Ki-Hyung Kim Division of Information and Computer Eng. Ajou University

Slides:



Advertisements
Similar presentations
CPU Structure and Function
Advertisements

Khaled A. Al-Utaibi  Computers are Every Where  What is Computer Engineering?  Design Levels  Computer Engineering Fields  What.
Memory Management Unit
Chapter 6 Computer Architecture
Chapter 12 CPU Structure and Function. CPU Sequence Fetch instructions Interpret instructions Fetch data Process data Write data.
Computer Organization and Architecture
Computer Organization and Architecture
Interfacing. This Week In DIG II  Basic communications terminology  Communications protocols  Microprocessor interfacing: I/O addressing  Port and.
COMP3221: Microprocessors and Embedded Systems Lecture 15: Interrupts I Lecturer: Hui Wu Session 1, 2005.
Embedded Systems Programming
Computer System Overview
University College Cork IRELAND Hardware Concepts An understanding of computer hardware is a vital prerequisite for the study of operating systems.
IT Systems Memory EN230-1 Justin Champion C208 –
Basic Computer Organization CH-4 Richard Gomez 6/14/01 Computer Science Quote: John Von Neumann If people do not believe that mathematics is simple, it.
NS Training Hardware. System Controller Module.
CH05 Internal Memory Computer Memory System Overview Semiconductor Main Memory Cache Memory Pentium II and PowerPC Cache Organizations Advanced DRAM Organization.
ARM Processor Architecture
1 © Unitec New Zealand Embedded Hardware ETEC 6416 Date: - 10 Aug,2011.
Handheld Devices (portable but still explicit usage) Laptops Personal Digital Assistants (Palm, PocketPC) TabletPC Smart Phones.
B.A. (Mahayana Studies) Introduction to Computer Science November March The Motherboard A look at the brains of the computer, the.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Operating Systems Overview: Using Hardware.
Faculty of Information Technology Department of Computer Science Computer Organization and Assembly Language Chapter 4 Cache Memory.
Introduction to Computing: Lecture 4
MICROPROCESSOR INPUT/OUTPUT
Samsung ARM S3C4510B Product overview System manager
Computers Are Your Future Eleventh Edition Chapter 2: Inside the System Unit Copyright © 2011 Pearson Education, Inc. Publishing as Prentice Hall1.
Lesson 3 — How a Computer Processes Data Unit 1 — Computer Basics.
Computer system & Architecture
2007 Sept. 14SYSC 2001* - Fall SYSC2001-Ch4.ppt1 Chapter 4 Cache Memory 4.1 Memory system 4.2 Cache principles 4.3 Cache design 4.4 Examples.
Accessing I/O Devices Processor Memory BUS I/O Device 1 I/O Device 2.
Memory Hierarchy. Hierarchy List Registers L1 Cache L2 Cache Main memory Disk cache Disk Optical Tape.
EFLAG Register of The The only new flag bit is the AC alignment check, used to indicate that the microprocessor has accessed a word at an odd.
Processor Structure and Function Chapter8:. CPU Structure  CPU must:  Fetch instructions –Read instruction from memory  Interpret instructions –Instruction.
Overview von Neumann Architecture Computer component Computer function
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
بسم الله الرحمن الرحيم MEMORY AND I/O.
Cache Small amount of fast memory Sits between normal main memory and CPU May be located on CPU chip or module.
1 Device Controller I/O units typically consist of A mechanical component: the device itself An electronic component: the device controller or adapter.
CS 1410 Intro to Computer Tecnology Computer Hardware1.
Lecture 2 (Memory) Dr. Muhammad Ayaz Computer Organization and Assembly Language. (CSC-210)
Computer Architecture Chapter (5): Internal Memory
ARM 7 & ARM 9 MICROCONTROLLERS AT91 1 ARM920T Processor.
Characteristics Location Capacity Unit of transfer Access method Performance Physical type Physical characteristics Organisation.
Chapter 5 Internal Memory
COURSE OUTCOMES OF Microprocessor and programming
Cache Memory.
ECE232: Hardware Organization and Design
Memory COMPUTER ARCHITECTURE
Basic Computer Organization
An Introduction to Microprocessor Architecture using intel 8085 as a classic processor
William Stallings Computer Organization and Architecture 8th Edition
Introduction to Microprocessors and Microcontrollers
William Stallings Computer Organization and Architecture 8th Edition
Morgan Kaufmann Publishers Computer Organization and Assembly Language
William Stallings Computer Organization and Architecture 8th Edition
COMP3221: Microprocessors and Embedded Systems
ARM920T Processor This training module provides an introduction to the ARM920T processor embedded in the AT91RM9200 microcontroller.We’ll identify the.
Presentation transcript:

Ki-Hyung Kim Division of Information and Computer Eng. Ajou University Processor/Interface Ki-Hyung Kim Division of Information and Computer Eng. Ajou University

임베디드 시스템 구조 Coordination of many levels of abstraction Software I/O system Processor Compiler Operating System (Windows 98) Application (Netscape) Digital Design Circuit Design Instruction Set Architecture Datapath & Control transistors Memory Hardware Software Assembler Coordination of many levels of abstraction

임베디드 시스템 H/W 구조 임베디드 시스템 구성 임베디드 H/W 프로세서/컨트롤러 메모리, I/O 인터페이스, 네트워크 인터페이스 Processor (active) Embedded System Control (“brain”) Datapath (“brawn”) Memory (passive) (where programs, data live when running) Devices Input Output

임베디드 H/W 구성요소 임베디드 프로세서/컨트롤러 대부분의 프로세서가 임베디드 시스템용으로 사용 많은 종류의 마이크로프로세서/컨트롤러들 중에서 응용에 최적인 제품을 찾아내는 것이 설계에서 매우 어렵고 중요한 작업 Embedded Computers 80% 8.5B Parts per Year Robots 6% Vehicles 12% Direct 2% Source: DARPA/Intel (Tennenhouse) 대부분의 프로세서가 임베디드용으로 사용됨

임베디드 H/W 구성요소(2) 메모리 버스 주변 장치 기타 ROM/RAM Timer/Counter Interrupt DMA 고속/대용량화 FLASH 메모리의 사용증가. CACHE/Virtual Memory 효용성 버스 주변 장치 Timer/Counter Interrupt DMA 기타

임베디드 프로세서 Computation tasks를 주로 담당 다양한 주변 인터페이스를 포함하는 SoC 형태로 발전 처리속도, 전력 소비, 가격 뿐만 아니라 개발환경과의 연관 관계가 매우 중요 제어 장치(control unit)와 연산부(data-path)로 구성 프로세서 선택 중요 ARM, PPC, MIPS, i386, Alpha, Sparc, m68k SH, CRIS, IA64, PARISC 등 MSP430, Atmega128 (AVR), i8051 본 강좌에서는 하나의 예로 ARM core를 기반으로 설명

프로세서 기본 구조 Control unit 과 data-path로 구성 특징 General data-path Control unit doesn’t store the algorithm – the algorithm is “programmed” into the memory Processor Control unit Data-path ALU Registers IR PC Controller Memory I/O Control /Status

Data-path 동작 ... Load ALU Store Read memory location into register Arithmetic/logical operation Store Write register into memory location Processor Control unit Datapath ALU Registers IR PC Controller Memory I/O Control /Status 10 ... +1 11

제어 장치(Control Unit) Control unit: configures the data-path operations Sequence of desired operations (“instructions”) stored in memory – “program” Instruction cycle – broken into several sub-operations, each one clock cycle: Fetch: Get next instruction into IR Decode: Determine what the instruction means Fetch operands: Move data from memory to data-path register Execute: Move data through the ALU Store results: Write data from register to memory Processor Control unit Datapath ALU Registers IR PC Controller Memory I/O Control /Status 10 ... load R0, M[500] 500 501 100 inc R1, R0 101 store M[501], R1 102 R0 R1

CISC and RISC 구조 CISC - Complex Instruction Set Computer 관련된 연산을 수행하는 수많은 명령을 가짐 CISC code is compact Can be many clock cycles per instruction Large silicon area > Higher cost per die RISC - Reduced Instruction Set Computer More modern architecture One instruction executed per clock cycle > Very fast RISC CPU cores tend to be small Typical dynamic instruction usage Data movement, Control flow, Arithmetic operations, Comparisons, Logical operations 이 99%를 차지함 Risc의 가능성을 보여줌(?)

BUS A Bus Is: shared communication link single set of wires used to connect multiple subsystems Data Bus, Address Bus, Control Bus Input/Output Bus (eg. PCI) – 표준화되어야됨. System Bus (local bus)- 고속이 목표, Processor에 의존적  chipset A Bus is also a fundamental tool for composing large, complex systems systematic means of abstraction Control Datapath Memory Processor Input Output

폰 노이만 아키텍처 memory CPU 200 address data IR ADD r5,r1,r3 Embedded System

하버드 아키텍처 address data memory data CPU PC program memory Harvard can’t use self-modifying code. http://www.arm.com/support/faqip/3738.html Harvard allows two simultaneous memory fetches. Most DSP use Harvard architecture for streaming data: greater memory bandwidth; more predictable bandwidth. CPU PC data memory program memory address data

Pipeline in RISC T1 T2 T3 T4 T5 T1 T2 T3 T4 T5 T1 T2 T3 T4 T5 T1 T2 T3 T1: instruction fetch T2: decode T3: Execution (Load from Memory) T4: Write to Memory (or Register) T1 T2 T3 T4 T5 T1 T2 T3 T4 T5 T1 T2 T3 T4 T5 T1 T2 T3 T4 T5 T1 T2 T3 T4 T5 T1 T2 T3 T4 T5 T1 T2 T3 T4 T5 T1 T2 T3 T4 T5 T1 T2 T3 T4 T5

MIPS MIPS (originally an acronym for Microprocessor without Interlocked Pipeline Stages) is a RISC microprocessor architecture developed by MIPS Technologies. By the late 1990s it was estimated that one in three RISC chips produced were MIPS-based designs.

Pipeline in CISC T1 T2 T3 T4 T5 T6 T7 T1 T2 T3 T4 T5 T1 T2 T3 T4 T5 T6 파이프라인의 개수와 길이가 가변  파이프라인을 만들기 어렵다.  파이프라인의 각 스텝을 길이를 최소화하기어렵다. T1 T2 T3 T4 T5 T6 T7 T1 T2 T3 T4 T5 T1 T2 T3 T4 T5 T6 T1 T2 T3 T4 T5 T1 T2 T3 T4 T5 T1 T2 T3 T4 T5 T1 T2 T3 T4 T5 T6 T1 T2 T3 T4 T5 T1 T2 T3 T4 T5

Simplified Harvard Architecture of ARM TCM: Tightly Coupled Memory ( Used for Realtime programs)

Dataflow Architecture Program counter 가 없다. 다음에 실행시킬 인스트럭션을 지정하는 기능이 없다. 다음에 실행시킬 인스트럭션은 어떤 인스트럭션이든 필요한 데이터가 준비되면 바로 실행된다. (병렬로) 메모리(실행될 instruction들) Add M1 + M2  M3 ALU ALU ALU ALU

Dataflow Architecture 2 Dataflow architecture is a computer architecture that directly contrasts the traditional von Neumann architecture or control flow architecture. Dataflow architectures do not have a program counter or (at least conceptually) the executability and execution of instructions is solely determined based on the availability of input arguments to the instructions.

메모리 - ROM Read-Only Memory (ROM) Non-volatile storage ROM, PROM, EPROM, EEPROM OT-PROM (one time programmable) Mask ROM Fuse ROM PROM(programmable) EPROM EEPROM Word Line Bit Line Mask ROM Fuse ROM EPROM EEPROM Flash Memory Floating gate Brief tour of computer memory. Storing data and retaining program state were early and fundamental problems in computer development. Tubes arranged as flip-flops very expensive—waste of active devices. Early attempts included delay line approaches (mechanical spring, mercury pool), phosphor state, magnetic domains (core, drums, tape, etc.) Semiconductor memory only became cheap enough around 1970, and was Intel’s first product.

Flash Memory NOR NAND XIP In computer science, execute in place (XIP) is a method of executing programs directly from long term storage rather than copying it into RAM. It is an extension of using shared memory to reduce the total amount of memory required.

메모리 - RAM Random Access Memory 전원이 인가되는 상태에서만 데이터를 유지 Two main types: Static RAM (SRAM) and Dynamic RAM (DRAM) 비트가 저장되는 방법 상에 차이점이 존재 Static RAM Fast (active drive) Less dense (4-6 transistors/bit) Stable (holds value as long as power applied) Dynamic RAM Slower High density (1 transistor/bit) Unstable (needs refresh) Other types: SDRAM, Video RAM, FERAM Brief tour of computer memory. Storing data and retaining program state were early and fundamental problems in computer development. Tubes arranged as flip-flops very expensive—waste of active devices. Early attempts included delay line approaches (mechanical spring, mercury pool), phosphor state, magnetic domains (core, drums, tape, etc.) Semiconductor memory only became cheap enough around 1970, and was Intel’s first product.

Inverter with CMOS

NAND with CMOS

NOR in NMOS

SRAM

DRAM

SDRAM SDRAM refers to synchronous dynamic random access memory, a term that is used to describe dynamic random access memory that has a synchronous interface. Traditionally, dynamic random access memory (DRAM) has an asynchronous interface which means that it responds as quickly as possible to changes in control inputs. SDRAM has a synchronous interface, meaning that it waits for a clock signal before responding to control inputs and is therefore synchronized with the computer's system bus.

성능(Performance) -- Throughput RD 값이출력 Latency (지연시간) 성능(Performance) -- Throughput

RAM의 기본 구조 Word Lines Bit Cell Bit Lines High Sense Amplifier Low Data Address High Low Data Brief tour of computer memory. Storing data and retaining program state were early and fundamental problems in computer development. Tubes arranged as flip-flops very expensive—waste of active devices. Early attempts included delay line approaches (mechanical spring, mercury pool), phosphor state, magnetic domains (core, drums, tape, etc.) Semiconductor memory only became cheap enough around 1970, and was Intel’s first product.

Typical 16 Mb DRAM (4M x 4)

Static RAM (SRAM) 구조 및 access Word Line Bit !Bit Read: Drive word line, sense value on bit lines Write: Drive word line, drive new value (strongly) on bit lines CE Addr Data Read Write Accessing a Static RAM Note: CE signal is often active-low as opposed to how shown here. SRAMs also generally have a write enable signal Where is the data physically stored? What energy form?

Dynamic RAM (DRAM) Read: Drive word line, sense value on bit line (destroys saved value) Write: Drive word line, drive new value on bit line. Word Line Bit Line RAS Dynamic RAM Timing (Read) Control signals are often active-low CAS Addr Process modifications to enhance capacitor storage capacity.

Other RAM Types Video RAM SDRAM Flash RAM FERAM Nanotech RAMs Optimized for high-speed regular accesses to frame buffer SDRAM Uses clocked organization to pipeline for speed Flash RAM Non-volatile (holds data without power) FERAM Uses magnetic technology (similar to hard disk) to store data Holds value when power off Capacity, access time similar to RAM (hard disks take ms) Nanotech RAMs Molecular electronics, carbon nanotubes Nowhere near ready for prime time

Frame Buffer in Graphic Card Video DRAM VRAM is a dual-ported variant of DRAM which was once commonly used to store the frame-buffer in some graphics adaptors. Dual-ported RAM (DPRAM) is a type of Random Access Memory that allows multiple reads or writes to occur at the same time, or nearly the same time, unlike single-ported RAM which only allows one access at a time. Video RAM or VRAM is a common form of dual-ported dynamic RAM mostly used for video memory, allowing the CPU to draw the image at the same time the video hardware is reading it out to the screen. VRAM Frame Buffer in Graphic Card CPU DVI

Flash Memory 1. NOR 형 • cell 이 병렬로 배치되어 random access 가 가능하고 byte 단위로 프로그래밍 가능. • 읽기 속도가 NAND 형보다 빠르지만, 쓰기/지우기 속도는 느리다. • 각 cell 마다 비트선의 접촉전극이 필요하여 NAND 형에 비해 cell 당 면적이 많이 필요하고 비싸다. • 읽기 속도가 빠르므로 코드 저장용(주로 디바이스의 OS 부팅용)으로 사용한다. 2. NAND 형 • cell 이 직렬로 배치되어 page/block 단위로 읽고 쓰기 가능. • random access 가 불가능하여 읽기 속도가 NOR 형에 비해서 느리지만, 쓰기/지우기 속도는 빠름. • 집적 밀도가 높다 • 대용량화가 가능하므로 데이터 저장용(디지털 카메라, MP3 등)으로 사용한다. 요약 NOR 형은 대용량화가 어렵고 NAND 형은 읽기 속도가 느리다는 단점이 있다

Cache Systems SRAM  DRAM CPU Cache Main Memory Main Memory 10MHz Data object transfer Block transfer 400MHz Main Memory 10MHz Bus 66MHz

Why Memory Hierachy?

Cache Mechanism (1)

Cache Address Mapping

512 byte 캐쉬의 라인? 4byte 4byte 4byte 4byte 4byte Cache Block 1block = 4 byte 4byte 128블록 4byte 4byte 4byte 4byte

512 byte 캐쉬의 라인? 4byte 4byte 4byte 4byte 4byte 4byte 4byte 4byte 4byte 1block = 16 byte (4word) 4byte 4byte 4byte 4byte 32블록 4byte 4byte 4byte 4byte 4byte 4byte 4byte 4byte 4byte 4byte 4byte 4byte 4byte 4byte 4byte 4byte

0번 라인 캐쉬 엔트리에 들어올수 있는 블록의 태그는? Cache entry 가 8개이면?  1way cache (direct mapped cache) Tag=0 Tag=1 0번 라인 캐쉬 엔트리에 들어올수 있는 블록의 태그는? 캐쉬가 8라인이면 (0, 8, 16, 24,… 캐쉬가 256라인이면 (0, 256, 512,)

Cache entry 가 8개이면?  2 way set-associative cache way 1, way 2 Set 0 Set 1 Set 2 Set 3 Tag: index가 set 안에 있나 없나? Set: set ID

Cache entry 가 8개이면?  4 way cache

Cache entry 가 8개이면?  n way set associative cache (n=8, s=1) – Fully associative cache

Direct Mapped

Direct Mapping Cache Organization

Direct Mapping Example

Direct Mapping pros & cons Simple Inexpensive Fixed location for given block If a program accesses 2 blocks that map to the same line repeatedly, cache misses are very high

Direct Mapping Cache Line Table Cache line Main Memory blocks held 0 0, m, 2m, 3m…2s-m 1 1,m+1, 2m+1…2s-m+1 m-1 m-1, 2m-1,3m-1…2s-1

2 Way Set Associative

Set Associative Cache 2 S0 S1 S2 A0, A1

Set Associative Mapping Cache is divided into a number of sets Each set contains a number of lines A given block maps to any line in a given set e.g. Block B can be in any line of set i e.g. 2 lines per set 2 way associative mapping A given block can be in one of 2 lines in only one set

Set Associative Mapping Example 13 bit set number Block number in main memory is modulo 213 000000, 00A000, 00B000, 00C000 … map to same set

Two Way Set Associative Cache Organization

Set Associative Mapping Address Structure Word 2 bit Tag 9 bit Set 13 bit Use set field to determine cache set to look in Compare tag field to see if we have a hit e.g Address Tag Data Set number 1FF 7FFC 1FF 12345678 1FFF 001 7FFC 001 11223344 1FFF

Two Way Set Associative Mapping Example

Fully Associative Cache

Associative Mapping A main memory block can load into any line of cache Memory address is interpreted as tag and word Tag uniquely identifies block of memory Every line’s tag is examined for a match Cache searching gets expensive

Fully Associative Cache Organization

Associative Mapping Example

Associative Mapping Address Structure Word 2 bit Tag 22 bit 22 bit tag stored with each 32 bit block of data Compare tag field with tag entry in cache to check for hit Least significant 2 bits of address identify which 16 bit word is required from 32 bit data block e.g. Address Tag Data Cache line FFFFFC FFFFFC 24682468 3FFF

Replacement Algorithms (1) Direct mapping No choice Each block only maps to one line Replace that line

Replacement Algorithms (2) Associative & Set Associative Hardware implemented algorithm (speed) Least Recently used (LRU) e.g. in 2 way set associative Which of the 2 block is lru? First in first out (FIFO) replace block that has been in cache longest Least frequently used replace block which has had fewest hits Random

Locality of Reference LRU (timestamp(access) 가 필요) way 1, way 2 frequency Set 0 Set 1 Set 2 Set 3 Locality of Reference LRU (timestamp(access) 가 필요) LFU (frequency(access)가 필요) FIFO (timestamp(로딩된 시간))

Access pattern 0,1,0,0,1,0,0,1 Frequency: 0=5, 1=3 timestamp 1 Set 0 Set 1 Set 2 Set 3 frequency Access pattern 0,1,0,0,1,0,0,1 Frequency: 0=5, 1=3 Access timestamp: 1이 최근 Loading timestamp: 0 이old

Write Policy Must not overwrite a cache block unless main memory is up to date Multiple CPUs may have individual caches  cache coherency (일관성=여러 개의 캐쉬의 데이터와 메인메모리의 데이터가 같아야 함) I/O may address main memory directly

Write through All writes go to main memory as well as cache Multiple CPUs can monitor main memory traffic to keep local (to CPU) cache up to date Lots of traffic Slows down writes Remember bogus write through caches!

Write back Updates initially made in cache only Update bit for cache slot is set when update occurs If block is to be replaced, write to main memory only if update bit is set Other caches get out of sync I/O must access main memory through cache N.B. 15% of memory references are writes

The Memory System Embedded systems and applications Simple blocks The memory system requirements: vary considerably Simple blocks Multiple types of memory Caches Write buffers Virtual memory

Memory management units Memory management unit (MMU) translates addresses: Protection checks main memory logical address memory management unit physical address CPU

Memory management tasks Allows programs to move in physical memory during execution Allows virtual memory: memory images kept in secondary storage; images returned to main memory on demand during execution Page fault: request for location not resident in memory

Address translation Requires some sort of register/table to allow arbitrary mappings of logical to physical addresses Two basic schemes: segmented paged Segmentation and paging can be combined (x86)

메모리 단편화(Fragmentation) P1 P5 P2 P2 P3 단편화 (fragmentation) P3 P3

압축(compaction) P2 P5 P2 P3 P3

Segments and pages page 1 Size가 고정 page 2 segment 1 memory Size가 가변

Code and data segment (section) #include <stdio.h> int a,b,c=3; static int k=2; void main(void) { {int d=5; int e=c; int d=f; e=add(3,5);d=add(3,5)} int *p = (int*) malloc(int); } int add(int y, int z) { int d=7; static int f++=7; return y+z+f; Text (code) Data BSS (Block Started by Symbol) 또는 Block Static Storage Heap Stack

Heap과 Stack SP(Stack Pointer)

Code and data segment (section) Text (code) COFF 또는 ELF Header Loading Text (code) LD(.so) Data Data BSS (Block Started by Symbol) 또는 Block Static Storage Symbol table Heap Stack

Storage class and Scope Static vs Volatile Static vs Dynamic Static vs External (In C and C++) extern Static vs Instance (Class variable in C++ and Java) Local vs Global Fixed (size) data vs Variable (size) data Variable(dynamic) 메모리는 단편화의 위험이 있다.

Segment address translation segment base address logical address + segment lower bound range error range check segment upper bound physical address

Page address translation offset page i base concatenate page offset

Page table organizations tree page descriptor flat page descriptor

Caching address translations Large translation tables require main memory access TLB: cache for address translation Typically small

ARM Memory Management Unit

ARM Memory Management System control coprocessor(CP15) Registers Write Buffers Caches Registers Up to 16 primary registers Physical registers in CP15 more than 16 Register access instructions MCR (ARM to CP15) MRC (CP15 to ARM)

Cached MMU memory system

Page table size for 4-KB pages : 220 X 4 bytes = 4 MB ARM Memory Management MMU can be enabled and disabled Memory region types: section: 1 Mbytes block large page: 64 Kbytes small page: 4 Kbytes tiny Page: 1 Kbytes Two-level translation scheme (why?) First-level table Second-level table Page table size for 4-KB pages : 220 X 4 bytes = 4 MB

ARM address translation Translation table base register 1st index 2nd index offset 1st level table descriptor concatenate 2nd level table descriptor physical address

First-level descriptors AP: access permission C,B: cachability and bufferability

Section descriptor and translating section references CP reg 2: 16 KB boundary 4K Entries 1 MB block (section) Max: 16KB

Coarse Page table descriptor 4 K entries 256 entries Max: 16KB Max: 1KB

Fine page table descriptor 1 K entries Max: 4 KB

Second-level descriptor

Translating large page references

Access permissions System (S) and ROM (R) in CP15 register 1

TLB functions Invalidate instruction TLB Invalidate instruction single entry Invalidate entire data TLB Invalidate data single entry TLB lockdown

PC Bus Architecture The northbridge, also known as the memory controller hub (MCH) in Intel systems (AMD, VIA, SiS and others usually use 'northbridge'), is traditionally one of the two chips in the core logic chipset on a PC motherboard The Southbridge, also known as the I/O Controller Hub (ICH) in Intel systems (AMD, VIA, SiS and others usually use 'southbridge'), is a chip that implements the "slower" capabilities of the motherboard in a northbridge/southbridge chipset computer architecture.

I/O devices Usually includes some non-digital component Typical digital interface to CPU: CPU status reg data mechanism

I/O addressing A microprocessor communicates with other devices using some of its pins Port-based I/O (parallel I/O) Processor has one or more N-bit ports Processor’s software reads and writes a port just like a register E.g., P0 = 0xFF; v = P1; -- P0 and P1 are 8-bit ports Bus-based I/O Processor has address, data and control ports that form a single bus Communication protocol is built into the processor A single instruction carries out the read or write protocol on the bus

Bus-based I/O 프로세서는 동일한 버스를 사용해서 메모리나 주변장치와 통신 Memory-mapped I/O Peripheral registers occupy addresses in same address space as memory e.g., Bus has 16-bit address lower 32K addresses may correspond to memory upper 32k addresses may correspond to peripherals Standard I/O (I/O-mapped I/O) Additional pin (M/IO) on bus indicates whether a memory or peripheral access all 64K addresses correspond to memory when M/IO set to 0 all 64K addresses correspond to peripherals when M/IO set to 1

Memory-mapped vs. Standard I/O Memory-mapped I/O 다른 특별한 명령이 요구되지 않음 Assembly instructions involving memory like MOV and ADD work with peripherals as well Standard I/O No loss of memory addresses to peripherals Simpler address decoding logic in peripherals possible When number of peripherals much smaller than address space then high-order address bits can be ignored smaller and/or faster comparators Standard I/O requires special instructions (e.g., IN, OUT) to move data between peripheral registers and memory

Timers(타이머) Timer: 시간 간격(time interval) 측정 Clock Pulse의 counting에 기반 To generate timed output events e.g., hold traffic light green for 10 s To measure input events e.g., measure a car’s speed Clock Pulse의 counting에 기반 E.g., let Clk period be 10 ns And we count 20,000 Clk pulses Then 200 microseconds have passed 16-bit counter would count up to 65,535*10 ns = 655.35 microsec., resolution = 10 ns Top: indicates top count reached, wrap-around 16-bit up counter Clk Cnt Basic timer Top Reset 16

Counters(카운터) 카운터와 유사하나, 클럭 펄스의 수를 세는 것이 아니라 일반 입력 신호로 부터의 펄스 수를 카운트 e.g., count cars passing over a sensor Can often configure device as either a timer or counter 16-bit up counter Clk 16 Cnt_in 2x1 mux Mode Timer/counter Top Reset Cnt

Watchdog timer Since most industrial or mission critical embedded system cannot fail, how do we guarantee that a glitch doesn’t break the instruction flow? Watchdog timer - 시스템의 동작을 모니터링하여, 다양한 조건 발생 시에서 RESET signal 발생 Power supply voltage goes out of range Computer hasn’t issued a reset pulse to the timer in designated time interval Processor RESET IN Output port: bit 0 Watchdog Timer RESET OUT RESET INPUT

Interrupt interface 임베디드 시스템의 실시간성 요구에 필수적인 요소 intr request status reg CPU status reg data mechanism PC intr request intr ack data/address IR

Interrupts Suppose a peripheral intermittently receives data, which must be serviced by the processor The processor can poll the peripheral regularly to see if data has arrived – wasteful The peripheral can interrupt the processor when it has data Requires an extra pin or pins: Int If Int is 1, processor suspends current program, jumps to an Interrupt Service Routine, or ISR Known as interrupt-driven I/O Essentially, “polling” of the interrupt pin is built-into the hardware, so no extra time!

Interrupts (2) ISR(interrupt service routine)의 주소? Fixed interrupt Address built into microprocessor, cannot be changed Either ISR stored at address or a jump to actual ISR stored if not enough bytes available Vectored interrupt 주변장치가 주소를 제공 Common when microprocessor has multiple peripherals connected by a system bus Compromise: interrupt address table

Additional interrupt issues Maskable vs. non-maskable interrupts Maskable: programmer can set bit that causes processor to ignore interrupt Important when in the middle of time-critical code Non-maskable: a separate interrupt pin that can’t be masked Typically reserved for drastic situations, like power failure requiring immediate backup of data to non-volatile memory Jump to ISR Some microprocessors treat jump same as call of any subroutine Complete state saved (PC, registers) – may take hundreds of cycles Others only save partial state, like PC only Thus, ISR must not modify registers, or else must save them first Assembly-language programmer must be aware of which registers stored

Direct memory access (DMA) Buffering Temporarily storing data in memory before processing Data accumulated in peripherals commonly buffered Microprocessor could handle this with ISR Storing and restoring microprocessor state inefficient Regular program must wait DMA controller more efficient Separate single-purpose processor Microprocessor relinquishes control of system bus to DMA controller Microprocessor can meanwhile execute its regular program No inefficient storing and restoring state due to ISR call Regular program need not wait unless it requires the system bus

ARM 프로세서 (Xscale core 기반의 PXA255 중심으로)

References ARM Architecture reference manual Second edition, by David Seal, Addison-wesley, 1996 ARM System Developer’s Guide Designing and Optimizing System Software Andrew N. Sloss, Dominic Symes, and Chris Wright, Morgan 2004 KAUFMANN and Elsevier

PXA255 Processor Intel PXA255 Overview High Performance 32-bit Microprocessor Max 400MHz Technology 0.35um, 3 layer metal CMOS, 2.6 Million transistors 256 PBGA package (17x17mm) Xscale core 로서 ARMv5TE 기반 Modified-Harvard Architecture 가 적용된 ARM 프로세서 Separate Instruction and data cache (2 caches)

ARM Processor Evolution

Evolution of ARM Architecture ARM Architecture Revision (Version) 특정 ISA (Instruction Set Architecture)을 가진다. http://www.arm.com/pdfs/ARM11%20Microarchitecture%20White%20Paper.pdf

ARM Nomenclature

ARM Nomenclature (2)

ARM Revision History

CPSR and Attribute Comparison

ARM Processor Variants

ARM7 Family ARM7core has a Von-Neumann style architecture, 3stage pipeline, ARMv4T instruction set ARM7TDMI is the first of a new range of processors introduced in 1995 by ARM ARM7TDMI-S : same as 7TDMI but synthesizable ARM720T : has MMU, (capable of Linux and WinCE), unified 8Kcache (Data + Instruction) A variation of ARM7 is ARM7EJ-S: 5-stage pipeline, executes ARMv5TEJ instructions

ARM9 Family ARM9 family was announced in 1997 5 stage pipeline  higher clock frequency than ARM7 family Memory system redesign  Harvard architecture (separate D and I cache (buses) The first processor in ARM9 family is ARM920T (Separate D + I cache, MMU  OS with virtual memory, ARMv4T instructions ARM922T is a variation on ARM920T (half of the cache size) ARM940T (smaller D+I cache and MPU) The next processors in ARM9 family are based on ARM9E-S core ( synthesizable version of ARM9 core with E) Two variations of ARM9E-S: ARM946E-S and ARM966E-S Both execute architecture v5TE instructions Both support optional embedded trace macrocell (ETM) ARM946E-S includes TCM, cache, and an MPU (designed for use in embedded control applications that require deterministic real-time response) ARM966E does not have MPU and cache extensions (but does have configurable TCMs) The latest core in ARM9 family is ARM926EJ-S (announced in 2000) Designed for portable Java-enabled devices such as 3G phones and PDAs) ARM926EJ-S is the first ARM core to include Jazelle technology Includes MMU, configurable TCMs, and D+I caches

ARM10 Family ARM10 was designed for performance (announced in 1999) It extends the ARM9 pipeline to six stages Optional vector floating-point(VFP) unit ARM1020E is the first processor to use an ARM10E core Separate 32K D+I caches, MMU, optional vector floating point unit, dual 64bit bus interface for increased performance ARM1026EJ-S is very similar to ARM926EJ-S but both MPU and MMU Has performance of ARM10 and the flexibility of an ARM926EJ-S

ARM11 Family ARM1136J-S was designed for high performance and power-efficient applications (announced in 2003) ARM1136J-S : the first processor implementation of ARMv6 architecture instructions 8stage pipeline with separate load-store and arithmetic pipelines ARMv6 instructions include SIMD extensions for media processing ARM1136JF-S is an ARM1136J-S with the addition of the vector floating point unit

Specialized Processors StrongARM was originally co-developed by Digital Semiconductor and is now exclusively licensed by Intel Popular for PDA (high performance and low power consumption Harvard architecture with separate D+I caches 5 stage pipeline without Thumb instruction set XScale is a follow-on product to the StrongARM (upto 1GHz) Xscale executes architecture v5TE instructions Harvard architecture and is similar to the StrongARM, Includes MMU SC100 is designed for low-power security applications Based on ARM7TDMI core with an MPU Used for smart card applications

Memory Management of ARM Three different types of memory management hardware of ARM Non-protected memory MPU: Memory Protection Unit Simple system that uses a limited number of memory resions MMU: Memory Management Unit Used by Virtual memory management system of OS

ARM Architecture 특징 비교

ARM Processor Roadmap http://www.arm.com/pdfs/ARM11%20Microarchitecture%20White%20Paper.pdf

ARM Roadmap

ARM7 과 ARM9 Core의 비교 http://www.arm.com/documentation/ARMProcessor_Cores/index.html Maximum Clock Freq. 1.8 ~ 2배 향상 Performance: 30% 향상

ARM Architecture 출처: ARM6 Architecture: http://www.arm.com/documentation/White_Papers/index.html

ARMv6의 성능향상기법

Little and Big endians Little Endian Big Endian 0x345f --- address: 0x8000: 5f 0x8001: 34 DNS www.ajou.ac.kr MSB: kr LSB: www Intel i386 CPUs Big Endian kr.ac.yu.www 0x345f  34 , 5f IBM, Motorola Mixed (Supports both Little and Big endians) ARM (default: Little Endian)

삼성에서 나오는 ARM 프로세서들 http://www.samsung.com/Products/Semiconductor/common/product_list.aspx?family_cd=LSI090101

Qualcomm Processors

Qualcomm MSM6800 http://www.cdmatech.com/images/products/diagram_msm6800.pdf

Qualcomm MSM3300 http://www.cdmatech.com/solutions/pdf/msm3300_chipset.pdf

TMS320DM270 http://www.tij.co.jp/jsc/docs/apps/digital/pdf/tms320dm270.pdf

Intel XScale Core Architecture Refer to Intel XScale Core Developer’s Manual January, 2004

Extensions to ARM Architecture

Event Architecture

Event Priority of XScale

Configuration

MCR/MRC Format

LDC/STC Format when Accessing CP14

CP15 Registers

Intel PXA255 Processor Developer’s Manual January, 2004

System Integration Unit

PXA255 Pin Serial Channel 0 (USB) LCDControl Serial Channel 1 UDC- L_DD(15:0) Serial Channel 0 (USB) UDC+ L_FCLK RXD_1 L_LCLK Serial Channel 1 LCDControl TXD_1 L_PCLK RXD_2 L_BIAS Serial Channel 2 (IrDA) TXD_2 Intelⓡ XScale* PXA250 [256-pins] GP(27:0) GPIO Ports RXD_3 nCAS/ DQM(3:0) Serial Channel 3 (UART) TXD_3 nRAS/ nSDCS(3:0) TXD_C nOE RXD_C nWE Serial Channel 4(CODEC) SFRM_C nCS(5:0) Memory Control SCLK_C RDY BATT_FAULT nSDRAS VDD_FAULT nSDCAS Power Management PWR_EN SDCKE<1:0> SDCLK<2:0> TCK_BYP RD/nWR Transceiver Control TESTCLK PEXTAL nPOE nPWE PXTAL nPIOR nPIOW TEXTAL nPCE<2:1> PCMCIA Bus Signals Clocks, Reset and Test TXTAL PSKTSEL nPREG nRESET nPWAIT nRESET_OUT nIOIS16 SMROM_EN Address Bus A<25:0> ROM_SEL TCK D<31:0> Data Bus TDI JTAG TDO VDD TMS VDDX Supply nTRST VSS/VSSX

PXA255 Address Map Reserved (1280 Mbytes) SDRAM Bank 3 (64 Mbytes) 0hFFFF FFFF Reserved (1280 Mbytes) 0hB000 0000 SDRAM Bank 3 (64 Mbytes) 0hAC00 0000 SDRAM Bank 2 (64 Mbytes) 0hA800 0000 Dynamic Memory Interface 256 Mbytes SDRAM Bank 1 (64 Mbytes) 0hA400 0000 SDRAM Bank 0 (64 Mbytes) 0hA000 0000 Reserved (1344 Mbytes) 0h4C00 0000 Memory Mapped registers (Memory Ctl) 0h4800 0000 Memory Mapped registers Interface 192 Mbytes Memory Mapped registers (LCD) 0h4400 0000 Memory Mapped registers (Peripherals) 0h4000 0000 PCMCIA/CF - Slot 1 (256 Mbytes) 0h3000 0000 PCMCIA Interface 512 Mbytes PCMCIA/CF - Slot 0 (256 Mbytes) 0h2000 0000 Reserved (128 Mbytes) 0h1800 0000 Static Chip Select 5 (64 Mbytes) 0h1400 0000 Static Chip Select 4 (64 Mbytes) 0h1000 0000 Static Chip Select 3 (64 Mbytes) Static Memory Interface (ROM, Flash, SRAM) 384 Mbytes 0h0C00 0000 Static Chip Select 2 (64 Mbytes) 0h0800 0000 Static Chip Select 1 (64 Mbytes) 0h0400 0000 0h0000 0000 Static Chip Select 0 (64 Mbytes)

PXA255 기반의 Example System Intel®XScale PX255 Portable Communications Microprocessor UART Tablet/ Serial Keyboard AC97 Infrared USB Synchronization Port TFT Color LCD Display SDRAM/DRAM SMROM/ROM Flash Glue Logic SRAM Variable Latency I/O PCMCIA Interface (Flash, Modem) Speaker Microphone 3.686MHz 32.768KHz 손바닥 사이즈의 시스템구성

DMA Controller and Bridge PXA255 Processor(1) ASIC Color or Grayscale LCD Controller RTC OS Timer PWM(2) Interrupt Controller Clock & Power Man. I2S I2C AC97 FF_UART BT_UART Slow lrDA Fast lrDA SSP Memory Variable Latency I/O Control PCMCIA & CF Static General Purpose I / O Peripheral Bus 3.6864 MHz Osc 32.768 KHz System Bus XCVR ROM/ Flash SRAM 4 banks Socket 0 Socket 1 Dynamic SDRAM/ SMROM DMA Controller and Bridge CS # 0,1,2 CS # 3,4,5 0x4400_0000 XScale Core IMMU DMMU Icache (32 Kbytes) Dcache Minicache Instructions PC Addr Write Buffer Read Load/Store Data Megacell NSSP USB Client MMC PXA255 Block Diargaram -------------------------------------------------------------------- DMA Controller 0x4000 0000 5-28 Full Function UART 0x4010 0000 10-26 Bluetooth UART 0x4020 0000 10-27 I2C 0x4030 0000 9-22 I2S 0x4040 0000 14-17 AC97 0x4050 0000 13-19 UDC 0x4060 0000 12-50 Standard UART 0x4070 0000 10-27 ICP 0x4080 0000 11-17 RTC 0x4090 0000 4-37 OS Timer 0x40A0 0000 4-41 PWM 0 0x40B0 0000 4-48 PWM 1 0x40C0 0000 4-48 Interrupt Control 0x40D0 0000 4-30 GPIO 0x40E0 0000 4-21 Power Manager and Reset Control 0x40F0 0000 3-31 SSP 0x4100 0000 8-18 MMC Controller 0x4110 0000 15-22 Clocks Manager 0x4130 0000 3-36 LCD Controller 0x4400 0000 7-49 Memory Controller 0x4800 0000 6-9 DMA Controller 0x4000 0000 0x4000 0000 DCSR0 DMA Control / Status Register for Channel 0 0x4000 0004 DCSR1 DMA Control / Status Register for Channel 1 0x4000 0008 DCSR2 DMA Control / Status Register for Channel 2 0x4000 000C DCSR3 DMA Control / Status Register for Channel 3 0x4000 0010 DCSR4 DMA Control / Status Register for Channel 4 0x4000 0014 DCSR5 DMA Control / Status Register for Channel 5 0x4000 0018 DCSR6 DMA Control / Status Register for Channel 6 0x4000 001C DCSR7 DMA Control / Status Register for Channel 7 0x4000 0020 DCSR8 DMA Control / Status Register for Channel 8 0x4000 0024 DCSR9 DMA Control / Status Register for Channel 9 0x4000 0028 DCSR10 DMA Control / Status Register for Channel 10 0x4000 002C DCSR11 DMA Control / Status Register for Channel 11 0x4000 0030 DCSR12 DMA Control / Status Register for Channel 12 0x4000 0034 DCSR13 DMA Control / Status Register for Channel 13 0x4000 0038 DCSR14 DMA Control / Status Register for Channel 14 0x4000 003C DCSR15 DMA Control / Status Register for Channel 15 0x4000 00f0 DINT DMA Interrupt Register 0x4000 0100 DRCMR0 Request to Channel Map Register for DREQ 0 0x4000 0104 DRCMR1 Request to Channel Map Register for DREQ 1 0x4000 0108 DRCMR2 Request to Channel Map Register for I2S receive Request 0x4000 010C DRCMR3 Request to Channel Map Register for I2S transmit Request 0x4000 0110 DRCMR4 Request to Channel Map Register for BTUART receive Request 0x4000 0114 DRCMR5 Request to Channel Map Register for BTUART transmit Request. 0x4000 0118 DRCMR6 Request to Channel Map Register for FFUART receive Request 0x4000 011C DRCMR7 Request to Channel Map Register for FFUART transmit Request 0x4000 0120 DRCMR8 Request to Channel Map Register for AC97 microphone Request 0x4000 0124 DRCMR9 Request to Channel Map Register for AC97 modem receive Request 0x4000 0128 DRCMR10 Request to Channel Map Register for AC97 modem transmit Request 0x4000 012C DRCMR11 Request to Channel Map Register for AC97 audio receive Request 0x4000 0130 DRCMR12 Request to Channel Map Register for AC97 audio transmit Request 0x4000 0134 DRCMR13 Request to Channel Map Register for SSP receive Request 0x4000 0138 DRCMR14 Request to Channel Map Register for SSP transmit Request 0x4000 013C DRCMR15 Reserved 0x4000 0140 DRCMR16 Reserved 0x4000 0144 DRCMR17 Request to Channel Map Register for ICP receive Request 0x4000 0148 DRCMR18 Request to Channel Map Register for ICP transmit Request 0x4000 014C DRCMR19 Request to Channel Map Register for STUART receive Request 0x4000 0150 DRCMR20 Request to Channel Map Register for STUART transmit Request 0x4000 0154 DRCMR21 Request to Channel Map Register for MMC receive Request 0x4000 0158 DRCMR22 Request to Channel Map Register for MMC transmit Request 0x4000 015C DRCMR23 Reserved 0x4000 0160 DRCMR24 Reserved 0x4000 0164 DRCMR25 Request to Channel Map Register for USB endpoint 1 Request 0x4000 0168 DRCMR26 Request to Channel Map Register for USB endpoint 2 Request 0x4000 016C DRCMR27 Request to Channel Map Register for USB endpoint 3 Request 0x4000 0170 DRCMR28 Request to Channel Map Register for USB endpoint 4 Request 0x4000 0174 DRCMR29 Reserved 0x4000 0178 DRCMR30 Request to Channel Map Register for USB endpoint 6 Request 0x4000 017C DRCMR31 Request to Channel Map Register for USB endpoint 7 Request 0x4000 0180 DRCMR32 Request to Channel Map Register for USB endpoint 8 Request 0x4000 0184 DRCMR33 Request to Channel Map Register for USB endpoint 9 Request 0x4000 0188 DRCMR34 Reserved 0x4000 018C DRCMR35 Request to Channel Map Register for USB endpoint 11 Request 0x4000 0190 DRCMR36 Request to Channel Map Register for USB endpoint 12 Request 0x4000 0194 DRCMR37 Request to Channel Map Register for USB endpoint 13 Request 0x4000 0198 DRCMR38 Request to Channel Map Register for USB endpoint 14 Request 0x4000 019C DRCMR39 Reserved 0x4000 0200 DDADR0 DMA Descriptor Address Register Channel 0 0x4000 0204 DSADR0 DMA Source Address Register Channel 0 0x4000 0208 DTADR0 DMA Target Address Register Channel 0 0x4000 020C DCMD0 DMA Command Address Register Channel 0 0x4000 0210 DDADR1 DMA Descriptor Address Register Channel 1 0x4000 0214 DSADR1 DMA Source Address Register Channel 1 0x4000 0218 DTADR1 DMA Target Address Register Channel 1 0x4000 021C DCMD1 DMA Command Address Register Channel 1 0x4000 0220 DDADR2 DMA Descriptor Address Register Channel 2 0x4000 0224 DSADR2 DMA Source Address Register Channel 2 0x4000 0228 DTADR2 DMA Target Address Register Channel 2 0x4000 022C DCMD2 DMA Command Address Register Channel 2 0x4000 0230 DDADR3 DMA Descriptor Address Register Channel 3 0x4000 0234 DSADR3 DMA Source Address Register Channel 3 0x4000 0238 DTADR3 DMA Target Address Register Channel 3 0x4000 023C DCMD3 DMA Command Address Register Channel 3 0x4000 0240 DDADR4 DMA Descriptor Address Register Channel 4 0x4000 0244 DSADR4 DMA Source Address Register Channel 4 0x4000 0248 DTADR4 DMA Target Address Register Channel 4 0x4000 024C DCMD4 DMA Command Address Register Channel 4 0x4000 0250 DDADR5 DMA Descriptor Address Register Channel 5 0x4000 0254 DSADR5 DMA Source Address Register Channel 5 0x4000 0258 DTADR5 DMA Target Address Register Channel 5 0x4000 025C DCMD5 DMA Command Address Register Channel 5 0x4000 0260 DDADR6 DMA Descriptor Address Register Channel 6 0x4000 0264 DSADR6 DMA Source Address Register Channel 6 0x4000 0268 DTADR6 DMA Target Address Register Channel 6 0x4000 026C DCMD6 DMA Command Address Register Channel 6 0x4000 0270 DDADR7 DMA Descriptor Address Register Channel 7 0x4000 0274 DSADR7 DMA Source Address Register Channel 7 0x4000 0278 DTADR7 DMA Target Address Register Channel 7 0x4000 027C DCMD7 DMA Command Address Register Channel 7 0x4000 0280 DDADR8 DMA Descriptor Address Register Channel 8 0x4000 0284 DSADR8 DMA Source Address Register Channel 8 0x4000 0288 DTADR8 DMA Target Address Register Channel 8 0x4000 028C DCMD8 DMA Command Address Register Channel 8 0x4000 0290 DDADR9 DMA Descriptor Address Register Channel 9 0x4000 0294 DSADR9 DMA Source Address Register Channel 9 0x4000 0298 DTADR9 DMA Target Address Register Channel 9 0x4000 029C DCMD9 DMA Command Address Register Channel 9 0x4000 02A0 DDADR10 DMA Descriptor Address Register Channel 10 0x4000 02A4 DSADR10 DMA Source Address Register Channel 10 0x4000 02A8 DTADR10 DMA Target Address Register Channel 10 0x4000 02AC DCMD10 DMA Command Address Register Channel 10 0x4000 02B0 DDADR11 DMA Descriptor Address Register Channel 11 0x4000 02B4 DSADR11 DMA Source Address Register Channel 11 0x4000 02B8 DTADR11 DMA Target Address Register Channel 11 0x4000 02BC DCMD11 DMA Command Address Register Channel 11 0x4000 02C0 DDADR12 DMA Descriptor Address Register Channel 12 0x4000 02C4 DSADR12 DMA Source Address Register Channel 12 0x4000 02C8 DTADR12 DMA Target Address Register Channel 12 0x4000 02CC DCMD12 DMA Command Address Register Channel 12 0x4000 02D0 DDADR13 DMA Descriptor Address Register Channel 13 0x4000 02D4 DSADR13 DMA Source Address Register Channel 13 0x4000 02D8 DTADR13 DMA Target Address Register Channel 13 0x4000 02DC DCMD13 DMA Command Address Register Channel 13 0x4000 02E0 DDADR14 DMA Descriptor Address Register Channel 14 0x4000 02E4 DSADR14 DMA Source Address Register Channel 14 0x4000 02E8 DTADR14 DMA Target Address Register Channel 14 0x4000 02EC DCMD14 DMA Command Address Register Channel 14 0x4000 02F0 DDADR15 DMA Descriptor Address Register Channel 15 0x4000 02F4 DSADR15 DMA Source Address Register Channel 15 0x4000 02F8 DTADR15 DMA Target Address Register Channel 15 0x4000 02FC DCMD15 DMA Command Address Register Channel 15 Full Function UART 0x4010 0000 0x4010 0000 FFRBR Receive Buffer Register (read only) 0x4010 0000 FFTHR Transmit Holding Register (write only) 0x4010 0004 FFIER Interrupt Enable Register (read/write) 0x4010 0008 FFIIR Interrupt ID Register (read only) 0x4010 0008 FFFCR FIFO Control Register (write only) 0x4010 000C FFLCR Line Control Register (read/write) 0x4010 0010 FFMCR Modem Control Register (read/write) 0x4010 0014 FFLSR Line Status Register (read only) 0x4010 0018 FFMSR Modem Status Register (read only) 0x4010 001C FFSPR Scratch Pad Register (read/write) 0x4010 0020 FFISR Infrared Selection Register (read/write) 0x4010 0000 FFDLL Divisor Latch Low Register (DLAB = 1) (read/write) 0x4010 0004 FFDLH Divisor Latch High Register (DLAB = 1) (read/write) Bluetooth UART 0x4020 0000 0x4020 0000 BTRBR Receive Buffer Register (read only) 0x4020 0000 BTTHR Transmit Holding Register (write only) 0x4020 0004 BTIER Interrupt Enable Register (read/write) 0x4020 0008 BTIIR Interrupt ID Register (read only) 0x4020 0008 BTFCR FIFO Control Register (write only) 0x4020 000C BTLCR Line Control Register (read/write) 0x4020 0010 BTMCR Modem Control Register (read/write) 0x4020 0014 BTLSR Line Status Register (read only) 0x4020 0018 BTMSR Modem Status Register (read only) 0x4020 001C BTSPR Scratch Pad Register (read/write) 0x4020 0020 BTISR Infrared Selection Register (read/write) 0x4020 0000 BTDLL Divisor Latch Low Register (DLAB = 1) (read/write) 0x4020 0004 BTDLH Divisor Latch High Register (DLAB = 1) (read/write) I2C 0x4030 0000 0x4030 1680 IBMR I2C Bus Monitor Register - IBMR 0x4030 1688 IDBR I2C Data Buffer Register - IDBR 0x4030 1690 ICR I2C Control Register - ICR 0x4030 1698 ISR I2C Status Register - ISR 0x4030 16A0 ISAR I2C Slave Address Register - ISAR I2S 0x4040 0000 0x4040 0000 SACR0 Global Control Register 0x4040 0004 SACR1 Serial Audio I2S/MSB-Justified Control Register 0x4040 0008 - Reserved 0x4040 000C SASR0 Serial Audio I2S/MSB-Justified Interface and FIFO Status Register 0x4040 0010 - Reserved 0x4040 0014 SAIMR Serial Audio Interrupt Mask Register 0x4040 0018 SAICR Serial Audio Interrupt Clear Register 0x4040 001C through 0x4040 005C - Reserved 0x4040 0060 SADIV Audio Clock Divider Register. 0x4040 0064 through 0x4040 007C - Reserved 0x4040 0080 SADR Serial Audio Data Register (TX and RX FIFO access Register). AC97 0x4050 0000 0x4050 0000 POCR PCM Out Control Register 0x4050 0004 PICR PCM In Control Register 0x4050 0008 MCCR Mic In Control Register 0x4050 000C GCR Global Control Register 0x4050 0010 POSR PCM Out Status Register 0x4050 0014 PISR PCM In Status Register 0x4050 0018 MCSR Mic In Status Register 0x4050 001C GSR Global Status Register 0x4050 0020 CAR CODEC Access Register 0x4050 0024 through 0x4050 003C - Reserved 0x4050 0040 PCDR PCM FIFO Data Register 0x4050 0044 through 0x4050 005C - Reserved 0x4050 0060 MCDR Mic-in FIFO Data Register 0x4050 0064 through 0x4050 00FC - Reserved 0x4050 0100 MOCR Modem Out Control Register 0x4050 0104 - Reserved 0x4050 0108 MICR Modem In Control Register 0x4050 010C - Reserved 0x4050 0110 MOSR Modem Out Status Register 0x4050 0114 - Reserved 0x4050 0118 MISR Modem In Status Register 0x4050 011C through 0x4050 013C - Reserved 0x4050 0140 MODR Modem FIFO Data Register 0x4050 0144 through 0x4050 01FC - Reserved 0x4050 0200 through 0x4050 02FC - Primary Audio codec Registers 0x4050 0300 through 0x4050 03FC - Secondary Audio codec Registers 0x4050 0400 through 0x4050 04FC - Primary Modem codec Registers 0x4050 0500 through 0x4050 05FC - Secondary Modem codec Registers UDC 0x4060 0000 0x4060 0000 UDCCR UDC Control Register 0x4060 0010 UDCCS0 UDC Endpoint 0 Control/Status Register 0x4060 0014 UDCCS1 UDC Endpoint 1 (IN) Control/Status Register 0x4060 0018 UDCCS2 UDC Endpoint 2 (OUT) Control/Status Register 0x4060 001C UDCCS3 UDC Endpoint 3 (IN) Control/Status Register 0x4060 0020 UDCCS4 UDC Endpoint 4 (OUT) Control/Status Register 0x4060 0024 UDCCS5 UDC Endpoint 5 (Interrupt) Control/Status Register 0x4060 0028 UDCCS6 UDC Endpoint 6 (IN) Control/Status Register 0x4060 002C UDCCS7 UDC Endpoint 7 (OUT) Control/Status Register 0x4060 0030 UDCCS8 UDC Endpoint 8 (IN) Control/Status Register 0x4060 0034 UDCCS9 UDC Endpoint 9 (OUT) Control/Status Register 0x4060 0038 UDCCS10 UDC Endpoint 10 (Interrupt) Control/Status Register 0x4060 003C UDCCS11 UDC Endpoint 11 (IN) Control/Status Register 0x4060 0040 UDCCS12 UDC Endpoint 12 (OUT) Control/Status Register 0x4060 0044 UDCCS13 UDC Endpoint 13 (IN) Control/Status Register 0x4060 0048 UDCCS14 UDC Endpoint 14 (OUT) Control/Status Register 0x4060 004C UDCCS15 UDC Endpoint 15 (Interrupt) Control/Status Register 0x4060 0060 UFNRH UDC Frame Number Register High 0x4060 0064 UFNRL UDC Frame Number Register Low 0x4060 0068 UBCR2 UDC Byte Count Register 2 0x4060 006C UBCR4 UDC Byte Count Register 4 0x4060 0070 UBCR7 UDC Byte Count Register 7 0x4060 0074 UBCR9 UDC Byte Count Register 9 0x4060 0078 UBCR12 UDC Byte Count Register 12 0x4060 007C UBCR14 UDC Byte Count Register 14 0x4060 0080 UDDR0 UDC Endpoint 0 Data Register 0x4060 0100 UDDR1 UDC Endpoint 1 Data Register 0x4060 0180 UDDR2 UDC Endpoint 2 Data Register 0x4060 0200 UDDR3 UDC Endpoint 3 Data Register 0x4060 0400 UDDR4 UDC Endpoint 4 Data Register 0x4060 00A0 UDDR5 UDC Endpoint 5 Data Register 0x4060 0600 UDDR6 UDC Endpoint 6 Data Register 0x4060 0680 UDDR7 UDC Endpoint 7 Data Register 0x4060 0700 UDDR8 UDC Endpoint 8 Data Register 0x4060 0900 UDDR9 UDC Endpoint 9 Data Register 0x4060 00C0 UDDR10 UDC Endpoint 10 Data Register 0x4060 0B00 UDDR11 UDC Endpoint 11 Data Register 0x4060 0B80 UDDR12 UDC Endpoint 12 Data Register 0x4060 0C00 UDDR13 UDC Endpoint 13 Data Register 0x4060 0E00 UDDR14 UDC Endpoint 14 Data Register 0x4060 00E0 UDDR15 UDC Endpoint 15 Data Register 0x4060 0050 UICR0 UDC Interrupt Control Register 0 0x4060 0054 UICR1 UDC Interrupt Control Register 1 0x4060 0058 USIR0 UDC Status Interrupt Register 0 0x4060 005C USIR1 UDC Status Interrupt Register 1 Standard UART 0x4070 0000 0x4070 0000 STRBR Receive Buffer Register (read only) 0x4070 0000 STTHR Transmit Holding Register (write only) 0x4070 0004 STIER Interrupt Enable Register (read/write) 0x4070 0008 STIIR Interrupt ID Register (read only) 0x4070 0008 STFCR FIFO Control Register (write only) 0x4070 000C STLCR Line Control Register (read/write) 0x4070 0010 STMCR Modem Control Register (read/write) 0x4070 0014 STLSR Line Status Register (read only) 0x4070 0018 STMSR Reserved 0x4070 001C STSPR Scratch Pad Register (read/write) 0x4070 0020 STISR Infrared Selection Register (read/write) 0x4070 0000 STDLL Divisor Latch Low Register (DLAB = 1) (read/write) 0x4070 0004 STDLH Divisor Latch High Register (DLAB = 1) (read/write) ICP 0x4080 0000 0x4080 0000 ICCR0 ICP Control Register 0 0x4080 0004 ICCR1 ICP Control Register 1 0x4080 0008 ICCR2 ICP Control Register 2 0x4080 000C ICDR ICP Data Register 0x4080 0010 - Reserved 0x4080 0014 ICSR0 ICP Status Register 0 0x4080 0018 ICSR1 ICP Status Register 1 RTC 0x4090 0000 0x4090 0000 RCNR RTC Count Register 0x4090 0004 RTAR RTC Alarm Register 0x4090 0008 RTSR RTC Status Register 0x4090 000C RTTR RTC Timer Trim Register OS Timer 0x40A0 0000 0x40A0 0000 OSMR<0> OS Timer Match Registers<0> 0x40A0 0004 OSMR<1> OS Timer Match Registers<1> 0x40A0 0008 OSMR<2> OS Timer Match Registers<2> 0x40A0 000C OSMR<3> OS Timer Match Registers<3> 0x40A0 0010 OSCR OS Timer Counter Register 0x40A0 0014 OSSR OS Timer Status Register 0x40A0 0018 OWER OS Timer Watchdog Enable Register 0x40A0 001C OIER OS Timer Interrupt Enable Register PWM 0 0x40B0 0000 0x40B0 0000 PWM_CTRL0 PWM 0 Control Register 0x40B0 0004 PWM_PWDUTY0 PWM 0 Duty Cycle Register 0x40B0 0008 PWM_PERVAL0 PWM 0 Period Control Register PWM 1 0x40C0 0000 0x40C0 0000 PWM_CTRL1 PWM 1Control Register 0x40C0 0004 PWM_PWDUTY1 PWM 1 Duty Cycle Register 0x40C0 0008 PWM_PERVAL1 PWM 1 Period Control Register Interrupt Control 0x40D0 0000 0x40D0 0000 ICIP Interrupt Controller IRQ Pending Register 0x40D0 0004 ICMR Interrupt Controller Mask Register 0x40D0 0008 ICLR Interrupt Controller Level Register 0x40D0 000C ICFP Interrupt Controller FIQ Pending Register 0x40D0 0010 ICPR Interrupt Controller Pending Register 0x40D0 0014 ICCR Interrupt Controller Control Register GPIO 0x40E0 0000 0x40E0 0000 GPLR0 GPIO Pin-Level Register GPIO<31:0> 0x40E0 0004 GPLR1 GPIO Pin-Level Register GPIO<63:32> 0x40E0 0008 GPLR2 GPIO Pin-Level Register GPIO<80:64> 0x40E0 000C GPDR0 GPIO Pin Direction Register GPIO<31:0> 0x40E0 0010 GPDR1 GPIO Pin Direction Register GPIO<63:32> 0x40E0 0014 GPDR2 GPIO Pin Direction Register GPIO<80:64> 0x40E0 0018 GPSR0 GPIO Pin Direction Register GPIO<31:0> 0x40E0 001C GPSR1 GPIO Pin Output Set Register GPIO<63:32> 0x40E0 0020 GPSR2 GPIO Pin Output Set Register GPIO<80:64> 0x40E0 0024 GPCR0 GPIO Pin Output Clear Register GPIO<31:0> 0x40E0 0028 GPCR1 GPIO Pin Output Clear Register GPIO <63:32> 0x40E0 002C GPCR2 GPIO Pin Output Clear Register GPIO <80:64> 0x40E0 0030 GRER0 GPIO Rising-Edge Detect Register GPIO<31:0> 0x40E0 0034 GRER1 GPIO Rising-Edge Detect Register GPIO<63:32> 0x40E0 0038 GRER2 GPIO Rising-Edge Detect Register GPIO<80:64> 0x40E0 003C GFER0 GPIO Falling-Edge Detect Register GPIO<31:0> 0x40E0 0040 GFER1 GPIO Falling-Edge Detect Register GPIO<63:32> 0x40E0 0044 GFER2 GPIO Falling-Edge Detect Register GPIO<80:64> 0x40E0 0048 GEDR0 GPIO Edge Detect Status Register GPIO<31:0> 0x40E0 004C GEDR1 GPIO Edge Detect Status Register GPIO<63:32> 0x40E0 0050 GEDR2 GPIO Edge Detect Status Register GPIO<80:64> 0x40E0 0054 GAFR0_L GPIO Alternate Function Select Register GPIO<15:0> 0x40E0 0058 GAFR0_U GPIO Alternate Function Select Register GPIO<31:16> 0x40E0 005C GAFR1_L GPIO Alternate Function Select Register GPIO<47:32> 0x40E0 0060 GAFR1_U GPIO Alternate Function Select Register GPIO<63:48> 0x40E0 0064 GAFR2_L GPIO Alternate Function Select Register GPIO<79:64> 0x40E0 0068 GAFR2_U GPIO Alternate Function Select Register GPIO 80 Power Manager and Reset Control 0x40F0 0000 0x40F0 0000 PMCR Power Manager Control Register 0x40F0 0004 PSSR Power Manager Sleep Status Register 0x40F0 0008 PSPR Power Manager Scratch Pad Register 0x40F0 000C PWER Power Manager Wake-up Enable Register 0x40F0 0010 PRER Power Manager GPIO Rising-Edge Detect Enable Register 0x40F0 0014 PFER Power Manager GPIO Falling-Edge Detect Enable Register 0x40F0 0018 PEDR Power Manager GPIO Edge Detect Status Register 0x40F0 001C PCFR Power Manager General Configuration Register 0x40F0 0020 PGSR0 Power Manager GPIO Sleep State Register for GP[31-0] 0x40F0 0024 PGSR1 Power Manager GPIO Sleep State Register for GP[63-32] 0x40F0 0028 PGSR2 Power Manager GPIO Sleep State Register for GP[84-64] 0x40F0 002C - Reserved 0x40F0 0030 RCSR Reset Controller Status Register SSP 0x4100 0000 0x4100 0000 SSCR0 SSP Control Register 0 0x4100 0004 SSCR1 SSP Control Register 1 0x4100 0008 SSSR SSP Status Register 0x4100 000C SSITR SSP Interrupt Test Register 0x4100 0010 SSDR (Write / Read) SSP Data Write Register/SSP Data Read Register MMC Controller 0x4110 0000 0x4110 0000 MMC_STRPCL Control to start and stop MMC clock 0x4110 0004 MMC_STAT MMC Status Register (read only) 0x4110 0008 MMC_CLKRT MMC clock rate 0x4110 000C MMC_SPI SPI mode control bits 0x4110 0010 MMC_CMDAT Command/response/data sequence control 0x4110 0014 MMC_RESTO Expected response time out 0x4110 0018 MMC_RDTO Expected data read time out 0x4110 001C MMC_BLKLEN Block length of data transaction 0x4110 0020 MMC_NOB Number of blocks, for block mode 0x4110 0024 MMC_PRTBUF Partial MMC_TXFIFO FIFO written 0x4110 0028 MMC_I_MASK Interrupt Mask 0x4110 002C MMC_I_REG Interrupt Register (read only) 0x4110 0030 MMC_CMD Index of current command 0x4110 0034 MMC_ARGH MSW part of the current command argument 0x4110 0038 MMC_ARGL LSW part of the current command argument 0x4110 003C MMC_RES Response FIFO (read only) 0x4110 0040 MMC_RXFIFO Receive FIFO (read only) 0x4110 0044 MMC_TXFIFO Transmit FIFO (write only) Clocks Manager 0x4130 0000 0x4130 0000 CCCR Core Clock Configuration Register 0x4130 0004 CKEN Clock Enable Register 0x4130 0008 OSCC Oscillator Configuration Register LCD Controller 0x4400 0000 0x4400 0000 LCCR0 LCD Controller Control Register 0 0x4400 0004 LCCR1 LCD Controller Control Register 1 0x4400 0008 LCCR2 LCD Controller Control Register 2 0x4400 000C LCCR3 LCD Controller Control Register 3 0x4400 0200 FDADR0 DMA Channel 0 Frame Descriptor Address Register 0x4400 0204 FSADR0 DMA Channel 0 Frame Source Address Register 0x4400 0208 FIDR0 DMA Channel 0 Frame ID Register 0x4400 020C LDCMD0 DMA Channel 0 Command Register 0x4400 0210 FDADR1 DMA Channel 1 Frame Descriptor Address Register 0x4400 0214 FSADR1 DMA Channel 1 Frame Source Address Register 0x4400 0218 FIDR1 DMA Channel 1 Frame ID Register 0x4400 021C LDCMD1 DMA Channel 1 Command Register 0x4400 0020 FBR0 DMA Channel 0 Frame Branch Register 0x4400 0024 FBR1 DMA Channel 1 Frame Branch Register 0x4400 0038 LCSR LCD Controller Status Register 0x4400 003C LIIDR LCD Controller Interrupt ID Register 0x4400 0040 TRGBR TMED RGB Seed Register 0x4400 0044 TCR TMED Control Register Memory Controller 0x4800 0000 0x4800 0000 MDCNFG SDRAM Configuration Register 0 0x4800 0004 MDREFR SDRAM Refresh Control Register 0x4800 0008 MSC0 Static Memory Control Register 0 0x4800 000C MSC1 Static Memory Control Register 1 0x4800 0010 MSC2 Static Memory Control Register 2 0x4800 0014 MECR Expansion Memory (PCMCIA/Compact Flash) Bus Configuration Register 0x4800 001C SXCNFG Synchronous Static Memory Control Register 0x4800 0024 SXMRS MRS value to be written to SMROM 0x4800 0028 MCMEM0 Card interface Common Memory Space Socket 0 Timing Configuration 0x4800 002C MCMEM1 Card interface Common Memory Space Socket 1 Timing Configuration 0x4800 0030 MCATT0 Card interface Attribute Space Socket 0 Timing Configuration 0x4800 0034 MCATT1 Card interface Attribute Space Socket 1 Timing Configuration 0x4800 0038 MCIO0 Card interface I/O Space Socket 0 Timing Configuration 0x4800 003C MCIO1 Card interface I/O Space Socket 1 Timing Configuration 0x4800 0040 MDMRS MRS value to be written to SDRAM 0x4800 0044 BOOT_DEF Read-Only Boot-Time Register. Contains BOOT_SEL and PKG_SEL values.

PXA255 Processor(2) Micro-architecture Execution Core IRQ FIQ BTB Branch Target Buffer Trace Buffer Instruction Cache 32KBytes Data Cache 32 KBytes Mini D-Cache 2 KBytes MMU Write Buffer System Management Debug JTAG CP0 Multiplier / Accumulator CP 15 Config Registers CP 14 Performance Monitoring IRQ FIQ Interrupt Request Coprocessor Interface Instruction Execution Core Data Address Core Memory Bus Mini I-Cache 2 KBytes BTB 128 entry, direct mapped cache brnach명령의 주소 brnach명령과 관계된 Target주소 a previous history of the branch being taken or not taken history의 내용은 4개 중에 하나로 구성되어 있다. strongly taken/weakly taken/weakly not-taken/strongly not-taken 사용여부는 CP15 C1에 의해서 결정된다. 성공적으로 예측된 branch는 superpipeline에서의 branch-latency penalty를 회피한다. 실패한 예측 branch의 결과는 4-5 branch-latency penalty를 초래한다. 현재의 instruction address를 갖져온다. 그리고 tag를 취하기 위해서 현재의 어드레스에서 [8:2]를 취한다. 그리고 instruction address와 tag값을 비교한다. 비교하는 비트는 [31:9,1]이다. 주소가 일치하고 history bit가 이전에 자주 사용되어졌던 분기 주소이면, BTB는 다음의 instruction address로 간주하고 data(target address)를 전송한다. Update policy - branch instruction이 실행된 경우 - the branch was taken - 현재 branch가 BTB에 없는 경우 BTB Control Disabling/Enabling Reset시 : disable Enable : CP15 C1.11(Z) = 1 Invalidation - reset - CP15 C7에서 BTB를 invalidate한 경우(7-11) - Processo ID register에 값이 쓰여진 경우 - CP15 C7을 통해서 instruction cache가 invalidate한 경우

PXA255 Processor(3) XScale Core Architecture Features Instruction Cache 32 Kbytes 32 Ways Lockable by line Micro- Processor 7 Stage pipeline Data Cache Max 32 Kbytes 32 ways WR-Back or WR-Through Hit under miss Debug Hardware Breakpoints Branch History Table MAC Single cycle Throughput (16*32) 16-bit SIMD 40-bit accumulator Power Mgnt Ctrl Write Buffer 8 entries Full coalescing JTAG Data Ram Max 28 Kbytes Re-map of data cache Branch Target Buffer 128 entries IMMU 32 entry TLB Fully associative Lockable by entry DMMU 32 entry TLB Fully associative Lockable by entry Fill Buffer 4~8 entries Performance Monitoring Mini-Data Cache 2 Kbytes 2 ways MAC - Memory Access Controller The Intel® XScale™ microarchitecture provides these features: • ARM* Architecture Version 5TE ISA compliant. — ARM* Thumb Instruction Support — ARM* DSP Enhanced Instructions • Low power consumption and high performance • Intel® Media Processing Technology — Enhanced 16-bit Multiply — 40-bit Accumulator • 32-KByte Instruction Cache • 32-KByte Data Cache • 2-KByte Mini Data Cache • 2-KByte Mini Instruction Cache • Instruction and Data Memory Management Units • Branch Target Buffer • Debug Capability via JTAG Port ASSP Application Specific Standard Product. API Application Programming Interface. BTB Branch Target Buffer TLB Translation Look-aside Buffer. Coalescing 기존 저장 오퍼레이션과 함께 새로운 저장 오퍼레이션을 같이 가지고 오는것을 의미한다.

Register File Operand Shifter PXA255 Processor(4) XScale Core 32Bit RISC 32Bit registers 32Bit instructions Longword aligned 32Bit datapaths 7~8 stage pipeline ALU Execute Register File Operand Shifter Instruction Fetch1 Write Back State Execute PC PC - 12 PC - 16 Instruction Fetch2 PC - 4 Instruction Decode PC - 8 Data Cache Access Data Cache Writeback Multiplier Stage1 Multiplier Stage X Multiplier Stage2 F1 RF X1 F2 ID M1 M2 Mx X2 XWB DWB D2 D1 MAC pipeline Main execution pipeline Memory pipeline

ARM HOST BUS, ARM SYSTEM BUS PXA255 Processor(5) Advanced Microcontroller Bus Architecture CPU버스 : A, B, ALU BUS로 구성 Arbiter TIC EBI ARM Bus I/F Bridge On-chip RAM Decoder Timer Remap / Pause Interrupt Controller Reset External ROM AHB or ASB APB Slow Peripherals ARM HOST BUS, ARM SYSTEM BUS 프로세서 내부에 내장된 고속 장치 연결 ARM PERIPHERAL BUS 저속으로 동작하는 장치 연결

PXA255 Processor(6) Memory Model MMU On-chip Caches Core Memory Virtual Addresses Physical Addresses Buffers Controller PX255 Processor

PXA255 Processor(7) PXA255 BUS Reads PXA255 Cache line fills read 8 words Read Allocate Round robin replacement Half Core Clock Core Clock System Memory PXA255 D[0:31] Instruction hit 32KB I- Cache PC Instructions & Data miss A[0:25] Memory Controller A[0:31] Read Buffer A[0:31] I-MMU VA[0:31] XScale Core D[0:31] D[0:31] A[0:31] D-MMU VA[0:31] External Bus System Bus miss 32 bytes 32KB D-Cache Addr hit D[0:31] Data

PXA255 Processor(8) PXA255 PXA255 BUS Writes No wirte to I-Cache Write Back D-Cache Software coherency needed between caches Not write allocate System Memory Half Core Clock Core Clock PXA255 A[0:31] D-MMU VA[0:31] Data Dirty Bits XScale Core A[0:25] A[0:31] Write Buffer (8entries) Memory Controller 32 bytes D[0:31] D[0:31] 32KB D-Cache External Bus System Bus Addr Data D[0:31]

What happens on a write? Write through—The information is written to both the block in the cache and to the block in the lower-level memory. Write back—The information is written only to the block in the cache. The modified cache block is written to main memory only when it is replaced. is block clean or dirty? Pros and Cons of each? WT: read misses cannot result in writes WB: no writes of repeated writes WT always combined with write buffers so that don’t wait for lower level memory

Write Buffer for Write Through Processor Cache Write Buffer DRAM A Write Buffer is needed between the Cache and Memory Processor: writes data into the cache and the write buffer Memory controller: write contents of the buffer to memory Write buffer is just a FIFO: Write Buffer는 쓰는 경우의 성능 향상을 위해 존재 캐시 메모리는 명령어, 데이터를 읽을 경우의 성능 향상 CPU가 쓰기 동작을 하는 동안에도 다른 처리를 계속할 수 있도록, 주소와 데이터가 write buffer에 저장 버스의 사용권한이 write buffer에 주어지면 외부장치에 write You are right, memory is too slow. We really didn't writ e to the memory directly. We are writing to a write buffer. Once the data is written into the write buffer and assuming a cache hit, the CPU is done with the write. The memory controller will then move the write buffer’s contents to the real memory behind the scene. The write buffer works as long as the frequency of store is not too high. Notice here, I am referring to the frequency with respect to time, not with respect to number of instructions. Remember the DRAM cycle time we talked about last time. It sets the upper limit on how frequent you can write to the main memory. If the store are too close together or the CPU time is so much faster than the DRAM cycle time, you can end up overflowing the write buffer and the CPU must stop and wait. +2 = 60 min. (Y:40)

Cache organization DATA RAM 32set 8word 4byte 1 2 3 4 5 6 7 Tag CAM 31 12 11 10 9 8 7 6 5 4 3 2 1 Line offset Virtual address Tag Index 32set 8word 4byte DATA RAM 1 2 3 4 5 6 7 31 30 29 28 32 lines Fully Associativety 가 16개 중복됨 1line : 8word : 32byte 1set : 32line : 256 word : 1024 byte 동일한 하위번지 가지고 있는 address의 집합 – 512배수로 됨 I or D cache : 16set : 512line : 4096 word : 16Kbyte 데이터가 연속인 주소 공간에 있는 경우 각각 set의 동일한 line이 연속적인 주소 공간임 Index를 통해 16개의 set중에 해당하는 set을 선택한다. 해당하는 set에 선택한 TAG의 값과 일치하는 값이 있으면 hit 이므로 해당하는 값을 데이터 라인에 쓰거나 읽는다. Miss인 경우는 메모리로부터 데이터를 가지고 온다. Index가 많을 수록 연속된 데이터를 가질 확률이 높다. Tag CAM To From CPU 3 2 1

PXA255 - 명령어 캐시 명령어 캐시(Instruction Cache) 32KB Instruction Cache 1024 lines of 32bytes(8words) Uses the virtual address 32-way 32-set associative Round-Robin replacement Mapped via MMU page C bits MMU가 enable 되었을 경우에는 memory management table에 있는 C비트에 의해서 제어된다. MMU가 disable 되었을 경우에는 모든 어드레스에 대하여 C=1인 된다. C=1 또는 MMU가 disable 상태인 경우 miss인 경우 8word의 line fetch가 수행이 되어 Round-robin replacement에 의해서 Cache bank가 대치된다. MMU가 enable되고 C=0인 경우에는 virtual address에 해당하는 외부 메모리로부터 single word를 읽어오고, cache에 쓰여지지 않는다. Instructions IMMU 32 Kbytes I-cache XScale Core PC Main D-cache & Mini-D-cache DMMU Address Data

PXA255 - 데이터 캐시 데이터 캐시(Data Caches) Two Data Caches(Main Data Cache, Mini Data Cache) Both: writeback, read allocate, virtual Mapped via MMU page B, C bits Main Data Cache, 32KB 32-way 32-set associative Round-Robin replacement B=1 & C=1 Mini Data Cache, 2KB 2-way set associative Least Recently Used(LRU) replacement B=0 & C=1 Instructions IMMU 16 Kbyte I-cache XScale Core PC Main D-cache & Mini-D-cache DMMU Address Data

PXA255 - Read Buffer PXA255 Read Buffer Data prefetcher saves processor waiting load & calculate in parallel for Read-Only data supplements the data cache Under software control Coprocessor 15, register #9 4 entries, 32 bytes each Loads of 1, 4, 8 words Replace or invalidate data Instructions I-cache XScale Core PC D-cache & mini-D-cache Address Data Write Buffer 128 Byte Read Buffer System Bus

Physical Addresses Space Virtual Addresses Space PXA255 Memory Management Physical Addresses Space Virtual Addresses Space MMU System Memory PXA255 ITLB I-CACHE 32 C A PA VA XScale Core Instructions DTLB D-CACHE B C A PA VA Data TLB Miss Translation Table Base Register Descriptors Coprocessor에 의한 MMU 지원

PXA255 Processor - CP15 CP15 Register structure Register Purpose ID Register 1 Control 2 Translation Table Base 3 Domain Access Control 5 Fault Staus 6 Fault Address 7 Cache Operations 8 TLB Operations 9 Read Buffer Operations 10 TLB lockdown 13 Process ID Mapping 14 Debug Support 15 Test & Clock Control 4,11~12 UNUSED

PXA255 CoProcessor CP15 register C0 C1 C2 C3 C5 C6 C13 C0 >> 16..23 : Intel Manufacturer 4..15 : Part Number C1 >> M bit0 MMU 0 – On-chip memory-management unit disabled 1 – On-chip memory-management unit enabled A bit1 Address fault 0 – Alignment fault disabled 1 – Alignment fault enable C bit2 D-Cache 0 – Data cache disabled 1 – Data cache enabled W bit3 Write Buffer 0 – Write buffer disabled 1 – Write buffer enabled P bit4 32-bit/26-bit exception handlers. should always be 1. D bit5 32-bit/26-bit Data address range. should always be 1. L bit6 Inplementation defined. should always be 1. B bit7 Little – Bigendian 0 – Little endian operation 1 – Big endian operation S bit8 System R bit9 ROM F bit10 ; Z bit11 ; I bit12 I-Cache 0 – Instruction cache disabled 1 – Instruction cache enabled V bit13 Virtual interrupt vector adjust 0 – Base address of interrupt vectors is 0x0000_0000 1 – Base address of interrupt vectors is 0xFFFF_0000 RR bit14 ; C2 >> 14..31 ; MMU 가 사용하는 Page table 의 Base 번지 C3 >> Domain Register - 메모리접근권한 Manager or Client C5 >> Fault Status C6 >> Fault address / Page Fault 상태와 주소 C6 C13

PXA255 Processor - SCM System Control Module Power management controller Supporting normal, idle and sleep modes 81 general purpose I/O ports Generate FIQ, IRQ, “wakeup” interrupts Interrupt controller Routes all system (GPIOs, LCD, Serial Channel) interrupts to either IRQ or FIQ Multi-channel DMA controller Software programmable to any serial port and LCD Supporting External DMA Real time clock and timer 32 bit counter/comparator 32.7 kHz crystal - accuracy +/- 5 sec/month OS timer with alarm register 3.68 MHz crystal - fine grain timing interrupts

PXA255 - running mode Summary of running mode of PXA255 Power on, nRESET asserted HARDWARE RESET nRESET negated nRESET asserted nRESET asserted nRESET asserted RUN Wait for interrupt instruction Force sleep bit set, or VDD or battery fault pins asserted System or peripheral unit interrupt GPIO or RTC alarm interrupt IDLE SLEEP VDD or battery fault pins asserted CPU clock held low, all other resources active, wait for interrupt Wait for wake-up event

PXA255 Processor - GPIO General Purpose I/O GPIO[58:73] = dual panel color or 16 bit parallel input on LCD GPIO[23:27] = SPI if both synchronous serial protocols are required in a single system Modem control signals for UART (CTS, RTS, CD, etc) implemented via GPIO signals 4-5 GPIOs required for full PCMCIA support 3 GPIOs required for Intel® SA-1111 Interface

PXA255 General Purose I/O Block Diagram Pin Direction Register(GPDR) Alternate Function Register(GAFR) Pin Set Registers(GPSR) Edge Detect Status Register(GEDR) Rising Edge Detect Enable Register(GRER) Falling Edge Detect Enable Register(GFER) Edge Detect Pin-Level Register(GPLR) 1 Alternate Function (Output) Alternate Function (Input) Pin Clear Registers(GPCR) 2 3 Power Manager Sleep Wake-up logic 0x40E0_000C/10/14 GPDR 1 : 출력 0 : 입력 0x40E0_0054/58/5C 0x40E0_0060/64/68 Base Address 0x40E0_0000 0x40E0_0048/4C/50 0x40E0_0030/34/38 0x40E0_003C/40/44 0x40E0_0000/04/08 GPIO 는 총 84개가 존재한다. Intel PXA255 Developer’s Manual Page 4-1. 참조 GPDR - GPIO Pin Direction Registers (GPDR0, GPDR1, GPDR2) GAFR - GPIO Alternate Function Register (GAFR0_L, GAFR0_U, GAFR1_L,GAFR1_U,GAFR2_L,GAFR2_U) GPSR - GPIO Pin Output Set Register GPCR - GPIO Pin Output Clear Register GEDR - GPIO Edge Detect Status Register GRER - GPIO Rising Edge Detect Enable Register GFER - GPIO Falling Edge Detect Enable Register GPLR - GPIO Pin Level Register

PXA255 Interrupt controller Level Register(ICLR) All Other Qualified interrupt Bits 0 : IRQ 1 : FIQ 40D0 0008 XScale CORE 23 23 CCR[DIM]=0 & IDLE mode=‘1’ Interrupt Controller Mask Register (ICMR) 40D0 0004 FIQ Interrupt Source Bit CPSR.6(F) Interrupt Controller Pending Register (ICPR) 40D0 0010 IRQ Interrupt Controller IRQ Pending Register (ICIP) CPSR.7(I) 40D0 0000 Intel PXA255 Developer’s Manual Page 4-20. 참조 ICIP - Interrupt Controller IRQ Pending register ICFP - Interrupt Controller FIQ Pending register ICPR - Interrupt Controller Pending register ICMR - Interrupt Controller Mask register ICLR - Interrupt Controller Level register ICCR - Interrupt Controller Control register Interrupt Controller FIQ Pending Register (ICFP) 40D0 000C 40D0 0014 : Interrupt controller control register (ICCR) ICCR.0 : disable idle mask(DIM)

Universal Serial Bus USB: Standard used for device/peripheral interconnect in PC market. Intel® PXA250 is Client not Hub Differential signaling Half-duplex Individual bits encoded with NRZI Bit stuffing keeps receiver synchronized Hand-held use USB to synchronize to a desktop PC USB UDC+ UDC- 리눅스에서는 이 모드를 통해 인터럽트 핸들러 구현 fiq안씀 => 수퍼바이저모드로 바꿈..

DMAC Block Diagram Memory Controller DMA Controller DSCR 0 DDADR 0 System Bus(internal) Control Register DMA Controller 16 DMA Channels Channel 15 DSCR 0 DREQ[1:0] (external) Channel 0 DDADR 0 DMA_IRQ (internal) DSADR 0 DRCMR 0 DTADR 0 디바이스가 CPU를 거치지 않고 직접 메모리를 읽고 쓸 수 있는 방법 리눅스에서는 이 모드를 통해 인터럽트 핸들러 구현 fiq안씀 => 수퍼바이저모드로 바꿈.. DCMD 0 PREQ[37:0] (internal) DINT Peripheral Bus (internal)

Serial Infrared Datalink IrDA: Infrared Data Association Standard v1.1 www.irda.org, 150 members including Digital HP-SIR at 115kbps and 4PPM at 4Mbps UART datastream divided by 16 Pulse then fed to IR transceiver 4PPM encodes 2 data bits at a time Each period divided into 4-125ns time periods 125ns pulse, period 1 represents 00; period 2 represents 01, etc Loopback for diagnostics HandHelds talk IrDA with Laptops, PDAs & Printers IrDA or UART RXD 2 TXD 2 리눅스에서는 이 모드를 통해 인터럽트 핸들러 구현 fiq안씀 => 수퍼바이저모드로 바꿈..

UART Universal Asynchronous Receiver/Transmitter UART: RS-232, Infamous PC ‘Com’ ports Operates to 230 Kbits/s Level shifters needed for 5V logic (TTL) Loopback for diagnostics Data is byte wide if DMA used HandHelds talk RS232 for synchronization, communication, keyboard I/O, software loading, etc Primary debug connection for ARM Software Development Toolset UART RXD 3 TXD 3 리눅스에서는 이 모드를 통해 인터럽트 핸들러 구현 fiq안씀 => 수퍼바이저모드로 바꿈..

PXA255 - H/W Interface(1) RESET(EMPOS II 예) uP Reset Circuit MAX811T Voltage Monitor ( 3V~3.15 ) Manual Reset Input ( Push button – “Low” ) Multi-ICE Reset Reset Output to Flash PXA255 RESET_IN RESET_OUT MR RESET MAX811T 3 1 5 7 JTAG_RST JTAG PORT J20

PXA255 - H/W Interface(2) Flash memory 3Volt Intel Strata Flash - 28F128 32Bit Data Bus Size : 32MByte -128Mbit (16Mbyte) * 2 EA MSC0 - Static Chip Select 0 (Bank 0) Base Address = 0x0000_0000 PXA255 Memory Controller Interface ADDR [10..23] DATA [0..32] Flash 16Bit Low 16Bit High D[0..15] D[16..31] CS0 RESET OE MSC0 Register => 0x4800_0008

PXA255 - H/W Interface(3) Static RAM (SRAM) Samsung K6R4016V1C 3Volt High-Speed CMOS Static RAM 32Bit Data Bus / 1Mbyte MSC1 - Static Chip Select 3 (Bank 3) Base Address = 0x0C00_0000 PXA255 Memory Controller Interface ADDR [10..23] DATA [0..32] SRAM 16Bit Low 16Bit High D[0..15] D[16..31] DQM[0..1] DQM[2..3] CS3 WE OE MSC1 Register => 0x4800_000C

PXA255 - H/W Interface(4) SDRAM (SDRAM) PXA255 Samsung Synchronous DRAM - K4S561632 32Bit Data Bus 256Mbit - 4M x 16Bit x 4 Bank Size : 64MByte -256Mbit (32Mbyte) * 2 EA SDRAM Bank 0 - Dynamic Memory Base Address = 0xA000_0000 PXA255 Memory Controller Interface ADDR [10..24] DATA [0..32] SDRAM 16Bit Low SRAM 16Bit High D[0..15] D[16..31] DQM[0..1] DQM[2..3] nSDCS0 WE RAS/CAS SDCLK1/SDCKE1 MDCNFG -> 0x4800_0000 MDREFR -> 0x4800_0004 MDMRS -> 0x4800_0040

PXA255 - H/W Interface(5) PCMCIA / CF PXA255 SOCKET 0 D[15:0] D[15:0] DIR OE# nPIOR nPOE A(25:0) OE# WE# IOR# IOW# REG# MA(25:0) nPWE nPIOW nPREG nPCE(1:2) CE(1:2)# nPWAIT nPIOS16 WAIT# IOIS16# SOCKET 1 D[15:0] DIR OE# GPIO(7) CD1# CD2# GPIO(12) CD1# CD2# GPIO(11) RDY/BSY# GPIO(10) RDY/BSY# PSKTSEL

PXA255 - H/W Interface(6) PS2 Keyboard / Mouse Holtek HT6542B 8Bit Data Bus 8MHz Operating Support PS/2 compatible mouse PXA255 HT6542B MD(31:0) D(7:0) DIR OE# DIR OE# KBCO KBCI RD_nWR KBDO HT6542_CS KBDI KEYBOARD Address nCS1 Decoder nCS2 CS# MSCO nCS3 nCS4 MSCI DQ RESET# MSDO MSDI MOUSE MA(25:0) A0 nOE RD# nPWE WR# GPIO(19) KB_INT GPIO(9) MS_INT

PXA255 - H/W Interface(7) PXA255 Audio Codec Cirrus Logic CS4202 AC’97 2.2 Compliant 20-bit Stero D/A Converters 18-bit Stero A/D Converters MIC Input / Headphone Output PXA255 AC’97 Controller Unit (ACUNIT) nACRESET CS4202 AC’97 Primary CODEC SDATA_OUT SYNC(48 kHz) SDATA_IN_0 BITCLK(12.288MHz Intel PXA255 Developer’s Manual Page 13-1. Chapter 13 참조 AC97 0x4050 0000 0x4050 0000 POCR PCM Out Control Register 0x4050 0004 PICR PCM In Control Register 0x4050 0008 MCCR MIC In Control Register 0x4050 000C GCR Global Control Register 0x4050 0010 POSR PCM Out Status Register 0x4050 0014 PISR PCM In Status Register 0x4050 0018 MCSR MIC In Status Register 0x4050 001C GSR Global Status Register 0x4050 0020 CAR CODEC Access Register 0x4050 0024 through 0x4050 003C - Reserved 0x4050 0040 PCDR PCM FIFO Data Register 0x4050 0044 through 0x4050 005C - Reserved 0x4050 0060 MCDR MIC-in FIFO Data Register 0x4050 0064 through 0x4050 00FC - Reserved 0x4050 0100 MOCR Modem Out Control Register 0x4050 0104 - Reserved 0x4050 0108 MICR Modem In Control Register 0x4050 010C - Reserved 0x4050 0110 MOSR Modem Out Status Register 0x4050 0114 - Reserved 0x4050 0118 MISR Modem In Status Register 0x4050 011C through 0x4050 013C - Reserved 0x4050 0140 MODR Modem FIFO Data Register 0x4050 0144 through 0x4050 01FC - Reserved 0x4050 0200 through 0x4050 02FC - Primary Audio codec Registers 0x4050 0300 through 0x4050 03FC - Secondary Audio codec Registers 0x4050 0400 through 0x4050 04FC - Primary Modem codec Registers 0x4050 0500 through 0x4050 05FC - Secondary Modem codec Registers

PXA255 - H/W Interface(8) Ethernet Controller PXA255 SMSC 10/100 Ethernet Single Chip LAN91C111 Internal 32Bit Wide Data Path 8Kbytes Internal Memory (Receive and Transmit FIFO Buffers) External 25MHz-output pin for an external PHY and MAC MSC0,1 - Static Chip Select 1,2 (Bank 1,2) Base Address = 0x04000_0000 (Pri) 0x0800_0000(sec) PXA255 MD(31:0) T/F Primary Ethernet Secondary Ethernet ADDR (15:2) D(31:0) DIR OE# Logic nCS1 nCS2 nCS3 nCS4 RD_nWR nPWE nOE MA(25:0) nDQM(3:0) WE# OE# A(15:2) DQM(3:0)# GPIO(0) GPIO(1) INTR0

PXA255 - H/W Interface(9) Push Switches 8Bit Read [ D0~D7 ] Base Address = 0x1050_0000

PXA255 - H/W Interface(10) Discrete LED’s 8Bit Write [ D0~D7 ] Base Address = 0x1060_0000

PXA255 - H/W Interface(11) 7 Segment LED’s 16Bit Write [ D0~D7 ] Base Address = 0x1030_0000 [ Low 2 Segment ] 0x1040_0000 [ High 2 Segment ]

PXA255 - H/W Interface(12) Character LCD 8Bit Data Write [ D0~D7 ] 3Bit Control Write [ D8~D10] Base Address = 0x1060_0000 20 Characters x 2 Lines / Backlight Type