T-kernel Tutorial (Session A) July 6, 2006 Lin Gu Dept. of Computer Science University of Virginia.

t-kernel Tutorial (Session A) July 6, 2006 lingu@cs.virginia.eduhttp://www.cs.virginia.edu/~lg6e Lin Gu Dept. of Computer Science University of Virginia

1 Outline Overview Code modification and transitions Bridging Re-visit the code modification Differentiated virtual memory

2 Application domain Targeting application systems with very- low-power, networked, and unattended operations By Duane Birkey, ecuadorphotos.tripod.com

3 Design a new OS kernel for wireless sensor networks (WSNs) supporting – OS protection – Virtual memory – Preemptive priority scheduling Without assuming traditional hardware support Goals Hardware abstraction OS t-kernel Application complexity

4 Load-time code modification The naturalized program becomes a cooperative program that supports OS protection, virtual memory, and preemptive scheduling Application Naturalized program t-kernel – Approach OS (t-kernel)

5 Naturalizer –Processes binary instructions, and generates naturalized instructions (called natin) –Page by page Paging –Storage management Dispatcher –Controls execution t-kernel – Naturalization Process Paging Naturalizer Dispatcher Nonvolatile Storage (flash, 512K) App Program Memory (128KB) RAM (4KB) μCμC

6 Source code tree Paging module Naturalizer Dispatcher Source code will be available at http://www.sf.net/projects/vert

7 Outline Overview Code modification Dispatcher and transitions Re-visit the code modification Differentiated virtual memory

8 Traditional solution: clock interrupts – Masking interrupts is privileged – Adopted by some WSN OS’s Challenge: NO privilege support – Application can turn off clock interrupts cli; disable interrupts self: rjmp self; Sensor node halts! Code modification – CPU control

9 Modified branches to transition logic –Guarantees that the kernel and the application alternatively take hold of the CPU –A-K transition – application to kernel –K-A transition – kernel to application Performance is an important factor in the design Code modification – Branch regulating ALU Inst. Branch Inst. ALU Inst. Transition logic ALU Inst. App pageNatin page

10 VPC – Virtual program counter –Generated by a compiler or a programmer HPC – Host program counter –The physical program counter on the hardware platform Code modification – Branch regulating ALU Inst. Branch Inst. ALU Inst. Transition logic ALU Inst. App pageNatin page Branch Inst. Transition logic VPC HPC VPC

11 “cli” is copied – We allow the application to turn off interrupts “rjmp” is modifed to an “rcall” – A branch helper stub (branch_stub) helps transfer the control flow to the kernel cli self: rjmp self Code modification – Branch regulating cli rcall branch_stub self (VPC) branch_stub: push r31 in r31, 0x3f cli call townGate Kernel K-A transition

12 The dispatcher retrieves the destination VPC from the natin page HPC = lookup(VPC) Code modification – Branch regulating cli rcall branch_stub self (VPC) branch_stub: push r31 in r31, 0x3f cli call townGate dispatcher Stack frame ret1 Stack frame r31 ret0

13 Problem: How to look up entry points? –Estimated index table size: 9KB –Entry table in RAM? Exceeds 4K RAM size. –In flash? Slow. Solution: In-Page Indexing t-kernel – lookup(VPC)? Transition logic Natin page 107 t-kernel Natin page 102 Transition logic ALU Inst. Entry point VPCHPC

14 Embed indexing in naturalized code Part of K-A transition performs VPC look-up logic t-kernel – In-Page Indexing

15 Minimum: 0 Bytes in RAM To enhance speed, index cache –Hash has 16 possible results t-kernel – Required RAM Hash IndexVPC Offset 6 bits Tag 8 bits3 bits Hash VPC

16 1. VPC look-aside buffer (fast) 2. Two-associative VPC table 3. Brute-force search on the natin pages (slow) t-kernel – Three-level lookup

17 VPC  lookup  HPC The dispatcher controls the execution of the application –Starts at VPC = 0 (or any start address) –Performs lookup(VPC) and sets the HPC Code modification – Branch regulating ALU Inst. Branch Inst. ALU Inst. Transition logic ALU Inst. App pageNatin page Branch Inst. Transition logic VPC HPC VPC dispatcher

19 Bridging: To accelerate execution speed, shortcut the branch source and destination Differentiate forward and backward branches – Forward: Dest VPC > Source VPC – Backward: Dest VPC <= Source VPC rjmp Next... Next: add r0, r1 Bridging rcall branch_stub Next (VPC)... add r0, r1... branch_stub: push r31 in r31, 0x3f cli call townGate dispatcher

20 The dispatcher looks up HPC for “Next” The naturalizer patches the natin page Handled by “townLogic” (town transition) Bridging – Forward branch rcall branch_stub Next (VPC)... add r0, r1... branch_stub: push r31 in r31, 0x3f cli call townGate dispatcher Stack frame ret1 Stack frame r31 ret0 jmp HPC... add r0, r1... branch_stub: push r31 in r31, 0x3f cli call townGate naturalizer HPC

21 The naturalizer patches the natin page SystemCounter is 8-bit One trap into kernel per 256 backward branches “headmaster” does sanity check in the dispatcher Bridging – Backward branch... Prev: add r0, r1... rcall branch_stub Prev (VPC)... dispatcher... add r0, r1... push r31 in r31, 0x3f push r31 lds SystemCounter inc SystemCounter sts SystemCounter brne go cli call headmaster go: pop r31 Out 0x3f, r31 pop r31 jmp HPC... naturalizer HPC

22 Accelerates execution speed for both forward and backward branches Still guarantee CPU control by the OS –No infinite loops Bridging and CPU control Transition logic Natin page 107 Natin page 102 Transition logic ALU Inst. dispatcher

24 Example: rjmp Modified by: – translateRjmp() in naturalizer.c Patched by: – townLogic() in dispatcher.c – rewritePgmPage() in naturalizer.c rjmp DEST Jump instructions rcall branch_stub DEST (VPC) branch_stub: push r31 in r31, 0x3f cli call townGate

25 Example: breq Modified by: – translateBranch() in naturalizer.c Patched by: – townLogic() in dispatcher.c – rewritePgmPage() in naturalizer.c breq DEST fall_thru:... Conventional conditional branches breq taken rcall branch_stub fall_thru (VPC) taken: rcall branch_stub DEST (VPC) branch_stub: push r31 in r31, 0x3f cli call townGate

26 Example: sbrs – Skip the next instruction if the bit in the register is set Modified by: translateSkip() in naturalizer.c Patched by: townLogic() in dispatcher.c, rewritePgmPage() in naturalizer.c sbrs r18, 1 add r0, r1 Skip instructions sbrs r18, 1 add r0, r1 Back:sbrs r18, 1 noskip: skipped: sbrs r18, 1 rjmp _noskip _skipped: rcall branch_stub skipped (VPC) _noskip: natins for

27 Add a town transition to the next VPC when a natin page is full VPC_Miss: a kernel service that handles the situation that a VPC is not services in this natin page... insn1: insn2: Page boundary... Natin(s) for rjmp nextpage jmp VPC_Miss nextpage: rcall branch_stub insn2 (VPC)

28 Bridging makes direct jumps between natin pages The incoming pages are recorded in the link-in record of the natin page Needed for invalidating direct jumps when a natin page is changed (new entry point inserted, or swapped out) page 109 Link-in and finis... last town transition incoming page 0 incoming page 1... incoming page 2 branch_stub...

29 Version No: used by the bridging Code length: number of VPCs in this natin page Start VPC: the first VPC in this natin page Link-in and finis _VPC100: rcall branch_stub VPC205... branch_stub:... Finis: version = 6. code length = 23 start VPC = VPC100 _VPC200: _VPC205:... branch_stub:... Finis: version = 7. code length = 28 start VPC = VPC200

31 Physical address sensitive memory (PASM) – Virtual/physical addresses are the same – The fastest access Stack memory – Virtual/physical addresses directly mapped – Fast access with boundary checks Heap memory – May involve a transition to kernel – The slowest, sometimes involves swapping t-kernel – Differentiated Memory Access

32 Example: lds/sts at PASM – Load/store physical address sensitive memory Not modified, runs at native speed lds r18, 0x20 sts 0x20, r18 PASM lds r18, 0x20 sts 0x20, r18

33 Example: push register, std pointer+d, register –Pointer: a register pair pointing to an address, e.g., Y = r29:r28 Modified by: translateLdd(), translateStd, translateLd, translateSt, translateLduu, ranslatestuu in naturalizer.c push r18 Stack memory area push r18 std Y+2, r18 adiw r28, 2 cpi r29, 0x10 brcs instack... instack: std Y, r18 sbiw r28, 2

34 Example: st pointer, register Modified by: translateLdd(), translateStd, translateLd, translateSt, translateLduu, ranslatestuu in naturalizer.c Kernel services involved – scall_st, scall_ld Heap memory area st Y, r18 cpi r29, 0x10 brcs instack push r31 push r29 push r30 push r28 in r30, 0x3f push r30 movw r30, r28 mov r29, r18 call scall_st pop r30 out 0x3f r30 pop r28 pop r30 pop r29 pop r31 rjmp inheap instack: st Y, r18 inheap:...

35 Virtual memory overhead (old version) t-kernel – Evaluation OperationExecution time (cycle) Physical address sensitive 2 Stack (best case)2 Stack (worst case)16 Heap (best case)15 Heap (worst case)149,815 (20.3ms) Evaluated on a 7.3827MHz Mica2 Mote

36 Traditional virtual memory: hard disks can be written infinite times (theoretically) Flash on MICA2: 10,000 erasure/write cycles WSNs often have write-unfriendly external storage –Bad swapping may destroy flash in one day (1,000 swaps/hour) t-kernel – Swapping

37 Problem: How to swap with write-unfriendly storage and small RAM? –Direct mapping  Minimizes in RAM data structure  Could destroy a flash in less than 1 day * (assume 1,000 swaps/hour) –Page table with external address  Maximizes longevity  Wear leveling  Needs 352B at minimum (8.5% of RAM) Solution: Partitioned swapping t-kernel – Swapping Flash Page # External address Flash Virtual memory space

38 Super page partition: fast swaps Overflow partition extends longevity t-kernel – Partitioned Swapping 32 Bytes in RAM, 266 days (1,000 swaps/hour), 20% fast swaps

39 It is not that one partitioning fits all Application-directed partitioning –Associativity parameter t-kernel – Partitioned Swapping

40 Balance between swapping speed and longevity An example: Associativity = 2 –266 days, 20% fast swaps (assuming 1000swaps/hr) t-kernel – Partitioned Swapping

41 Related kernel code –scall_ld, scall_st in dispatcher.c –swapRam() in paging.c Flash operations (in paging.c) –writeExtFlashChunk() –readExtFlashChunk() t-kernel – Partitioned Swapping

42 L. Gu and J. Stankovic, t-kernel: Providing Reliable OS Support for Wireless Sensor Networks, SenSys 2006 L. Gu and J. A. Stankovic. t-kernel: a Translative OS Kernel for Sensor Networks, UVA CS Tech Report CS- 2005-09, 2005 Web page under construction: http://www.cs.virginia.edu/~lg6e/tkernel/ To be available: source code in the vert repository: www.sf.net/projects/vert www.sf.net/projects/vert Comment and bug report: lingu@cs.virginia.edulingu@cs.virginia.edu Resources … to be continued

T-kernel Tutorial (Session A) July 6, 2006 Lin Gu Dept. of Computer Science University of Virginia.

Similar presentations

Presentation on theme: "T-kernel Tutorial (Session A) July 6, 2006 Lin Gu Dept. of Computer Science University of Virginia."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

T-kernel Tutorial (Session A) July 6, 2006 Lin Gu Dept. of Computer Science University of Virginia.

Similar presentations

Presentation on theme: "T-kernel Tutorial (Session A) July 6, 2006 Lin Gu Dept. of Computer Science University of Virginia."— Presentation transcript:

Similar presentations

About project

Feedback