Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Kernfach System Software WS04/05 P. Reali M. Corti.

Similar presentations


Presentation on theme: "1 Kernfach System Software WS04/05 P. Reali M. Corti."— Presentation transcript:

1 1 Kernfach System Software WS04/05 P. Reali M. Corti

2 © P. Reali / M. Corti System-Software WS 04/05 2 Introduction Admin Lecture – Mo 13-14IFW A 36 – We 10-12IFW A 36 Exercises – Always on Thursday 14-15IFW A34C. Tuduce(E) 14-15IFW C42V. Naoumov(E) 15-16IFW A32.1I. Chihaia(E) 15-16RZ F21C. Tuduce (E) 16-17IFW A34T. Frey(E) 16-17IFW A32.1K. Skoupý(E)

3 © P. Reali / M. Corti System-Software WS 04/05 3 Introduction Additional Info Internet – Homepagehttp://www.cs.inf.ethz.ch/ssw/http://www.cs.inf.ethz.ch/ssw/ – Inforumvis site Textbooks & Co. – Lecture Slides – A. Tanenbaum, Modern Operating Systems – Silberschatz / Gavin, Operating Systems Concepts – Selected articles and book chapters

4 © P. Reali / M. Corti System-Software WS 04/05 4 Introduction Exercises Exercises are optional (feel free to shoot yourself in the foot) – Weekly paper exercises test the knowledge acquired in the lecture identify troubles early exercise questions are similar to the exam ones – Monthly programming assignment feel the gap between theory and practice

5 © P. Reali / M. Corti System-Software WS 04/05 5 Introduction Exam Sometimes in March 2005 Written, 3 hours Allowed help – 2 A4 page summary – calculator Official Q&A session 2 weeks before the exam

6 © P. Reali / M. Corti System-Software WS 04/05 6 Introduction Lecture Goals Operating System Concepts – bottom-up approach – no operating system course – learn most important concepts – feel the complexity of operating systems theres no silver-bullet! Basic knowledge for other lectures / term assignments – Compilerbau – Component Software –.... – OS-related assignments

7 © P. Reali / M. Corti System-Software WS 04/05 7 Introduction What is an operating system? An operating system has two goals: Provide an abstraction of the hardware – ABI (application binary interface) – API (application programming interface) – hide details Manage resources – time and space multiplexing – resource protection

8 © P. Reali / M. Corti System-Software WS 04/05 8 Introduction Operating system target machines Targets mainframes servers multiprocessors desktops real-time systems embedded systems Different goals and requirements! memory efficiency reaction time abstraction level resources security...

9 © P. Reali / M. Corti System-Software WS 04/05 9 Introduction Memory vs. Speed Tradeoff Example: retrieve a list of names memorytime 1. ArrayNnN 2. ListN(n+4)N/2 3. Bin. TreeN(n+8)log(N) 4. Hash Table3Nn1 N = # names n = name length

10 © P. Reali / M. Corti System-Software WS 04/05 10 Introduction Operating System as resource manager... in the beginning was the hardware! Most relevant resources: CPU Memory Storage Network

11 © P. Reali / M. Corti System-Software WS 04/05 11 Introduction Lecture Topics MemoryCPUNetwork Abstraction level Disk Scheduling Virtual Memory Demand Paging Thread Process Coroutine Memory Management Garbage Collection Concurrency Support File System Object-Oriented Runtime Support Distributed File-System Distributed Object-System Virtual Machine Runtime support

12 © P. Reali / M. Corti System-Software WS 04/05 12 Introduction A word of warning.... Most of the topics may seem simple......... and in fact they are! Problems are mostly due to: complexity when integrating system low-level (bit fiddling) details bootstrapping (X needs Y, Y needs X)

13 © P. Reali / M. Corti System-Software WS 04/05 13 Locks Storage Modules Processor Memory Interrupts Active Traps Timers SMP Introduction Bootstrapping (Aos) Level

14 © P. Reali / M. Corti System-Software WS 04/05 14 Introduction Lecture Topics Overview Runtime Support Virtual Addressing Memory Management Distributed Obj. System Concurrency Disc / Filesystem Case Study: JVM Oct04 Nov04 Dec04 Jan05 Feb05

15 © P. Reali / M. Corti System-Software WS 04/05 15 Run-time Support Overview Support for programming abstractions – Procedures calling conventions parameters – Object-Oriented Model objects methods (dynamic dispatching) – Exceptions Handling –... more...

16 © P. Reali / M. Corti System-Software WS 04/05 16 Call a.P Call b.Q Call b.q Return b.q Call c.R Return c.R Return b.Q Return a.P a.P b.Q b.q 1 1 Run-time Support Application Binary Interface (ABI) Object a, b, c, … with methods P, Q, R, … and internal procedures p, q, r, … Call Sequence Stack Pointer (SP) Procedure Activation Frame (PAF) a.P b.Q b.q 2 2 a.P b.Q 3 3 a.P b.Q c.R 4 4 Stack

17 © P. Reali / M. Corti System-Software WS 04/05 17 locals params Run-time Support Procedure Activation Frame Dynamic Link Frame Pointer (FP) Save Registers Push Parameters Save PC Branch Save FP FP := SP Allocate Locals Remove Locals Restore FP Restore PC Remove Parameters Restore Registers FP PC Stack Pointer (SP) Caller Callee Caller Frame Call Return Caller

18 © P. Reali / M. Corti System-Software WS 04/05 18 Run-time Support Procedure Activation Frame, Optimizations Many optimizations are possible – use registers instead of stack – register windows – procedure inlining – use SP instead of FP addressing

19 © P. Reali / M. Corti System-Software WS 04/05 19 Run-time Support Procedure Activation Frame (Oberon / x86) push params call P push fp mov fp, sp sub sp, size(locals) mov sp, fp pop fp ret size(params) CallerCallee... push pc pc := P pop pc add sp,size(params)

20 © P. Reali / M. Corti System-Software WS 04/05 20 Run-time Support Calling Convention Convention between caller and callee – how are parameters passed data layout left-to-right, right-to-left registers register window – stack layout dynamic link static link – register saving reserved registers

21 © P. Reali / M. Corti System-Software WS 04/05 21 Run-time Support Calling Convention (Oberon) Parameter passing: – on stack (exception: Oberon/PPC uses registers) – left-to-right – self (methods only) as last parameter – structs and arrays passed as reference, value-parameters copied by the callee Stack – dynamic link – static link as last parameter (for local procedures) Registers – saved by caller

22 © P. Reali / M. Corti System-Software WS 04/05 22 Run-time Support Calling Convention (C) Parameter passing: – on stack – right-to-left – arrays passed as reference (arrays are pointers!) Stack – dynamic link Registers – some saved by caller

23 © P. Reali / M. Corti System-Software WS 04/05 23 Run-time Support Calling Convention (Java) Parameter passing – left-to-right – self as first parameter – parameters pushed as operands – parameters accessed as locals – access through symbolic, type-safe operations

24 © P. Reali / M. Corti System-Software WS 04/05 24 Run-time Support Object Oriented Support, Definitions Obj x = new ObjA(); static type of x is Obj dynamic type of x is ObjA x compiled as being compatible with Obj, but executes as ObjA. static and dynamic type can be different the system must keep track of the dynamic type with an hidden type descriptor Obj0 Obj ObjA ObjB Class Hierarchy Polymorphism

25 © P. Reali / M. Corti System-Software WS 04/05 25 Run-Time Support Polymorphism VAR t: Triangle; s: Square; o: Figure; BEGIN t.Draw(); s.Draw(); o.Draw(); END; WHILE p # NIL DO p.Draw(); p := p.next END; Type is discovered at runtime! Type is statically known!

26 © P. Reali / M. Corti System-Software WS 04/05 26 Run-time Support Object Oriented Support, Definitions Obj x = new ObjA(); if (x IS ObjA) {... } // type test ObjA y = (ObjA)x // type cast x = y; // type coercion // (automatic convertion) Obj0 Obj ObjA ObjB Class Hierarchy

27 © P. Reali / M. Corti System-Software WS 04/05 27 Run-time Support Object Oriented Support (High-level Java).... a IS T.... if (a != null) { Class c = a.getClass(); while ((c != null) && (c != T)) { c = c.getSuperclass(); } return c == T; } else { return false; } Type Test Implementation

28 © P. Reali / M. Corti System-Software WS 04/05 28 Run-Time Support Type Descriptors struct TypeDescriptor { intlevel; type[] extensions; method[]methods; } class Object { TypeDescriptortype; } many type-descriptor layouts are possible layout depends on the optimizations choosen

29 © P. Reali / M. Corti System-Software WS 04/05 29 Run-Time Support Type Tests and Casts 0 1 2 Obj0 Obj ObjAObjB 0: Obj0 1: NIL 2: NIL 3: NIL TD(Obj0) TD(Obj) 0: Obj0 1: Obj 2: NIL 3: NIL 0: Obj0 1: Obj 2: ObjA 3: NIL TD(ObjA) (obj IS T) obj.type.extension[ T.level ] = T mov EAX, obj mov EAX, -4[EAX] cmp T, -4 * T.level - 8[EAX] bne.... extension level

30 © P. Reali / M. Corti System-Software WS 04/05 30 Run-time Support Object Oriented Support (High-level Java).... a.M(.....).... Class[] parTypes = new Class[params.Length()]; for (int i=0; i< params.Length(); i++) { parTypes[i] = params[i].getClass(); } Class c = a.getClass(); Method m = c.getDeclaredMethod(M, parTypes); res = m.invoke(self, parValues); Method Call Implementation Use method implementation for the actual class (dynamic type)

31 © P. Reali / M. Corti System-Software WS 04/05 31 Disadvantages: memory usage bad integration (explicit self) non constant Advantages: instance bound can be changed at run-time Run-Time Support Handlers / Function Pointers TYPE SomeType = POINTER TO SomeTypeDesc; Handler = PROCEDURE (self: SomeType; param: Par); SomeTypeDesc = RECORD handler: Handler; next: SomeType; END handler nexthandler nexthandler next PROC Q PROC R root

32 © P. Reali / M. Corti System-Software WS 04/05 32 Run-Time Support Method tables (vtables) TYPE A = OBJECT PROCEDURE M0; PROCEDURE M1; END A; B = OBJECT (A) PROCEDURE M0; PROCEDURE M2; END B; B.M0 overrides A.M0 B.M2 is new 0: A.M0 1: A.M1 A.MethodTable 0: A.M0 1: A.M1 B.MethodTable 2: B.M2 B.M0 Idea: have a per-type table of function pointers. New methods add a new entry in the method table Overrides replace an entry in the method table Each method has an unique entry number

33 © P. Reali / M. Corti System-Software WS 04/05 33 Run-Time Support Method tables TYPE A = OBJECT PROCEDURE M0; PROCEDURE M1; END A; B = OBJECT (A) PROCEDURE M0; PROCEDURE M2; END B; 0: A.M0 1: A.M1 A.MethodTable 0: A.M0 1: A.M1 B.MethodTable 0: B.M0 2: B.M2 Virtual Dispatch o.M0; call o.Type.Methods[0] mov eax, VALUE(o) mov eax, type[eax] mov eax, off + 4*mno[eax] call eax o Fields Type

34 © P. Reali / M. Corti System-Software WS 04/05 34 Run-Time Support Oberon Type Descriptors obj size obj fields ext table mth table type name type desc td size type desc method table superclass table pointers in object for GC type descriptor is also an object! type desc ptr offsets for garbage collection for object allocation for type checks for method invocation

35 © P. Reali / M. Corti System-Software WS 04/05 35 Run-Time Support Interfaces, itables interface A { void m(); } interface B { void p(); } Object x; A y = (A)x; y.m(); does x implement A? x has an method table (itable) for each implemented interface multiple itables: how is the right itable discovered?

36 © P. Reali / M. Corti System-Software WS 04/05 36 Run-Time Support Interface support How to retrieve the right method table (if any)? Global table indexed by [class, interface] Local (per type) table / list indexed by [interface] Many optimizations are available use the usual trick: enumerate interfaces

37 © P. Reali / M. Corti System-Software WS 04/05 37 Run-Time Support Interface support (I) method table (vtable) interfaces method table (itable) Intf0 method table (itable) Intf7 Type Descriptor Intf0 y = (Intf0)x; y.M(); interface i = x.type.interfaces; while ((i != null) && (i != Intf0) { i = i.next; } if (i != null) i.method[mth_nr](); Call is expensive because requires traversing a list: O(N) complexity

38 © P. Reali / M. Corti System-Software WS 04/05 38 Run-Time Support Interface support (II) vtable interfaces itable 2 0 itable 7 1234567 sparse array! Intf0 y = (Intf0)x; y.M(); interface i = x.type.interfaces[Intf0]; if (i != null) i.method[mth_nr](); Lookup is fast (O(1)), but wastes memory Type Descriptor

39 © P. Reali / M. Corti System-Software WS 04/05 39 Run-Time Support Interface Implementation (III) vtable t interfaces itable t,2 0 itable t,7 1234567 vtable t interfaces itable u,2 itable u,0 01234567 overlap interface table index Type Descriptor t Type Descriptor u

40 © P. Reali / M. Corti System-Software WS 04/05 40 Run-Time Support Interface Implementation (III) vtable interfaces itable vtable interfaces itable overlapped interface table index Type Descriptor

41 © P. Reali / M. Corti System-Software WS 04/05 41 Run-Time Support Interface Implementation (III) vtable interfaces itable overlapped interface tables Type Descriptor Intf0 y = (Intf0)x; y.M(); itable i = x.type.interfaces[Intf0]; if ((i != null) && (i in x.type)) i.method[mth_nr]();

42 © P. Reali / M. Corti System-Software WS 04/05 42 Run-Time Support Exceptions void catchOne() { try { tryItOut(); } catch (TestExc e) { handleExc(e); } void catchOne() 0 aload_0 1 invokevirtual tryItOut(); 4 return 5 astore_1 6 aload_0 7 aload_1 8 invokevirtual handleExc 11 return ExceptionTable FromToTargetType 045TestExc

43 © P. Reali / M. Corti System-Software WS 04/05 43 Run-Time Support Exception Handling / Zero Overhead void ExceptionHandler(state) { pc = state.pc, exc = state.exception; while (!Match(table[i], pc, exc)) { i++; if (i == TableLength) { PopActivationFrame(state); pc = state.pc; i = 0; } state.pc = table[i].pc handler ; ResumeExecution(state) } try {..... } catch (Exp1 e) {..... } catch (Exp2 e) {..... } pc start pc end pc handler1 pc handler2 startend exceptionhandler pc start pc end Exp1pc handler1 pc start pc end Exp2pc handler2 Global Exception Table

44 © P. Reali / M. Corti System-Software WS 04/05 44 Run-Time Support Exception Handling / Zero Overhead exception table filled by the loader / linker traverse whole table for each stack frame system has default handler for uncatched exceptions no exceptions => no overhead exception case is expensive system optimized for normal case

45 © P. Reali / M. Corti System-Software WS 04/05 45 Run-Time Support Exception Handling / Fast Handling try {..... } catch (Exp1 e) {..... } catch (Exp2 e) {..... } pc handler1 pc handler2 try { save (FP, SP, Exp1, pc handler1 ) save (FP, SP, Exp2, pc handler2 )..... remove catch descr. jump end } catch (Exp1 e) {..... remove catch descr. jump end } catch (Exp2 e) {..... remove catch descr. jump end } end: push catch descriptors on the stack add code instrumentation use an exception stack to keep track of the handlers

46 © P. Reali / M. Corti System-Software WS 04/05 46 Run-Time Support Exception Handling / Fast Handling void ExceptionHandler(ThreadState state) { int FP, SP, handler; Exception e; do{ retrieve(FP, SP, e, handler); } while (!Match(state.exp, e)); state.fp = FP;// set frame to the one state.sp = SP;// containing the handler state.pc = handler;// resume with the handler ResumeExecution(state) } pop next exception descriptor from exception stack can resume in a different activation frame

47 © P. Reali / M. Corti System-Software WS 04/05 47 Run-Time Support Exception Handling / Fast Handling code instrumentation insert exception descriptor at try remove descriptor before catch fast exception handling overhead even when no exceptions system optimized for exception case

48 © P. Reali / M. Corti System-Software WS 04/05 48 Virtual Addressing Overview Virtual Addressing: abstraction of the MMU (Memory Management Unit) Work with virtual addresses, where address real = f(address virtual ) Provides decoupling from real memory – virtual memory – demand paging – separated address spaces

49 © P. Reali / M. Corti System-Software WS 04/05 49 Virtual Addressing Pages Memory as array of pages 1 2 3 4 5 7 6 0 3 1 2 0 0 5 virtual address-space 2 real memory: pool of page frames virtual address-space 1 unmapped (invalid) page page page frame unmapped range mapping programs use and run in this address spaces memory address

50 © P. Reali / M. Corti System-Software WS 04/05 50 Translation Lookaside Buffer Associative Cache (PT, VA, RA) mVirtual Address Real Address page-no off Virtual Addressing Page mapping Page Table Ptr Register off Virtual AddressReal Address frame Real Memory Page Frame frame Page Table page-no off frame MMU TLB

51 © P. Reali / M. Corti System-Software WS 04/05 51 Virtual Addressing Definitions pagesmallest unit in the virtual address space page frameunit in the physical memory page tabletable mapping pages into page frames page faultaccess to a non-mapped page working setpages a process is currently using

52 © P. Reali / M. Corti System-Software WS 04/05 52 pr Virtual Addressing Alternate Page Mapping mMultilevel page tables lMultipart Virtual Address lPage table as (B*-)Tree mInverted Page-Table pno1pno2 off 0 1 N vp pr Hash pr, vp pf vp Next probe pr Process pf Hashtable 64 bit Address Space 1. Level Table 2. Level Table unassigned

53 © P. Reali / M. Corti System-Software WS 04/05 53 Virtual Addressing What for? Decoupling from real memory – virtual memory (cheat: use more virtual memory than the available real memory) – dynamically allocated contiguous memory blocks (for multiple stacks in multitasking systems) – some optimizations null reference checks garbage collection (using dirty flag) Virtual Addressing is not for free! – address mapping may require additional memory accesses – page table takes space

54 © P. Reali / M. Corti System-Software WS 04/05 54 Virtual Addressing Virtual Memory mUse secondary storage (disc) to keep currently unused pages (swapping) mPage table usually keeps some per-page flag minvalid page not mapped mreferenced page has been referenced mdirty page has been modified mAccessing an invalid page causes a page-fault interrupt mselect page frame to be swapped out (victim or candidate) mswap-in requested page frame

55 © P. Reali / M. Corti System-Software WS 04/05 55 Virtual Addressing Virtual Memory / Demand Paging Page-out Page-in Real Memory Disc Page Table victim set to invalid requested page

56 © P. Reali / M. Corti System-Software WS 04/05 56 Virtual Addressing Demand Paging Sequence ELSE Access Page Table; IF Page invalid THEN Page-Fault ELSE RETURN RA END IF Free Page Frame exists THEN Assign frame to VA ELSE Search victim page; IF victim page modified THEN page-out to secondary storage END; Invalidate victim page; Assign frame to VA END; Page-in from secondary storage; Reset invalid flag MMU OS Page- Fault Handler IF VA IN TLB THEN RETURN RA TLB E[t] = P TLB * t TLB + P PT * t PT + P disc * t disc Expected time to translate VA into RA

57 © P. Reali / M. Corti System-Software WS 04/05 57 Virtual Addressing Example Page size4 KB Address size32 Bits addressable memory: 2 32 = 4GB page offset: 12 Bits (4KB = 2 12 ) page number: 20 Bits (32 - 12) page table size: 2 20 * 32 Bits = 4 MB Real Memory128 MB page table overhead: ca. 3%

58 © P. Reali / M. Corti System-Software WS 04/05 58 Virtual Addressing Example mov EAX, @Addr 1-P TLB Page Table TLB P TLB DiscP pag e fault Memory 1-P page fault 1 disc read 1 disc write 1 memory read E[t] = P TLB t TLB + (1- P TLB )(t PT + P PF t disc + (1-P PF )t mem )

59 © P. Reali / M. Corti System-Software WS 04/05 59 Virtual Addressing Demand Paging: Page Replacement mOptimal Strategy (Longest Unused) lTake the page, that will remain unused for the longest time lRequires oracle Prefrefmod 300 201 110 011 mNRU: Not Recently Used lReset the referenced flag at each tick lCreate page categories (good candidate to bad candidate) lchoose best candidate

60 © P. Reali / M. Corti System-Software WS 04/05 60 Virtual Addressing Demand Paging: Page Replacement (2) mLRU: Least Recently Used lAssumption: not used in past ==> not used in the future lHardware implementation l64-Bit time-stamp for each page lSoftware implementation lAging-Algorithm lChoose page with lowest value t 0000111111 0111 0111 Reference Flag t(i) t(i+1) set if page accessed

61 © P. Reali / M. Corti System-Software WS 04/05 61 Virtual Addressing Demand Paging: Page Replacement (3) mLeast Recently Created LRC (FIFO) lPage Lifespan as metric (old are swapped out) lChain sorted by creation time lBad handling for often-used pages Fix: second chance when accessed (ref flag set) during the last tick earliest Ref-Flag cur := earliest; WHILE cur.ref DO cur.ref := FALSE; cur := cur.next END next

62 © P. Reali / M. Corti System-Software WS 04/05 62 Virtual Addressing Demand Paging: Page Replacement (4) Strategies: – optimal – LRU / NRU / LRC Exceptions: – page pinning: page cannot be swapped out kernel code

63 © P. Reali / M. Corti System-Software WS 04/05 63 Virtual Addressing Example Accessed Pages: 1, 2, 1, 3, 4, 1, 2, 3, 4 Available Page Frames: 3 working set {1,2,3,4} Page Access 121341234 Ideal11, 2 1, 2, 31, 2, 4 2, 3, 4 FIFO11, 2 1, 2, 32, 3, 43, 4, 14, 1, 21, 2, 32, 3, 4 LRU11, 2 1, 2, 31, 3, 4 1, 4, 24, 2, 3 PF!

64 © P. Reali / M. Corti System-Software WS 04/05 64 Demand Paging Beladys Anomaly LRC Strategie 3 Page Frames 9 Page Faults 4 Page Frames 10 Page Faults 0 1 2 3 0 1 4 0 1 2 3 4 0 1 2 3 0 1 4 4 4 2 3 3 0 1 2 3 0 1 1 1 4 2 2 0 1 2 3 0 0 0 1 4 4 0 1 2 3 0 1 x x x x x x x x x 0 1 2 3 3 3 4 0 1 2 3 4 0 1 2 2 2 3 4 0 1 2 3 0 1 1 1 2 3 4 0 1 2 0 0 0 1 2 3 4 0 1 0 1 2 3 4 0 x x x x x Victim Page access sequence Beladys Anomaly: More page frames cause more page faults

65 © P. Reali / M. Corti System-Software WS 04/05 65 Demand Paging How many page frames per process? mEven Distribution lEvery process has the same amount of memory lThrashing every memory access causes a page-fault not enough page-frames for the current working-set Process Count CPU-Load 100 % 1 2 n n+1 System is swapping instead of running

66 © P. Reali / M. Corti System-Software WS 04/05 66 Demand Paging How many page frames per process? (2) mDepending on the process needs (1) luse Working-Set lPage Frames assigned according to the process working-set size. Swap-out a process when not enough memory available. 1 3 2 2 3 3 1 2 2 3 3 3 4 2 2 1 1 1 2 1 3 3 3 1 3 1 2 3 4 1 { 1, 2, 3, 4 } Sliding Window Page Access { 2, 3, 4 } Working Set

67 © P. Reali / M. Corti System-Software WS 04/05 67 Demand Paging How many page frames per process? (3) mDepending on the process needs (2) luse Page-Fault Rate Time HIGH LOW Page-Fault Rate Swap out one process Swap in

68 © P. Reali / M. Corti System-Software WS 04/05 68 Virtual Addressing Aos/Bluebottle, Memory Layout Example Stacks 4 GB 2 GB Heap Kernel PROCEDURE PageFault; BEGIN IF adr > 2GB THEN add page to stack ELSE Exception(NilTrap) END END PageFault; 128 KB per stack max. 32768 active objects first stack page allocated on process creation

69 © P. Reali / M. Corti System-Software WS 04/05 69 Virtual Addressing Example: UNIX, Fork code text data a UNIX Program consists of..... Process B Fork() read-only Process A read-only Page Table data read-write copy on write

70 © P. Reali / M. Corti System-Software WS 04/05 70 Virtual Addressing OS Control Oberon – no virtual memory Windows – Virtual Memory configuration – Task Manager Linux – Swap partition / Swap files – ps / top

71 © P. Reali / M. Corti System-Software WS 04/05 71 Virtual Addressing Segmentation e.g. Intel x86 Problem – 640KB Max Memory – 16bit addresses (i.e. 64KB) Solution – work in a segment – code / data segments – check segment boundaries Addr real = Seg base +Offset real memory data segment code segment segment limit segment base

72 © P. Reali / M. Corti System-Software WS 04/05 72 Virtual Addressing Summary virtual addresses, address real = f(address virtual ) Decoupling from real memory – virtual memory – demand paging – separate address spaces Keywords – page – page frame – page table – page fault – page flags dirty, used, unmapped – page replacement strategy LRC, LRU, ideal,... – swapping – thrashing, beladys anomaly

73 © P. Reali / M. Corti System-Software WS 04/05 73 Memory Management Overview Abstractions for applications – heap – memory blocks ( << memory pages) Operations: – Allocate – Deallocate Topics: – memory organization – free lists – allocation strategies – deallocation explicit – garbage collection type-aware conservative copying / moving incremental generational

74 © P. Reali / M. Corti System-Software WS 04/05 74 Memory Management Objects on the heap mObject Instances: a, b, c, d, … mSequence: NEW(a) NEW(b) NEW(c) DISPOSE(b) NEW(d) NEW(e) a b c dynamic allocation explicit disposal Heap e a c d ! e Case 1 e e Case 2 not enough space

75 © P. Reali / M. Corti System-Software WS 04/05 75 Memory Management Problem overview m Problems l Heap size limitation ( e, case 1) l External Fragmentation ( e, case 2) l Dangling Pointers (a points to b) m Solutions l System-managed list of free blocks (free list) l Vector of blocks with fixed size (Bitmap, with 0=free, 1=used) l Automated detection and reclamation of unused blocks (garbage collection)

76 © P. Reali / M. Corti System-Software WS 04/05 76 Memory Management Theory: 50% rule Assumption: stable state M free blocks, N block allocated 50%-Rule: M = 1/2 N ABBBBCC N = A + B + C M = 1/2 (2A + B + e) e = 0,1, or 2 block disposal: ΔM = (C - A) / N block allocation: (splitting likelihood) ΔM = 1 - p B (C - A) / N = 1 - p C - A - N + pN = 0 2M = 2A + B + e 2M = 2A + N - A - C + e 2M = N + A - C + e 2M +e = pN

77 © P. Reali / M. Corti System-Software WS 04/05 77 Memory Management Theory: Memory Fragmentation { 50%-Rule } (b/2)*F = H - b*B, /2*b*B = H - b*B H/(b*B) = 1 + /2, = 2/ - 2 Critical point

78 © P. Reali / M. Corti System-Software WS 04/05 78 Memory Management Free-list management with a Bitmap Idea – partition heap in blocks of size s – use bitmap to track allocated blocks bitmap[i] = true block i allocated Problems – internal fragmentation round up block size to next multiple of s – map size size is (heap_size / s) bits loss due to internal fragmentation

79 © P. Reali / M. Corti System-Software WS 04/05 79 Memory Management Free-list management with a list List organization – sorted / non-sorted merging of empty blocks is simpler with sorted list – one list / many lists (per size) search is simpler, merging is more difficult – management data stored in the free block size, next pointer Operations – Allocation – Disposal with merge find free blocks next to current block, merge into bigger free block

80 © P. Reali / M. Corti System-Software WS 04/05 80 Memory Management Memory allocation strategies block splitting: – if a free-block is bigger than the requested block, then it is split first-fit – use first free block which is big enough best-fit – take smallest fitting block causes a lot of fragmentation worst-fit – take biggest available block quick-fit – best-fit but multiple free-lists (one per block size) fast allocation! freeused freeused internal fragmentation

81 © P. Reali / M. Corti System-Software WS 04/05 81 Memory Management Buddy System (for fast block merging) Blocks have size 2 k Block with size 2 i has address j*2 i (last i bits are 0) Blocks with address x=j*2 i and (j XOR 1)*2 i are buddies (can be merged into a block of size 2 i+1 ) buddy = x XOR 2 i 32 64 32 16 328168 b 1 xxxx 0 0000 b 2 xxxx 1 0000 2 k+1 2k2k 2 k-1 Merge Split

82 © P. Reali / M. Corti System-Software WS 04/05 82 Memory Management Buddy System (for fast block merging) Problem: only buddies can be merged Cascading merge 3216 328168 3216 328168 no buddies buddies 32816 8 3216 32

83 © P. Reali / M. Corti System-Software WS 04/05 83 Memory Management Buddy System (for fast block merging) Allocation – allocate(8) 328168 3216 32 split quick fit 328168

84 © P. Reali / M. Corti System-Software WS 04/05 84 Block size = k*32 free-lists for k = 1..9, one list for blocks > 9*32 Allocate quick-fit, splitting may be required Free-list management and block-merging done by the Garbage Collector Memory Management Example: Oberon / Aos k * 32 96 64 32 ALLOCATE(50) initial state k * 32 96 64 32 Allocated Block

85 © P. Reali / M. Corti System-Software WS 04/05 85 Memory Management Garbage Collection Two steps: 1. Free block detection – type-aware collector is aware of the types traversed, i.e. know which values are pointers – conservative collector doesnt know which values are pointers 2. Block Disposal return unused blocks to the free-lists GC Characteristics – incremental gc is performed in small steps to minimize program interruption – moving / copying / compacting blocks are moved around – generational blocks are grouped in generations; different treatment or collection priority Barriers – read intercept and check every pointer read operation – write intercept and check every pointer write operation

86 © P. Reali / M. Corti System-Software WS 04/05 86 Memory Management Garbage Collection: Reference Counting mEvery object has a Reference counter rc mrc = 0 Object is Garbage mProblems lOverhead lno support for circular structures mUseful for... lModule hierarchies lDAG-Structures (z. B. LISP) p, q Pointers to Object q := p rc p q write barrier INC p.rc DEC q.rc IF q.rc = 0 THEN Collect q^ END; q := p M AB CD rc >= 1

87 © P. Reali / M. Corti System-Software WS 04/05 87 Memory Management Garbage Collection: Mark & Sweep mMark-Phase (Garbage Detection) lCompute the Root-set consisting of global pointers (statics) in each module local pointers on the stack in each PAF temporary pointers in the CPUs registers lTraverse the graph of the live objects starting from the root-set with depth-first strategy; mark all reached objects. mSweep-Phase (Garbage Collection) lLinear heap traversal. Non-marked blocks are inserted into free-lists. lOptimization: lazy sweeping (sweep during allocation, allocation gets slower)

88 © P. Reali / M. Corti System-Software WS 04/05 88 Run-time support from object-system. Hidden data structures with (compiler generated) information about pointers (metadata). mConservative approach. Guess which values could be pointers and threat them as such Memory Management Garbage Collection: root-set off off1 off2 off1 off Module Descriptor Module Data Object Instance Typ Descriptor Type Tag global pointer instanc e pointer

89 © P. Reali / M. Corti System-Software WS 04/05 89 Memory Management Garbage Collection: Mark with Pointer Rotation/1 Problem: Garbage collection called when free memory is low, but mark may require a lot of memory Solution: Pointer rotation algorithm (Deutsch, Schorre, Waite) + Memory efficient + iterative – structures are temporarily inconsistent – non-concurrent – non-incremental

90 © P. Reali / M. Corti System-Software WS 04/05 90 Memory Management Garbage Collection: Mark with Pointer Rotation/2 qp qp p.link Simple case: list traversal

91 © P. Reali / M. Corti System-Software WS 04/05 91 Memory Management Garbage Collection: Mark with Pointer Rotation/3 q p q p Generic case: structure traversal

92 © P. Reali / M. Corti System-Software WS 04/05 92 Memory Management Garbage Collection: Memory Compaction nextavail Pointer: partition heap between allocated and free space Allocate: increment nextavail Garbace Collector performs memory compaction nextavail ALLOC GC MS.NET

93 © P. Reali / M. Corti System-Software WS 04/05 93 Memory Management Garbage Collection: Stop & Copy Partition heap in from and to regions Collection: – traverse objects in from, copy to to – leave forwarding pointer behind – requires read barrier – swap from and to Characteristics – copying – incremental – (generational) IF p is moved THEN replace p with forwarding pointer END; access p instrument code with read barrier

94 © P. Reali / M. Corti System-Software WS 04/05 94 Memory Management Garbage Collection: Stop & Copy fromto 1 fromto 2 fromto 3 from 4

95 © P. Reali / M. Corti System-Software WS 04/05 95 Memory Management Garbage Collection: Concurrent GC mStop-and-Go Approach mIncremental Approach Mutator GC Mutator GC Mutator User Process Real-Time Constraint

96 © P. Reali / M. Corti System-Software WS 04/05 96 Memory Management Garbage Collection: Tricolor marking Wave-front Model StateColor already traversed, behind wave black being traversed, on the wave grey not reached yet, in front of the wave white

97 © P. Reali / M. Corti System-Software WS 04/05 97 Mutator can change pointers at any time lCritical case: black white mRemedy lWrite-Barrier lcolor B gray lcolor W gray Memory Management Garbage Collection: Tricolor marking / Isolation W unreachable B Write Barrier

98 © P. Reali / M. Corti System-Software WS 04/05 98 Memory Management Garbage Collection: Backers Treadmill To-Space From-Space Free-Space Heap: double- linked chain of objects curscan

99 © P. Reali / M. Corti System-Software WS 04/05 99 Memory Management Garbage Collection: Backers Treadmill To-Space From-Space Free-Space curscan conservative allocation progressive allocation

100 © P. Reali / M. Corti System-Software WS 04/05 100 Memory Management Garbage Collection: Backers Treadmill collect To-Space From-Space Free-Space curscan reference curscan

101 © P. Reali / M. Corti System-Software WS 04/05 101 Memory Management Garbage Collection: Backers Treadmill mState transitions after GC is complete lFrom-Space + Free-Space Free-Space lToSpace FromSpace mFragmentation lExternal: not removed lInternal: depends on supported block sizes mAllocation lconservative: black lprogressive: white Root Set x y NEW(y) NEW(x ) curscan

102 © P. Reali / M. Corti System-Software WS 04/05 102 Memory Management Generational Garbage Collection Generations Expected object life young short life (temp data) old long life Generations G0, G1, G2 A B C D E A D F G A G H I J G2 G1 G0 special handling for pointers across different generations required Gen GC frequency G0high G1medium G2low collect where it is garbage is most likely to be found

103 © P. Reali / M. Corti System-Software WS 04/05 103 Memory Management Garbage Collection: Finalization Finalization (after-use cleanup) User-defined routine when object is collected Establish Consistency save buffers flush caches Release Resources close connections release file descriptors Dangers: Resurrection of objects: objects added to live structures Finalization sequence is undefined

104 © P. Reali / M. Corti System-Software WS 04/05 104 Memory Management Garbage Collection:.NET Finalization Example Rules: objects with finalizer belong to older generation finalizer only called once (ReRegisterForFinalize) FinalizationQueue: live object with finalizer FreachableQueue: collected objects to be finalized Finalization executed by different process for security reasons A B C D E E B A garbage Finalization Queue A B C D E E B A Finalization Queue Freachable Queue GC thread

105 © P. Reali / M. Corti System-Software WS 04/05 105 Memory Management Garbage Collection: Weak Pointers Weak Pointers Objects referenced only through a weak pointer can be collected by the GC in case of need Used for Caches and Buffers Implementation 1.Weak Pointers are not registered to the GC 2.Use a weak reference table (indirect access) garbage in use weak pointer weak reference weak reference table

106 © P. Reali / M. Corti System-Software WS 04/05 106 Memory Management Garbage Collection: Weak Pointers Example Oberon: internal file list – system must keep track of open files to avoid buffer duplication – file descriptor must be collected once user has no more reference to it – use weak pointer in the system (otherwise would keep file alive!)

107 © P. Reali / M. Corti System-Software WS 04/05 107 Memory Management Object Pools Application keeps a pool of preallocated object instances; handles allocation and disposal lSimulation discrete events lBuffers in a file system lProvide dynamic allocation in real-time system PROCEDURE NewT (VAR p: ObjectT); BEGIN IF freeT = NIL THEN NEW(p) ELSE p := freeT; freeT := freeT.next END END NewT; PROCEDURE DisposeT (p: ObjectT); BEGIN p.next := freeT; freeT := p END DisposeT;

108 © P. Reali / M. Corti System-Software WS 04/05 108 Garbage Collection, Recap GC kinds: compacting copying incremental generational Helpers: write barrier read barrier forwarding pointer pointer rotation Algorithms: Ref-Count Mark & Sweep Stop & Copy Mark & Copy (.NET) Bakers Threadmill – Dijkstra / Lamport – Steele

109 © P. Reali / M. Corti System-Software WS 04/05 109 Distributed Object Systems Overview Goals – object-based approach – hide communication details Advantages – more space – more CPU – redundancy – locality Problems Coherency – ensure that same object definition is used Interoperability – serialization – type consistency – type mapping Object life-time – distributed garbage collection

110 © P. Reali / M. Corti System-Software WS 04/05 110 Distributed Object Systems Architecture ProxyStubImpl. Naming Service IDL Object Broker ClientServer Object Broker Call Contex t Message IDL-Compiler Impl. Skeleton Application

111 © P. Reali / M. Corti System-Software WS 04/05 111 Remote Procedure Invocation Overview Problem – send structured information from A to B – A and B may have different memory layouts – endianness – How is 0x1234 (2 bytes) representend in memory? 1234 0 1 1234 Big-Endian: MSB before LSB IBM, Motorola, Sparc Little-Endian: LSB before MSB VAX, Intel network byte- ordering little end first

112 © P. Reali / M. Corti System-Software WS 04/05 112 Definitions Serialization – conversion of an objects instance into a byte stream Deserialization – conversion of a stream of bytes into an objects instance Marshaling – gathering and conversion (may require serialization) to an appropriate format of all relevant data, e.g in a remote method call; includes details like name representation.

113 © P. Reali / M. Corti System-Software WS 04/05 113 Remote Procedure Invocation Protocol Overview Protocols – RPC + XDR (Sun) RFC 1014, June 1987 RFC 1057, June 1988 – IIOP / CORBA (OMG) V2.0, February 1997 V3.0, August 2002 – SOAP / XML (W3C) V1.1, May 2000 –... XDR Type System – [unsigned] Integer (32-bit) – [unsigned] Hyper-Integer (64-bit) – Enumeration (unsigned int) – Boolean (Enum) – Float / Double (IEEE 32/64-bit) – Opaque – String – Array (fix + variable size) – Structure – Union – Void big-endian representation

114 © P. Reali / M. Corti System-Software WS 04/05 114 Remote Procedure Invocation RPC Protocol Remote Procedure Call Marshalling of procedure parameters Message Format Authentication Naming Client PROCEDURE P(a, b, c) pack parameters send message to server await response unpack response Server unpack parameters find procedure invoke pack response send response P(a, b, c)

115 © P. Reali / M. Corti System-Software WS 04/05 115 Distributed Object Systems Details References vs. Values – client receives reference to remote object – data values are copied to client for efficiency reasons – decide whether an object is sent as reference or a value serializable (Java,.NET), valuetype (CORBA) MarshalByRefObject (.NET), java/RMI/Remote (Java), default (CORBA) object creation – server creates objects – client creates objects – server can return references object instances – one object for all requests – one object for each requests – one object per proxy conversation state – stateless – stateful

116 © P. Reali / M. Corti System-Software WS 04/05 116 Distributed Object Systems Distr. Object Systems vs. Service Architecture Dist. Object System – object oriented model – object references – stateful / stateless – tight coupling Service Architecture – OO-model / RPC – service references – stateless – loose coupling internal communication between applications tiers external communication between applications

117 © P. Reali / M. Corti System-Software WS 04/05 117 Distributed Object Systems Distr. Object Systems vs. Service Architecture heterogeneoushomogeneous tight loose CORBA Remoting RMI Web Services components / objects (distributed object system) stateful and stateless conversation transactions services remote procedure calls stateless conversation (session?) message environment coupling

118 © P. Reali / M. Corti System-Software WS 04/05 118 Distributed Object Systems Type Mapping Type System 1 Interoperability Type System Type System 2 Possible Types Mappable Types Mappable Types Interop Subset

119 © P. Reali / M. Corti System-Software WS 04/05 119 Distributed Object Systems Type Mapping, Example Java Type System CORBA Type System CLS Type System wchar double char enum union custom implementation

120 © P. Reali / M. Corti System-Software WS 04/05 120 Distributed Object Systems Examples Standards – OMG CORBA IIOP – Web Services SOAP Frameworks – Java RMI (Sun) – DCOM (Microsoft) –.NET Remoting (Microsoft) IIOP.NET

121 © P. Reali / M. Corti System-Software WS 04/05 121 Distributed Object Systems CORBA mCommon Object Request Broker Architecture TCP/IP Socket ORB Interface Repository Implementation Repository CORBA Runtime Object Adaptor CORBA Runtime Client Stub Object Skeleton ObjectClient Application Remote Architecture Client Server GIOP/IIOP Object-Bus

122 © P. Reali / M. Corti System-Software WS 04/05 122 Distributed Object Systems CORBA – CORBA is a standard from OMG Object Management Group Common Object Request Broker Architecture – CORBA is useful for... building distributed object systems heterogeneous environments tight integration – CORBA defines... an object-oriented type system an interface definition language (IDL) an object request broker (ORB) an inter-orb protocol (IIOP) to serialize data and marshall method invocations language mappings from Java, C++, Ada, COBOL, Smalltalk, Lisp, Phyton... and many additional standards and interfaces for distributed security, transactions,...

123 © P. Reali / M. Corti System-Software WS 04/05 123 Distributed Object Systems CORBA Basic Types – integers 16-, 32-, 64bit integers (signed and unsigned) – IEEE floating point 32-, 64-bit and extended- precision numbers – fixed point – char, string 8bit and wide – boolean – opaque (8bit), any – enumerations Compound Types – struct – union – sequence (variable-length array) – array (fixed-length) – interface concrete (pass-by-reference) abstract (pure definition) – value type pass-by-value abstract (no state) Operations in / out / inout parameters raises Attributes

124 © P. Reali / M. Corti System-Software WS 04/05 124 Distributed Object Systems CORBA / General Inter-ORB Protocol (GIOP) CDR (Common Data Representation) – Variable byte ordering – Aligned primitive types – All CORBA Types supported IIOP (Internet IOP) – GIOP over TCP/IP – Defines Interoperable Object Reference (IOR) host post key Message Format – Defined in IDL – Messages Request, Reply CancelRequest, CancelReply LocateRequest, LocateReply CloseConnection MessageError Fragment – Byte ordering flag – Connection Management request multiplexing asymmetrical / bidirectional connections

125 © P. Reali / M. Corti System-Software WS 04/05 125 Distributed Object Systems CORBA / GIOP Message in IDL module GIOP { struct Version { octet major; octet minor; } enum MsgType_1_0 { Request, Reply, CancelRequest, CancelReply, LocateRequest, LocateReply, CloseConnection, Error } struct MessageHeader { charMagic[4]; VersionGIOP_Version; booleanbyte_order; octetmessage_size; unsigned longmessage_type; } } // module end GIOP

126 © P. Reali / M. Corti System-Software WS 04/05 126 Distributed Object Systems CORBA Services CORBA Services – System-level services defined in IDL – Provide functionality required by most applications Naming Service – Allows local or remote objects to be located by name – Given a name, returns an object reference – Hierarchical directory-like naming tree – Allows getting initial reference of object Event Service – Allows objects to dynamically register interest in an event – Object will be notified when event occurs – Push and pull models... and more – Trader, LifeCycle, Persistence, Transaction, Security

127 © P. Reali / M. Corti System-Software WS 04/05 127 Web Services Distributed Object Systems WebServices Service-oriented architecture Rely on existing protocols – SOAP messaging protocol – WSDL service description protocol – UDDI service location protocol SOAP HTTP TCP/IP

128 © P. Reali / M. Corti System-Software WS 04/05 128 Distributed Object Systems SOAP Simple Object Access Protocol communication protocol XML-based describes object values XML Schemas as interface description language – basic types string, boolean, decimal, float, double, duration, datetime, time, date, hexBinary, base64Binary, URI, Qname, NOTATION – structured types list, union SOAP Message – SOAP Envelope – SOAP Header – SOAP Body Method Call – packed as structure – messages are self- contained – no external object references

129 © P. Reali / M. Corti System-Software WS 04/05 129 Distributed Object Systems SOAP Message SOAP Message – SOAP Envelope SOAP Header SOAP Body Example float Multiply(float a, float b);

130 © P. Reali / M. Corti System-Software WS 04/05 130 Distributed Object Systems SOAP Example (Request) POST /quickstart/aspplus/samples/services/MathService/CS/MathService.asmx HTTP/1.1 Host: samples.gotdotnet.com Content-Type: text/xml; charset=utf-8 Content-Length: length SOAPAction: "http://tempuri.org/Multiply" float float

131 © P. Reali / M. Corti System-Software WS 04/05 131 Distributed Object Systems SOAP Example (Answer) HTTP/1.1 200 OK Content-Type: text/xml; charset=utf-8 Content-Length: length float

132 © P. Reali / M. Corti System-Software WS 04/05 132 Distributed Object Systems SOAP Example (Service Description-1)

133 © P. Reali / M. Corti System-Software WS 04/05 133 Distributed Object Systems SOAP Example (Service Description-2)

134 © P. Reali / M. Corti System-Software WS 04/05 134 Distributed Object Systems WebServices Comments – XML (easily readable) – system independent – standard – stateless (encouraged design pattern) – bloated – big messages (but easily compressed) – requires expensive parsing Constraints – Services no object references server-activated servant – Goes over HTTP requires web server

135 © P. Reali / M. Corti System-Software WS 04/05 135 Distributed Object Systems WebService Future Use SOAP-Header to store additional information about message or context Many standards to come... – WS-Security – WS-Policy – WS-SecurityPolicy – WS-Trust – WS-SecureConversation – WS-Addressing

136 © P. Reali / M. Corti System-Software WS 04/05 136 Distributed Object Systems Java RMI Java Remote Method Invocation TCP/IP Socket Transport Layer Remote References Object Stub Object Client Application Network Remote Architecture Client Server Lookup Register Transport Layer Remote References Object Stub Lookup Register

137 © P. Reali / M. Corti System-Software WS 04/05 137 Distributed Object Systems Java RMI Details Framework – supports various implementations e.g. RMI/IIOP – mapping limited to the Java type system, workarounds needed – uses reflection to inspect objects

138 © P. Reali / M. Corti System-Software WS 04/05 138 Distributed Object-Systems Low-Level Details: Java RMI/IIOP Common Type-System – restricted CORBA Marshalling – name mapping – remote objects only references Interface Description Language (IDL) – java to IDL mapping Message representation Underlying protocol – IIOP (CORBA)

139 © P. Reali / M. Corti System-Software WS 04/05 139 Distributed Object Systems Microsoft DCOM Distributed Common Object Model RPC Channel SCM SCMs and RegistrationCOM Runtime Object Proxy Object Stub ObjectClient Application Network Remote Architecture ClientServer Registry OXID Resolver Ping Server

140 © P. Reali / M. Corti System-Software WS 04/05 140 Distributed Object Systems Microsoft.NET Remoting Instance Network Channel Transparent Proxy Transparent Proxy ObjRef Client new Instace() or Activator.GetObject(...) Application Domain Boundary IChannelInfo ChannelInfo; IEnvoyInfo EnvoyInfo; IRemotingTypeInfo TypeInfo; string URI;

141 © P. Reali / M. Corti System-Software WS 04/05 141 channel Distributed Object Systems Microsoft.NET Remoting Client Instance Instance s = new Instance(); s.DoSomething(); Network Proxy Dispatcher Formatter serialize object Transport Sink Transport Sink Transport Sink Transport Sink handle communication Stream Chan.Sink(s) custom operations Message Chan.Sink(s) Message Chan.Sink(s) Message Chan.Sink(s) Message Chan.Sink(s) custom operations

142 © P. Reali / M. Corti System-Software WS 04/05 142 Distributed Object Systems Microsoft.NET Remoting Activation client – one instance per activation server / Singleton – one instance of object server / SingleCall – one instance per call Leases (Object Lifetimes) – renew lease on call – set maximal object lifetime Serialization – SOAP Warning: non-standard types, only for.NET use – binary – user defined Transport – TCP – HTTP – user defined

143 © P. Reali / M. Corti System-Software WS 04/05 143 AppDomain 2AppDomain 1 Distributed Object Systems Microsoft.NET Remoting (Object Marshalling) MarshalByRefObjects remoted by reference client receives an ObjRef object, which is apointer to the original object [Serializable] all fields of instance are cloned to the client [NonSerialized] fields are ignored ISerializable object has method to define own serialization ObjProxy AppDomain 2AppDomain 1 Obj Serialized ObjRef Serialized fld 1... fld n

144 © P. Reali / M. Corti System-Software WS 04/05 144 Distributed Object Systems Microsoft.NET Remoting, Activation Server-Side Activation (Well-Known Objects) – Singleton Objects only one instance is allocated to process all requests – SingleCall Objects one instance per call is allocated Client-Side Activation – Client Activated Objects the client allocates and controls the object on the server stateless stateful

145 © P. Reali / M. Corti System-Software WS 04/05 145 Distributed Object Systems Microsoft.NET Remoting, Limitations – Server-Activated Objects object configuration limited to the default constructor – Client-Activated Objects class must be instantiated, no access over interface class hierarchy limitations use Factory Pattern – to get interface reference – to allow parametrization of the constructor – Furthermore... interface information is lost when passing an object reference to another machine no control over the channel – which channel is used – which peer is allowed to connect

146 © P. Reali / M. Corti System-Software WS 04/05 146 Distributed Object Systems Case Study: IIOP.NET Opensource project based on ETH-Diploma thesis – http://iiop-net.sourceforge.net/ IIOP.NET (marketing) – Provide seamless interoperability between.NET and CORBA- based peers (including J2EE) IIOP.NET (technical).NET remoting channel implementing the CORBA IIOP protocol Compiler to make.NET stubs from IDL definitions IDL definition generator from.NET metadata

147 © P. Reali / M. Corti System-Software WS 04/05 147 Distributed Object Systems Case Study: IIOP.NET IIOP rather than SOAP transparent reuse of existing servers tight coupling object-level granularity efficiency Runtime: standard.NET remoting channel for IIOP transport sink formatter type-mapper Build tools IDL CLS compiler CLS IDL generator.NET server.NET client J2EE server Java client CORBA objects IIOPbinaryIIOP Java Type System IDL Type SystemCLS Type System Possible Types IDL Mappable Types IDL Mappable Types Interop Subset

148 © P. Reali / M. Corti System-Software WS 04/05 148 Distributed Object Systems Case Study: IIOP.NET, Interoperability Communication Protocols Data Model Message Format Contextual Data Interception Layer Conversation Services Application TCP/UDP, Byte stream, point-to-point communication Type system, mapping and conversion issues RPC, IIOP, HTTP, SOAP, proprietary binary format, messages, unknown data (exceptions), encryption SessionID, TransactionID, cultureID, logical threadID … Activation model (EJB, MBR), global naming, distributed garbage collection, conversational state,… Distributed Transaction Coordinator, Active Directory, … This is what we want

149 © P. Reali / M. Corti System-Software WS 04/05 149 Distributed Object Systems Case Study: IIOP.NET, Granularity ServiceComponentObject Message-based Interface, Stateless Strongly-typed Interface, Stateless or Stateful Implementation Dependency, Stateful Object Component Object Component Object Component Service System Granularity Coupling, Interaction

150 © P. Reali / M. Corti System-Software WS 04/05 150 Distributed Object Systems Case Study: IIOP.NET 1.01.11.21.31.41.5 1 st Article2 nd Article 1.6

151 © P. Reali / M. Corti System-Software WS 04/05 151 Distributed Object Systems Case Study: IIOP.NET, Performance Test Case: – WebSphere 5.0.1 as server – Clients IBM SOAP-RPC Web Services IBM Java RMI/IIOP IIOP.NET – Response time receiving 100 beans from server – WS: 4.0 seconds – IIOP.NET: 0.5 seconds when sending many more beans, WS are then 200% slower than IIOP.NET Source: posted on IIOP.NET forum

152 © P. Reali / M. Corti System-Software WS 04/05 152 Processes and Threads Introduction CPU as resource, provide abstraction to it Allow multiprogramming – pseudo-parallelism (single-processors) – real parallelism (multi-processors) Required abstractions – multiple activities -- execution of instructions – protection of resources – synchronization of activities Topics – coroutines – processes – threads – scheduling fairness starvation – synchronization deadlocks

153 © P. Reali / M. Corti System-Software WS 04/05 153 Processes and Threads Multithreading Call a.run Call b.Q Call b.q Return b.q Call c.R Return c.R Return b.Q Return a.run a.run b.Q Threa d 1 b.q Call c.run Call d.Q Call d.q Return d.q Call e.R Return e.R Return d.Q Return c.run Threa d 2 c.run d.Q e.R Stack 1 Stack 2 1 2 time 1 2 1 2

154 © P. Reali / M. Corti System-Software WS 04/05 154 Processes and Threads Coroutines (1) Coroutines – each activity has its own stack, address-space is shared – explicit context switch (stack only) under programmers control – uses Transfer call switch to another coroutine

155 © P. Reali / M. Corti System-Software WS 04/05 155 Processes and Threads Coroutines (2) Subroutines Coroutinen Call Return Start Transfer Call Return

156 © P. Reali / M. Corti System-Software WS 04/05 156 Processes and Threads Coroutines (3) TYPE Coroutine = POINTER TO RECORD FP: LONGINT; stack: POINTER TO ARRAY OF SYSTEM.BYTE; END; VAR cur: Coroutine; (* Current Coroutine *) PROCEDURE Transfer*(to: Coroutine); BEGIN SYSTEM.GETREG(SYSTEM.EBP, cur.FP); cur := to; SYSTEM.PUTREG(SYSTEM.EBP, cur.FP); END Transfer; MOV ESP, EBP POP EBP RET 4 PUSH EBP SUB ESP, 4 save FP restore FP

157 © P. Reali / M. Corti System-Software WS 04/05 157 Processes and Threads Coroutines (4) to SP FP PC FP locals stack Q stack P to SP FP PC FP locals stack Q stack P Q pc x locals FP FP := Q.FP to SP FP PC FP locals stack Q stack P Q pc x locals FP Transfer(Q) FP stack Q stack P Q pc x locals FP SP return jump at PC

158 © P. Reali / M. Corti System-Software WS 04/05 158 Processes and Threads Coroutines (5) Current stack: current execution state All other stacks: top PAF (proc activation frame) contains last Transfer call Start: create stack with fake Transfer-like PAF PROCEDURE Start(C: Coroutine; size: LONGINT); BEGIN NEW(C.stack, size); tos := SYSTEM.ADR(C.stack[0])+LEN(C.stack); SYSTEM.PUT(tos-4, 0);(* par = null *) SYSTEM.PUT(tos-8, 0);(* PC = null, not allowed to return *) SYSTEM.PUT(tos-12, 0);(* FP *) cur.FP := tos-12; END;

159 © P. Reali / M. Corti System-Software WS 04/05 159 Processes and Threads Problems caused by multitasking Concurrent access to resources – protection limit access to a resource – synchronization synchronize task with resource state or other task Concurrent access to CPU – task priorities – scheduling One problems solution is another problems cause.... – deadlocks – fairness – deadlines / periodicity constraints

160 © P. Reali / M. Corti System-Software WS 04/05 160 Processes and Threads Protection: Mutual Exclusion Mutual Exclusion only one activity is allowed to access one resource at a time disable interrupts (single CPU only, avoid switches) locks flag: lock taken / lock free spin lock (uses busy waiting) exclusive lock read-write lock (multiple reader, one writers)

161 © P. Reali / M. Corti System-Software WS 04/05 161 Processes and Threads Protection: Monitor Shared resources as Monitor resources are passive objects execution of critical sections inside monitor is mutually exclusive Global Monitor Lock Shared Monitor Lock for read-access (optional) monitor as a special module [original version (Hoare, Brinch Hansen)] object instance as monitor method and code block granularity Java, C#, Active Oberon,... Resource task P task Q acquire release acquire release

162 © P. Reali / M. Corti System-Software WS 04/05 162 Processes and Threads Protection Simplistic implementation with coroutines Non-reentrant lock (no recursion allowed) PROCEDURE Acquire(r: Resource); BEGIN IF r.taken THEN InsertList(r.waiting, cur); SwitchToNextRoutine() ELSE r.taken := TRUE END END Acquire; PROCEDURE Release(r: Resource); BEGIN next := GetFromList(r.waiting); IF next # NIL THEN InsertList(ready, next); Transfer(GetNextTask()); ELSE r.taken := FALSE END END Release; one waiting queue per resource is required

163 © P. Reali / M. Corti System-Software WS 04/05 163 Processes and Threads Protection Shared resource as Process synchronization during communication Communicating Sequential Processes (CSP) C.A.R. Hoare (1978) Model of communication Rendez-vous between two processes P!x (send x to process P) Q?y (ask y from process Q) Used in Ada, Occam task P task Q Q!z P?x task P task Q Q!z P?x

164 © P. Reali / M. Corti System-Software WS 04/05 164 Processes and Threads Protection Some variations on the theme.... – Reentrant Locks – Readers / Writers one writer or multiple readers allowed – Binary Semaphores one activity can get the resource – Generic Semaphores N activities are allowed to get the resource

165 © P. Reali / M. Corti System-Software WS 04/05 165 Processes and Threads Synchronization Wait on a condition / state lSignals with Send/Wait Methods Require cooperation from all processes Example: Producer/Consumer with conditions nonempty/nonfull Semantic of Send vSend-and-Pass vs. Send-and-Continue lGeneric system-handled conditions (Active Oberon) lAWAIT(x > y); Wait on partner process lCSP

166 © P. Reali / M. Corti System-Software WS 04/05 166 Processes and Threads Synchronization: Implementation Example mProcess list ldouble-chained list of all coroutines lcur points to current (running) coroutine leach signal has a LIFO list C2 C1 C4 C5C3 s link ready Signal cur

167 © P. Reali / M. Corti System-Software WS 04/05 167 Processes and Threads Synchronization: Implementation Example Schedule prev := cur; WHILE ~cur.ready & cur.next # prev DO cur := cur.next END; IF cur.ready THEN Transfer(cur) ELSE (*deadlock*) END Terminate cur.next.prev := cur.prev; cur.prev.next := cur.next; Schedule

168 © P. Reali / M. Corti System-Software WS 04/05 168 Processes and Threads Synchronization: Implementation Example Send(s) IF s # NIL THEN (*send-and-pass*) cur := s; s.ready := TRUE; s := s.link END; Schedule (*to next ready from cur*) Wait(s) cur.link := s; s := cur; cur.ready := FALSE; Schedule (*to next ready from cur*) Init(s) s := NIL

169 © P. Reali / M. Corti System-Software WS 04/05 169 Processes and Threads Active Oberon: Bounded Buffer Buffer* = OBJECT VAR data: ARRAY BufLen OF INTEGER; in, out: LONGINT; (* Put - insert element into the buffer *) PROCEDURE Put* (i: INTEGER); BEGIN {EXCLUSIVE} (*AWAIT ~full *) AWAIT ((in + 1) MOD BufLen # out); data[in] := i; in := (in + 1) MOD BufLen END Put; (* Get - get element from the buffer *) PROCEDURE Get* (VAR i: INTEGER); BEGIN {EXCLUSIVE} (*AWAIT ~empty *) AWAIT (in # out); i := data[out]; out := (out + 1) MOD BufLen END Get; PROCEDURE & Init; BEGIN in := 0; out := 0; END Init; END Buffer;

170 © P. Reali / M. Corti System-Software WS 04/05 170 Processes and Threads CSP: Bounded Buffer (I) [bounded_buffer || producer || consumer] producer :: *[ bounded_buffer ! item; ] consumer :: *[bounded_buffer ? item; ] Geoff Coulson Lancaster University

171 © P. Reali / M. Corti System-Software WS 04/05 171 Processes and Threads CSP: Bounded Buffer (II) bounded_buffer :: buffer: (0..9) item; in, out: integer; in := 0; out := 0; *[ in < out+10; producer ? buffer(in mod 10) -> in := in + 1; || out < in; consumer ! buffer(out mod 10) -> out := out + 1; ]

172 © P. Reali / M. Corti System-Software WS 04/05 172 Processes and Threads Process State Process states 1. Running: actually using the CPU 2. Ready: waiting for a CPU 3. Blocked: unable to run, waiting for external event – Process state transitions 1. wait for external event 2. system scheduler 3. system scheduler 4. external event happens Running BlockedReady 1 2 3 4

173 © P. Reali / M. Corti System-Software WS 04/05 173 Processes and Threads Process State (Active Oberon) Active Oberon provides – monitor-like object protection – conditions Condition are checked by the system. No explicit help or knowledge from user is required (no x.Signal) Running Awaiting Object Ready Awaiting Condition

174 © P. Reali / M. Corti System-Software WS 04/05 174 Activities Program (static concept) Process (dynamic) Processes, jobs, tasks, threads (differences later) – program code – context: program counter (PC) and registers stack pointer state – [new] – running – waiting – ready – [terminated] – stack – data section (heap)

175 © P. Reali / M. Corti System-Software WS 04/05 175 Processes vs. Threads Process or job (heavyweight) – code – address space – processor state – private data (stack+registers) – can have multiple threads Thread (lightweight) – shared code – shared address space – processor state – private data (stack+registers) CPU Kernel

176 © P. Reali / M. Corti System-Software WS 04/05 176 Processes vs. Threads: Example PROC 1 instr … instr PROC 2 instr … instr HEAP 1 STACK 1 HEAP 2 STACK 2 PROC instr … instr HEAP STACK 1STACK 2

177 © P. Reali / M. Corti System-Software WS 04/05 177 Programmed events that can cause a task switch – protection (locks) acquire release – synchronization wait on a condition send a signal (send-and-pass) System events that can cause a task switch – voluntary switch (yield, task termination) – process with higher priority becomes available – consumption of the allowed time quantum synchronous asynchronous task preemption Multitasking

178 © P. Reali / M. Corti System-Software WS 04/05 178 Preemption Assign each process a time-quantum (normally in the order of tens of ms) Asynchronous task switches can happen at any time! – task can be in the middle of a computation – save whole CPU state (registers, flags,...) Perform switch – on resource conflict – on synchronization request – on timer-interrupt (time-quantum is over)

179 © P. Reali / M. Corti System-Software WS 04/05 179 Context switch Scheduler invocation: – preemption interrupt – cooperation explicit call Operations: – store the process state (PC, regs, …) – choose the next process (strategy) – [accounting] – restore the state of the next process (regs, SP, PC, …) – jump to the restored PC A context switch is usually expensive: 1–1000 s depending on the system and number of processes – hardware optimizations (e.g., multiple sets of registers – SPARC, DECSYSTEM-20)

180 © P. Reali / M. Corti System-Software WS 04/05 180 Scheduling algorithms Three categories of environments: batch systems (e.g., VPP, DOS) – usually non-preemptive (i.e., task is not stopped by scheduler, only synchronous switches) interactive systems (UNIX, Windows, Mac OS) – cooperative or preemptive – no task allowed to have the CPU forever real-time systems (PathWorks, RT Linux) – timing constraints (deadlines, periodicity)

181 © P. Reali / M. Corti System-Software WS 04/05 181 Scheduling Performance CPU utilization Throughput – number of jobs per time unit – minimize context switch penalty Turnaround time – = exit time - arrival time – execution, wait, I/O Response time – = start time - request time Waiting time (I/O, waiting, …) Fairness

182 © P. Reali / M. Corti System-Software WS 04/05 182 Scheduling algorithm goals All systems – Fairness give every task a chance – Policy enforcement – Balance keep all subsystems busy Interactive systems – Response time respond quickly – Proportionality meet users expectations Batch systems – Throughput maximize number of jobs – Turnaround time minimize time in system – CPU utilization keep CPU busy Real-time systems – Meet deadlines avoid losing data – Predictability avoid degradation – Hard- vs. soft-real-time systems

183 © P. Reali / M. Corti System-Software WS 04/05 183 Batch Scheduling Algorithms Choose task to run (task is usually not preempted) First Come First Serve (FCFS) – fair, may cause long waiting times Shortest Job First (SJF) – requires knowledge about job length Longest Response Ratio – response ratio = (time in the system / CPU time) – depends on the waiting time Highest Priority First – with or without preemption Mixed – the priority is adjusted dynamically (time in queue, length, priority, …) ETH-VPP is a batch system! Which algorithm does it use?

184 © P. Reali / M. Corti System-Software WS 04/05 184 Time sharing – Each task has a predefined time quantum – Round-Robin Schedule next task on the ready list – Quantum choice: small: may cause frequent switches big: may cause slow response – Implicit assumption: all task have same importance P1 P4 P3 P2 next Preemptive Scheduling Algorithms

185 © P. Reali / M. Corti System-Software WS 04/05 185 Preemptive Scheduling Algorithms Priority scheduling – process with highest priority is scheduled first Variants – multilevel queue scheduling one list per priority, use round-robin on list – dynamic priorities proportional to time in system inversely proportional to part of quantum used – make time quantum proportional to priority

186 © P. Reali / M. Corti System-Software WS 04/05 186 Real-Time Scheduling Algorithms Task needs to meet the deadline! Task cost is known (should) Two task kind: – aperiodic – periodic Reservation – scheduler decides if system has enough resources for the task Algorithms: – Rate Monotonic Scheduling assign static priorities (priority proportional to frequency) – Earliest Deadline First task with closest deadline is chosen

187 © P. Reali / M. Corti System-Software WS 04/05 187 Scheduling Algorithm Example Situation: – Tasks P1, P2, P3, P4 Arrive at time t = 0 Priority: P1 highest, P4 lowest Time to process: 10, 2, 5, 3

188 © P. Reali / M. Corti System-Software WS 04/05 188 P1 P2 P3 P4 010 12 17 20 Scheduling Algorithm Example Highest Priority First

189 © P. Reali / M. Corti System-Software WS 04/05 189 P1 P2 P3 P4 0 2 20 10 5 Scheduling Algorithm Example Shortest Job First

190 © P. Reali / M. Corti System-Software WS 04/05 190 P1 P2 P3 P4 0 24681012 13 14161820 Scheduling Algorithm Example Timesharing with quantum = 2

191 © P. Reali / M. Corti System-Software WS 04/05 191 P1 P2 P3 P4 0 8111520 running at 1/4 running at 1/3 running at 1/2 Scheduling Algorithm Example Timesharing with quantum 0

192 © P. Reali / M. Corti System-Software WS 04/05 192 Scheduling Algorithm Example: Results Situation: – Tasks P1, P2, P3, P4 Arrive at time t = 0 Priority: P1 highest, P4 lowest Time to process: 10, 2, 5, 3 – Results turnaround response time Highest Priority First: 14.759.75 Shortest Job First: 9.254.25 Timesharing with Quantum = 2:12.753.0 Timesharing with Quantum 0: 13.50

193 © P. Reali / M. Corti System-Software WS 04/05 193 Scheduling Examples UNIX – preemption – 32 priority levels (round robin) – each second the priorities are recomputed (CPU usage, nice level, last run) BSD similar – every 4th tick priorities are recomputed (usage estimation) Windows NT – real time priorities: fixed, may run forever – variable: dynamic priorities, preemption – idle: last choice (swap manager)

194 © P. Reali / M. Corti System-Software WS 04/05 194 Scheduling Examples: Quantum & Priorities Win2K: – quantum = 20ms (professional) 120ms (user), configurable – depending on type (I/O bound) BSD: – quantum = 100ms – priority = f(load,nice,time last ) Linux: – quantum = quantum / 2 + priority – f(quantum, nice)

195 © P. Reali / M. Corti System-Software WS 04/05 195 Scheduling Problems Starvation A task is never scheduled (although ready) fairness Deadlock No task is ready (nor it will ever become ready) detection+recovery or avoidance

196 © P. Reali / M. Corti System-Software WS 04/05 196 Coffman conditions for a deadlock (1971): Mutual exclusion Hold and wait No resource preemption Circular wait (cycle) R1R1 R2R2 A holds R A wants S T1T1 T2T2 B holds S B wants R T Thread R Resource Deadlock Conditions

197 © P. Reali / M. Corti System-Software WS 04/05 197 Deadlock Remedies Coarser lock granularity: use a single lock for all resources (e.g., Linux 2.0-2.4 Big Kernel Lock) Locking order: resources are ordered resource locking according to the resource order (ticketing) Two-phase-locking: try to acquire all the resources if successful, lock them; otherwise free them and try again

198 © P. Reali / M. Corti System-Software WS 04/05 198 Deadlock Detection, Prevention & Recovery Deadlock detection: the system keeps a graph of locks and tries to detect cycles. – time consuming – the graph has to be kept consistent with the actual state Deadlock prevention (avoidance): remove one of the four Coffman conditions cycles Recovery: – kill processes and reclaim the resources – rollback: requires to save the states of the processes regularly

199 © P. Reali / M. Corti System-Software WS 04/05 199 A B C +S +T -S -T +T +R -T -R +R +S -R -S Simple Deadlock Scenario Example – Resources R, S, T – Tasks A, B, C require { R, S }, { S, T }, { T, R } respectively Case 1: Sequential execution, no deadlock

200 © P. Reali / M. Corti System-Software WS 04/05 200 A B C +R +S +T +S +T +R Simple Deadlock Scenario Case 2: Interleaving, deadlock CA B R TS

201 © P. Reali / M. Corti System-Software WS 04/05 201 D C A F B E G R S W T UV graphical representation is this a case of deadlock? Complex Deadlock Scenario Case with 6 resources and 7 tasks

202 © P. Reali / M. Corti System-Software WS 04/05 202 Locks Blocks Modules Configuration Memory Interrupts Threads Traps Timers Processors Module Lock Module Hierarchy Each Kernel Module has a lock to protect its data When multiple locks are needed, acquire them according to the module hierarchy Deadlock Avoidance Strategy in Bluebottle

203 © P. Reali / M. Corti System-Software WS 04/05 203 Priority Inversion A high-priority task can be blocked by a lower priority one. Example: Low running High ready waiting Medium

204 © P. Reali / M. Corti System-Software WS 04/05 204 Priority Inversion Big problem for RTOS Solutions – priority inheritance low-priority task holding resource inherits priority of high- priority task wanting the resource – priority ceilings each resource has a priority corresponding to the highest priority of the users +1 the priority of the resource is transferred to the locking process can be used instead of semaphores

205 © P. Reali / M. Corti System-Software WS 04/05 205 Example: Mars Pathfinder (1996–1998) VxWorks real-time system: preemptive, priorities Communication bus: shared resource (mutexes) Low priority task (short): meteorological data gathering Medium priority task (long): communication High priority: bus manager Detection: watchdog on bus activity system reset Fix: activate priority inheritance via an uploaded on- the-fly patch (no memory protection).

206 © P. Reali / M. Corti System-Software WS 04/05 206 Locking on Multiprocessor Machines Real parallelism! Cannot disable interrupts like on single processor machines (could stop every task, but not efficient) Software solutions – Peterson, Dekker,... Hardware support – bus locking – atomic instructions (Test And Set, Compare And Swap)

207 © P. Reali / M. Corti System-Software WS 04/05 207 Locking on multiprocessor machines Test And Set TAS s: IF s = 0 THEN s := 1 ELSE CC := TRUE END Compare and Swap (Intel) CAS R1, R2, A: R1: expected value R2: new value A: address IF R1 = M[A] THEN M[A] := R2; CC := TRUE ELSE R1 := M[A]; CC := FALSE END These instructions are atomic even on multiprocessors! The usually do so by locking the data bus

208 © P. Reali / M. Corti System-Software WS 04/05 208 Counter s: available resources Binary Semaphores with TAS Try TAS s JMP Try CS Spinning (busy wait) TAS s JMP Queuing CS Blocking Example: Semaphores on SMP

209 © P. Reali / M. Corti System-Software WS 04/05 209 Counter s: available resources Generic Semaphores with CAS P(s) Enter CS Load R1 s TryP MOVE R1 R2 DEC R2 CAS R1, R2, s BNE TryP CMP R2, 0 BN Queuing [CS] Load R1 s TryV MOVE R1 R2 INC R2 CAS R1, R2, s BNE TryV CMP R2, 0 BNP Dequeuing Exit CS V(s) Example: Semaphores on SMP P(S):{ S := S - 1} IF S < 0 THEN jump queuing END V(S):{ S := S + 1} IF S <= 0 THEN jump dequeuing END

210 © P. Reali / M. Corti System-Software WS 04/05 210 PROCEDURE AcquireSpinTimeout(VAR locked: BOOLEAN); CODE {SYSTEM.i386} MOV EBX, locked[EBP] ; EBX := ADR(locked) MOV AL, 1 ; AL := 1 CLI ; switch interrupts off before ; acquiring lock test: XCHG [EBX], AL; set and read the lock ; atomically. ; LOCK prefix implicit. CMP AL, 1; was locked? JE test; retry.. END AcquireSpinTimeout; simplified version Spin-Locks: the Bluebottle/i386 way

211 © P. Reali / M. Corti System-Software WS 04/05 211 Z = OBJECT VAR myT: T; I: INTEGER; PROCEDURE & NEW (t: T); BEGIN myT := t END NEW; PROCEDURE P (u: U; VAR v: V); BEGIN { EXCLUSIVE } i := 1 END P; BEGIN { ACTIVE } BEGIN { EXCLUSIVE } AWAIT (i > 0); END END Z; Condition State Object Activity Method Initialize r Mutual Exclusion Active Objects in Active Oberon

212 © P. Reali / M. Corti System-Software WS 04/05 212 Ready Queue Running Awaiting Assertion Awaiting Object Ready NIL Ready CPUs 1 Lock Queue Wait Queue 2 Active Oberon Runtime Structures

213 © P. Reali / M. Corti System-Software WS 04/05 213 Running Awaiting Assertion Awaiting Object Ready NIL 1 23 4 5 0 7 6 END Run next ready Preempt Set to ready; Run next ready 6 7 1 1 NEW Create object; Create process; Set to ready 0 Active Oberon Implementation

214 © P. Reali / M. Corti System-Software WS 04/05 214 Enter Monitor IF monitor lock set THEN Put me in monitor obj wait list; Run next ready ELSE set monitor lock END Exit Monitor Find first asserted x in wait list; IF x found THEN set x to ready ELSE Find first x in obj wait list; IF x found THEN set x to ready ELSE clear monitor lock END Run next ready 4 5 1 1 2 Running Awaiting Assertion Awaiting Object Ready NIL 1 23 4 5 0 7 6 Active Oberon Implementation

215 © P. Reali / M. Corti System-Software WS 04/05 215 Running Awaiting Assertion Awaiting Object Ready NIL 1 23 4 5 0 7 6 AWAIT Put me in monitor assn wait list; Call Exit monitor 3 Active Oberon Implementation

216 © P. Reali / M. Corti System-Software WS 04/05 216 p p q < p end of quantum Case Study: Windows CE 3.0 Real-time constraints – Reaction time on events – Execution time Threads with priorities and time quanta – Priorities: 0 (high), …, 255 (low) – Time quanta in ms Default 100 ms 0 no quantum Single processor

217 © P. Reali / M. Corti System-Software WS 04/05 217 IST ISR Event IRQ NK.EXE Kernel Modus User Modus Case Study: Windows CE 3.0 Interrupt Handling – ISR (Interrupt Service Routine) 1st level handling Kernel mode, uses kernel stack Installed at boot-time Creates event on-demand Preempted by ISR with higher priority – IST (Interrupt Service Thread) 2nd level handling User mode Awaits events

218 © P. Reali / M. Corti System-Software WS 04/05 218 [ [ ] [ ] ] CS Case Study: Windows CE 3.0 Synchronization on common resources: – Critical sections: enter, leave operations – Semaphores and mutexes (binary semaphores) Synchronization is performed with system/library calls (they are not part of a language). Priority inversion avoidance – priority inheritance (thread inherits priority of task wanting the resource)

219 © P. Reali / M. Corti System-Software WS 04/05 219 Case Study: Java Activities are mapped to threads (no processes) Synchronization in the language – locks – signals Threads provided by the library Scheduling depends on the JVM

220 © P. Reali / M. Corti System-Software WS 04/05 220 Case Study: Java public class MyThread() extends Thread { public void run() { System.out.println("Running"); } public static void main(String [] arguments) { MyThread t = (new MyStread()).start(); }

221 © P. Reali / M. Corti System-Software WS 04/05 221 Case Study: Java public class MyThread() implements Runnable { public void run() { System.out.println("Running"); } public static void main(String [] arguments) { Thread t = (new Thread(this)).start(); }

222 © P. Reali / M. Corti System-Software WS 04/05 222 Case Study: Java Protection with monitor-like objects – with method granularity public synchronized void someMethod() – with statement granularity synchronized(anObject) {... } Synchronization with signals – wait() (with optional time-out) – notify() / notifyAll() (send and continue pattern)

223 © P. Reali / M. Corti System-Software WS 04/05 223 Case Study: Java private Object o; public synchronized consume() { while (o == null) { try { wait(); } catch (InterruptedException e) {} } use(o); o = null; notifyAll(); } public synchronized void produce(Object p) { while (o != null) { try { wait(); } catch (InterruptedException e) {} } o = p; notifyAll(); }

224 © P. Reali / M. Corti System-Software WS 04/05 224 Case Study: POSIX Threads Standard interface for threads in C Mostly UNIX, possible on Windows Provided by a library (libpthread) and not part of the language. IEEE POSIX 1003.1c standard (1995) Various implementations (both user and kernel level)

225 © P. Reali / M. Corti System-Software WS 04/05 225 Case Study: POSIX Threads #include pthread_mutex_t m; void *run(){ pthread_mutex_lock(&m); // critical section pthread_mutex_unlock(&m); pthread_exit(NULL); } int main (int argc, char *argv[]){ pthread_t t; pthread_create(&t, NULL, run,NULL); pthread_exit(NULL); }

226 File Systems

227 © P. Reali / M. Corti System-Software WS 04/05 227 File Systems - Overview Hardware File abstraction File organization File systems – Oberon – Unix – FAT Distributed file systems – NFS – AFS Special topics – Error recovery – ISAM – B* Trees

228 © P. Reali / M. Corti System-Software WS 04/05 228 Hardware: the ATA Bus ATA / IDE (1986) – Advanced Technology Attachment – Integrated Drive Electronics ATA-2 / EIDE ATA-4 / ATAPI – ATA Packet Interface (SCSI command set) ATA-5 – UDMA 66 ATA-6 – UDMA 100 – SATA ATA-7 – UDMA 133 bus with 2 devices – master / slave low-level interface – head / cylinder / sector – support for LBA (logical block addressing) PIO mode – read byte by byte through hardware port DMA mode – use DMA transfer

229 © P. Reali / M. Corti System-Software WS 04/05 229 Hardware: the SCSI Bus SCSI: Small Computer Systems Interface SCSI-2 – Fast SCSI – Wide SCSI SCSI-3 Bus with 8 devices – wide: 16 / 32 devices – bus arbitration – disconnected mode Device kinds – direct access – CD-ROM –... Block-oriented access – read-block, write-block Transfer mode selection – asynchronous (hand-shake) – synchronous (period / offset)

230 © P. Reali / M. Corti System-Software WS 04/05 230 surface (head) rotation axis track (cylinder) sector Hardware: Hard Disk Organization – cylinder (c) – head (h) – sector (s) Addressing – sector (c, h, s) – block (LBA)

231 © P. Reali / M. Corti System-Software WS 04/05 231 Hardware: Example Current disk example: ATA-100 250GB 512 bytes per sector (488·10 6 sectors) 8MB cache 8.9ms average seek time 7200 rpm

232 © P. Reali / M. Corti System-Software WS 04/05 232 1 2 3 4 5 6 7 cylinder Hardware: Hard Disk Improvements Interleaving optimize sequential sector access Read-ahead Caching Sector defect management

233 © P. Reali / M. Corti System-Software WS 04/05 233 Hardware: Disk Scheduling Disk controllers have a queue of pending requests: – type: read or write – block number: translated into the (h,c,s)-tuple – memory address (where to copy from and to) – amount to be transferred (byte or block count)

234 © P. Reali / M. Corti System-Software WS 04/05 234 Hardware: Disk Scheduling First-come, first-served (FCFS) Shortest-seek-time-first (SSTF) SCAN (elevator) & C-SCAN LOOK & C-LOOK Performance: minimize head movements, maximize throughput Scheduling is now in the hardware

235 © P. Reali / M. Corti System-Software WS 04/05 235 Hardware: Disk Scheduling Example (head position, track number): queue = 31, 72, 4, 18, 147, 193, 199, 153, 114, 72

236 © P. Reali / M. Corti System-Software WS 04/05 236 Hardware: Disk Scheduling

237 © P. Reali / M. Corti System-Software WS 04/05 237 Abstractions Block: array of sectors some systems call them clusters user configured reduces address space increases access speed causes internal fragmentation Disk: array of sectors File: stream of bytes sequential access random access stored on disk – mapping byte to block – block allocation management

238 © P. Reali / M. Corti System-Software WS 04/05 238 Disk ReadSector, WriteSector Volume ReadBlock, WriteBlock AllocateBlock, FreeBlock File System OpenFile, WriteFile, ReadFile, SeekFile, CloseFile Abstractions Implementations ATA driver SCSI driver FAT Oberon ISO 9660 Abstraction Layers ext3 NTFS

239 © P. Reali / M. Corti System-Software WS 04/05 239 File Organization How can we map groups of blocks into files? How do we manage free space? How can I jump to a certain location? Operation: read n bytes at position p.

240 © P. Reali / M. Corti System-Software WS 04/05 240 File Organization: Contiguous Allocation File is a group of contiguous blocks Simple management Fast transfers IBM MVS (mainframe) start length

241 © P. Reali / M. Corti System-Software WS 04/05 241 File Organization: Contiguous Allocation external fragmentation allocation – how much space does a file need? – first fit, best fit, …? file growth (error? move? extensions?) preallocation: internal fragmentation start length

242 © P. Reali / M. Corti System-Software WS 04/05 242 File Organization: Linked Allocation File is a linked list of blocks – no external fragmentation – no growth problems Problems – sequential files only (positioning requires traversal) – space for pointers (1TB, 5B addr., 1% with 512B blocks) – reliability (lost pointers) start

243 © P. Reali / M. Corti System-Software WS 04/05 243 File Organization: Linked Allocation Clusters: series of contiguous blocks – faster (less jumps) – less space wasted for pointers – internal fragmentation start

244 © P. Reali / M. Corti System-Software WS 04/05 244 File Organization: Linked Allocation Pointer tables – the list of pointers is stored in a separate table – can be cached – usually is stored twice (reliability) – FAT (MS-DOS, OS/2, Windows, solid-state memory) start

245 © P. Reali / M. Corti System-Software WS 04/05 245 File Organization: Indexed Allocation Index with block addresses Fast access for random-access files No external fragmentation Problems – high management overhead – limited file size (depending on the index structure) – pointer overhead file

246 © P. Reali / M. Corti System-Software WS 04/05 246 File Organization: Indexed Allocation Variation: – linked list of indexes Advantage: – no file size limitation Disadvantage: – Index lookup requires sequential traversal of index list file

247 © P. Reali / M. Corti System-Software WS 04/05 247 File Organization: Indexed Allocation multi-level indexes (index of indexes) UNIX Advantage: – fast index lookup Disadvantage: – limited file size file

248 © P. Reali / M. Corti System-Software WS 04/05 248 File Organization: Indexed Allocation Example: blocks 2KB address 4B First level index blocks: 512 entries · 2KB = 1MB Second level index block: 512 entries · 2KB = 0.5GB file

249 © P. Reali / M. Corti System-Software WS 04/05 249 Free Space Management Bitmap (e.g., HFS) – bit vector to mark free blocks – simple – needs caching Linked lists – list of free blocks (similar to linked allocation) Grouping – free blocks contain n address of free blocks (similar to multilevel indexing) Counting – list of 2-tuples of series of free blocks (start, length)

250 © P. Reali / M. Corti System-Software WS 04/05 250 Case Study: Oberon File System Disk module: controller driver – block management FileDir module: – maps files to locations – implemented with B-trees – garbage collection (files) the directory is the root set anonymous (nonregistered) files are collected Files module: – allows user operations (read, create, write, …) – access is performed through riders Files FileDir Disk

251 © P. Reali / M. Corti System-Software WS 04/05 251 Characteristics Block size = 1KB File organization – multilevel index: 64 direct 12 1st level indirect – 672 data bytes in file header Block allocation – allocation table created at boot-time (partition GC) – no collection at run-time (partition fills up!) designed to optimize small files Case Study: Oberon File System

252 © P. Reali / M. Corti System-Software WS 04/05 252 d 0101 63 75 d (672B) d d d d d d d d d d d d d d d d d i1i1 d i2i2 i1i1 12 index blocks with 256 data blocks each 64 blocks Case Study: Oberon File System Block = 1KB

253 © P. Reali / M. Corti System-Software WS 04/05 253 Free block management: bitmap Garbage collection at startup Case Study: Oberon File System 11111111111111111111111111111111 11010010011110111101110100011100 11010010011110110001110100011100 11010010011110110000110100011100 startup / GC allocate 16,17 allocate 19 081624 081624 081624 081624

254 © P. Reali / M. Corti System-Software WS 04/05 254 R f f File Handle R Buffer R Rider Hint Case Study: Oberon File System Internals Rider: current read or write position Buffer (cache) for consistency (each file sees the write operations on it)

255 © P. Reali / M. Corti System-Software WS 04/05 255 Case Study: Oberon RAM Disk File = POINTER TO Header; Index = POINTER TO Sector; Rider = RECORD eof: BOOLEAN; file: File; pos: LONGINT; adr: LONGINT; END; Header = RECORD mark: LONGINT; name: FileDir.Name; len, time, date: LONGINT ext: ARRAY 12 OF Index; sec: ARRAY 64 OF SectorTable; END; ext table primary sector table header points to sectors 0 - 63 index sector 0 points to sectors 64 - 319 index sector 1 points to sectors 320 - 575

256 © P. Reali / M. Corti System-Software WS 04/05 256 Case Study: Oberon RAM Disk PROCEDURE Read(VAR r: Rider; VAR x: SYSTEM.BYTE); VAR m: INTEGER; BEGIN IF r.pos < r.file.len THEN SYSTEM.GET(r.adr, x); INC(r.adr); INC(r.pos); IF r.adr MOD SS = 0 THEN (*end of sector *) m := SHORT(r.pos DIV SS); IF m < STS THEN r.adr := r.file.sec[m] ELSE r.adr := r.file.ext[(m-STS) DIV XS].x[(m-STS) MOD XS] END ELSE x := 0X; r.eof := TRUE END END Read; SS = Sector Size STS = Sector Table Size XS = Index Size

257 © P. Reali / M. Corti System-Software WS 04/05 257 Case Study: Oberon RAM Disk PROCEDURE Write(VAR r: Rider; x: SYSTEM.BYTE); VAR k, m, n: INTEGER; ix: LONGINT; BEGIN IF r.pos < r.file.len THEN m := SHORT(r.pos DIV SS); INC(r.pos); IF m < STS THEN r.adr := r.file.sec[m] ELSE r.adr := r.file.ext[(m-STS) DIV XS].x[(m-STS) MOD XS] END ELSE.... END; SYSTEM.PUT(r.adr, x); INC(r.adr); END Write; overwrite

258 © P. Reali / M. Corti System-Software WS 04/05 258 Case Study: Oberon RAM Disk IF r.pos < r.file.len THEN.... ELSE IF r.adr MOD SS = 0 THEN m := SHORT(r.pos DIV SS); IF m < STS THEN Kernel.AllocSector(0, r.adr); r.file.sec[m] := r.adr ELSE n := (m-STS) DIV XS; k := (m-STS) MOD XS; IF k = 0 THEN Kernel.AllocSector(0, ix); r.file.ext[n] := SYSTEM.VAL(Index, ix) END; Kernel.AllocSector(0, r.adr); r.file.ext[n].x[k] := r.adr END; INC(r.pos); r.file.len := r.pos END; SYSTEM.PUT(r.adr, x); INC(r.adr); expand

259 © P. Reali / M. Corti System-Software WS 04/05 259 Case Study: UNIX, inodes Inode: file owner file type – regular / directory / special access permissions access time reference count (links) table of contents file size Inode table of contents 10 (12) direct blocks 1 indirect block 1 double indirect block 1 triple indirect block File system: files and directories (files with a special content) A file is represented by an inode

260 © P. Reali / M. Corti System-Software WS 04/05 260 d d i3i3 i3i3 i2i2 i2i2 i2i2 i2i2 i1i1 i1i1 i1i1 i1i1 i1i1 i1i1 Case Study: UNIX, inodes 0101 10 11 12 i3i3 i2i2 i1i1 d info inode type access refc i2i2 i1i1 i1i1 d d d d d d d d d

261 © P. Reali / M. Corti System-Software WS 04/05 261 Case Study: UNIX, directories Directories are normal files with a special content. The data part contains a list with – inode – name Every directory has two special entries –.the directory itself –..the parent directory

262 © P. Reali / M. Corti System-Software WS 04/05 262 Case Study: UNIX, inodes type: dir blocks: 132 owner: root ref count: 1 inode 2 / 2. 2.. 4bin 3root block 132 type: dir blocks: 406 owner: root ref count: 1 inode 3 /root/ 3. 2.. 5.tcshrc 6mbox block 406 type: file blocks: 42, 103 owner: root ref count: 1 inode 6 data block 42 data block 103 inode #name inodes disk block

263 © P. Reali / M. Corti System-Software WS 04/05 263 Case Study: UNIX, soft and hard links Hard links: – two directories entries with the same inode number – each file has a reference counter 42file 42hardlink Soft links – the directory entry points to a special file with the path of the linked file 42 file 43softlink (inode 43 points to a special file with the path of file )

264 © P. Reali / M. Corti System-Software WS 04/05 264 Case Study: UNIX, hard links type: dir blocks: 132 owner: root ref count: 1 inode 2 / 2. 2.. 4bin 3root block 132 type: dir blocks: 406 owner: root ref count: 1 inode 3 /root/ 3. 2.. 5mails 5mbox block 406 type: file blocks: 42, 103 owner: root ref count: 2 inode 5 data block 42 data block 103 inodes disk block

265 © P. Reali / M. Corti System-Software WS 04/05 265 Case Study: UNIX, soft links type: dir blocks: 132 owner: root ref count: 1 inode 2 / 2. 2.. 4bin 3root block 132 type: dir blocks: 406 owner: root ref count: 1 inode 3 /root/ 3. 2.. 5mbox 6mails block 406 type: file blocks: 42 owner: root ref count: 1 inode 5 data block 42 type: file blocks: 43 owner: root ref count: 1 inode 6 /root/mbox block 43

266 © P. Reali / M. Corti System-Software WS 04/05 266 Case Study: UNIX, Volume Layout A volume (partition) contains boot block – bootstrap code super block – size – max file – free space – … inodes data blocks boot block super block inode listdata blocks

267 © P. Reali / M. Corti System-Software WS 04/05 267 Case Study: UNIX, Functions Core functions breadread block bwritewrite block igetget inode from disk iputput inode to disk bmapmap (inode, offset) to disk block nameiconvert path name to inode

268 © P. Reali / M. Corti System-Software WS 04/05 268 Case Study: UNIX, namei namei (path) if (absolute path) inode = root; else inode = current directory inode; while (more path to process) { read directory (inode); if match(directory, name component) { inode = directory[name component]; iget(inode); } else { return no inode; } return inode;

269 © P. Reali / M. Corti System-Software WS 04/05 269 FATnn: nn corresponds to the FAT size in bits FAT12, FAT16, FAT32 used by MS-DOS and Windows for disks and floppies Volume Layout boot block FAT1FAT2root directory data Case Study: FAT

270 © P. Reali / M. Corti System-Software WS 04/05 270 Case Study: FAT, Example 0 1 2EOF 3 412 5FREE 69 7BAD 83 911 10EOF 1110 12EOF 13FREE … 691110 412 38 File 1: File 2: File 3: disk size

271 © P. Reali / M. Corti System-Software WS 04/05 271 Case Study: FAT, Directory Information about files is kept in the directory File name (8) Extension (3) ADVSHR Reserved (10) Time (2) Date (2) First block (2) File size (4)

272 © P. Reali / M. Corti System-Software WS 04/05 272 Case Study: FAT, Max. Partition Size Block sizeFAT-12FAT-16FAT-32 0.5 KB2 MB 1 KB4 MB 2 KB8 MB128 MB 4 KB16 MB256 MB1 TB 8 KB512 MB2 TB 16 KB1024 MB2 TB 32 KB2048 MB2 TB

273 © P. Reali / M. Corti System-Software WS 04/05 273 File System Mounting More than one volume mounted in the same directory tree. /usr mnt floppy dos cd homecorti bin afs ethz.ch

274 © P. Reali / M. Corti System-Software WS 04/05 274 Virtual File System Support for several file systems – disk based – network – special VFS: unifies the system calls Mirrors the traditional UNIX file system model Applications ext3FATNFSAFSprocpts ext3FATNFSAFSprocptsVFS

275 © P. Reali / M. Corti System-Software WS 04/05 275 File System Mounting Each file system type has a method table System calls are indirect function calls through the method table Common interface (open, write, readdir, lock, …) Each file is associated with a the method table

276 © P. Reali / M. Corti System-Software WS 04/05 276 File System Mounting: Special Files Devices – disks – memory – USB devices – serial ports – … Kernel communication (e.g., proc) Uniform interface (open, close, read, write) Uniform protection (user, groups)

277 © P. Reali / M. Corti System-Software WS 04/05 277 File Systems: Protection Restrict: access (who), operations (what), management – FAT: flags in the directory e.g., read only execution based on name – UNIX: restrictions in inodes based on users and groups operations: read, write, execute directories: manage files not so flexible – VMS: access lists list of users and rights per file

278 Distributed File Systems

279 © P. Reali / M. Corti System-Software WS 04/05 279 Distributed File Systems (DFS) Clients, servers and storage are dispersed among machines in a distributed system. Client Server Client Server

280 © P. Reali / M. Corti System-Software WS 04/05 280 Overview Naming (dynamic): location transparency: file name does not reveal the file location location independence: file name does not change when storage is moved Caching (efficiency) write-through delayed-write write-on-close Consistency client-initiated: poll server for changes server-initiated: notify clients

281 © P. Reali / M. Corti System-Software WS 04/05 281 Naming Simple approaches: – file is identified by a host, path pair – Ibis (host:path) – SMB (\\host\path) Transparent – remote directory are mounted in the local file system – not uniform (the mount point is not defined) – NFS (/mnt/home, /home/) – SMB (\\host\path mounted on Z:) Global name structure – uniform and transparent naming – AFS (/afs/cell/path)

282 © P. Reali / M. Corti System-Software WS 04/05 282 Caching Reduces network and disk load Consistency problems Granularity: – How much? Big/small chunks of data? Entire files? – Big: +hit ratio, +hit penalty, +consistency problems Location: – memory: +diskless stations, +speed – disk: +cheaper, +persistent – hybrid Space consumption on the clients

283 © P. Reali / M. Corti System-Software WS 04/05 283 Caching Policies: write-through: +reliability, -performance (cache is effective only for read operations) delayed-write: +write speed, +unnecessary writes eliminated, -reliability – write when the cache is full (+performance, -long time in the cache) – regular intervals write-on-close

284 © P. Reali / M. Corti System-Software WS 04/05 284 Consistency Is my cached copy up-to-date? Client-initiated approach: – the client performs validity checks – when? open/fixed intervals/every access Server-initiated approach: – the server keeps track of cached files (parts) – notifies the clients when conflicts are detected – should the server allow conflicts?

285 © P. Reali / M. Corti System-Software WS 04/05 285 Stateless and Stateful Servers Stateful: the server keeps track of each accessed file session IDs (e.g., identifying an inode on the server) fast – simple requests – caches – fewer disk accesses – read ahead volatile – server crash: rebuild structures (recovery protocol) – client crash: orphan detection and elimination

286 © P. Reali / M. Corti System-Software WS 04/05 286 Stateless and Stateful Servers Stateless: each request is self-contained request: file and position complex requests need for uniform low-level naming scheme (to avoid name translations) need idempotent operations (same results if repeated) – absolute byte counts No locking possible

287 © P. Reali / M. Corti System-Software WS 04/05 287 File Replication A file can be present on failure independent machines Naming scheme manages the mapping – same high-level name – different low-level names Transparency Consistency

288 © P. Reali / M. Corti System-Software WS 04/05 288 Distributed File-Systems (mainstream) NFS: Network File System (Sun) AFS: Andrew File System (CMU) SMB: Server Message Block (Microsoft) NCFS: Network Computer FS (Oberon)

289 © P. Reali / M. Corti System-Software WS 04/05 289 Network File System (NFS) UNIX - based (Sun) mount file system from another machine into local directory stateless (no open/close) uses UDP to communicate based on RPC and XDR (External Data Representation) – every operation is a remote procedure call known problems: – no caching – no disconnected mode – efficiency security: IP based

290 © P. Reali / M. Corti System-Software WS 04/05 290 NFS: Example / home corti reali etc server exports /home/ client(rw) mount -t nfs server:/home /home client / home etc / home corti reali etc

291 © P. Reali / M. Corti System-Software WS 04/05 291 NFS No special servers (each machine can act as a server and as a client) Cascading mounts are allowed – mount -t nfs server1:/home /home – mount -t nfs server2:/projects/corti /home/corti/projects Limited scalability (limited number of exports)

292 © P. Reali / M. Corti System-Software WS 04/05 292 NFS: Stateless Protocol Each request contains a unique file identifier and an absolute offset No concurrency control (locking has to be performed by the applications) Committed information is assumed to be on disk (the server cannot cache writes)

293 © P. Reali / M. Corti System-Software WS 04/05 293 Network File System (NFS) Virtual file system layer System call layer Local file system NFS client RPC / XDR Virtual file system layer Local file system NFS server RPC / XDR network (UDP)

294 © P. Reali / M. Corti System-Software WS 04/05 294 1234 0 1 1234 Big-endian: MSB before LSB IBM, Motorola, SPARC Little-endian: LSB before MSB VAX, Intel network byte- ordering little end first Remote Procedure Invocation: Overview Problem – send structured information from A to B – A and B may have different memory layouts – byte order problems – How is 0x1234 (2 bytes) represented in memory?

295 © P. Reali / M. Corti System-Software WS 04/05 295 Marshalling / Serialization Marshalling: packing one or more data items into a buffer using a standard representation Presentation layer (OSI) RPC + XDR (Sun) – RFC 1014, June 1987 – RFC 1057, June 1988 IIOP / CORBA (OMG) – V2.0, February 1997 – V3.0, August 2002 SOAP / XML (W3C) – V1.1, May 2000 XDR Type System [unsigned] integer (32-bit) [unsigned] hyper-integer (64-bit) enumeration (unsigned int) boolean (enum) float / double (IEEE 32/64- bit) opaque string array (fix + variable size) structure union void

296 © P. Reali / M. Corti System-Software WS 04/05 296 Client procedure P(a, b, c) pack parameters send message to server await response unpack response Server unpack parameters find procedure invoke pack response send response P(a, b, c) RPC Protocol Remote procedure call Marshalling of procedure parameters Message format Authentication Naming

297 © P. Reali / M. Corti System-Software WS 04/05 297 NFS ClientServer lookup read write RPC - protocol

298 © P. Reali / M. Corti System-Software WS 04/05 298 NFS Efficiency Stateless protocols are inherently slow – e.g., directory lookup Caching: – file blocks (data) – file attributes (inodes) – read-ahead – delayed write – tradeoff between speed and consistency It is possible that two machines see different data

299 © P. Reali / M. Corti System-Software WS 04/05 299 NFS: Security Exports based on IP addresses – low security – low granularity Data is not encrypted Permissions based on user and group ID – uniform naming needed (e.g., NIS)

300 © P. Reali / M. Corti System-Software WS 04/05 300 Andrew File System (AFS) 1983 CMU (later IBM, now open source) Scalable (>5000 workstations): – network divided in clusters (cells) Client/user mobility (files are accessible from everywhere) Security: encrypted communication (Kerberos) Protection: control access lists Heterogeneity: clear interface to the server

301 © P. Reali / M. Corti System-Software WS 04/05 301 Andrew File System (AFS) server provides a cell world-wide addressing scheme (name cell) client caches a whole file server-synchronization on file open and close AFS is efficient low network overhead stateful: consistency is implemented with callbacks callback = client is in synch with server on store, server changes the callbacks

302 © P. Reali / M. Corti System-Software WS 04/05 302 AFS: Logical View / afs dir vol bin usr Shared Space Private Space f Volume Mount Point

303 © P. Reali / M. Corti System-Software WS 04/05 303 AFS: Physical View ethz.ch epfl.ch cmu.edu client sever cell network

304 © P. Reali / M. Corti System-Software WS 04/05 304 AFS ClientServer open RPC - protocol close Cache read write close

305 © P. Reali / M. Corti System-Software WS 04/05 305 AFS: Consistency Interaction only when opening and closing files. Writes are not visible on other machines before a close. Clients assume that cached files are up-to-date. Servers keep track of caching by the clients (callbacks) – clients are notified in case of changes

306 © P. Reali / M. Corti System-Software WS 04/05 306 AFS: Kerberos Kerberos (Cerberos: three-headed dog guarding the Hades) – authentication – accounting – audit Needham-Schroeder shared key protocol Distributed AFS: communication is encrypted

307 © P. Reali / M. Corti System-Software WS 04/05 307 AFS: Protection Access lists: %> fs listacl thesis Access list for thesis is Normal rights: system:anyuser l trg rlidwk corti rlidwka Its possible to allow (or deny) access to users or customized groups Restriction on: read, write, lookup, insert, administer, lock and delete. Supports UNIX control bits.

308 © P. Reali / M. Corti System-Software WS 04/05 308 The Eight Fallacies of Distributed Computing (Peter Deutsch) Network Fallacies The network is reliable Latency is zero Bandwidth is infinite The network is secure The network topology doesnt change There is one administrator Transport cost is zero The network is homogeneous

309 © P. Reali / M. Corti System-Software WS 04/05 309 General Principles (Satyanarayan) From DFSs we learned the following lessons: we should try to move computations to the clients use caching whenever possible special files (e.g., temporary) can be specially treated. make scalable systems. trust the fewest possible entities batch work if possible

310 Kernel Structure

311 © P. Reali / M. Corti System-Software WS 04/05 311 Introduction Kernel performs dangerous operations – page table mapping – scheduling Kernel must be protected against malign user code – access to other processes data – increasing own processes priority Kernel must have more rights than user code Solution: – distinguish between kernel mode and user mode – access kernel through system calls – the system calls define the interface to the kernel

312 © P. Reali / M. Corti System-Software WS 04/05 312 application system calls Kernel Protection application drivers memory manager file systems

313 © P. Reali / M. Corti System-Software WS 04/05 313 Kernel Protection Means: hardware support – privileged instructions – supervisor mode separate address spaces – user process has no access to kernel structures access memory / functions through symbolic names – user has no access to hardware

314 © P. Reali / M. Corti System-Software WS 04/05 314 Kernel Protection Privileged instructions in user mode generate a trap Mode switch: – interrupts – gated calls (user generated sw interrupt calls) Parameters: – stack – registers Examples: – Intel x86: 4 protection levels (code/segment attribute), interrupt – PowerPC: 2 levels (CPU attribute), special instruction

315 © P. Reali / M. Corti System-Software WS 04/05 315 Linux System Calls (Intel) System calls are wrapped in libraries (e.g., libc) The library function – writes the parameters in registers ( 5) – writes the parameters on the stack (>5) – writes the system call number in EAX – calls int 0x80 The kernel – jumps to the corresponding function in sys_call_table

316 © P. Reali / M. Corti System-Software WS 04/05 316 Linux System Calls Examples: pid_t fork(void) : creates a child process ssize_t write(int fd, const void *buf, size_t count) : writes count bytes from buf to fd int kill(pid_t pid, int sig) : send signal to a process int gettimeofday(struct timeval *tv, struct timezone *tz) : gets the current time int open(const char *pathname, int flags) : opens a file int ioctl(int d, int request,...) : manipulates special devices …

317 © P. Reali / M. Corti System-Software WS 04/05 317 Windows System Calls Layered system: system call must be performed by a wrapper (NTDLL.DLL). The system call position in the KiSystemServiceTable is not known (depends on the build) call WriteFile() KiSystem Service Table NtWriteFile() application KERNEL32.DLL … int 0x2e NTDLL.DLL

318 © P. Reali / M. Corti System-Software WS 04/05 318 Kernel Design: API vs. System Calls Linux system-calls are clearly specified (POSIX standard) system-calls do not change about 100 calls Windows system-calls are hidden only Win32 API is published Win32 is standard thousands of API calls, still growing some API calls are handled in user space More than one API: – POSIX – OS/2

319 © P. Reali / M. Corti System-Software WS 04/05 319 Protection and SMP What happens when two process (on two CPUs) enter in kernel mode? – Big kernel lock: not allowed (OpenBSD, NetBSD) – Fine grained locks in the kernel (FreeBSD 5, Linux 2.6) CPU 1CPU 2 proc1: int 0x80 proc1: int 0x80

320 © P. Reali / M. Corti System-Software WS 04/05 320 Kernel Structure monolithic kernel – big mess, no structure, one big block, fast – MS-DOS (no protection), original UNIX – micro-kernel (AIX, OS X) layered system – layer n uses functions from layer n-1 – OS/2 (some degree of layering) virtual machine – define artificial environment for programs client-server – tiny communication microkernel to access various services

321 © P. Reali / M. Corti System-Software WS 04/05 321 Monolithic Kernels terminal controllers device drivers memory controllers scheduler signal handling file system swapping virtual memory user-level applications terminal controllers device drivers memory controllers scheduler signal handling file system swapping virtual memory user-level applications Monolithic Micro-kernel

322 © P. Reali / M. Corti System-Software WS 04/05 322 Layered Systems THE operating system A layer uses only functions from below What goes where? Less efficient user programs buffering I/O console drivers memory management CPU scheduling hardware

323 © P. Reali / M. Corti System-Software WS 04/05 323 Virtual Machines VM operating system (IBM) slow and difficult to implement complete protection no sharing of resources useful for development and research compatibility hardware virtual machine procs

324 © P. Reali / M. Corti System-Software WS 04/05 324 Design: Kernel or User Space? Big monolithic kernel: fast (less switches) less protection Examples: HTTP server in the Linux kernel. graphic routines in Windows Modular and micro-kernels: structured more separation move code to user space less efficient more secure Example: user level drivers

325 © P. Reali / M. Corti System-Software WS 04/05 325 Virtual Machines Machine specification in software – instruction set – memory layout – virtual devices –.... JVM (Java Virtual Machine).NET / Mono VMWare – specified machine is a whole PC – allows multiple PC environments on same machine IBM VM/370

326 Case Study: JVM

327 © P. Reali / M. Corti System-Software WS 04/05 327 Reality is somewhat fuzzy! Is a Pentium-II a machine? Hardware and software are logically equivalent (A. Tanenbaum) RISC Core instructions decoder Op 1 Op 2 Op 3 Virtual Machines What is a machine? does something (...useful) programmable concrete (hardware) What is a virtual machine? a machine that is not concrete a software emulation of a physical computing environment

328 © P. Reali / M. Corti System-Software WS 04/05 328 Virtual Machine, Intermediate Language Pascal P-Code (1975) – stack-based processor – strong type machine language – compiler: one front end, many back ends – UCSD Apple][ implementation, PDP 11, Z80 Modula M-Code (1980) – high code density – Lilith as microprogrammed virtual processor JVM – Java Virtual Machine (1995) – Write Once – Run Everywhere – interpreters, JIT compilers, Hot Spot Compiler Microsoft.NET (2000) – language interoperability

329 © P. Reali / M. Corti System-Software WS 04/05 329 JVM Case Study compiler (Java to bytecode) interpreter, ahead-of-time compiler, JIT dynamic loading and linking exception Handling memory management, garbage collection OO model with single inheritance and interfaces system classes to provide OS-like implementation – compiler – class loader – runtime – system

330 © P. Reali / M. Corti System-Software WS 04/05 330 JVM: Type System Primitive types – byte – short – int – long – float – double – char – reference – boolean mapped to int Object types – classes – interfaces – arrays Single class inheritance Multiple interface implementation Arrays – anonymous types – subclasses of java.lang.Object

331 © P. Reali / M. Corti System-Software WS 04/05 331 JVM: Java Byte-Code Memory access tload / tstore ttload / ttstore tconst getfield / putfield getstatic / putstatic Operations tadd / tsub / tmul / tdiv tshifts Conversions f2i / i2f / i2l /.... dup / dup2 / dup_x1 /... Control ifeq / ifne / iflt /.... if_icmpeq / if_acmpeq invokestatic invokevirtual invokeinterface athrow treturn Allocation new / newarray Casting checkcast / instanceof

332 © P. Reali / M. Corti System-Software WS 04/05 332 JVM: Java Byte-Code Example bipush OperationPush byte Format Formsbipush = 16 (0x10) Operand Stack...=>..., value DescriptionThe immediate byte is sign-extended to an int value. That value is pushed onto the operand stack. bipush byte

333 © P. Reali / M. Corti System-Software WS 04/05 333 JVM: Machine Organization Virtual Processor stack machine no registers typed instructions no memory addresses, only symbolic names Runtime Data Areas pc register stack – locals – parameters – return values heap method area – code runtime constant pool native method stack

334 © P. Reali / M. Corti System-Software WS 04/05 334 iload 5 iload 6 iadd istore 4 iadd v5+v6 v5 v6 locals v4 istore 4 program Time v5 iload 5 v6 iload 6 operand stack JVM: Execution Example

335 © P. Reali / M. Corti System-Software WS 04/05 335 JVM: Reflection java.lang.Class – getFields – getMethods – getConstructors java.lang.reflect.Field – setObjectgetObject – setIntgetInt – setFloatgetFloat –..... java.lang.reflect.Method – getModifiers – invoke java.lang.reflectConstructor Load and manipulate unknown classes at runtime.

336 © P. Reali / M. Corti System-Software WS 04/05 336 JVM: Reflection – Example import java.lang.reflect.*; public class ReflectionExample { public static void main(String args[]) { try { Class c = Class.forName(args[0]); Method m[] = c.getDeclaredMethods(); for (int i = 0; i < m.length; i++) { System.out.println(m[i].toString()); } } catch (Throwable e) { System.err.println(e); }

337 © P. Reali / M. Corti System-Software WS 04/05 337 JVM: Java Weaknesses Transitive closure of java.lang.Object contains 1.147 1.2178 1.3180 1.4248 5 (1.5)280 classpath 0.03299 class Object { public String toString();.... } class String { public String toUpperCase(Locale loc);.... } public final class Locale implements Serializable, Cloneable {.... }

338 © P. Reali / M. Corti System-Software WS 04/05 338 B static { y = A.f(); } A static { x = B.f(); } JVM: Java Weaknesses Class static initialization T is a class and an instance of T is created T tmp = new T(); T is a class and a static method of T is invoked T.staticMethod(); A nonconstant static field of T is used or assigned (field is not static, not final, and not initialized with compile-time constant) T.someField = 42; Problem circular dependencies in static initialization code

339 © P. Reali / M. Corti System-Software WS 04/05 339 JVM: Java Weaknesses interface Example { final static String labels[] = {A, B, C} } hidden static initializer: labels = new String[3]; labels[0] = A; labels[1] = B; labels[2] = C; Warning: in Java final means write-once! interfaces may contain code

340 © P. Reali / M. Corti System-Software WS 04/05 340 JVM: Memory Model The JVM specs define a memory model: – defines the relationship between variables and the underlying memory – meant to guarantee the same behavior on every JVM The compiler is allowed to reorder operations unless synchronized or volatile is specified.

341 © P. Reali / M. Corti System-Software WS 04/05 341 JVM: Reordering read and writes to ordinary variables can be reordered. public class Reordering { int x = 0, y = 0; public void writer() { x = 1; y = 2; } public void reader() { int r1 = y; int r2 = x; }

342 © P. Reali / M. Corti System-Software WS 04/05 342 JVM: Memory Model synchronized: in addition to specify a monitor it defines a memory barrier: – acquiring the lock implies an invalidation of the caches – releasing the lock implies a write back of the caches synchronized blocks on the same object are ordered. order among accesses to volatile variables is guaranteed (but not among volatile and other variables).

343 © P. Reali / M. Corti System-Software WS 04/05 343 JVM: Double Checked Lock Singleton public class SomeClass { private static Resource resource = null; public Resource synchronized getResource() { if (resource == null) { resource = new Resource(); } return resource; }

344 © P. Reali / M. Corti System-Software WS 04/05 344 JVM: Double Checked Lock Double checked locking public class SomeClass { private static Resource resource = null; public Resource getResource() { if (resource == null) { synchronized (this) { if (resource == null) { resource = new Resource(); } return resource; }

345 © P. Reali / M. Corti System-Software WS 04/05 345 JVM: Double Checked Lock Thread 1Thread 2 public class SomeClass { private Resource resource = null; public Resource getResource() { if (resource == null) { synchronized { if (resource == null) { resource = new Resource(); } return resource; } public class SomeClass { private Resource resource = null; public Resource getResource() { if (resource == null) { synchronized { if (resource == null) { resource = new Resource(); } return resource; } The object is instantiated but not yet initialized!

346 © P. Reali / M. Corti System-Software WS 04/05 346 JVM: Immutable Objects are not Immutable Immutable objects: – all types are primitives or references to immutable objects – all fieds are final Example (simplified): java.lang.String – contains an array of characters the length an offset – example: s = abcd, length = 2, offset = 2, string = cd String s1 = /usr/tmp String s2 = s1.substring(4); //should contain /tmp Sequence: s2 is instantiated, the fields are initialized (to 0), the array is copied, the fields are written by the constructor. What happens if instructions are reordered?

347 © P. Reali / M. Corti System-Software WS 04/05 347 JVM: Reordering Volatile and Nonvolatile Stores volatile reads and writes are totally ordered among threads but not among normal variables example Thread 1Thread 2 o = new SomeObject; initialized = true; while (!initialized) { sleep(); } o.field = 42; volatile boolean initialized = false; SomeObject o = null; ?

348 © P. Reali / M. Corti System-Software WS 04/05 348 JVM: JSR 133 Java Community Process Java memory model revision Final means final Volatile fields cannot be reordered

349 © P. Reali / M. Corti System-Software WS 04/05 349 Java JVM: Execution Interpreted (e.g., Sun JVM) – bytecode instructions are interpreted sequentially – the VM emulates the Java Virtual Machine – slower – quick startup Just-in-time compilers (e.g., Sun JVM, IBM JikesVM) – bytecode is compiled to native code at load time (or later) – code can be optimized (at compile time or later) – quicker – slow startup Ahead-of time compilers (e.g., GCJ) – bytecode is compiled to native code offline – quick startup – quick execution – static compilation

350 © P. Reali / M. Corti System-Software WS 04/05 350 JVM: Loader – The Classfile Format ClassFile { version constant pool flags super class interfaces fields methods attributes } Constants: Values String / Integer / Float /... References Field / Method / Class /... Attributes: ConstantValue Code Exceptions

351 © P. Reali / M. Corti System-Software WS 04/05 351 JVM: Class File Format class HelloWorld { public static void printHello() { System.out.println("hello, world"); } public static void main (String[] args) { HelloWorld myHello = new HelloWorld(); myHello.printHello(); }

352 © P. Reali / M. Corti System-Software WS 04/05 352 JVM: Class File (Constant Pool) 1. String hello, world 2. Class HelloWorld 3. Class java/io/PrintStream 4. Class java/lang/Object 5. Class java/lang/System 6. Methodref HelloWorld. () 7. Methodref java/lang/Object. () 8. Fieldref java/io/PrintStream java/lang/System.out 9. Methodref HelloWorld.printHello() 10. Methodref java/io/PrintStream.println(java/lang/Strin g ) 11. NameAndType ()V 12. NameAndType out Ljava/io/PrintStream; 13. NameAndType printHello ()V 14. NameAndType println (Ljava/lang/String;)V 15. Unicode ()V 16. Unicode (Ljava/lang/String;)V 17. Unicode ([Ljava/lang/String;)V 18. Unicode 19. Unicode Code 20. Unicode ConstantValue 21. Unicode Exceptions 22. Unicode HelloWorld 23. Unicode HelloWorld.java 24. Unicode LineNumberTable 25. Unicode Ljava/io/PrintStream; 26. Unicode LocalVariables 27. Unicode SourceFile 28. Unicode hello, world 29. Unicode java/io/PrintStream 30. Unicode java/lang/Object 31. Unicode java/lang/System 32. Unicode main 33. Unicode out 34. Unicode printHello

353 © P. Reali / M. Corti System-Software WS 04/05 353 JVM: Class File (Code) Methods 0 () 0ALOAD0 1INVOKESPECIAL [7] java/lang/Object. () 4RETURN 1PUBLIC STATIC main(java/lang/String []) 0NEW [2] HelloWorld 3DUP 4INVOKESPECIAL [6] HelloWorld. () 7ASTORE1 8INVOKESTATIC [9] HelloWorld.printHello() 11RETURN 2PUBLIC STATIC printHello() 0GETSTATIC [8] java/io/PrintStream java/lang/System.out 3LDC1 hello, world 5INVOKEVIRTUAL [10] java/io/PrintStream.println(java/lang/String ) 8RETURN

354 © P. Reali / M. Corti System-Software WS 04/05 354 JVM: Compilation – Pattern Expansion Each byte code is translated according to fix patterns + easy - limited knowledge Example (pseudocode) switch (o) { case ICONST : generate(push n); PC++; break; case ILOAD : generate(push off_n[FP]); PC++; break; case IADD: generate(pop -> R1); generate(pop -> R2); generate(add R1, R2 -> R1); generate(push R1); PC++; break; …

355 © P. Reali / M. Corti System-Software WS 04/05 355 JVM: Optimizing Pattern Expansion Main Idea: use internal virtual stack stack values are consts / fields / locals / array fields / registers /... flush stack as late as possible iload 4 iload 5 iadd istore 6 local 4 local 5 EAX local 5 EAX MOV EAX, off 4 [FP]ADD EAX, off 5 [FP] iload4iload5 iaddistore6 MOV off 6 [FP], EAX emitted code virtual stack

356 © P. Reali / M. Corti System-Software WS 04/05 356 JVM: Compiler Comparison pattern expansion push off4[FP] push off5[FP] pop EAX add 0[SP], EAX pop off6[FP] optimized mov EAX, off4[FP] add EAX, off5[FP] mov off6[FP], EAX iload_4 iload_5 iadd istore_6 5 instructions 9 memory accesses 3 instructions 3 memory accesses

357 © P. Reali / M. Corti System-Software WS 04/05 357 Linking (General) A compiled program contains references to external code (libraries) After loading the code the system need to link the code to the library – identify the calls to external code – locate the callees (and load them if necessary) – patch the loaded code Two options: – the code contains a list of sites for each callee – the calls to external code are jumps to a procedure linkage table which is then patched (double indirection)

358 © P. Reali / M. Corti System-Software WS 04/05 358 Linking (General) 0 instr 1 2 jump - 3 instr 4 5 jump - 6 instr 7 jump 2 9 instr 10 instr proc 0 5 proc 1 7 0 instr 1 2 jump 101 3 instr 4 5 jump 100 6 instr 7 jump 101 9 instr 10 instr 100 jump 101 jump

359 © P. Reali / M. Corti System-Software WS 04/05 359 Linking (General) 0 instr 1 2 jump &p1 3 instr 4 5 jump &p0 6 instr 7 jump &p1 9 instr 10 instr proc 0 5 proc 1 7 0 instr 1 2 jump 101 3 instr 4 5 jump 100 6 instr 7 jump 101 9 instr 10 instr 100 jump &p0 101 jump &p1

360 © P. Reali / M. Corti System-Software WS 04/05 360 JVM: Linking Bytecode interpreter – references to other objects are made through the JVM (e.g., invokevirtual, getfield, …) Native code (ahead of time compiler) – static linking – classic native linking JIT compiler – only some classes are compiled – calls could reference classes that are not yet loaded or compiled (delayed compilation) code instrumentation

361 © P. Reali / M. Corti System-Software WS 04/05 361 JVM: Methods and Fields Resolution method and fields are accessed through special VM functions (e.g., invokevirtual, getfield, …) the parameters of the special call defines the target the parameters are indexes in the constant pool the VM checks id the call is legal and if the target is presentl

362 © P. Reali / M. Corti System-Software WS 04/05 362 class A {.......B.x } class B { int x; } B.xCheckClass(B); B.x IF ~B.initialized THEN Initialize(B) END; JVM: JIT – Linking and Instrumentation Use code instrumentation to detect first access of static fields and methods

363 © P. Reali / M. Corti System-Software WS 04/05 363 C header C source Compiler Object File Object File Object File Object file Linker C header Loader Loaded Code Compilation and Linking Overview

364 © P. Reali / M. Corti System-Software WS 04/05 364 Oberon source Compiler Object File Object File Object File Object & Symbol Loader Linker Loaded Module Loaded Module Loaded Module Loaded Module Compilation and Linking Overview

365 © P. Reali / M. Corti System-Software WS 04/05 365 Loader Linker JIT Compiler Java source Class File Compiler Class Loader Class Reflection API Class Compilation and Linking Overview

366 © P. Reali / M. Corti System-Software WS 04/05 366 Jaos Jaos (Java on Active Object System) is a Java virtual machine for the Bluebottle system goals: – implement a JVM for the Bluebottle system – show that the Bluebottle kernel is generic enough to support more than one system – interoperability between the Active Oberon and Java languages – interoperability between the Oberon System and the Java APIs

367 © P. Reali / M. Corti System-Software WS 04/05 367 Metadata Loader Linker Loaded Module Loaded Module Oberon Loader Linker Loaded Module Oberon source Compiler Object & Symbol Oberon Metadata Loader Oberon Browser Java Reflection API JIT Compiler Loaded Class Linker Class File Loader Java Metadata Loader Jaos (Interoperability Framework)

368 © P. Reali / M. Corti System-Software WS 04/05 368 JVM: Verification Compiler generates good code........ that could be changed before reaching the JVM need for verification Verification makes the VM simpler (less run-time checks): – no operand stack overflow – load / stores are valid – VM types are correct – no pointer forging – no violation of access restrictions – access objects as they are (type) – local variable initialized before load – …

369 © P. Reali / M. Corti System-Software WS 04/05 369 JVM: Verification Pass1 (Loading): class file version check class file format check class file complete Pass 2 (Linking): final classes are not subclassed every class has a superclass (but Object) constant pool references constant pool names

370 © P. Reali / M. Corti System-Software WS 04/05 370 Byte-Code Verification Delayed for performance reasons JVM: Verification Pass 3 (Linking): For each operation in code (independent of the path): operation stack size is the same accessed variable types are correct method parameters are appropriate field assignment with correct types opcode arguments are appropriate Pass 4 (RunTime): First time a type is referenced: load types when referenced check access visibility class initialization First member access: member exists member type same as declared current method has right to access member

371 © P. Reali / M. Corti System-Software WS 04/05 371 JVM: Byte-Code Verification Verification: branch destination must exists opcodes must be legal access only existing locals code does not end in the middle of an instruction types in byte-code must be respected execution cannot fall of the end of the code exception handler begin and end are sound

372 Addendum: Security

373 © P. Reali / M. Corti System-Software WS 04/05 373 Security internal protection – memory protection – file system accesses external protection – accessibility problems: – program threats

374 © P. Reali / M. Corti System-Software WS 04/05 374 Security: Program Threats Trojan horses: a code segment that misuses its environment – mail attachments – web downloads (e.g., SEXY.EXE which formats your hard disk) – programs with the same name as common utilities – misleading names (e.g., README.TXT.EXE) Trap door (in programs or compilers): an intentional hole in the software

375 © P. Reali / M. Corti System-Software WS 04/05 375 Security: System Threats worms: a standalone program that spawns other processes (copies of itself) to reduce system performance – example: Morris worm (1988) exploited holes in rsh, finger and sendmail to gain access to other machines once on the other machine it was able to replicate itself – used by spammers to spread and distribute spamming applications viruses: similar to worms but embedded in other programs – they usually infect other programs and the boot sector

376 © P. Reali / M. Corti System-Software WS 04/05 376 Security: System Threats Denial of service – perform many requests to steal all the available resources – often distributed (using worms) Example: SYN flooding attacks – the attacker tries to connect – the victim answers with a synchronize and acknowledge packet – and waits for acknowledgment Countermeasures – active filtering – request dropping – cookie based protocols (requests must be authenticated) – stateless protocols

377 © P. Reali / M. Corti System-Software WS 04/05 377 Security: System Threats badly implemented and designed software: – lpr (setuid) with an option to delete the printed file – mkdir (first create the inode then change the owner) it was possible to change the inode before the chown … – buffer overflows – password in memory or swap files – insecure protocols (FTP, SMTP) – missing sanity checks (syscalls, command in input, …) – short keys and passwords – proprietary protocols

378 © P. Reali / M. Corti System-Software WS 04/05 378 Bad design: A very recent example Texas Instruments produces RFID tags offering cryptographic functionalities. used for cars and electronic payments 40 bit keys proprietary protocol Attack from Johns Hopkins University and RSA Labs – less than 2 hours for 5 keys – less than 3500$

379 © P. Reali / M. Corti System-Software WS 04/05 379 Security: Buffer Overflows Overwrite a functions return address function foo(int p1, int p2) { char array[10]; strcpy(array, someinput); } array FP RET p1 & p2 array Avoid strcpy and check the length, e.g., strncpy

380 © P. Reali / M. Corti System-Software WS 04/05 380 Security: Monitoring check for suspicious patterns – login times audit logs periodic scans for security holes (bad passwords, set-uid programs, changes to system programs) – system integrity checks (checksums for executable files) [tripwire] network services – monitor network activity

381 © P. Reali / M. Corti System-Software WS 04/05 381 Example: Firewalling Many applications use network sockets to communicate (even on a single machine) Many applications are not protected Solution: filter all the incoming connections by default and allow only the trusted ones

382 © P. Reali / M. Corti System-Software WS 04/05 382 Security: (some) Design Principles Open systems (programs and protocols) Default is deny access Check for current authority (timeouts, …) Give the least privilege possible Simple protection mechanisms Do not ask to much to the users (or they will avoid to protect themselves)

383 © P. Reali / M. Corti System-Software WS 04/05 383 Security and Systems: Some Examples Enhancements to memory management: Intel XD bit, AMD NX bit mark pages according to the content (data or code) an exception is generated if the PC is moved to a data address prevents some buffer overflow attacks dynamically generated code has to be generated through special system calls Windows XP SP2, Linux, BSD …

384 © P. Reali / M. Corti System-Software WS 04/05 384 Security and Systems: Some Examples SELinux National Security Agency (USA) patches to the Linux kernel to enforce mandory access control open source independent from the traditional UNIX roles (users and groups) configurable policies restricting what a program is able to do

385 © P. Reali / M. Corti System-Software WS 04/05 385 Security and Systems: Some Examples OpenBSD audit process (proactive bug search) random gaps in the stack ProPolice: gcc puts a random integer on the stack in a call prologue and checks it when returning W^X: pages are writable xor executable

386 © P. Reali / M. Corti System-Software WS 04/05 386 Security and Systems: Some Examples OpenBSD randomized shared library order and addresses mmap() and malloc() return randomized addresses guard pages between objects privilege separation and revocation

387 © P. Reali / M. Corti System-Software WS 04/05 387 Privilege Separation unprivileged child process to contain and restrict the effects of programming errors e.g., openssh listen *22 network connection monitor network processing request auth auth result key exchange authentication fork unprivileged child monitor user request processing request PTY pass PTY user network data state export fork user child time


Download ppt "1 Kernfach System Software WS04/05 P. Reali M. Corti."

Similar presentations


Ads by Google