Presentation is loading. Please wait.

Presentation is loading. Please wait.

Drinking from the Firehose Inter-Process Communication (IPC)

Similar presentations


Presentation on theme: "Drinking from the Firehose Inter-Process Communication (IPC)"— Presentation transcript:

1 Drinking from the Firehose Inter-Process Communication (IPC)
Number twelve of a series Drinking from the Firehose Inter-Process Communication (IPC) on the Mill CPU Family

2 … millcomputing.com/docs You are here Talks in this series Encoding
The Belt Memory Prediction Metadata and speculation Execution Security and reliability Specification Software pipelines The Compiler Switches Inter-process Communication Threading Wide data Slides and videos of other talks are at: millcomputing.com/docs You are here

3 The Mill CPU The Mill is a new general-purpose commercial CPU family.
The Mill has a 10x single-thread area/power/performance gain over conventional out-of-order superscalar architectures, yet runs the same programs, without rewrite. This talk will explain: Symmetric distrust Protection for mutual paranoia Avoiding translation Secure argument passing OS ”features” that hinder IPC

4 The Mill ISA Mill is wide-issue – 30+ MIMD ops per cycle
Mill is statically scheduled – no issue hazards or OOO Mill is exposed pipeline – all ops have fixed latency Mill has integrated vectors – all scalar ops are vector too Mill has hardware SSA – no general registers

5 Gross over-simplification!
Caution! Gross over-simplification! This talk tries to convey an intuitive understanding to the non-specialist. The reality is more complicated. (we try not to over-simplify, but sometimes…)

6 Mechanism vs. policy This talk is about mechanism – how Mill IPC works. It is not about policy – how the mechanism is used. The Mill is a general-purpose CPU architecture. It is not a Unix machine. It is not a Windows, …, machine. It is not a C machine. It is not a Java, …, machine. It is a platform in which each of those can implement their own policies and models. To the extent that they have them.

7 Motivating example 1 – buggy drivers
Device drivers need access to special parts of memory to make the device work – MMIO, on-device buffers, etc. They shouldn’t have access to the OS or application state. Ideally, each driver should be its own process, with relevant device-specific memory regions mapped in. application OS driver device Clean, simple – and too expensive

8 Motivating example 2 – sandboxing
Your browser downloads and decodes images. Media decoders are typically written in memory-unsafe languages such as C/C++ and have historically had vulnerabilities that could be exploited. Ideally, each image should be decoded in its own process to isolate it from the rest of the browser. browser decoder Clean, simple – and too expensive

9 Security must be unobtrusive, unavoidable, and cheap
Some philosophy Security must be unobtrusive, unavoidable, and cheap or it won’t be used.

10 All must have equal security, none more equal than others
Some philosophy All must have equal security, none more equal than others No pigs on this farm. Which directly conflicts with the monolithic Unix model

11 You can see only what I give you I can see only what you give me
The Mill protection model You can see only what I give you I can see only what you give me Fast, cheap, no third-parties

12 The Image Decoder example
The browser needs to decode an image Solution: call the decode function in the image decoding library Problem: the image exploits a vulnerability in the library And your browser is pwned Solution: we need to sandbox the image decoder in a separate process decoder browser data

13 The Image Decoder example
The browser is now protected from the decoder But data has to be copied back and forth by the kernel Or, if using shared memory (shudder), messages to coordinate decoding still have to be sent back and forth If it were cheap on a conventional machine, we’d already do it! Solution: get the kernel and the copies out of the way! browser decoding sandbox data kernel

14 Services Threads need to get work done they don’t know how to do
Using their own data Solution: call a function from a library But sometimes the function has its own data And doesn’t want your fingers on it! private data A service is private data that exports an API API stir bake mix fry boil mix app data

15 So we need someone to carry the data across the wall
Services But the service data must be PRIVATE! So we need a wall… But now we can’t call the API… so But now the API can’t use it’s data…  So we need someone to carry the data across the wall private data stir bake mix fry boil API mix app data

16 It’s fabulously expensive!
Inter-process Communication That’s how conventional IPC works. Your app gives data to the OS Who goes over the wall between processes And gives it to the server process private data And back and forth, and back, and forth… It’s fabulously expensive! stir bake mix fry boil API mix app data

17 It’s fabulously expensive!
On a Mill… There is no thread or process over the wall The OS is not involved Orders of magnitude less overhead private data And it’s cheap enough to use It’s fabulously expensive! stir bake mix fry boil API mix app data

18 this is the permission hardware
On a Mill… this is the server private data this is the permission hardware these are portals this is the client stir bake mix fry boil API app data

19 So who does the work? This is a portal call
With no process over there, where’s the server thread? The client thread goes through the portal… and becomes the server thread. data This is a portal call private stir bake mix fry boil API mix app data

20 Different guise, same old thread
In machine terms client thread becomes server thread is the same as: thread running in client turf becomes thread running in server turf server turf mix client turf Cannot access client data when running in server Cannot access server data when running in client same thread app

21 Permissions r rw Mill uses permissions to control access
The permissions are kept in tables in memory and cached by hardware Each permission has a range – in bytes And rights e.g. r, w, x And other stuff A turf is a bundle of permissions with an ID Permissions can overlap address space r rw permission stuff rights range

22 Turfs A turf is a collection of memory permissions
- code they can execute, memory they can read and write, portals they can call A turf is a Mill concept; turfs are orthogonal to threads A process is an OS concept; processes are a collection of permissions and a collection of OS threads E.g. the kernel can be a collection of turfs, with device drivers isolated in dedicated turfs E.g. the browser can be a collection of turfs, with image decoders isolated in dedicated turfs A thread can change turf by making a portal call

23 The operating system is an application
What about the OS? The operating system is an application - like any other. There are no privileged operations. There is no Supervisor Mode. All protection is by memory address. Byte address.

24 Turfs The kernel may be split into many turfs, e.g. to isolate drivers Even normal programs may be split into many turfs For example, a web server may sandbox each request in its own turf And keep its private keys in another turf private key httpd request request request

25 Mill call and return operations
On the Mill, all saving and restoring of state for a call or return is automatic and atomic There are various flavors of call op and retn op A call op takes an argument for the number of results expected, the address of a function, and arguments to be passed to the function. The retn op returns 0 or more results

26 Portals Fabulously fast IPC!
A thread can portal call between turfs Works just like a normal call The thread can only use the permissions of the current turf And call arguments private key private key httpd decrypt(buf) request request request request API Fabulously fast IPC!

27 Portal Calls A portal is just like any other code address Except that the current turf does not have eXecute permissions Instead, the current turf has Portal permission And the address points to a Portal struct The hardware changes turf and calls the actual address Stack frame Turf B Stack frame Turf A Actual address Turf B Stack frame

28 The normal thread stack
The data stack is list of frames Every time you call a function the function gets a new frame Frames need link pointers to track which function to return to Stack Overflow! Return Oriented Programming! Return address Stack frame

29 The Mill secure thread stack
The data stack is list of frames Every time you call a function the function gets a new frame Frames need link pointers to track which function to return to Hardware manages a secure call stack Inaccessible to programs And data access is checked against the turf’s permissions Turf B fault Return address Return address Turf A Change turf secure stack data stack

30 Many more spiller frames than shown
Spillets The spiller holds the secure stack in a spillet in memory Spiller frames contain the saved belt and instruction pointers to restore to on function return When a spillet is full, an overflow spillet is allocated by the OS from a heap Many more spiller frames than shown ... an overflow spillet a spillet

31 spillet [thread 9] [turf 17]
Spillet Array The highest part of the virtual address space is reserved for a 2D array of spillets Indexed by thread and by turf (Only active spillets need actual physical memory) 17 18 19 20 16 spillet [thread 9] [turf 17] turfs 4 5 6 7 8 9 10 11 12 13 14 threads

32 Portal calls private key is turf 20 request is turf 17 thread 9 httpd
decrypt(buf) void handle_request(packet* pkt) { decrypt(pkt->buf, pkt->buflen); ... void decrypt(void* buf, size_t len) { ... return; } thread 9 private key is turf 20 request is turf 17

33 Arguments So how do you talk across the wall? By value By reference
Cannot access client data when running in server Cannot access server data when running in client

34 What is in r4? Arguments So how do you talk across the wall? By value
Easy, right? Just put the arguments in the registers! Func(x, y, z) r1 = x r2 = y r3 = z call func Cannot access client data when running in server Cannot access server data when running in client What is in r4?

35 What is in r2? Arguments So how do you talk across the wall? By value
What about results? return w r1 = w retn Cannot access client data when running in server Cannot access server data when running in client What is in r2?

36 Arguments Put not your trust in compilers –
So how do you talk across the wall? By value Get compiler to clear unused registers? If you remember… If it remembers… If nothing messes up… Put not your trust in compilers – For they shall do exactly what you tell them to do Which is never what you wanted

37 Not quite  Arguments So how do you talk across the wall? By value
Easy, right? Not quite  (except on a Mill )

38 By value, on the belt b0 b7 Caller’s belt 8 3 2 6 4 3 8 8 6
call func,b1,b5,b3,b3 X X Callee’s belt 1 4 9 5 2 7 2 retn b4 8 3 2 6 4 3 Caller’s belt A call has the same belt effects as an op like add A call can drop multiple results

39 Accessing data Problem: turf B cannot access the image that needs decoding! Turf B fault Turf A Return address Change turf Return address secure stack data stack

40 (without being able to use anything else)
Passing by reference write(int fd, void* buf, size_t count), address space How can the callee use the fourth argument? (without being able to use anything else)

41 Passing by reference r Callers grant transient permissions to callees:
grant(void* addr, size_t leng, int rights) address space r permission stuff rights range

42 transient permissions
Passing by reference Callers grant transient permissions to callees: grant(void* addr, size_t leng, int rights) address space r The granted permission lets the callee use the memory in the same way that the caller could have. transient permissions

43 The transient permission table
The table is in the thread’s spillet, on top of the saved frames virtual mem spillets Spillets reserve 2^50 bytes Spillet space is 1/1024 of virtual space. No physical unless used. tos current frame The spillet is at spillets[turfId, threadId] spiller stack

44 Accessing data redux Problem: turf B cannot access the image that needs decoding! So turf A pushes permission to the data that turf B can access just before they make the portal call This permission is transient And is automatically revoked when the portal returns Can only be used by the current thread Turf B fault Turf A Return address Change turf Permissions Return address secure stack data stack

45 Syscalls Syscalls are just portals Not limited to a single syscall portal with a dispatch switch statement – each syscall can have a dedicated portal And those portals can go straight to isolated driver turfs A write to a file may use a different portal than a write to a socket etc bluetooth driver pair write ? syscall read ... app turf kernel turf

46 Issues: FORK

47 Hierarchy from 40,000 ft. CPU core decode retire stations
load/store FUs I$0e I$0f dPLB iPLB D$1 I$1e I$1f Harvard level 1 L$2 shared level 2 TLB The Mill uses virtual caching and the single address space model. View is representative. Actual hierarchy is configured in each chip specification. device controllers devices MMIO DRAM ROM

48 Hierarchy from 40,000 ft. virtual addresses physical addresses
retire stations load/store FUs eI$0 fI$0 dPLB iPLB D$1 eI$1 fI$1 Harvard level 1 L$2 shared level 2 TLB The Mill uses virtual caching and the single address space model. device controllers physical addresses devices MMIO DRAM ROM

49 Memory model bottleneck Mill:
Program addresses must be translated to physical addresses before being looked up in cache. bottleneck Traditional load operation translation/ protection lines regs fault virtual address physical cache CPU TLB data Mill: load operation protection lines belt fault PLB CPU cache data virtual address All tasks use the same virtual addresses, no aliasing or translation across tasks or OS.

50 Why put translation in front of the cache?
bottleneck Traditional load operation translation/ protection lines regs fault virtual address physical cache CPU TLB data Different programs must overlap addresses (aliasing) to fit in 32-bit memory. Translation gives each program private memory, even while using the same bit patterns as pointers. The cost: On the critical path, TLBs must be very fast, small, and power-hungry, and frequently multilevel. Big programs can see 20% or more TLB overhead.

51 Why put translation after the cache?
TLB out of critical path, only referenced on cache misses and evicts; can be big, single-level, and low power. Pointers can be passed to OS or other tasks without translation; simplifies sharing and protection for apps. Protection checking done in parallel with cache access. Mill: load operation protection lines belt fault PLB CPU cache data virtual address All tasks use the same virtual addresses, no aliasing or translation across tasks or OS.

52 Semantics of fork() How do you tell the difference?
fork() creates a child process by copying the parent Internal pointers must still work, but refer to the child Mill external pointers refer to same as parent These are local pointers global shared data These are global pointers How do you tell the difference? 0xbaa 0xbaa copy 0xf00 0xf00 parent child

53 The Mill Pointer The Mill is a 64-bit machine
Pointers are not integers Pointers have special semantics ADDRESS60 L1 BM3 lo hi Local bit 3 Barrier Mask bits useful for GC etc 0 = global: address is absolute 1 = local: address is relative to turf Dedicated ops for pointers e.g. addp

54 Sign up for technical announcements, white papers, etc.:
Want more? Sign up for technical announcements, white papers, etc.:


Download ppt "Drinking from the Firehose Inter-Process Communication (IPC)"

Similar presentations


Ads by Google