Presentation is loading. Please wait.

Presentation is loading. Please wait.

How the OS facilitates programming languages & compilers (with a focus on the Linux kernel) Eric Powders (ejp2127)

Similar presentations


Presentation on theme: "How the OS facilitates programming languages & compilers (with a focus on the Linux kernel) Eric Powders (ejp2127)"— Presentation transcript:

1 How the OS facilitates programming languages & compilers (with a focus on the Linux kernel) Eric Powders (ejp2127)

2 Motivation / Background I worked for many years doing Application Development Came to Columbia to obtain background in Systems Programming & learn how things work at low-level: –Linux env/kernel (backgrd: Windows; non-hacking) –Compilers –Architecture –Networking 2

3 Meeting with Professor Aho I met with Professor Aho, & we chatted about list of potential topics I mentioned Linux as an active learning area; Prof. Aho suggested looking at: –the interface between PL&Cs & OSs –how the OS facilitates programming languages & compilers Is there really a topic here? 3

4 Started brainstorming… How does the kernel launch a program/process? What efficiencies does an OS implement when launching a process? How does an OS execute programs compiled for other OSes? How do programs interface with the OS to obtain/release memory? Upon forking, how does an OS decide whether to run the child or parent? How does an OS execute a non-binary file? Many, many more…… 4

5 Why is this important? Doesn't the OS magically handle stuff for us? Avoid pitfalls (such as I/O data sitting in buffers when you thought it's on disk) Understand kernel options you can change (system calls & parameters, configuring/tuning kernel, etc) Know when to handle something in the application instead of relying on OS (OS can't handle every situation optimally!) Know when to try your hand at kernel hacking –(Or someone on your team) I have a new appreciation for the OS 5

6 Focus This presentation focuses on Linux & C/C++ because that’s what I know best OSes seem to somewhat copy each other; when something works, others tend to follow So much of this probably applies to other OSes Additionally, much of this probably applies to other programming languages 6

7 Topic selection Had to narrow down to very few topics (after all, there are 2 semester-long courses on OS!) Tried to choose topics that an application developer could care about Topics related to programming, languages, and compilers Not extremely subtle, hidden, or overly-complex topics that don’t affect an application developer 7

8 4 Topics 1.Running a program: the execution context that the kernel establishes 2.Forking a proc: behind the scenes of fork() & exec() 3.Memory management Part I: the heap: behind the scenes of new/malloc 4.Memory management Part II: virtual memory & paging: managing a process's address space 8

9 Disclaimer #1 Disclaimer #1: I'm new to Linux kernel & kernel hacking Some of you probably know way more than me about Linux! I probably won’t be able to answer many of your questions! (Though feel free to offer your own insights & experiences from working with Linux or other OSes) 9

10 Disclaimer #2 Disclaimer #2: The Linux kernel is constantly changing, so anything I discuss today might already be out of date! Someone might even be releasing a kernel update right now as we speak 10

11 RUNNING A PROGRAM 11

12 What happens when you run a program? (launch a process) Kernel responsible for setting up the execution context Kernel has to be flexible: –Different executable formats –Shared libraries –Command-line arguments –Environment variables 12

13 Let’s take a step back to compiling Each high-level source code file is transformed into obj file containing assembly Then linker collects all obj files & builds EXE Linker also checks for libraries to include, and “glues” them in (more on this in a moment) gcc compiler inserts an exit_group() system call into the code at the end of main() –exit_group() invokes program termination & hands control back to the OS 13

14 “Glue-ing” in libraries Old way: static libraries. EXE produced by linker includes the libraries as well (so big EXEs!; wasteful) New way: shared libraries: EXE merely contains reference to the library name (not the library itself) –At run-time, dynamic linker makes the libraries available to the program (more on this shortly) Advantage of using shared libraries? Disadvantage of using shared libraries? 14

15 Using shared libraries Benefit: reduce memory footprint b/c libraries can be loaded once & shared among programs Disadv: different systems might have different versions of shared libraries! If pass-in “-static” to gcc, requests static linking 15

16 At run-time, dynamic linker determines which shared libraries are needed If libraries aren’t yet in memory, creates memory regions to hold them, & maps libraries into process’s address space Updates all references to library function names in code, so know where to look for them Dynamic linker terminates by setting eip register (“instruction ptr”) to entry point of new program, so execution jumps to main() 16

17 Program’s address space is partitioned into segments Text segment (program’s text), Initialized data segment, Uninitialized data segment (also called bss), Stack, Heap Size of each segment is determined by kernel Can use ulimit shell command to alter (e.g., reduce heap & increase stack if lot of recursion) Command-line arguments & environment variables are placed on user stack frame when program is invoked 17

18 How does OS execute a non-binary file? 18

19 How does OS execute a non-binary file? Before executing, system checks the file’s first 2 bytes looking for “#!” (the “shebang”) –As a Windows programmer, for many years I wondered what a shebang was If finds that, reads the entire first line looking for the pathname of an interpreter Attempts to invoke that interpreter to execute this file 19

20 How does Linux run files compiled for other OSes? If similar OS & only minor differences (e.g., how signals are numbered), kernel can iron out the differences If big differences, need to run emulator (e.g., DOSemu, Wine) –Adapters: Basically, emulators intercept API / sys calls, & typically translate and invoke the Linux API / sys call 20

21 FORKING A PROCESS 21

22 Review: Forking & exec*-ing() The only way to launch a new program (process) is via fork() (that’s what the shell or the UI does when you run a program) fork() duplicates (clones) the currently-running process: original is called “parent”; new is called “child” A typical use is to fork() the current process, then immediately run exec*() in the child, which replaces it with a new program/process execl, execv, execle, execlp, execvp are library (API) calls, all of which invoke execve(), the only actual “exec” system call 22

23 Example: typing “ls -l” in shell When you type ls in the shell… The shell issues a fork(), so now you have 2 shells (a parent shell & a child shell) We don’t really want 2 shells, so the child shell immediately issues an exec*(…, “ls”, “-l”, …) This invokes the execve() system call, which replaces current process (child shell) with the “ls” program/process “Parent shell” continues to exist (as “the shell”), while child shell (now the “ls” program) runs to completion & terminates 23

24 Now let’s look behind the scenes… fork() semantics: parent & child should contain same data, but should have separate copies Upon fork(), parent’s data is copied over to child. But child has distinct address space. If child alters a variable, doesn’t change in parent process They retain parent-child relationship: parent can find & talk to its children; children can find & talk to their parent via ppid variable Is this efficient? –Example: shell. Did we really need to duplicate the shell just to run “ls”?? 24

25 Kernel implements some fork() & exec*() efficiencies After fork(), child & parent share same program text (marked as read-only) –Each process has their own Instruction Ptr to the text Copy-on-write (COW): child & parent share all data until 1 of them changes some datum’s value –Then, kernel clones only that page of data If parent issues multiple forks in a row, all children & parent will share same data… 25

26 COW Implementation Kernel marks “shared” COW pages as read-only If parent or child tries to write to a shared COW page, it forces a trap to the kernel First thing kernel checks upon this trap: Do we have a protection violation fault, or do we just need to un-share these pages? Kernel keeps a reference count of # of procs sharing each page of data (can un-share 1 proc at a time) –Unshare = copy over data & mark writable; decrease ref count on original copy; if ref count == 1, writable 26

27 What exactly happens upon exec*()? When issue exec*(), most of the “old process” resources are automatically discarded, closed, etc. Stack, heap, all data are wiped Except: –Process ID (pid) remains the same –All open file descriptors remain open (unless you passed-in FD_CLOEXEC flag to open()) (That’s why COW: what a waste to copy over code & data upon fork(), then wipe it on exec!) 27

28 After fork(), who runs first? Child or parent? Which is better? 28

29 After fork(), who runs first? Child or parent? Which is better? We now have 2 processes: should parent or child run first? Child should! Child will often exec() & discard the data, but if parent runs first then any data it changes requires copying the data first! Parent should! Probably has lot of data it needs in cache; if child runs first, it’ll probably thrash the cache! (esp if exec()s new process) Bottom line: It’s indeterminate, though you can hack the kernel if this is important to you. 29

30 MEMORY MANAGEMENT PART 1: THE HEAP 30

31 How does the kernel allocate memory? Programs are allocated memory space upon start, & can request additional free store as needed When program requests memory (e.g., malloc), the C memory mgr provides from its free store When the free store runs low, C memory mgr requests more mem from the OS C memory mgr typically requests more than it needs from the OS (slow) to avoid future requests C mem mgr maintains reserve that could dry up at any time (even during crit. real-time routine) 31

32 What does it mean to dynamically allocate memory via new/malloc? “Program break”: the heap’s current limit (the “end of the heap”) new/malloc = adjusting the program break brk() is only system call to adjust the program break; malloc, calloc, realloc, new call brk() Program can call brk() directly, but all it does is move the program break; doesn’t manage free list –malloc/new manage free list for you… 32

33 Example int a* = new int;// assume returns addr 1,000 int b* = new int;// assume returns addr 1,004 delete a;// frees addr 1,000 int c* = new int;// can re-assign addr 1,000 You could have called brk() instead and asked for 8 bytes, but then you’d have to manually remember which bytes have useful data! Could be useful to use brk(), however, if you ever need to build your own memory allocator –new/delete very expensive 33

34 Speaking of building your own memory allocator… Linux has a “slab allocator” that sets-aside “slabs” sized to commonly-used structs Every time process starts (& need new “task_struct”), can grab slab of that exact size off of slab allocator instead of asking memory allocator to hunt around for chunk of that size –e.g., divide 4KB page into sixty 68-byte slabs More efficient: avoids hunting for properly-sized chunks; avoids fragmentation **You can build a slab allocator for yourself if ever find yourself needing lot of same-sized chunks 34

35 How malloc/new work As discussed, malloc typically asks brk() for more memory than it needs & stores on free list to avoid slow system calls to OS malloc/new allocate extra bytes to hold size of block (& possibly other metadata) so know how much to delete when call free/delete if declare sequence of ints on stack, typically will be 4 bytes apart. On heap, they’re 64 bytes apart on my system! Dyn alloc of small objects is very wasteful Another good reason to build your own mem allocator 35

36 alloca() Function I didn’t know exists that allocates memory dynamically on the stack Increases the size of the stack frame by modifying the value of the stack pointer Why might this be useful? 36

37 Why might alloca() be useful? If you’re inside a function & suddenly need to grab a bunch of memory (that you won’t need outside of the function)… & don’t want to waste time manually freeing the memory Why? Because the entire chunk will be freed once you exit the function (because the stack frame disappears) Also if you’re planning to do a longjmp, don’t need to worry about memory leaks; again the memory will be freed when stack frame released Also, faster than new because no searching Beware though: could exceed stack boundary! 37

38 What happens when you release memory back to the kernel? When you issue free/delete, C memory manager typically holds the pages on its free list; doesn’t release to kernel But at some point your program terminates and the entire memory space is released back to the kernel Which brings us to our next topic… 38

39 MEMORY MANAGEMENT PART 2: VIRTUAL MEMORY & PAGING 39

40 Virtual Memory Virtual memory & paging is a very large topic; we’re only going to focus here on a couple items that relate more closely to PL&C As mentioned, programs are allocated memory space upon start by the kernel The kernel allocates memory in chunks of pages (typically a page is 4KB) 40

41 Growing the memory space Program can acquire more pages (if available) as needed However, you’re battling other programs, data, & caches for finite memory space Why acquire more memory? Examples: –Memory mapping: can map a file’s contents into process’s address space so more efficient I/O –IPC: Can create a shared memory region to communicate with another process 41

42 The kernel’s “free list” Kernel groups consecutive free pages together, & maintains lists of 1, 2, 4, 8, …, 1024 consecutive pages Example: Process needs 6 more page frames –Kernel searches “8” list; if finds something, assigns 6 to the process & moves other 2 onto the “2” list –Else searches the “16” list; if finds something, assigns 6 to the process, moves 8 pages to the “8” list, & final 2 pages to the “2” list What are the advantages of this approach? What are the disadvantages? 42

43 Zoned Page Frame Allocator Advantage: More efficient way to allocate memory (instead of manually traversing free store list looking for perfect size) Downside: “16” example from previous page –Assigns 6; splits remaining 10 into 8 + 2 –Future request for 10 pages won’t find these! –Fragments memory; can get premature out of mem (We have 10 consecutive pages but kernel doesn’t know it) I think emphasis is on quick alloc (& de-alloc), rather than low-on-mem, b/c so much memory! If memory constraints are important to you (e.g., embedded device), might want to hack the kernel 43

44 What happens when programs release page frames back to the kernel? Remember, kernel keeps lists of 1, 2, … 1024 consecutive free pages If you release 5 pages, kernel splits into 4 + 1, & puts 4 pages on “4” list & 1 page on “1” list Buddy system: Kernel tries to merge each chunk with its “buddy” (neighbor) For example, tries to merge the chunk of 4 pages with its neighbor chunk of 4 pages so can store them on the “8” list to minimize fragmentation 44

45 Who is my buddy? If we’re releasing 4 pages back to memory, starting at address 10101010101010101010, who is my buddy? (Technically every chunk has 2 neighbors, of course, but kernel only considers 1 of them your “buddy”) The buddies are mutually exclusive; they are each other’s buddy, & ignore their other neighbors –(Imagine houses on a street) Who is my buddy? 45

46 Who is my buddy? If we’re releasing 4 pages back to memory, starting at address 10101010101010101010, who is my buddy? 10101110101010101010 4 pages is 16KB chunk So we can “flip on” the 16KB bit to find our neighbor (in essence, moving forward 16KB) If the bit were already on, we’d flip the bit off, meaning our buddy is right behind us 46

47 So then what? We flip the bit to find our buddy’s address, and check to see if he’s on the free store list of the same size If he is, we “combine” with him and together we move up to the “8” list Can be applied iteratively: now we search for our buddy on the “8” list by flipping 1 bit to the left Of course, our buddy might not be in free store Or… our buddy might be on a different-sized free store list, e.g., the “2” list, which does us no good; can’t combine to make 4+2 = 6 47

48 Pros & cons of buddy system Benefits: –Very efficient method of finding your neighbor to combine chunks & reduce fragmentation –Can be applied iteratively to build larger chunks Downsides: –You’re only checking 1 neighbor; 2 consecutive 4- page chunks can sit on the free list, & won’t be merged if they’re not each other’s buddy –Still have 4+2 fragmentation problem from earlier: a consecutive 4-page chunk and 2-page chunk still remain unmerged b/c we don’t store 6-page chunks If memory constraints are important to you… 48

49 That’s it for now…… Hopefully you learned a thing or 2 that you might find useful in the future… Questions? 49

50 50


Download ppt "How the OS facilitates programming languages & compilers (with a focus on the Linux kernel) Eric Powders (ejp2127)"

Similar presentations


Ads by Google