Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright 2013 – Noah Mendelsohn Compiling C Programs Noah Mendelsohn Tufts University Web:

Similar presentations


Presentation on theme: "Copyright 2013 – Noah Mendelsohn Compiling C Programs Noah Mendelsohn Tufts University Web:"— Presentation transcript:

1 Copyright 2013 – Noah Mendelsohn Compiling C Programs Noah Mendelsohn Tufts University Email: noah@cs.tufts.edunoah@cs.tufts.edu Web: http://www.cs.tufts.edu/~noahhttp://www.cs.tufts.edu/~noah COMP 40: Machine Structure and Assembly Language Programming (Fall 2014)

2 © 2010 Noah Mendelsohn 2 Today  Much of this material is well- covered in course lecture notes  Here we just present a few diagrams and samples

3 © 2010 Noah Mendelsohn 3 How do we get from source to executable program?

4 © 2010 Noah Mendelsohn Executable files  Executable file: –A single file with all code ready to run at a fixed address in memory –Typically the same address for all programs  Requirements –Code divided into multiple source files (.c files and.h files) –Functions in shared.c files need to show up in lots of executables –Often we want to share only the compiled versions (.o files) [you don’t have the source for printf() but you use it all the time]  The challenge –In different executables using the same shared code… –… the same functions and global variables may wind up at different addresses … –… but we still need to make references work across source files 4

5 © 2010 Noah Mendelsohn Resolving external references 5 #include int main(int argc, char *argv[]) { printf(“The sum is %d\n”,sum(1,2)); } two_plus_one.c int sum(int a, int b) { return a+b; } arith.c call to sum(1,2)code for sum() How do we know where sum() wound up? two_plus_one (executable)

6 © 2010 Noah Mendelsohn From source code to executable (simplified) 6 two_plus_one.c int sum(int a, int b) { return a+b; } arith.c gcc –c arith.c Relocateable object code for sum() arith.o gcc –c two_plus_one.c Relocateable object code for main() two_plus_one.o #include int main(int argc, char *argv[]) { printf(“The sum is %d\n”,sum(1,2)); }

7 © 2010 Noah Mendelsohn From source code to executable (simplified) 7 #include int main(int argc, char *argv[]) { printf(“The sum is %d\n” sum(1,2)); } two_plus_one.c int sum(int a, int b) { return a+b; } gcc –c arith.c Relocateable object code for sum() arith.c arith.o gcc –c two_plus_one.c Relocateable object code for main() two_plus_one.o Relocatable.o files Contain machine code References within the file are resolved References to external files not resolved Some address fields may need adjusting later depending on final location in executable program Includes lists of: 1) Names and addresses of defined externals 2) Names and referents of things needing relocation

8 © 2010 Noah Mendelsohn Linking.o files to create executable 8 gcc –o two_plus_one two_plus_one.o arith.o Relocateable object code for sum() two_plus_one.o Relocateable object code for sum() arith.o Executable Program two_plus_one

9 © 2010 Noah Mendelsohn Linking.o files to create executable 9 gcc –o two_plus_one two_plus_one.o arith.o Relocateable object code for sum() two_plus_one.o Relocateable object code for sum() arith.o Executable Program two_plus_one gcc actually runs a program named “ld” to create the executable.

10 © 2010 Noah Mendelsohn Linking.o files to create executable 10 gcc –o two_plus_one two_plus_one.o arith.o Relocateable object code for sum() two_plus_one.o Relocateable object code for sum() arith.o Executable Program two_plus_one To create executable: Code from all.o files collected in one executable Fixed load address assumed All references resolved – code & vars updated

11 © 2010 Noah Mendelsohn Linking.o files to create executable 11 gcc –o two_plus one two_plus_one.o arith.o Relocateable object code for sum() two_plus_one.o Relocateable object code for sum() arith.o Executable Program two_plus_one The executable contains all the code, with references resolved, loadable at a fixed addr. It is ready to be invoked using the exec_() family of system calls or from the command line [which uses exec()].

12 © 2010 Noah Mendelsohn Linking.o files to create executable 12 gcc –o two_plus_one two_plus_one.o arith.o Relocateable object code for sum() two_plus_one.o Relocateable object code for sum() arith.o Executable Program two_plus_one The default name for an executable is a.out so programmers sometimes informally refer to any executable as an “a.out”.

13 © 2010 Noah Mendelsohn 13 We left out two important steps!

14 © 2010 Noah Mendelsohn Preprocessor 14 #include #define TWO 2 int main(int argc, char *argv[]) { printf(“The sum is %d\n”, sum(1,TWO)); } Before the compiler even sees the code… …the preprocessor rewrites the code handling all #define, #include, #ifdef and macro substitution… These are gone before the compiler sees the code

15 © 2010 Noah Mendelsohn Preprocessor used for sharing declarations 15 #include #include “arith.h” int main(int argc, char *argv[]) { printf(“The sum is %d\n”,sum(1,2)); } two_plus_one.c #include “arith.h” int sum(int a, int b) { return a+b; } arith.c int sum(int a, int b); arith.h Caller and callee agree on function prototype for sum()

16 © 2010 Noah Mendelsohn We also left out the assembler step  The object code in a.o is binary (not human-readable)  Assembly language is a human-reable form of machine code –Symbolic names for machine instructions –Symbolic labels for addresses (like variables and branch targets in code) –Etc.  When you run gcc –c it actually does three steps: –Run the preprocessor –Run the compiler itself to create an assembler file –Run the assembler to create a.o –Normally, we do these steps together, but you can use switches to run them separately 16

17 © 2010 Noah Mendelsohn Common invocations of gcc 17 gcc –c two_plus_two.c  Runs preprocessor, compiler & assembler to make two_plus_two.o gcc –c arith.c  Same: makes arith.o gcc –o two_plus_two two_plus_two.o arith.o  Use ld to link.o files + system libraries to make two_plus_two executale gcc –E two_plus_two.c  Runs just preprocessor gcc –S two_plus_two.c  Runs just preprocessor & compiler, produces assembler in.s file gcc –c two_plus_two.s  Notices.s extension, runs assembler

18 © 2010 Noah Mendelsohn 18 Putting it All Together

19 © 2010 Noah Mendelsohn Compiling a program 19 #include int main(int argc, char *argv[]) { printf(“The sum is %d\n” sum(1,2)); } Preprocessor (cpp) Pre processed source Compiler (cpp) Assembler Source Assembler (as).o file Preprocessor (cpp) Pre processed source Compiler (cpp) Assembler Source Assembler (as).o file int sum(int a, int b) { return a+b; } Loader (ld) Two_plus_two (executable)

20 © 2010 Noah Mendelsohn 20 Shared Libraries (not required for COMP 40) (these slides on shared libraries were used in COMP 111 …you may find them interesting to read)

21 © 2010 Noah Mendelsohn Ooops! Where does printf come from? 21 gcc –o two_plus one two_plus_one.o arith.o libc.a Relocateable object code for sum() two_plus_one.o Relocateable object code for sum() arith.o Executable Program two_plus_one Routines like printf live in libraries.

22 © 2010 Noah Mendelsohn Ooops! Where does printf come from? 22 gcc –o two_plus one two_plus_one.o arith.o Relocateable object code for sum() two_plus_one.o Relocateable object code for sum() arith.o Executable Program two_plus_one Routines like printf live in libraries. These are created with the “ ar ” command, which packages up several.o files together into a “.a ” archive or library. You can list the.a along with your separate.o files and ld will pull from it any.o files it needs.

23 © 2010 Noah Mendelsohn Ooops! Where does printf come from? 23 gcc –o two_plus one two_plus_one.o arith.o Relocateable object code for sum() two_plus_one.o Relocateable object code for sum() arith.o Executable Program two_plus_one Routines like printf live in libraries. These are created with the “ ar ” command, which packages up several.o files together into a “.a ” archive or library. You can list the.a along with your separate.o files and ld will pull from it any.o files it needs. printf used to live in the system library named libc.a, which the compiler links automatically into the executable (so you don’t have to list it).

24 © 2010 Noah Mendelsohn Why shared libraries?  Problem: if printf is linked from the libc.a, then we get a separate copy in each program that uses printf  Idea: what if we could have one copy and use memory mapping to put it into every executable that needs it?  Challenges: –We can’t link it when ld builds the rest of the executable: we can just note we need it –The same copy is likely to be mapped at different addresses in different programs 24

25 © 2010 Noah Mendelsohn Why shared libraries?  Problem: if printf is linked from the libc.a, then we get a separate copy in each program that uses printf  Idea: what if we could have one copy and use memory mapping to put it into every executable that needs it?  Challenges: –We can’t link it when ld builds the rest of the executable: we can just note we need it –The same copy is likely to be mapped at different addresses in different programs  Solution: compiler, linker and OS work together to support shared libraries –gcc –fPIC printf.c  generates “position-independent code” that can load at any address –gcc –shared –o libc.so printf.o xxx.o obj3.o  creates shared library –gcc –o two_plus_one two_plus_one.o arith.o libc.so 25 We’ll use printf as an example even though it’s built in to the system… Compile the source with –fPIC to make a position-independent.o file.

26 © 2010 Noah Mendelsohn Why shared libraries?  Problem: if printf is linked from the libc.a, then we get a separate copy in each program that uses printf  Idea: what if we could have one copy and use memory mapping to put it into every executable that needs it?  Challenges: –We can’t link it when ld builds the rest of the executable: we can just note we need it –The same copy is likely to be mapped at different addresses in different programs  Solution: compiler, linker and OS work together to support shared libraries –gcc –fPIC printf.c  generates “position-independent code” that can load at any address –gcc –shared –o libc.so printf.o xxx.o obj3.o  creates shared library –gcc –o two_plus_one two_plus_one.o arith.o libc.so 26 Link that printf.o and any other files with the –shared option to create a shared library (.so) file.

27 © 2010 Noah Mendelsohn Why shared libraries?  Problem: if printf is linked from the libc.a, then we get a separate copy in each program that uses printf  Idea: what if we could have one copy and use memory mapping to put it into every executable that needs it?  Challenges: –We can’t link it when ld builds the rest of the executable: we can just note we need it –The same copy is likely to be mapped at different addresses in different programs  Solution: compiler, linker and OS work together to support shared libraries –gcc –fPIC printf.c  generates “position-independent code” that can load at any address –gcc –shared –o libc.so printf.o xxx.o obj3.o  creates shared library –gcc –o two_plus_one two_plus_one.o arith.o libc.so 27 The linker recognizes.so files…instead of including the code, it leaves a little stub that tells the OS to find and map the shared copy of the.so file when exec loads the program. (Actually, libc.so is so widely used that it’s automatically linked, so you don’t need to list it as you would your own. so libraries).

28 © 2010 Noah Mendelsohn MAIN MEMORY CPU Angry Birds Play Video Browser OPERATING SYSTEM Angry Birds Stack (Angry Birds Call Stack) Text (Angry Birds code) Static initialized (Angry Birds Data) Static uninitialized (Angry Birds Data) Heap (malloc’d) argv, environ ??? libc.so Stack (Browser Call Stack) Text (Browser code) Static initialized (Browser Data) Static uninitialized (Browser Data) Heap (malloc’d) argv, environ libc.so libc.so (with printf code) shows up at different locations in the two programs Memory mapping allows sharing of.so libraries

29 © 2010 Noah Mendelsohn Memory mapping allows sharing of.so libraries MAIN MEMORY CPU Angry Birds Play Video Browser OPERATING SYSTEM Stack (Angry Birds Call Stack) Text (Angry Birds code) Static initialized (Angry Birds Data) Static uninitialized (Angry Birds Data) Heap (malloc’d) argv, environ Stack (Angry Birds Call Stack) Text (Browser code) Static initialized (Browser Data) Static uninitialized (Browser Data) Heap (malloc’d) argv, environ Angry Birds ??? libc.so Only one copy lives in memory… everyone shares it!

30 © 2010 Noah Mendelsohn Memory mapping allows sharing of.so libraries MAIN MEMORY CPU Angry Birds Play Video Browser OPERATING SYSTEM Stack (Angry Birds Call Stack) Text (Angry Birds code) Static initialized (Angry Birds Data) Static uninitialized (Angry Birds Data) Heap (malloc’d) argv, environ Stack (Angry Birds Call Stack) Text (Browser code) Static initialized (Browser Data) Static uninitialized (Browser Data) Heap (malloc’d) argv, environ Angry Birds ??? libc.so Memory mapping hardware can do this… Code must be position- independent!


Download ppt "Copyright 2013 – Noah Mendelsohn Compiling C Programs Noah Mendelsohn Tufts University Web:"

Similar presentations


Ads by Google