Embedding Assembly Code in C Programs תרגול 7 שילוב קוד אסמבלי בקוד C.

Slides:

Advertisements

Similar presentations

Chapter 11 Introduction to Programming in C

Advertisements

SYMBOL TABLES &CODE GENERATION FOR EXECUTABLES. SYMBOL TABLES Compilers that produce an executable (or the representation of an executable in object module.

Software & Services Group, Developer Products Division Copyright© 2010, Intel Corporation. All rights reserved. *Other brands and names are the property.

CPU Review and Programming Models CT101 – Computing Systems.

INSTRUCTION SET ARCHITECTURES

What is a pointer? First of all, it is a variable, just like other variables you studied So it has type, storage etc. Difference: it can only store the.

Lecture 2 Introduction to C Programming

Introduction to C Programming

Semantic analysis Parsing only verifies that the program consists of tokens arranged in a syntactically-valid combination, we now move on to semantic analysis,

Inline Assembly Section 1: Recitation 7. In the early days of computing, most programs were written in assembly code. –Unmanageable because No type checking,

1 Homework Turn in HW2 at start of next class. Starting Chapter 2 K&R. Read ahead. HW3 is on line. –Due: class 9, but a lot to do! –You may want to get.

CS1061 C Programming Lecture 2: A Few Simple Programs A. O’Riordan, 2004.

CSCE 121, Sec 200, 507, 508 Fall 2010 Prof. Jennifer L. Welch.

1 Key Concepts:  Why C?  Life Cycle Of a C program,  What is a computer program?  A program statement?  Basic parts of a C program,  Printf() function?

State Machines Timing Computer Bus Computer Performance Instruction Set Architectures RISC / CISC Machines.

Assembly תרגול 8 פונקציות והתקפת buffer.. Procedures (Functions) A procedure call involves passing both data and control from one part of the code to.

Guide To UNIX Using Linux Third Edition

Introduction to C Programming

Introduction to Array The fundamental unit of data in any MATLAB program is the array. 1. An array is a collection of data values organized into rows and.

CHAPTER 1: INTORDUCTION TO C LANGUAGE

Basic Elements of C++ Chapter 2.

Copyright 2003 Scott/Jones Publishing Brief Version of Starting Out with C++, 4th Edition Chapter 1 Introduction to Computers and Programming.

CCSA 221 Programming in C CHAPTER 2 SOME FUNDAMENTALS 1 ALHANOUF ALAMR.

CHAPTER 4: INTRODUCTION TO COMPUTER ORGANIZATION AND PROGRAMMING DESIGN Lec. Ghader Kurdi.

INTRODUCTION TO COMPUTING CHAPTER NO. 06. Compilers and Language Translation Introduction The Compilation Process Phase 1 – Lexical Analysis Phase 2 –

Chapter 3 Elements of Assembly Language. 3.1 Assembly Language Statements.

C How to Program, 6/e © by Pearson Education, Inc. All Rights Reserved.

DEPARTMENT OF COMPUTER SCIENCE & TECHNOLOGY FACULTY OF SCIENCE & TECHNOLOGY UNIVERSITY OF UWA WELLASSA 1 CST 221 OBJECT ORIENTED PROGRAMMING(OOP) ( 2 CREDITS.

9 Chapter Nine Compiled Web Server Programs. 9 Chapter Objectives Learn about Common Gateway Interface (CGI) Create CGI programs that generate dynamic.

C Tokens Identifiers Keywords Constants Operators Special symbols.

Introduction to Java Applications Part II. In this chapter you will learn:  Different data types( Primitive data types).  How to declare variables?

Chapter 06 (Part I) Functions and an Introduction to Recursion.

Programming With C.

CSE 1301 Lecture 2 Data Types Figures from Lewis, “C# Software Solutions”, Addison Wesley Richard Gesick.

Input, Output, and Processing

Sharda University P. K. Mishra (Asst.Prof) Department of Computer Science & Technology Subject Name: Programming Using C Sub Code: CSE-106 Programming.

Week 1 Algorithmization and Programming Languages.

COMPUTER PROGRAMMING. A Typical C++ Environment Phases of C++ Programs: 1- Edit 2- Preprocess 3- Compile 4- Link 5- Load 6- Execute Loader Primary Memory.

Lists II. List ADT When using an array-based implementation of the List ADT we encounter two problems; 1. Overflow 2. Wasted Space These limitations are.

Execution of an instruction

Data TypestMyn1 Data Types The type of a variable is not set by the programmer; rather, it is decided at runtime by PHP depending on the context in which.

Algorithms  Problem: Write pseudocode for a program that keeps asking the user to input integers until the user enters zero, and then determines and outputs.

Programming Fundamentals. Overview of Previous Lecture Phases of C++ Environment Program statement Vs Preprocessor directive Whitespaces Comments.

Introduction to OOP CPS235: Introduction.

Introduction to Java Applications Part II. In this chapter you will learn:  Different data types( Primitive data types).  How to declare variables?

FUNCTIONS. Midterm questions (1-10) review 1. Every line in a C program should end with a semicolon. 2. In C language lowercase letters are significant.

Announcements Assignment 2 Out Today Quiz today - so I need to shut up at 4:25 1.

1 Types of Programming Language (1) Three types of programming languages 1.Machine languages Strings of numbers giving machine specific instructions Example:

LECTURE 3 Translation. PROCESS MEMORY There are four general areas of memory in a process. The text area contains the instructions for the application.

Introduction to Computer Programming Concepts M. Uyguroğlu R. Uyguroğlu.

Chapter 1 slides1 What is C? A high-level language that is extremely useful for engineering computations. A computer language that has endured for almost.

Some of the utilities associated with the development of programs. These program development tools allow users to write and construct programs that the.

Lecture 3 Translation.

Chapter Topics The Basics of a C++ Program Data Types

Visit for more Learning Resources

User-Written Functions

CSC201: Computer Programming

Basic Elements of C++.

Algorithms Problem: Write pseudocode for a program that keeps asking the user to input integers until the user enters zero, and then determines and outputs.

Introduction to C Topics Compilation Using the gcc Compiler

Compiler Construction

Basic Elements of C++ Chapter 2.

IPC144 Introduction to Programming Using C Week 1 – Lesson 2

Compiler Construction

C Structures, Unions, Bit Manipulations and Enumerations

Introduction to C++ Programming

CSCE Fall 2013 Prof. Jennifer L. Welch.

Introduction C is a general-purpose, high-level language that was originally developed by Dennis M. Ritchie to develop the UNIX operating system at Bell.

CSCE Fall 2012 Prof. Jennifer L. Welch.

Presentation transcript:

Embedding Assembly Code in C Programs תרגול 7 שילוב קוד אסמבלי בקוד C.

History In the early days of computing, most programs were written in assembly code.  This becomes unmanageable for programs of significant complexity.  Since assembly code does not provide any form of type checking, it is very easy to make basic mistakes, such as using a pointer as an integer.  Even worse, writing in assembly code locks the entire program into a particular class of machine. Early compilers for higher-level programming languages did not generate very efficient code and did not provide access to the low-level object representations. Programs requiring maximum performance or requiring access to object representations were still often written in assembly code.

Nowadays Compilers have largely removed performance optimization as a reason for writing in assembly code. Code generated by a high quality compiler is generally as good or even better than what can be achieved manually. The C language has largely eliminated machine access as a reason for writing in assembly code.  Unions - A collection of variables of different types, where only one is stored at each point in time.  Pointer arithmetic.  Ability to operate on bit-level data representations. This provides a sufficient access to the machine for most programmers.  E.g., almost every part of a modern operating system such as Linux is written in C.

Why embedding? There are times when writing in assembly code is the only option:  (Especially) When implementing an operating system. Examples:  There are a number of special registers storing process state information that the operating system must access.  There are either special instructions or special memory locations for performing input and output operations.  Even for application programmers, there are some machine features, such as the values of the condition codes, that cannot be accessed directly in C.

The challenge: To integrate code consisting mainly of C with a small amount written in assembly language. Methods:  Writing the assembly function in a different file using the same conventions for argument passing and register usage as are followed by the C compiler, and combine all the code using the Linker. E.g., file p1.c contains C code and file p2.s contains assembly code. The command: unix> gcc -o p p1.c p2.s will cause file p1.c to be compiled, file p2.s to be assembled, and the resulting object code to be linked to form an executable program p.  Using Inline Assembly.

Inline Assembly ( asm directive) Inline assembly allows the user to insert assembly code directly into the code sequence generated by the compiler. Features are provided to specify instruction operands and to indicate to the compiler which registers are being overwritten by the assembly instructions. The resulting code is, of course, highly machine- dependent, since different types of machines do not have compatible machine instructions. The asm directive is also specific to GCC. Therefore, incompatible with many other compilers.

Basic form of inline assembly The basic form of inline assembly is to write code that looks like a procedure call: asm( code-string ); code-string is an assembly code sequence given as a quoted string. The compiler will insert this string verbatim into the assembly code being generated, and hence the compiler-supplied and the user supplied assembly will be combined. Important: The compiler doesn’t check the string for errors, and so the first indication of a problem might be an error report from the assembler.

An example: Consider functions with the following prototypes: int ok_smul(int x, int y, int *dest); int ok_umul(unsigned x, unsigned y, unsigned *dest); Each is supposed to compute the product of arguments x and y and store the result in the memory location specified by argument dest. As return values, they should return 0 when the multiplication overflows and 1 when it does not. We have separate functions for signed and unsigned multiplication, since they overflow under different circumstances.

An example: - first try The strategy here is to exploit the fact that register %eax is used to store the return value. Assuming the compiler uses this register for variable result, the first line will set the register to 0. The inline assembly will insert code that sets the low-order byte of this register appropriately, and the register will be used as the return value.

Unfortunately, GCC has its own ideas of code generation. Instead of setting register %eax to 0 at the beginning of the function, the generated code does so at the very end, by doing this overrides the assembly operation - and so the function always returns 0. The fundamental problem is that the compiler has no way to know what the programmer’s intentions are, and how the assembly statement should interact with the rest of the generated code.

An example: - second try A less than ideal code, but working code:

Same strategy as before, but this time the code reads a global variable dummy to initialize result to 0. Compilers are typically more conservative about generating code involving global variables, and therefore less likely to rearrange the ordering of the computations. Even the above code depends on quirks of the compiler to get proper behavior. In fact, it only works when compiled with optimization enabled. Otherwise it stores result on the stack and retrieves its value just before returning.

Extended form of asm The extended version of the asm allows the programmer to specify which program values are to be used as operands to an assembly code sequence and which registers are overwritten by the assembly code. With this information the compiler can generate code that will correctly set up the required source values, execute the assembly instructions, and make use of the computed results. It will also have information it requires about register usage so that important program values are not overwritten by the assembly code instructions.

Extended form syntax The general syntax of an extended assembly sequence is as follows: asm( code-string [ : output-list [ : input-list [ : overwrite-list ] ] ] ); The square brackets denote optional arguments.  code-string – the assembly code sequence.  output-list – results generated by the assembly code.  input-list – source values for the assembly code.  overwrite-list – registers that are overwritten by the assembly code. These lists are separated by the colon (‘:’) character. As the square brackets show, we only include lists up to the last nonempty list.

Extended form syntax (2) The syntax for the code string is reminiscent of that for the format string in a printf statement. It consists of a sequence of assembly code instructions separated by the semicolon (‘;’) character. Input and output operands are denoted by references %0, %1, and so on, up to possibly %9. Operands are numbered, according to their ordering first in the output list and then in the input list. Register names such as “%eax” must be written with an extra ‘%’ symbol, e.g., “%eax.”

An example: - third try

The first assembly instruction stores the test result in the single-byte register %bl. The second instruction zero-extends and copies the value to whatever register the compiler chooses to hold result, indicated by operand %0. The output list consists of pairs of values separated by spaces. (In this example there is only a single pair). The first element of the pair is a string indicating the operand type, where ‘r’ indicates an integer register and ‘=’ indicates that the assembly code assigns a value to this operand. The second element of the pair is the operand enclosed in parentheses. It can be any assignable value (known in C as an lvalue). The input list has the same general format, while the overwrite list simply gives the names of the registers (as quoted strings) that are overwritten.

The code shown above works regardless of the compilation flags. As this example illustrates, it may take a little creative thinking to write assembly code that will allow the operands to be described in the required form.  E.g., there are no direct ways to specify a program value to use as the destination operand for the setae instruction, since the operand must be a single byte.  Instead, we write a code sequence based on a specific register and then use an extra data movement instruction to copy the resulting value to some part of the program state.

An example - umul We can’t use the same code since GCC use imull (signed multiply) instruction for both signed and unsigned multiplication. This generates the correct value for either product, but it sets the carry flag according to the rules for signed multiplication. Therefore, we must include an assembly code that performs unsigned multiplication using the mull instruction.

An example - umul (2)