LLVM IR, File - Praakrit Pradhan. Overview The LLVM bitcode has essentially two things A bitstream container format Encoding of LLVM IR.

Slides:



Advertisements
Similar presentations
Introduction to Java 2 Programming Lecture 3 Writing Java Applications, Java Development Tools.
Advertisements

Chapter 11 Introduction to Programming in C
Intermediate Code Generation
Chapter 10- Instruction set architectures
CSI 3120, Implementing subprograms, page 1 Implementing subprograms The environment in block-structured languages The structure of the activation stack.
1 Compiler Construction Intermediate Code Generation.
Wannabe Lecturer Alexandre Joly inst.eecs.berkeley.edu/~cs61c-te
The Assembly Language Level
Program Representations. Representing programs Goals.
Kernighan/Ritchie: Kelley/Pohl:
Semantic analysis Parsing only verifies that the program consists of tokens arranged in a syntactically-valid combination, we now move on to semantic analysis,
Intermediate Representation I High-Level to Low-Level IR Translation EECS 483 – Lecture 17 University of Michigan Monday, November 6, 2006.
CPSC Compiler Tutorial 9 Review of Compiler.
CPSC Compiler Tutorial 8 Code Generator (unoptimized)
Representing programs Goals. Representing programs Primary goals –analysis is easy and effective just a few cases to handle directly link related things.
1 Intermediate representation Goals: –encode knowledge about the program –facilitate analysis –facilitate retargeting –facilitate optimization scanning.
3/17/2008Prof. Hilfinger CS 164 Lecture 231 Run-time organization Lecture 23.
Intermediate Code CS 471 October 29, CS 471 – Fall Intermediate Code Generation Source code Lexical Analysis Syntactic Analysis Semantic.
Program Design and Development
CSCE 121, Sec 200, 507, 508 Fall 2010 Prof. Jennifer L. Welch.
Run time vs. Compile time
Compiler Construction1 A Compulsory Module for Students in Computer Science Department Faculty of IT / Al – Al Bayt University First Semester 2009/2010.
1 Run time vs. Compile time The compiler must generate code to handle issues that arise at run time Representation of various data types Procedure linkage.
CS 153: Concepts of Compiler Design August 25 Class Meeting Department of Computer Science San Jose State University Fall 2014 Instructor: Ron Mak
1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
An Introduction Chapter Chapter 1 Introduction2 Computer Systems  Programmable machines  Hardware + Software (program) HardwareProgram.
สาขาวิชาเทคโนโลยี สารสนเทศ คณะเทคโนโลยีสารสนเทศ และการสื่อสาร.
CS412/413 Introduction to Compilers Radu Rugina Lecture 15: Translating High IR to Low IR 22 Feb 02.
COP4020 Programming Languages
Perl Tutorial Presented by Pradeepsunder. Why PERL ???  Practical extraction and report language  Similar to shell script but lot easier and more powerful.
IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.
Compiler course 1. Introduction. Outline Scope of the course Disciplines involved in it Abstract view for a compiler Front-end and back-end tasks Modules.
Programming With C.
Compiler Chapter# 5 Intermediate code generation.
Unit-1 Introduction Prepared by: Prof. Harish I Rathod
CPS120: Introduction to Computer Science Decision Making in Programs.
1.  10% Assignments/ class participation  10% Pop Quizzes  05% Attendance  25% Mid Term  50% Final Term 2.
What on Earth? LEXEMETOKENPATTERN print p,r,i,n,t (leftpar( 4number4 *arith* 5number5 )rightpar) userAnswerID Letter followed by letters and digits “Game.
1 Text Reference: Warford. 2 Computer Architecture: The design of those aspects of a computer which are visible to the programmer. Architecture Organization.
Introduction to Compilers. Related Area Programming languages Machine architecture Language theory Algorithms Data structures Operating systems Software.
RUN-Time Organization Compiler phase— Before writing a code generator, we must decide how to marshal the resources of the target machine (instructions,
CPSC 252 The Big Three Page 1 The “Big Three” Every class that has data members pointing to dynamically allocated memory must implement these three methods:
Overview of Previous Lesson(s) Over View  A program must be translated into a form in which it can be executed by a computer.  The software systems.
Perl Tutorial. Why PERL ??? Practical extraction and report language Similar to shell script but lot easier and more powerful Easy availablity All details.
Fall 2002CS 150: Intro. to Computing1 Streams and File I/O (That is, Input/Output) OR How you read data from files and write data to files.
The Model of Compilation Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University.
1 Compiler & its Phases Krishan Kumar Asstt. Prof. (CSE) BPRCE, Gohana.
Introduction to OOP CPS235: Introduction.
1 Control Flow Graphs. 2 Optimizations Code transformations to improve program –Mainly: improve execution time –Also: reduce program size Can be done.
Processor Fundamentals Assembly Language. Learning Objectives Show understanding of the relationship between assembly language and machine code, including.
CSC 4181 Compiler Construction
©SoftMoore ConsultingSlide 1 Structure of Compilers.
1 Asstt. Prof Navjot Kaur Computer Dept PRESENTED BY.
Alexandria University Faculty of Science Computer Science Department Introduction to Programming C++
LECTURE 3 Translation. PROCESS MEMORY There are four general areas of memory in a process. The text area contains the instructions for the application.
LLVM Simone Campanoni
Introduction to Computer Programming Concepts M. Uyguroğlu R. Uyguroğlu.
F453 Module 8: Low Level Languages 8.1: Use of Computer Architecture.
Single Static Assignment Intermediate Representation (or SSA IR) Many examples and pictures taken from Wikipedia.
A Single Intermediate Language That Supports Multiple Implemtntation of Exceptions Delvin Defoe Washington University in Saint Louis Department of Computer.
Component 1.6.
Visit for more Learning Resources
Introduction to Compiler Construction
Compiler Construction (CS-636)
Compilers Principles, Techniques, & Tools Taught by Jing Zhang
C Language VIVA Questions with Answers
Chapter 11 Introduction to Programming in C
Chapter 6 Intermediate-Code Generation
Instructions in Machine Language
Review: What is an activation record?
Presentation transcript:

LLVM IR, File - Praakrit Pradhan

Overview The LLVM bitcode has essentially two things A bitstream container format Encoding of LLVM IR

Bitstream format The bitstream format is an abstract encoding of structured data, very similar to XML in some ways. Like XML, bitstream files contain tags, and nested structures, and you can parse the file without having to understand the tags. Unlike XML, the bitstream format is a binary encoding, and unlike XML it provides a mechanism for the file to self- describe “abbreviations”, which are effectively size optimizations for the content.

LLVM IR file LLVM IR files may be optionally embedded into a wrapper structure, or in a native object file. Both of these mechanisms make it easy to embed extra data along with LLVM IR files.wrappernative object file

LLVM IR Encoding LLVM IR is encoded into a bitstream by defining blocks and records. It uses blocks for things like constant pools, functions, symbol tables, etc. It uses records for things like instructions, global variable descriptors, type descriptions, etc.

LLVM IR is defined with the following blocks: 8 — MODULE_BLOCK — This is the top-level block that contains the entire module, and describes a variety of per-module information.MODULE_BLOCK 9 — PARAMATTR_BLOCK — This enumerates the parameter attributes.PARAMATTR_BLOCK 10 — TYPE_BLOCK — This describes all of the types in the module.TYPE_BLOCK 11 — CONSTANTS_BLOCK — This describes constants for a module or function.CONSTANTS_BLOCK 12 — FUNCTION_BLOCK — This describes a function body.FUNCTION_BLOCK 13 — TYPE_SYMTAB_BLOCK — This describes the type symbol table.TYPE_SYMTAB_BLOCK 14 — VALUE_SYMTAB_BLOCK — This describes a value symbol table.VALUE_SYMTAB_BLOCK 15 — METADATA_BLOCK — This describes metadata items.METADATA_BLOCK 16 — METADATA_ATTACHMENT — This contains records associating metadata with function instruction values.METADATA_ATTACHMENT

To put it visually IR better than assembly? Possibly

The stages: Frontend: parsing original language and spiting out LLVM Intermediate Representation (IR) code Optimizer: mangling one IR into optimized equivalent IR. This stage does all the usual optimizations like constant propagation, dead code removal and so on Backend: taking IR and producing machine code optimised for a specific CPU

IR is the heart of LLVM The crucial part is IR. It's a common language that sits between the high-level program and the low-level backend. IR is used to express high level concepts and is specific enough that any backend can produce a fast machine code.

Goals for LLVM IR Easy to produce, understand and define Language and Target Independent One IR for analysis and optimization Must be able to support aggressive IPO, loop opts, scalar opts High and low level optimization Optimize as early as possible

Hardware support Expectation?

Flowchart of Source code to LLVM IR

LLVM IR In memory compiler IR (intermediate representation) Human readable assembly language – LLVM IR (*.ll *.s) LLVM IR is SSA form (Single Static Assignment form) Each variable is assigned exactly once Use-def chains are explicit and each contains a single element

Global Variable & Array Representation

Function entry & Local Variables

Inner Most Loop

Lets try writing it... Let's consider a relatively straightforward function that takes three integer parameters and returns an arithmetic combination of them. This is nice and simple: And this is what we need to end up with:

Still trying... Here is what our basic main function will look like: The first segment is pretty simple: it creates an LLVM “module.” In LLVM, a module represents a single unit of code that is to be processed together. Here we’ve declared a makeLLVMModule() function to do the real work of creating the module. The second segment runs the LLVM module verifier on our newly created module. The verifier will print an error message if your LLVM module is malformed in any way. Finally, we instantiate an LLVM PassManager and run the PrintModulePass on our module.

Almost there... The first chunk of our module All this does is instantiate a module and gives it a name. This is our function: Pass in the name, return type and arg type of the function In our case it’s a 32 bit integer type We set our calling convention to a C calling convention.

Functions and blocks... let's also give names to the parameters This also isn’t strictly necessary (LLVM will generate names for them if you don’t specify them) The IR, being an abstract assembly language, represents control flow using jumps (we call them branches), both conditional and unconditional. The straight-line sequences of code between branches are called basic blocks, or just blocks. So we need to create these blocks :

Such blockage We create a new basic block by callings its constructor We need to tell it its name and the function to which it belongs We also create an IRBuilder object. This is a convenience for creating instructions and for appending them to the end of the block Instructions can be created through their constructors as well Interfaces for that are complicated, so using IRBuilder will make life simpler (doing this is ok, unless we need a lot more control)

And finally... Our mul_add function is composed of just three instructions: a multiply, an add, and a return. IRBuilder gives us a simple interface for constructing these instructions and appending them to the “entry” block. Each of the calls to IRBuilder returns a Value* that represents the value yielded by the instruction. You’ll also notice that, above, x, y, and z are also Value*'s, so it's clear that instructions operate on Value*'s. All hail IRBuilders? Apparently above command lines are helpful to compile and run code. (never tried this, so not sure)

Just another quick example :

What is IR? IR is a low-level programming language, pretty similar to assembly According to the AOSA book.

Conclusion Low Level IR SSA-Based Language-Independent Machine-Independent Allow libraries and portions written by different language And basically a better Assembly language Assembly LLVM

Thank you

References/Links 20Bitcode%20Introduction.pdf 20Bitcode%20Introduction.pdf extension-of-llvm-ir-file extension-of-llvm-ir-file

LLVM References LLVM official website LLVM IR

Unused references/links createcompilerllvm1/ createcompilerllvm1/