High Performance Embedded Computing © 2007 Elsevier Lecture 10: Code Generation Embedded Computing Systems Michael Schulte Based on slides and textbook.

© 2006 Elsevier Embedded vs. general-purpose compilers General-purpose compilers must generate code for a wide range of programs:  No real-time requirements.  Often no explicit low-power requirements.  Generally want fast compilation times. Embedded compilers must meet real-time, low-power requirements.  May be willing to wait longer for compilation results.

© 2006 Elsevier Code generation steps Instruction selection chooses opcodes, modes. Register allocation binds values to registers.  Many DSPs and ASIPs have irregular register sets. Address generation selects addressing mode, registers, etc. Instruction scheduling is important for pipelining and parallelism.

© 2006 Elsevier twig instruction models Rewriting rule:  replacement<- template {cost} = action Dynamic programming can be used to cover the program with instructions for tree-structured instructions.  Heuristics are needed for more general instructions

© 2006 Elsevier ASIP instruction description Designing code generators for general purpose machines:  Only need to describe how instructions modify the programmer-visible registers Designing code generators for Application-Specific Instruction Processors (ASIPs)  May need to describe the complete behavior of the instruction in the pipeline  Most ASIPs do not have general purpose registers and many important instructions use specialized registers. Why?

© 2006 Elsevier Register allocation and lifetimes Two variables can be assigned to the same register if they are not live at the same time:  The last use of one variable is before the first use of the other

© 2006 Elsevier Conflict graphs and clique covering In conflict graphs, edges connect nodes (variables) that have disjoint lifetimes  Can be assigned to same register Clique: every pair of vertices is connected by an edge. Cliques in graph correspond to registers. Cliques should be maximal. Each node should belong to exactly one clique.

© 2006 Elsevier Instruction selection and scheduling Instruction selection is more challenging when processors have limited or irregular resources (e.g., for DSPs) When resources are limited instruction selection and scheduling often interact. FlexWare System  Includes a code generation system, called CodeSyn, for ASIPs and DSPs with irregular register files.  Has intermediate representation (IR) for programs control and dataflow (see next slide)  Target instructions use the same basic format as IR, but include information regarding how registers can communicate  Covers the program graph using dynamic programming for data flow and heuristics for control flow.

© 2006 Elsevier Register connectivity and classification [Lie94] © 1994 IEEE A separate representation indicates which registers can be used by which types of operations This information has to be taken into account when performing instruction selection and scheduling, and with register allocation.

© 2006 Elsevier Code placement Place code to minimize cache conflicts. Possible cache conflicts may be determined using addresses;  Interesting conflicts are determined through analysis. May require blank areas in program.

© 2006 Elsevier Hwu and Chang code placement Analyzed traces to find relative execution times of code sections. Inline expanded in frequently used subroutines.  Eliminates function call overhead Placed frequently-used traces using greedy algorithm.  Most frequently used programs are assigned to blocks with least conflicts

© 2006 Elsevier McFarling code placement Analyzed program structure, trace information. Annotated program with loop execution count, basic block size, procedure call frequency. Walked through program to propagate labels, group code based on labels, place code groups to minimize interference.

© 2006 Elsevier McFarling procedure inlining Estimated number of cache misses in a loop:  s l = effective loop body size.  s b = basic block size.  f = average execution frequency of block.  M l = number of misses per loop instance.  l = average number of loop iterations.  S = cache size. Estimated new cache miss rate for inlining; used greedy algorithm to select functions to inline.

© 2006 Elsevier Pettis and Hansen Profiled programs using gprof. Put caller and callee close together in the program, increasing the chance they would be on the same page.  Ordered procedures using call graph, weighted by number of invocations, merging highly-weighted edges. Optimized if-then-else code to take advantage of the processor’s branch prediction mechanism.  If branches are predicted taken then code restructured so that the more frequent path is predicted to be taken Identified basic blocks that were not executed by given input data (fluff blocks); moved to separate processes to improve memory system behavior.

High Performance Embedded Computing © 2007 Elsevier Lecture 10: Code Generation Embedded Computing Systems Michael Schulte Based on slides and textbook.

Similar presentations

Presentation on theme: "High Performance Embedded Computing © 2007 Elsevier Lecture 10: Code Generation Embedded Computing Systems Michael Schulte Based on slides and textbook."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

High Performance Embedded Computing © 2007 Elsevier Lecture 10: Code Generation Embedded Computing Systems Michael Schulte Based on slides and textbook.

Similar presentations

Presentation on theme: "High Performance Embedded Computing © 2007 Elsevier Lecture 10: Code Generation Embedded Computing Systems Michael Schulte Based on slides and textbook."— Presentation transcript:

Similar presentations

About project

Feedback