Computer Organization CS224

Slides:



Advertisements
Similar presentations
1 Lecture 3: MIPS Instruction Set Today’s topic:  More MIPS instructions  Procedure call/return Reminder: Assignment 1 is on the class web-page (due.
Advertisements

C OMPUTER O RGANIZATION AND D ESIGN The Hardware/Software Interface 5 th Edition Chapter 2 Instructions: Language of the Computer.
Lecture 5: MIPS Instruction Set
MIPS assembly. Review  Lat lecture, we learnt  addi,  and, andi, or, ori, xor, xori, nor,  beq, j, bne  An array is stored sequentially in the memory.
The University of Adelaide, School of Computer Science
Lecture 20: 11/12/2002CS170 Fall CS170 Computer Organization and Architecture I Ayman Abdel-Hamid Department of Computer Science Old Dominion University.
Lecture 9: MIPS Instruction Set
Morgan Kaufmann Publishers Instructions: Language of the Computer
Recursion in MIPS Computer Organization I 1 August 2009 © McQuain, Feng & Ribbens Leaf and Non-Leaf Procedures A leaf procedure is one that.
1 Procedure Calls, Linking & Launching Applications Lecture 15 Digital Design and Computer Architecture Harris & Harris Morgan Kaufmann / Elsevier, 2007.
Chapter 2 — Instructions: Language of the Computer — 1 Branching Far Away If branch target is too far to encode with 16-bit offset, assembler rewrites.
Solution 2nd Exam.
10/6: Lecture Topics Procedure call Calling conventions The stack
1 Lecture 4: Procedure Calls Today’s topics:  Procedure calls  Large constants  The compilation process Reminder: Assignment 1 is due on Thursday.
Lecture 6: MIPS Instruction Set Today’s topic –Control instructions –Procedure call/return 1.
Assembly Code Example Selection Sort.
The University of Adelaide, School of Computer Science
Computer Architecture CSCE 350
1 Nested Procedures Procedures that don't call others are called leaf procedures, procedures that call others are called nested procedures. Problems may.
CPS3340 COMPUTER ARCHITECTURE Fall Semester, /17/2013 Lecture 12: Procedures Instructor: Ashraf Yaseen DEPARTMENT OF MATH & COMPUTER SCIENCE CENTRAL.
CS1104 – Computer Organization PART 2: Computer Architecture Lecture 4 Assembly Language Programming 2.
The University of Adelaide, School of Computer Science
Apr. 12, 2000Systems Architecture I1 Systems Architecture I (CS ) Lecture 6: Branching and Procedures in MIPS* Jeremy R. Johnson Wed. Apr. 12, 2000.
The University of Adelaide, School of Computer Science
CS3350B Computer Architecture Winter 2015 Lecture 4
Lecture 8: MIPS Instruction Set
MIPS Coding. Exercise – the bubble sort 5/8/2015week04-3.ppt2.
CSCE 212 Quiz 2 – 2/2/11 1.What is the purpose of the jal instruction? 2.What are the two conditional branching (if, goto; not the slt instruction) instructions.
1 Lecture 5: MIPS Examples Today’s topics:  the compilation process  full example – sort in C Reminder: 2 nd assignment will be posted later today.
Lecture 7: MIPS Instruction Set Today’s topic –Procedure call/return –Large constants Reminders –Homework #2 posted, due 9/17/
CS224 Fall 2011 Chapter 2c Computer Organization CS224 Fall 2011 Chapter 2 c With thanks to M.J. Irwin, D. Patterson, and J. Hennessy for some lecture.
CHAPTER 2 ISA Instructions (logical + procedure call)
MIPS function continued. Recursive functions So far, we have seen how to write – A simple function – A simple function that have to use the stack to save.
Adapted from Computer Organization and Design, Patterson & Hennessy, UCB ECE232: Hardware Organization and Design Part 7: MIPS Instructions III
Lecture 19: 11/7/2002CS170 Fall CS170 Computer Organization and Architecture I Ayman Abdel-Hamid Department of Computer Science Old Dominion University.
Procedure (Method) Calls Ellen Spertus MCS 111 September 25, 2003.
Runtime Stack Computer Organization I 1 November 2009 © McQuain, Feng & Ribbens MIPS Memory Organization In addition to memory for static.
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
Lecture 4: MIPS Instruction Set
Chapter 2 — Instructions: Language of the Computer — 1 Branch Addressing Branch instructions specify – Opcode, two registers, target address Most branch.
Computer Architecture CSE 3322 Lecture 4 Assignment: 2.4.1, 2.4.4, 2.6.1, , Due 2/10/09
Chapter 2 — Instructions: Language of the Computer — 1 Conditional Operations Branch to a labeled instruction if a condition is true – Otherwise, continue.
1 Lecture 6: Assembly Programs Today’s topics:  Large constants  The compilation process  A full example  Intro to the MARS simulator.
Pointers and Arrays in C and Assembly Language. Arrays Provide useful abstraction Match up to machine memory organization Array index computation –time.
DR. SIMING LIU SPRING 2016 COMPUTER SCIENCE AND ENGINEERING UNIVERSITY OF NEVADA, RENO Session 12 Procedure Calling.
DR. SIMING LIU SPRING 2016 COMPUTER SCIENCE AND ENGINEERING UNIVERSITY OF NEVADA, RENO Session 14, 15 Addressing Mode.
CPEG323 Homework Review I Long Chen October, 17 th, 2005.
Computer Architecture
Lecture 5: Procedure Calls
Lecture 6: Assembly Programs
CS2100 Computer Organisation
The University of Adelaide, School of Computer Science
Lecture 4: MIPS Instruction Set
Procedures (Functions)
Procedures (Functions)
The University of Adelaide, School of Computer Science
CSCI206 - Computer Organization & Programming
Instructions - Type and Format
MIPS Procedures.
Addressing in Jumps jump j Label go to Label op address 2 address
Solutions Chapter 2.
The University of Adelaide, School of Computer Science
Lecture 7: Examples, MARS, Arithmetic
Lecture 5: Procedure Calls
Logical and Decision Operations
Chapter 2 Instructions: Language of the Computer part 2
Computer Organization and Design Assembly & Compilation
MIPS Coding.
Lecture 6: Assembly Programs
Systems Architecture I
Presentation transcript:

Computer Organization CS224 Fall 2012 Lessons 13 and 14

C Sort Example Illustrates use of assembly instructions for a C bubble sort function Swap procedure (leaf) void swap(int v[], int k) { int temp; temp = v[k]; v[k] = v[k+1]; v[k+1] = temp; } address of v[0] in $a0, k in $a1, temp in $t0 §2.13 A C Sort Example to Put It All Together

The Procedure Swap swap: sll $t1, $a1, 2 # $t1 = k * 4 add $t1, $a0, $t1 # $t1 = v+(k*4) # (address of v[k]) lw $t0, 0($t1) # $t0 (temp) = v[k] lw $t2, 4($t1) # $t2 = v[k+1] sw $t2, 0($t1) # v[k] = $t2 (v[k+1]) sw $t0, 4($t1) # v[k+1] = $t0 (temp) jr $ra # return to calling routine

The Sort Procedure in C Non-leaf (calls swap) void sort (int v[], int n) { int i, j; for (i = 0; i < n; i += 1) { for (j = i – 1; j >= 0 && v[j] > v[j + 1]; j -= 1) { swap(v,j); } address of v[0] in $a0, n in $a1, i in $s0, j in $s1 why is using $s0 & $s1 (for i & j) a good idea?

The Procedure Body move $s2, $a0 # save $a0 into $s2 move $s0, $zero # i = 0 for1tst: slt $t0, $s0, $s3 # $t0 = 0 if $s0 ≥ $s3 (i ≥ n) beq $t0, $zero, exit1 # go to exit1 if $s0 ≥ $s3 (i ≥ n) addi $s1, $s0, –1 # j = i – 1 for2tst: slti $t0, $s1, 0 # $t0 = 1 if $s1 < 0 (j < 0) bne $t0, $zero, exit2 # go to exit2 if $s1 < 0 (j < 0) sll $t1, $s1, 2 # $t1 = j * 4 add $t2, $s2, $t1 # $t2 = v + (j * 4) lw $t3, 0($t2) # $t3 = v[j] lw $t4, 4($t2) # $t4 = v[j + 1] slt $t0, $t4, $t3 # $t0 = 0 if $t4 ≥ $t3 beq $t0, $zero, exit2 # go to exit2 if $t4 ≥ $t3 move $a0, $s2 # 1st param of swap is v (old $a0) move $a1, $s1 # 2nd param of swap is j jal swap # call swap procedure addi $s1, $s1, –1 # j –= 1 j for2tst # jump to test of inner loop exit2: addi $s0, $s0, 1 # i += 1 j for1tst # jump to test of outer loop Move params Outer loop Inner loop Pass params & call Inner loop Outer loop

The Full Procedure sort: addi $sp,$sp, –20 # make room on stack for 5 registers sw $ra, 16($sp) # save $ra on stack sw $s3,12($sp) # save $s3 on stack sw $s2, 8($sp) # save $s2 on stack sw $s1, 4($sp) # save $s1 on stack sw $s0, 0($sp) # save $s0 on stack … # procedure body … exit1: lw $s0, 0($sp) # restore $s0 from stack lw $s1, 4($sp) # restore $s1 from stack lw $s2, 8($sp) # restore $s2 from stack lw $s3,12($sp) # restore $s3 from stack lw $ra,16($sp) # restore $ra from stack addi $sp,$sp, 20 # restore stack pointer jr $ra # return to calling routine

Compiler Benefits Comparing performance for bubble (exchange) sort To sort 100,000 words with the array initialized to random values on a Pentium 4 with a 3.06 GHz clock rate, a 533 MHz system bus, with 2 GB of DDR SDRAM, using Linux version 2.4.20 gcc optimization Relative performance Clk cycles (106) Instr count (106) CPI None 1.00 158,615 114,938 1.38 O1 (medium) 2.37 66,990 37,470 1.79 O2 (full) 2.38 66,521 39,993 1.66 O3 (proc mig) 2.41 65,747 44,993 1.46 The unoptimized code has the best CPI, the O1 version has the lowest instruction count, but the O3 version is the fastest. Which one do we go with?

Effect of Compiler Optimization Compiled with gcc for Pentium 4 under Linux

Sorting in C versus Java Comparing performance for two sort algorithms in C and Java (BubbleSort vs. Quicksort) The JVM/JIT is Sun/Hotspot version 1.3.1/1.3.1 Method Opt Bubble Quick Speedup Quick vs. Bubble Relative performance C Compiler None 1.00 2468 O1 2.37 1.50 1562 O2 2.38 1555 O3 2.41 1.91 1955 Java Interpreted 0.12 0.05 1050 JIT compiler (runtime = comp+exe) 2.13 0.29 338 Unoptimized C prgram is 8.3 times faster than the interpreted Java code for bubble. Using JIT makes Java 2.1 times faster than the unoptimized C and within a factor of 1.13 of the highest optimized C code. Ratio’s aren’t as close for quicksort, presumably because it is harder to amortize the cost of runtime compilation over the shorter execution time. The last column demonstrates the impact of a better algorithm Observations?

Effect of Language and Algorithm

Lessons Learned Instruction count and CPI are not good performance indicators in isolation Compiler optimizations are sensitive to the algorithm Java/JIT compiled code is significantly faster than JVM interpreted Comparable to optimized C in some cases Nothing can fix a dumb algorithm!

Arrays vs. Pointers Array indexing involves Multiplying index by element size Adding to array base address Indexing code is easy(er) to understand Pointers correspond directly to memory addresses Can avoid indexing complexity Pointer code executes faster Often difficult to understand (and to write correctly) §2.14 Arrays versus Pointers

Example: Clearing an Array clear1(int array[], int size) { int i; for (i = 0; i < size; i += 1) array[i] = 0; } clear2(int *array, int size) { int *p; for (p = &array[0]; p < &array[size]; p = p + 1) *p = 0; move $t0,$zero # i = 0 loop1: sll $t1,$t0,2 # $t1 = i * 4 add $t2,$a0,$t1 # $t2 = # &array[i] sw $zero, 0($t2) # array[i] = 0 addi $t0,$t0,1 # i = i + 1 slt $t3,$t0,$a1 # $t3 = # (i < size) bne $t3,$zero,loop1 # if (…) # goto loop1 move $t0,$a0 # p = & array[0] sll $t1,$a1,2 # $t1 = size * 4 add $t2,$a0,$t1 # $t2 = # &array[size] loop2: sw $zero,0($t0) # Memory[p] = 0 addi $t0,$t0,4 # p = p + 4 slt $t3,$t0,$t2 # $t3 = #(p<&array[size]) bne $t3,$zero,loop2 # if (…) # goto loop2 (  ) increment i, shift i, add i to form address vs. increment address (  )

Comparison of Array vs. Pointer Versions Multiply “strength reduced” to shift Both versions use sll instead of mul Array version requires shift to be inside loop Part of index calculation for incremented i c.f. incrementing pointer Compiler can achieve same effect as manual use of pointers Induction variable elimination, etc Better to make source program clearer and safer Write in indexing-style HLL code, but get pointer-style code from the compiler Optimizing compilers do these things, and many more! See Sec. 2.15 on CD-ROM