CSE 360: Introduction to Computer Systems

CSE 360: Introduction to Computer Systems
Course Notes Rick Parent Wayne Heym Copyright © by Rick Parent, Todd Whittaker, Bettina Bair, Pete Ware, Wayne Heym CSE360

Information Representation 1
Positional Number Systems: position of character in string indicates a power of the base (radix). Common bases: 2, 8, 10, 16. (What base are we using to express the names of these bases?) Base ten (decimal): digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 form the alphabet of the decimal system. E.g., = Base eight (octal): digits 0, 1, 2, 3, 4, 5, 6, 7 form the alphabet. E.g., 4748 = Begin with this example: 31610 = 3 × × × 100 = = 31610 For the octal example, ask the following questions: What do I write first? Second? Third? How do I know what power to write? 4748 = 4 × × × 80 = = 31610 When we did this arithmetic, what base did we use? CSE360

Base 16 (hexadecimal): digits 0-9 and A-F. E.g., 13C16 = Base 2 (binary): digits (called “bits”) 0, 1 form the alphabet. E.g., = In general, radix r representations use the first r chars in {0…9, A...Z} and have the form dn-1dn-2…d1d0. Summing dn-1´rn-1 + dn-2´rn-2 + … + d0´r0 will convert to base 10. Why to base 10? In the first example, how do I know to use 12? 13C16 = 1 × × × 160 = = In the second example, what power do I use? What do I do for zeros? = 1 × × × 21 = = 3810 look at tables 1.4 and 1.5 (pg 10) dec oct hex bin 10 12 A 1010 11 13 B 1011 12 14 C 1100 13 15 D 1101 14 16 E 1110 15 17 F 1111 CSE360

Base Conversions Convert to base 10 by multiplication of powers E.g., = ( )10 Convert from base 10 by repeated division E.g., = ( )8 Converting base x to base y: convert base x to base 10 then convert base 10 to base y 1 × × × 50 = = 632 632 ¸ 8 = 79 r 0 79 ¸ 8 = 9 r 7 9 ¸ 8 = 1 r 1 1 ¸ 8 = 0 r 1 63210 = 11708 1310=? 2 =? 10 13/2=6 r x 20=1 6/2 = 3 r 0 0 x 21=0 3/2 = 1 r 1 1 x 22=4 1/2 = 0 r 1 1 x 23=8 ====== 13 10 4145=? (87 16) 4 x x x 52=10910 109 ¸ 16 = 8 r 7 8 ¸ 16 = 0 r 8 CSE360

Special case: converting among binary, octal, and hexadecimal is easier Go through the binary representation, grouping in sets of 3 or 4. E.g., = = 3318 = = D916 E.g., C3B16 = ( )8 C3B16 = = = 60738 What four bits do I write for C? 3? B? At which end do I start for grouping in threes? What do I write for each group of three? Notice that = 810 = 2310 = 108 = 1610 = 2410 = 1016 Ask = ? 8 (= 1008) = ? 16 (= 10016) An assembler is an instruction translation program. We will be using an assembler for SPARC machines. It automatically converts notations in three bases (8, 10, and 16) to the machine’s internal, binary representation. It can produce a listing that shows the results of the translation in hexadecimal notation. As assembler input, a token that begins with a character between ‘1’ and ‘9’ is a decimally represented number. A prefix of ‘0x’ indicates hexadecimal notation for a number, and a prefix of ‘0’ without the ‘x’ indicates octal notation. C 3 B CSE360

What is special about binary? The basic component of a computer system is a transistor (transfer resistor): a two state device which switches between logical “1” and “0” (actually represented as voltages on the range 5V to 0V). Octal and hexadecimal are bases in powers of 2, and are used as a shorthand way of writing binary. A hexadecimal digit represents 4 bits, half of a byte. 1 byte = 8 bits. A bit is a binary digit. Get comfortable converting among decimal, binary, octal, hexadecimal. Converting from decimal to hexadecimal (or binary) is easier going through octal. Three other strategies for converting “big” numbers to binary (other than octal): Repeated division by 2. This is tedious and prone to error Repeated division by 16, then convert hex digits to groups of 4 binary digits. Fewer divisions, but probably requires a calculator. Add up the powers of 2. For example, the largest power of two that is still less than 320, is 256 (28). Then =64 (26). Put a one in the 28 and 26 columns to make +320, simple binary. Review Qs count to 20 base 5 convert 0F3 to decimal convert 239 to binary (try going through octal to make it easier) convert to hex convert 7A48 to binary ANSWERS 0,1,2,3,4,10,11,12,13,14,20 243 FD Going through octal: 3578 = 239 ÷ 8 = 29 r 7 29 ÷ 8 = r 5 3 ÷ 8 = r 3 CSE360

Binary Hex Decimal 0000 1000 8 0001 1 1001 9 0010 2 1010 A 10 0011 3 1011 B 11 0100 4 1100 C 12 0101 5 1101 D 13 0110 6 1110 E 14 0111 7 1111 F 15 This is an extension of Table 1.5. Now would be a good time for The Secret Message exercise. Why is it important? Machines use binary. (See Information Representation 5 (slide 6) for what’s special about binary.) However, humans (even engineers!) find binary to be very cumbersome. So, when they must deal with binary encodings, they often collect the bits into groups of four, and represent each group of four bits with a hexadecimal digit. The human has now reduced the difficulty by a factor of four. On the other hand, it is hard to multiply and divide by sixteen. It is much easier to multiply and divide by eight. Therefore, the octal system is useful for hand-calculated arithmetic (when a calculator is unavailable or inconvenient), and reduces the difficulty of binary by a factor of three. A remaining advantage for hexadecimal notation is that 4 bits is an integer power of 2; furthermore, 4 bits is exactly half a byte. We can make neither of these claims for a group of 3 bits. CSE360

Ranges of values Q: Given k positions in base n, how many values can you represent? A: nk values over the range (0…nk-1)10 n=10, k=3: 103=1000 range is (0…999)10 n=2, k=8: 28=256 range is (0…255)10 n=16, k=4: 164=65536 range is (0…65535)10 Q: How are negative numbers represented? What is in base 2? What is in base 16? How are negative numbers represented? Using pen or pencil, we just write a minus sign (which looks like a hyphen) in front of the representation of the magnitude, e.g., -FFFF or But how will we represent negative numbers inside a computer? We could do the analogous thing with the minus sign, yes, but it turns out that we have additional options. CSE360

Integer representation: Value and representation are distinct. E.g., 12 may be represented as XII, C16, 1210, and Note: -12 may be represented as -C16, -1210, and Simple and efficient use of hardware implies using a specific number of bits, e.g., a 32-bit string, in a binary encoding. Such an encoding is “fixed width.” Four methods: (fixed-width) simple binary, signed magnitude, binary coded decimal, and 2’s complement. Simple binary: as seen before, all numbers are assumed to be positive, e.g., 8-bit representation of 6610 = and = Very likely the Romans had no short-hand for representing negative values. They would likely say, in Latin, things like “I owe you” or “you owe me”. Until now, we used as many bits as were necessary (and only that many bits) in our binary representations of numbers. Let’s call this method of representing numbers “general simple binary.” You may want to use the base-independent definition of radix notation (found on slide 3) now to define the specific general simple binary notation. From now on, many of our encoding schemes will be “fixed width”; i.e., we’ll need to specify how many bits are in the encoding. CSE360

Signed magnitude: simple binary with leading sign bit. 0 = positive, 1 = negative. E.g., 8-bit signed mag.: 6610 = -6610 = What ranges of numbers may be expressed in 8 bits? Largest: Smallest: Extend to 12 bits: if k=7, n=2, 27=128 range: {0… 27-1} Largest = 7F = 127 smallest = -7F = -127 Insert 0s immediately to the right of the sign bit to increase bits: ** these are encodings! ** CSE360

Problems: (1) Compare the signed magnitude numbers and (2) Must have “subtraction” hardware in addition to “addition” hardware. Binary Coded Decimal (BCD): use a 4 bit pattern to express each digit of a base 10 number 0000 = = = = 3 0100 = = = = 7 1000 = = = = - E.g., 123 : +123 : -123 : In 8-bit signed magnitude representation: Is negative zero? Q: Why must we have “subtraction” hardware? A: -110 = + 210 = ==== ========= 110  BCD is not intended as a solution to these problems; however, it is easy to convert to decimal if you’re human. Leading zeros (shown in blue on the slide) are optional. BCD is often not a fixed-width encoding scheme. Q: Does use of BCD imply any sort of waste? A: Depending how you count, 4, 5, or 6 patterns are unused in BCD. CSE360

BCD Disadvantages: Takes more memory. 32 bit simple binary can represent more than 4 billion discrete values. 32 bit BCD can hold a sign and 7 digits (or 8 digits for unsigned values) for a maximum of 110 million values, a 97% reduction. More difficult to do arithmetic. Essentially, we must force the Base 2 computer to do Base 10 arithmetic. BCD Advantages: Used in business machines and languages, i.e., in COBOL for precise decimal math. Can have arrays of BCD numbers for essentially arbitrary precision arithmetic. The supposed BCD advantages arose from the failure to introduce an appropriate layer of abstraction between people and the machines. Regarding the greater difficulty of doing arithmetic, consider: Binary Addition 001 (1) +010 (2) ====== 011 (3) then, does it follow, BCD === ============ (eleven?) CSE360

CODE 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 Simple 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Signed +0 1 2 3 4 5 6 7 -0 -1 -2 -3 -4 -5 -6 -7 2’s comp 1 2 3 4 5 6 7 -8 -7 -6 -5 -4 -3 -2 -1 Two’s Complement Used by most machines and languages to represent integers. Fixes the -0 in the signed magnitude, and simplifies machine hardware arithmetic. Divides bit patterns into a positive half and a negative half (with zero considered positive); n bits creates a range of [-2n-1… 2n-1 -1]. Quick review: Convert 103 to Base 2 [ ] Base 8 [147] Base 16 [67] BCD [0001,0000,0011] Convert –13 to 5 bit signed mag [11101] In general, Radix Complement. Except for the break from 7 to -8, adding one to the CODE corresponds to adding one to the represented 2’s complement number. In contrast, this is only true for the positive signed magnitude numbers; the correspondence is to subtracting one from the negative signed magnitude numbers. CSE360

Representation in 2’s complement; i.e., represent i in n-bit 2’s complement, where -2 n-1  i  +2 n-1-1 Nonnegative numbers: same as simple binary Negative numbers: Obtain the n-bit simple binary equivalent of | i | Obtain its negation as follows: Invert the bits of that representation Add 1 to the result Ex.: convert to 16-bit 2’s complement Ex.: extend the 12-bit 2’s complement number to 16 bits. 32010 = inversion yields: = Obtain complement of to get back to the representation of 32010: = ? invert: = Why does this work? Because rep(-x) = [(2^n – 1) – rep(x)] + 1, and [(2^n – 1) – rep(x)] can be performed as bit inversion. Note that if 0 < = x < 2^(n-1), then rep(x) = bin(x), and, if –2^(n-1) <= x < 0, then rep(x) = bin(2^n - |x|). CSE360

Binary Arithmetic Addition and subtraction only for now Rules: similar to standard addition and subtraction, but only working with 0 and 1. 0 + 0 = = 0 1 + 0 = = 1 0 + 1 = = 0 1 + 1 = = 1 Must be aware of possible overflow. Ex.: 8-bit signed magnitude = Ex.: 8-bit signed magnitude = [86] [99] [-57?] .. [Max value in 8-bit signed mag. is 127.] #1. Positive plus positive yielded a negative. Suggests something fishy. #2. If this were 2’s complement arithmetic, the ALU would check to see if the carry-in to last column (in this case 1) and carry out from last column (0) are equal. In this case, they’re not. So the ALU would report an overflow error. Because this is signed magnitude arithmetic, the ALU merely checks to see if the carry-in to last column is 0; if not, it reports an overflow error. [99] [We run out of columns to borrow bits from before we run out of non-zero values to subtract.] Binary subtraction is just ugly (borrowing bits, ugh), and trying to do it in signed magnitude is worse because the MSB isn’t actually a part of the value. Complicated logic like this leads to expensive circuitry. What about unsigned arithmetic? Carry out bit indicates overflow. CSE360

2’s Complement binary arithmetic Addition and subtraction are the same operation Still must be aware of overflow. Ex.: 8 bit 2’s complement: = Ex.: 8 bit 2’s complement: = Ex.: 8 bit 2’s complement: = Show carries Two positive numbers added produce a positive number. No prob. = 6810 0=carry-in = carry-out=0 Two positive numbers added produce a negative number. Oflow. = should have been 145, but since >127 (2n-1) 1=carry-in  carry-out=0 Positive and negative number produces correct answer. = -2210 Fourth example, , shows 1=carry-in = carry-out=1 Fifth example, -100 – 45, shows 0=carry-in  carry-out=1 CSE360

2’s Complement overflow Opposite signs on operands can’t overflow If operand signs are same, but result’s sign is different, must have overflow Can two positives sum to positive and still have overflow? Can two negatives? More practice: 1001 = 9?! Explain Largest 4 bit positive: = 1110, overflow, and negative Smallest 4 bit negative: = 10000, overflow and positive CSE360

Characters and Strings EBCDIC, Extended Binary Coded Decimal Interchange Code Used by IBM in mainframes (360 architecture and descendants). Earliest system ASCII, American Standard Code for Information Interchange. Most common system Unicode, New international standard Variable length encoding scheme with either 8- or 16-bit minimum “a unique number for every character, no matter what the platform, no matter what the program, no matter what the language.” EBCDIC - 8 bit ASCII - 7 bit, 128 characters EBCDIC - Extended Binary Coded Decimal Interchange Code Unicode - uses a default 16-bit encoding, but includes extensions that permit over 900,000 additional characters. It offers two Unicode Transformation Formats, UTF-8 and UTF-16, variable length encoding schemes with, respectively, 8- and 16-bit minimum length codes. UTF-8 is compatible with ASCII. CSE360

ASCII see table 1.7 on pg. 18. In Unix, run “man ascii”. 7 bit code Printable characters for human interactions Control characters for non-human communication (computer-computer, computer-peripheral, etc.) 8-bit code: most significant bit may be set Extended ASCII (IBM), includes graphical symbols and lines ISO 8859, several international standards Unicode’s UTF-8, variable length code with 8-bit minimum What does the “bit is set” mean? CSE360

ASCII Easy to decode But takes up a predictable amount of space Upper and lower case characters are 0x20 (3210) apart ASCII representation of ‘3’ is not the same as the binary representation of 3. To convert ASCII to binary (an integer), ‘3’-‘0’ = 3 Line feed (LF) character = 0x0a = 1010 ‘\n’ = 0xa CSE360

String: definition is programming language dependent. C, C++: strings are arrays of characters terminated by a null byte. Decode: , , , , , , , , , , , , , Or (in hex): How many bytes is this? What’s the use of the ’00’ byte at the end? What do you suppose a null byte is? CSE360

Simple data compression ASCII codes are fixed length. Huffman codes are variable length and based on statistics of the data to be transmitted. Assign the shortest encoding to the most common character. In English, the letter ‘e’ is the most common. Either establish a Huffman code for an entire class of messages, Or create a new Huffman code for each message, sending/storing both the coding scheme and the message. “a widely used and very effective technique for compressing data; savings of 20% to 90% are typical, depending on the characteristics of the file being compressed.” (Cormen, p. 337) The quote is from: Cormen, Thomas H., Charles E. Leiserson, and Ronald L. Rivest, Introduction to Algorithms. Cambridge, Massachusetts: MIT Press, 1990, p. 337. CSE360

ECL - Expected Code Length
Char Fixed len encoding Freq Var len encoding # bits Expected # bits  00 .5 1  01 .25 2  10 .15 001 3 .45  11 .10 000 .3 Avg len 1.75 Frequency is given Var Len Encoding is derived from frequency CSE360

Huffman Tree for “a man a plan a canal panama” Examine data set and determine frequencies of letters (example ignores spaces, normally significant) Create a forest of single node trees. Choose the two trees having the smallest total frequencies (the two “smallest” trees), and merge them together (lesser frequency as the left subtree, for definiteness, to make grading easier). Continue merging until only one tree remains. Count 10 ‘a’s in a string of length 21 = freq of .476; two smallest trees combine c l combine the next two smallest trees a n c l m p continue CSE360 .47619 .47619 .3333

Reading a ‘1’ calls for following the left branch. Reading a ‘0’ calls for following the right branch. Decoding using the tree: To decode ‘0001’, start at root and follow r_child, r_child, r_child, l_child, revealing encoded ‘m’. Huffman Tree for "a man a plan a canal panama" 'a' .4762 'n' .1905 'c' .0476 'l' .0952 .1428 'm' 'p' .3333 .5238 1.0 Next step Last step .47619 .5238 .3333 CSE360 1.000 .47619 .5238 .3333

Comparison of Huffman and 3-bit code example 3-bit: = 63 bits Huffman: = 46 bits Savings of 17 bits, or 27% of original message We’re not encoding the spaces between words, but spaces are displayed in the examples to make them easier for us to read. A decode tree represents a prefix code because none of the internal nodes is labeled with a character. Calc the ECL for: ABE DEFACED A FADED BED freq code ecl A 4/ /19 B 2/ /19 C 1/ /19 D 5/ /19 E 5/ /19 F 2/ /19 ecl = 2.42 Use the same encodings for A BEADED ACE CAB CSE360 B C F A D E 9/19 10/19 5/19 3/19

Parity: Simple error detection
Data transmission, aging media, static interference, dust on media, etc. demand the ability to detect errors. Single bit errors detected by using parity checking. Parity, here, is the “the state of being odd or even.” CSE360

How to detect a 1-bit error: Ex.: send ASCII ‘S’: send , but receive ? Add a 1-bit parity to make an odd or even number of bits per byte. Parity bit is stripped by hardware after checking. Sender/receiver both agree to odd or even parity. 2 flipped bits in the same encoding are not detected. is ASCII encoding for ‘R’. Note that the value of the parity bit is different for ‘E’ than for ‘S’. Pursue the ‘S’ example. Suppose the parity bit is the one that gets flipped by the transmission? ‘S’ with odd parity oops! Parity bit flipped! Can we detect error? Yes, because there are now an even # of 1s Pursue an example with the flipping of two bits. last two bits have flipped. Can’t detect the error CSE360

Two meanings for Hamming distance. 2nd is generalization of 1st. 1st is: distance between two encodings of the same length. A count of the number of bits different in encoding 1 vs. encoding 2. E.g., dist(1100, 1001) = dist(0101, 1101) = Generalize to an entire code by taking the minimum over all distinct pairs (2nd meaning). The ASCII encoding scheme has a Hamming distance of 1. A simple parity encoding scheme has a Hamming distance of 2. Hamming distance serves as a measure of the robustness of error checking (as a measure of the redundancy of the encoding). dist(1100,1001)=2 dist(0101,1101)=1 Ecl- expected code length fixed len freq var len len = (given or (built) derrived)     avg fixed len= ecl = 1.75 See page 21 for English frequency table Hamming distance gives the number of bits that have to be changed to reach some other valid encoding (e.g., 2 for ASCII with single parity). CSE360

ISEM FAQ 1 Editing, Assembling, Linking, and Loading
There are three components to the Instructional SPARC Emulator (ISEM) package that we use for this class: the assembler, the linker, and the emulator/debugger. The assembler translates source modules into object modules; source modules are in assembly language, and object modules are in machine language. The linker links object modules together to create a single executable program; an executable program is in machine language. We use the emulator/debugger to observe the execution of the program in its intermediate steps. CSE360

ISEM FAQ 2 Editing There are a number of programs that you can use to create your source files. Emacs is probably the most popular; vi is also available, but its command syntax is difficult to learn and use; using pine program, you can use the pico editor, which combines many features of Emacs into a simple menu-driven facility. Start Emacs by “xemacs sourcefile.s &”, which creates the file called sourcefile.s. Use the tutorial, accessed by typing "Ctrl-H Ctrl-H t". For other editors, you are on your own. In addition you will use an editor to create your program source files. The assembler and linker are part of the GNU Binutils package that is distributed as free software on the internet, and is actually used by real-world programmers to assemble and link programs. The emulator/debugger is an interactive environment that permits you to run SPARC programs on any architecture machine (HP/UX, Linux, DOS/Windows, etc.). Thus, there are three programs that you will invoke repeatedly to create your programs. Unix’s dos2unix program can help you solve the minor problem that text files brought directly from Windows systems may show “^M” at the end of each line in emacs. This problem does not bother the new assembler. Lines in DOS text files are terminated with 0xD, 0xA, but lines in Unix text files are terminated with 0xA alone. CSE360

Example Sparc Assembly Language Instructions
% type xmp0.s .data ! Assembler directive: data starts here. A_m, B_m, and A_m: .word ’?’ ! C_m are symbolic constants. Furthermore, each B_m : .word 0x30 ! is an address of a certain-sized chunk of memory. Here, C_m : .word 0 ! each chunk is four bytes (one word) long. When the ! program gets loaded, each of these chunks stores a ! number in 2’s complement encoding, as follows: At ! address C_m, zero; at B_m, 48; at A_m, 0x3F = 077 = 63. .text ! Assembler directive, instructions start here start: ! Label (symbolic constant) for this address set A_m, %r2 ! Put address A_m into register 2 ld [%r2], %r2 ! Use r2 as an indirect address for a load (read) set B_m, %r3 ! Put address B_m into register 3 ld [%r3], %r3 ! Read from B_m and replace r3 w/ value at addr B_m sub %r2, %r3, %r2 ! Subtract r3 from r2, save in r2 set C_m, %r4 ! Put address C_m into register 4 st %r2, [%r4] ! Store (write) r2 to memory at address C_m terminate: ! Label for address where ’ta 0’ instruction stored ta ! Stop the program beyond_end: ! Label for address beyond the end of this program The chunks of memory at addresses A_m, B_m, and C_m are contiguous, and A_m is the least address. Zero can also be spelled to the assembler as 0x0, or a string of 0s of any length, as 00 or 000. How many bits are in the 2’s complement encodings of those three numbers? A symbolic constant can be used in an assembly language program anywhere a literal constant of equal value could have been used. With our current versions of the assembler and linker, typically A_m = 0x4000, B_m = 0x4004, and C_m = 0x4008. CSE360

ISEM FAQ 3 Assembling The assembler is called "isem-as", and is the GNU Assembler (GAS), configured to cross-assemble to a SPARC object format. It is used to take your source code, and produce object code that may be linked and run on the ISEM emulator. The syntax for invoking the assembler is: isem-as [-a[ls]] sourcefile.s -o objectfile.o The input is read from sourcefile.s, and the output is written to objectfile.o. The option "-a" tells the assembler to produce a listing file. The sub-options "l" and "s" tell the assembler to include the assembly source in the listing file and produce a symbol table, respectively. I like to point out the following alternative to the class. Redirecting output to a file makes the error messages more visible and saves the listing file for future viewing: isem-as –als sourcefile.s –o objectfile.o > ! listfile.lst In past tradition, the object (output) file has been listed before the source file: isem-as –als –o objectfile.o sourcefile.s CSE360

ISEM FAQ 4 The listing file
Will identify all the syntactic errors in your program, and it will warn you if it identifies "suspicious" behavior in your source file. Column 1 identifies a line number in your source file. Column 2 is an offset for where this instruction or data resides in memory. Column 3 is the image of what is put in memory, either the machine instructions or the representation of the data. The final column is the source code that produced the line. At the bottom of the file you will find the symbol table. Again, the symbols are represented as offsets that are relocated when the program is loaded into memory. CSE360

isem-as -als labn.s -o labn.o >! labn.lst
data F A_m: .word ’?’ B_m: .word 0x30 C_m: .word 0 5 000c text start: set A_m, %r2 A000 C ld [%r2], %r2 9 000c set B_m, %r3 E000 C600C ld [%r3], %r3 sub %r2, %r3, %r2 12 001c set C_m, %r4 C st %r2, [%r4] terminate: D ta 0 16 002c beyond_end: DEFINED SYMBOLS xmp0.s: data: A_m xmp0.s: data: B_m xmp0.s: data: C_m xmp0.s: text: start xmp0.s: text: terminate xmp0.s: text: c beyond_end NO UNDEFINED SYMBOLS Line in source file (.s) Offset to address in memory The command isem-as -als xmp0.s -o xmp0.o >! xmp0.lst produced this listing in file xmp0.lst Trouble is very likely if there are any undefined symbols. Avoid using labels beginning with a capital ‘L’ because the assembler will treat these labels as “Local symbols”, will not list them in the symbol table, and will not export them to the debugger and the emulator. Other than that, however, the assembler does assemble using those labels correctly. One can force the assembler to treat symbols beginning with a capital ‘L’ just like any other symbols by giving these symbol names as arguments to the .global assembler directive. Contents at address in memory Labels are symbolic offsets CSE360

ISEM FAQ 5 Linking Linking turns a set of raw object file(s) into an executable program. From the manual page, "ld combines a number of object and archive files, relocates their data and ties up symbol references. Often the last step in building a new compiled program to run is a call to ld." Several object files are combined into one executable using ld; the separate files could reference symbols from one another. The output of the linker is an executable program. The syntax for the linker is as follows: isem-ld objectfile.o [-o execfile] Examples % isem-ld foo.o -o foo Links foo.o into the executable foo % isem-ld foo.o Links foo.o into the executable a.out. CSE360

ISEM FAQ 6 Loading/Running
Execute the program and test it in the emulation environment. The program "isem" is used to do this, and the majority of its features are covered in your lab manual. Invoke isem as follows isem [execfile] Examples % isem foo Invokes the emulator, loads the program foo % isem Invokes the emulator, no program is loaded Once you are in the emulator, you can run your program by typing "run" at the prompt. CSE360

Assembly language programs are not notoriously chatty.
ISEM Debugging Tools 1 % isem xmp0 Instructional SPARC Emulator Copyright Computer Science Department University of New Mexico ISEM comes with ABSOLUTELY NO WARRANTY ISEM Ver 1.00d : Mon Jul 27 16:29:45 EDT 1998 Loading File: xmp0 2000 bytes loaded into Text region at address 8:2000 2000 bytes loaded into Data region at address a:4000 PC: 08: nPC: PSR: e N:0 Z:0 V:0 C:0 start : sethi 0x10, %g2 ISEM> run Program exited normally. Assembly language programs are not notoriously chatty. CSE360

ISEM Debugging Tools 2 reg symb dump [addr]
Gives values of all 32 general registers Also PC symb Shows the resolved values of all symbolic constants dump [addr] Either symbol or hex address Gives the values stored in memory ISEM> reg G f O L I PC: 08: c nPC: PSR: e N:0 Z:0 V:0 C:0 beyond_end : sethi 0x0, %g0 ISEM> symb Symbol List A_m : B_m : . terminate : ISEM> dump A_m 0a: f f ? 0a: 0a: CSE360

ISEM Debugging Tools 3 break [addr] trace
Set breakpoints in execution Once execution is stopped, you can look at the contents of registers and memory. trace Causes one (or more) instruction(s) to be executed Registers are displayed Handy for sneaking up on an error when you’re not sure where it is. For the all-time “most wanted” list of errors (and their fixes) CSE360

Basic Components 1 Terminology from Ch. 2:
Flip flop: basic storage device that holds 1 bit D flip flop: special flip flop that outputs the last value that was input to it (a data signal). Clock: two different meanings: (1) a control signal that oscillates (low to high voltage) every x nanoseconds; (2) the “write select” line for a flip flop. Edge triggered device changes when clock rises. (in other systems, when it falls) Others are RS and JK. CSE360

Basic Components 2 Register: collection of flip flops with parallel load. Clock (or “write select”) signal controlled. Stores instructions, addresses, operands, etc. Bus: Collection of related data lines (wires). The term “clock” remains due to tradition. For many uses of registers, a better term would be “write select”. CSE360

Basic Components 3 Combinational circuits: implement Boolean functions. No feedback in the circuit, output is strictly a function of input. Gates: and, or, not, xor E.g., xy + Øz Z=X•Y AND see table on pg 38 ; requires 3 transistors (NAND requires 2 transistors) Z=X+Y OR ; requires 3 transistors (NOR requires 2 transistors) Z=X NOT ; requires 1 transistor Z=XY XOR (either, not both) ; requires 7 transistors (((X+Y)+XY)) Note: the flow of power is orthogonal to the flow of signals. f=xy+ z x y z xy ^z f CSE360

Basic Components 4 Gates can be used in combination to implement a simple (half) adder. Addition creates a value, plus a carry-out. Z = X  Y CO = X  Y X Y Z CO The simple half adder contains 10 = transistors. CSE360

Basic Components 5 Sequential Circuits: introduce feedback into the circuit. Outputs are functions of input and current state. Multiplexers: combinational circuits that use n bits to select an output from 2n input lines. D Q C The top figure shows a clocked D latch, a level-triggered device; it contains 9 transistors. See Maccabe’s p. 56 for the distinction between latches and flip-flops (edge-triggered). Tanenbaum’s Structured Computer Organization, 4th edition, p. 144, shows how to construct a D flip-flop from this D latch; it would contain 13 transistors. This D latch is a bistable circuit. Maccabe’s page 61 shows multiplexers. It takes two signals to select from 4 input lines because four different combinations of ‘1’s and ‘0’s are possible with two signals S0 S1 selects input line 0 0 0 0 1 1 0 2 1 1 3 CSE360

Basic Components 6 Von Neumann Architecture
Can access either instructions or data from memory in each cycle. One path to memory (von Neumann bottleneck) Stored program system. No distinction between programs and data A history lesson: 1946 ENIAC: 5000 adds/sec in DECIMAL!, 20 accumulators each with 10 digits 1945 John von Neumann proposed stored program machine with organs analogous to human’s for input, output, processing, storage and xfer 1952 IAS completed (project began in 1946); named for Princeton’s Institute for Advanced Study Only two types of components required: gates and memory cells Where are the flip-flops? Combinational circuits? Buses? CSE360

Basic Components 7 Examples of Von Neumann architecture to be explored in this course: SAM: tiny, good for learning architecture MIPS: text’s example assembly language SPARC: labs M68HC11: used in ECE 567 (taken by CSE majors) Roughly, the order of presentation in this course is as follows: A couple of days on the Main Memory System Weeks on the Central Processing Unit (CPU) Finish the course with the I/O System CSE360

Basic Components 8 Memory: Can be viewed as an array of storage elements. The index of each element is called the address. Each element holds the same number of bits. How many bits per element? 8, 16, 32, 64? 8 bits = 1 byte 16 bits 32 bits 64 bits It is not the size of the addressable element that depends on the size of the data bus, but, rather, the size of the word. The number of addressable elements depends on the size of the address bus. SPARC has a 32 bit word; a word is not always 32 bits. Mention the phrase “?-bit addressable”. SPARC’s memory is 8-bit addressable, i.e., byte addressable. 1 1 1 1 2 2 2 2 ... ... ... ... n-1 n-1 n-1 n-1 CSE360

Memory Element & Address Sizes
If a machine’s memory is 5-bit addressable, then, at each distinct address, 5 bits are stored. The contents at each address are represented by 5 bits. If 3 bits are used to represent memory addresses, then the memory can have at most 23 = 8 distinct addresses. Such a memory can store at most 8  5 = 40 bits of data. If the data bus is 10 bits wide, then up to 10 bits at a time can be transferred between memory and processor; this is a 10-bit word. Address Contents Decimal Binary 000 00011 1 001 01111 2 010 01110 3 011 10100 4 100 00101 5 101 6 110 7 111 10011 CSE360

Basic Components 9 Let’s look deeper.
Suppose each memory element is stored in a bank and given a relative address. You could have several such banks in your memory. The GLOBAL address of each element would be: [relative address] & [bank address]. To get two elements at a time, start reading from bank 0 (don’t start from bank 1; this would be a “memory address not aligned” error). 000 001 010 011 100 101 Bank 0 000 001 010 011 100 101 Bank 0 Bank 1 000 0 001 0 010 0 011 0 100 0 101 0 000 1 001 1 010 1 011 1 100 1 101 1 Global addresses, not contents. Think of the contents as being underneath the global addresses. See the pattern forming? As you read the addresses across and then down, they’re in sequence! Access is easy if the size of the element (e.g 1 byte) is the size you want, but suppose that the data bus is 16 bits wide and each element is 8 bits. Sometimes you might want 1 byte and sometimes you might want two! Getting a contiguous 16 bits starting at an address in bank 1 is difficult. The hardware needs to be able to get bytes from two different rows! So, as programmers, we give the hardware designers a break and agree to only request our 2 byte reads/writes using addresses that start in bank 0. This is easy because all the global addresses in bank 0, end in 0 and are even. This also works with four banks and more: CSE360 000 001 010 011 100 101 Bank 00 Bank 01 000 00 001 00 010 00 011 00 100 00 101 00 000 01 001 01 010 01 011 01 100 01 101 01 Bank 10 Bank 11 000 10 001 10 010 10 011 10 100 10 101 10 000 11 001 11 010 11 011 11 100 11 101 11

Basic Components 10 Memory alignment: Assume a byte addressable machine with 4-byte words. Where are operands of various sizes positioned? bytes: on a byte boundary (any address) half words: on half word boundary (even addresses) words: on word boundary (addresses divisible by 4) double words: on double word boundary (addresses divisible by 8) byte halfword word double address notice pattern to addresses… Discuss the fact that one need only look at the least significant hexadecimal digit of an address to determine divisibility by 8, 4, 2, and 1. Also mention that divisibility by 8 implies divisibility by 4, 2, and 1. In general, divisibility by any higher power of 2 implies divisibility by lesser powers of 2. 1 1 2 3 1 2 3 4 5 6 7 2 4 5 3 6 7 1 2 3 4 5 6 7 8 9 A B C D E F CSE360

Basic Components 11 Byte ordering: how numeric data is stored in memory Ex.: = 0EC699BF16 Stored at address 0 Big Endian High order (big end) is at byte 0 Little Endian Low order (little end) is at byte 0 OE 1 C6 2 99 3 BF BF 1 99 2 C6 3 0E Sparc is inconsistent big endian Does not impact the ordering of ascii string arrays consistent big-endian (not pictured): MSB at far right of lower address consistent little-endian (pictured): MSB at far left of higher address a matter of convention Big endian advantages: Faster in comparing integer-aligned character strings; the integer ALU can compare multiple bytes in parallel Decimal and ascii dumps can be printed left to right w/out causing confusion Big endian processors store integers and characters in the same order (MSB comes first) Little endian advantages Does not have to perform addition to isolate the least significant 2 bytes of a 4 byte integer address High precision arithmetic is easier b/c you don’t have to find LSB and move backward. Contrast with bit ordering 7 6 5 4 3 2 1 CSE360

Basic Components 12 Read/Write operations: must know the address to read or write. (read = fetch = load, write = store) CPU puts address on address bus CPU sends read signal (R/W=1, CS=1) (Read/don’t Write, Chip Select) Wait Memory puts data on data bus reset (CS=0) A0 A1 A(m-1) CS R/  W Despite some of their names, each of these operations is non-destructive of its source! Consider the size of the address and data buses, relative to the addressing scheme and size of memory. How does a Write work differently? Typically, the memory unit also needs to “know” the size of the data being transferred. D0 D1 D(n-1) CSE360

Basic Components 13 Types of memory:
ROM: Read Only Memory: non-volatile (doesn’t get erased when powered down; it’s a combinational circuit!) PROM: Programmable ROM: use a ROM burner to write data to it initially. Can’t be re-written. EPROM: Erasable PROM. Uses UV light to erase. EEPROM: Electrically Erasable PROM. RAM: Random access memory. Can efficiently read/write any location (unlike sequential access memory). Used for main memory. Many variations (types) of RAM, all volatile SDRAM, DDR SDRAM RDRAM ROM is for large-volume applications (appliances); PROM is for small-volume applications (equipment); EPROM and EEPROM are for device prototyping. RAM SRAM: Static RAM is built from flip-flops, sequential circuits, transistors; it’s called static only to distinguish it from DRAM: DRAM: Dynamic RAM is built from capacitors; it’s called dynamic because it must periodically refresh the capacitors. It also must rewrite each capacitor after reading. SDRAM: Synchronous Dynamic RAM DDR SDRAM: Double Data Rate SDRAM RDRAM: Rambus DRAM (proprietary) (DRDRAM) EEPROM is byte erasable Flash memory is block erasable and rewritable; 100-nsec access times; wear out after about 10,000 erasures. Currently used as “film” in digital cameras. CSE360

Basic Components 14 CPU: executes instructions -- primitive operations that the computer can perform. E.g., arithmetic A+B data movement A := B control if expr goto label logical AND, OR, XOR… Instructions specify both the operation and the operands. An encoded operand is often a location in memory where the value of interest may be found (address of value of interest). CSE360

Basic Components 15 Instruction set: all instructions for a machine. Instruction format specifies number and type of operands. Ex.: Could have an instruction like ADD A, B, R Where A, B, and R are the addresses of operands in memory. The result is R := A+B. During execution, the machine instruction is more similar to “ADD 0, 4, 0xC”. CSE360

Basic Components 16 Actually, the “instruction” might be represented in a source file as: 0x C20422C20520A. … A D D A , B , R As such, it is an assembly language instruction. An assembler might translate it to, say, 0x504C, the machine’s representation of the instruction. As such, it is a machine language instruction. CSE360

A Simple Instruction Set 1
Simple instruction set: the Accumulator machine. Simplify instruction set by only allowing one operand. Accumulator implied to be the second operand. Accumulator is a special register. Similar to a simple calculator. ADD addr ACC ¬ ACC + M[addr] SUB addr ACC ¬ ACC - M[addr] MPY addr ACC ¬ ACC * M[addr] DIV addr ACC ¬ ACC / M[addr] LOAD addr ACC ¬ M[addr] STORE addr M[addr] ¬ ACC The Simple Accumulator Machine (SAM) M[addr] means the value stored at (or the contents of cell with) address addr. By the slash operator (‘/’), we mean integer division. CSE360

Ex.: C = A´B + C´D LOAD 20 ! 1)Acc<-M[20] MPY 21 ! 2)Acc<-Acc*M[21] STORE 30 ! M[30]<-Acc LOAD 22 ! 3)Acc<-M[22] MPY 23 ! 4)Acc<-Acc*M[23] ADD 30 ! 5)Acc<-Acc+M[30] STORE 22 ! M[22]<-Acc Machine language: Converting from assembly language to machine language is called assembling. Accumulator 1) 2) 3) 4) 5) Try C=2A+B LOAD A ADD A ADD B STORE C Try C=A+2 1: LOAD TEMP DIV TEMP ! TEMP/TEMP = 1, under our usual assumption that even 0/0 = 1. STORE TEMP ! Save 1 ADD TEMP ! Acc <- 2 ADD A ! Acc<- A+2 2 (not so good; see comment below): ADD A ! Acc <- 2a DIV A ! 2a/a = 2, unless a = 0! ADD A ! Acc <- 2+ a CSE360

An Instruction (Encoding) Format
Assume 8-bit architecture. Each instruction may be 8 bits. 3 bits hold the op-code and 5 bits hold the operand. How much memory can we address? How many op-codes can we have? Convert the mnemonic op-codes into binary codes. 5 bits for operands means 25 , or 32, possible addresses 3 bits for opcodes means 23 , or 8, possible opcodes what if the system had larger (more bits long) instructions? size of addressable memory number of instructions tradeoffs Consider the number of memory accesses needed to fetch one instruction from memory. What possible problems could this create? One can find the binary operation codes in Table 3.6 on p. 88. CSE360

Hand assemble our program: LOAD MPY STORE Instructions are stored in consecutive memory: Addresses of operands are encoded as 5-bit simple binary, with a range 0..31, while integer values are encoded as 8-bit 2’s complement, with a range – CSE360

From fig 3.12 in the text. REGISTERS ACC - accumulator IR - Instruction Register, holds the instruction during interpretation MAR - Memory Address Register, stores address to read/write to/from MDR - Memory Data Register, stores data from memory, either written/read PC - Program Counter, stores the address of the next instruction Combinational Circuits ALU - Arithmetic and logic unit, implements the operations (eg, +,-,*,/) Decode - Instruction decoder, splits off the opcode and operands INC - Incrementer, increments the PC MUX - Multiplexer, controls inputs to PC and ACC Timing and control - asserts control signals, clock (It must include sequential circuits (at least one register) to record its current state.) CSE360

Control signals: control functional units to determine order of operations, access to bus, loading of registers, etc. See Table 3.7, p. 90, Maccabe. Need control signals to control loads of registers (1,3,4,5,7) Need control signals to control access to the shared bus (0,2,6,12) 9, 8 control MUX 10, 11 control which ALU op is executed 13, 14 control memory system. CSE360

CPU CSE360 See Table 3.7, p. 90, Maccabe.
Need control signals to control loads of registers (1,3,4,5,7) Need control signals to control access to the shared bus (0,2,6,12) 9, 8 control MUX 10, 11 control which ALU op is executed 13, 14 control memory system. CSE360

State 1 2 3 Y N 4 6 Big picture stuff states necessary to implement all the instructions in our instruction set fetch cycle execute cycle control signals executed simultaneously on state change no bus contention 5 Y N 7 8 CSE360

State 0: Control Signals 2, 5, 9, 3
Put the address of the next instruction in the Addr Register and Inc. PC. If you are not comfortable/familiar with electricity, then picture the above as being a plumbing diagram with the wires being pipes and the registers “tanks” of water -- but tanks that never get empty. Enabling a control signal is (EE people have trouble as “open” means 0) like opening the valve. The water flows along all pipes and through any other valves (control signals) that are opened/enabled. See Maccabe’s p. 55: “Implementation of a bus requires a new gate called a tri-state device.… When the control signal is 0, the output of the tri-state device does not produce an output signal. Instead, the tri-state device enters a high-impedance state that does not interfere with other signals on a shared bus.” (The italic emphasis is ours, not Maccabe’s.) One CPU cycle. CSE360

State 1: Control Signals 13, 14
Fetch the word of memory at Address, and load into Data Register. CSE360

Send the word from the Data Register to the Instruction Register. CSE360

Put the address from the instruction in the Address Register. CSE360

After State 3, what values are now stored in each register?
PC MAR MDR IR ACC PC address of next instruction MAR address of operand MDR current instruction opcode and operand IR current instruction opcode and operand ACC old value CSE360

Take the value from the ACCumulator and store it in the Data Register. CSE360

State 5: Control Signal 13 Write the data from the Data Register to the address stored in the MAR. CSE360

Load the word at the Address from the Addr Reg into the Data Register. CSE360

After State 6, what values are now stored in each register?
PC MAR MDR IR ACC PC address of next instruction MAR address of operand of current instruction MDR contents of memory from address of operand IR current instruction’s opcode and operand ACC same value from results of previous instruction CSE360

Load the word from Data Register into the ACCumulator. CSE360

State 8: Control Signals 6, 8, 10/11, 1
Use word from the Data Register for Arith Op and put result in ACC. Now would be a great time for “Playing Roles of SAM’s components”. CSE360

New Instruction What is necessary to implement a new instruction?
New states? New control signals? New fetch/execute cycle? An Example: SWAP Exchange value in Accumulator with value at Address SWAP addr ! Acc <- #M[addr], M[addr] <- #Acc CSE360

New Instruction What changes to fetch/execute cycle?
The fetch part of the cycle usually remains the same. Recall the values stored in registers after each state E.g., After State 6, what values are in each register? PC MAR MDR IR ACC Handy to have #M[addr] in MDR Start after state 6 then… . PC #PC + 1 MAR addr MDR M[addr] IR M[#PC], Instruction ACC #ACC CSE360

New State 9: Control Signals 6, 5
Save the Data value from the MDR in the Address Register. MDR -> bus Load MAR CSE360

New State 10: Control Signals 0, 7
Send the ACCumulator value to the Data Register. ACC -> bus load MDR CSE360

New State 11: Control Signals ?, 1
Put the saved value from the MAR into the ACCumulator. MAR->bus load ACC Note: there is no control signal in the current architecture opposite of 5 (Load MAR), so we would have to create a new control signal (MAR to bus) in addition to creating these new states. CSE360

New State 12 (Old 3): Control Signals 12, 5
Put (reload) the address from the instruction in the Address Register. Addr -> bus load MAR CSE360

New State 13 (Old 5): Control Signals 13
Write the data from the Data Register to the address stored in the MAR. CS CSE360

New Instruction Example Summary
Changes to States, added 9 thru 13 Changes to Signals, added 15: MAR -> bus Changes to Fetch/Execute, new register transfer language (RTL) PC -> bus, load MAR, INC -> PC, Load PC CS, R/w MDR -> bus, load IR Addr -> bus, load MAR MDR -> bus, load MAR ACC -> bus, load MDR MAR -> bus, load ACC CS Try LOAD_indexed imm ! Acc <- M[#Acc+imm] PC -> bus, Load MAR,INC -> PC, Load PC CS, R/~w MDR -> bus, Load IR Addr -> bus, ALU->ACC, ALU add, Load ACC Acc->bus, Load MAR MDR -> bus, Load ACC CSE360

Instruction Set Architectures 1
RISC vs. CISC Complex Instruction Set Computer (CISC): many, powerful instructions. Grew out of the need for high code density. Instructions have varying lengths, number of operands, formats, and clock cycles in execution. Reduced Instruction Set Computer (RISC): fewer, less powerful, optimized instructions. Grew out of opportunity for simpler, faster hardware. Instructions have fixed length, number of operands, formats, and similar number of clock cycles in execution. “Complex” refers to the complexity of individual instructions. Code density was sought as a way of alleviating the von Neumann bottleneck. Analogy: Imagine a 100 meter race. CISC: winner is whoever finishes in fewest strides; RISC: winner is one who crosses the finish line earliest. RISC made micro-coded hardware implementations unnecessary and made pipelining feasible. Next trend (maybe): VLIW (very long instruction words) to explicitly use multiple functional units (i.e., floating point, integer, logic) in one instruction. CSE360

Motivation: memory is comparatively slow. 10x to 20x slower than processor. Need to minimize number of trips to memory. Provide faster storage in the processor -- registers. Registers (16, 32, 64 bits wide) are used for intermediate storage for calculations, or repeated operands. Accumulator machine One data register -- ACC. 2 memory accesses per instruction -- one for the instruction and one for the operand. Add more registers (R0, R1, R2, …, Rn) CSE360

How many addresses to specify? With binary operations, need to know two source operands, a destination, and the operation. E.g., op (dest_operand) (src_op1) (src_op2) Based on number of operands, could have: 3 addr. machine: both sources and dest are named. 2 addr. machine: both sources named, dest is a source. 1 addr. machine: one source named, other source and dest. is the accumulator. 0 addr. machine: all operands implicit and available on the stack. CSE360

1-address architecture: a:=a´b+c´d´e Memory only Using registers 1½-address architecture: at least one operand must always be a register. (½ address is register, 1 address is the memory operand: LOAD 100, R1). Like an accumulator machine, but with many accumulators. Mem only, total accesses 16 using regs, 14 how many bits to represent ½ address as register How big does an instruction have to be? How many opcodes? How many regs? How many addresses? CSE360

3-address architecture: a:=a´b+c´d´e Using memory only: Using registers: What about instruction size? Show how values in memory change after each instruction Memory only: 4 accesses each, total accesses is 16 using regs: 3 or 2 accesses each, total is 10 discuss if accesses to memory take 10 cycles, and there are 16 accesses, it takes more than 160 cycles to complete instructions. How big does an instruction have to be to represent an opcode, two memory locations and a register? How big an address? How many opcodes? How many regs? CSE360

2-address architecture: a:=a´b+c´d´e Using memory only: Using registers: Most CISC arch. this way, making 1 operand implicit Using mem, 19 total accesses using regs, 13 CSE360

0-address architecture: a:=a´b+c´d´e Stack machine: All operands are implicit. Only push and pop touch memory. All other operands are pulled from the top of stack, and result is pushed on top. E.g., HP calculators. Show how instructions update the stack. Ask: how many bits to encode each instruction? CSE360

Load/Store Architectures -- RISC Use of registers is simple and efficient. Therefore, the only instructions that can access memory are load and store. All others reference registers. Total of 16 accesses look deeper than simple mem ref counts for value (pg 128) load/store simplifies instruction overlapping due to memory latency but may cause register interlock CSE360

Why load/store architectures? Number of instructions (hence, memory references to fetch them) is high, but can work without waiting on memory. Claim: overall execution time is lower. Why? Clock cycle time is lower (no micro code interpretation). More room in CPU for registers and memory cache. Easier to overlap instruction execution through pipelining. Side effects: Register interlock: delaying execution until memory read completes. Instruction scheduling: rearranging instructions to prevent register interlock (loads on SPARC) and to avoid wasting the results of pipelined execution (branches on SPARC). “Clock cycle time is lower” because CISC machines tend to need to have their more complex instructions interpreted in micro code. RISC machines tend to be implemented without the use of interpretation. See Tanenbaum’s Structured Computer Organization, 4th edition, p. 47. A reference for the following is Maccabe, pp : Overlapping instruction execution is made possible by the memory cache; when executing in a loop, the instructions are already present in the memory cache. Overlapping instruction execution is achieved through pipelining. (See Maccabe’s section 9.3) It is easier to implement overlapped instruction execution because only the “load” instruction is involved on the supply side; the reasoning required is much simpler. Register interlock: the machine does wait when necessary, to avoid erroneous results. The goal of pipelining is to interpret an instruction on every processor cycle. Pipelining is based on overlapping the execution of several instructions. (Maccabe, p. 320) CSE360

SPARC Assembly Language 1
SPARC (Scalable Processor ARChitecture) Used in Sun workstations, descended from RISC-II developed at UC Berkeley General Characteristics: 32-bit word size (integer, address, register size, etc.) Byte-addressable memory RISC load/store architecture, 32-bit instruction, few addressing modes Many registers (32 general purpose, 32 floating point, various special purpose registers) ISEM: Instructional SPARC Emulator - nicer than a real machine for learning to write assembly language programs. CSE360

Structure Line oriented: 4 types of lines Blank - Ignored Labeled - Any line may be labeled. Creates a symbol in listing. Labels must begin with a letter (other than ‘L’), then any alphanumeric characters. Label must end with a colon “:”. Label just assigns a name to an address. Assembler Directives - E.g., .data .word .text, etc. Instructions Comments start after “!” character and go to the end of the line. .data x_m: .word 0x42 y_m: .word 0x20 z_m: .word 0 .text start: set x_m, %r2 ld [%r2], %r2 set y_m, %r3 ld [%r3], %r3 ! Load x into reg 2 ! Load y into reg 3 Avoid starting labels with a capital ‘L’; this appears to mean “local label” to the assembler; a lower case ‘l’ is just like any other letter. For labels, the underscore character (‘_’) counts as a letter; i.e., in this context, it is considered to be an alphanumeric character, and it may be the first character of a label. CSE360

Directives: Instructions to the assembler Not executed by the machine .data -- following section contains declarations Each declaration reserves and initializes a certain number of bits of storage for each of zero or more operands in the declaration. .word bits .half bits .byte bits E.g., .data w: .half x: .byte 8 y: .byte ’m’, 0x6e, 0x0, 0, 0 z: .word 0x3C5F .text -- following section contains executable instructions Operands are, of course, separated by commas = 0x6978; ‘m’ = 0x6d. Q: What does the assembler do when a .word directive has zero operands? A: It reserves and initializes 0 bits of storage! CSE360

Registers bits wide 32 general purpose integer registers, known by several names to the assembler %r0-%r7 also known as %g0-%g7 global registers -- Note, %r0 always contains value 0. %r8-%r15 also known as %o0-%o7 output registers %r16-%r23 also known as %l0-%l7 local registers %r24-%r31 also known as %i0-%i7 input registers Use the %r0-%r31 names for now. Other names are used in procedure calls. 32 floating point registers %f0-%f31. Each reg. is single precision. Double prec. uses reg. pairs. CSE360

3-address operations - format different from book op src1, src2, dest !opposite of text E.g., add %r1, %r2, %r3 !%r3 ¬ %r1 + %r2 or %r2, 0x0004, %r2 !%r2 ¬ %r2 b-w-or 0x0004 Contrast SPARC with MiPs (used in the book) indirect address vs [addr] operand order, especially the destination register register notation: R2 vs. %r2 branches CSE360

2-address operations: load and store ld [addr], %r2 ! %r2 ¬ M[addr] st %r2, [addr] ! M[addr] ¬ %r2 Often use set to put an address (a label, a symbolic constant) into a register, followed by ld to load the data itself. set x_m, %r1 !put addr x_m into %r1 ld [%r1],%r2 !use addr in %r1 to load %r2 Immediate values: instruction itself contains some data to be used in execution. “ld” means “load word” and “st” means “store word”; “ld [x_m], %r2”, while syntactically legal in SPARC assembly language, leads to an “object file inconsistency: nonexistent symbol” linker error. Usually, “addr” is some register in both of those instructions; later on we will use a combination of two registers or a register and a constant. In the text of an assembly language program, the label x_m is just another name for a particular address; the label x_m equals that address. Try: A= A+B .data A_m: .word 2 B_m: .word 3 .text set A_m, %r1 ld [%r1], %r1 set B_m, %r2 ld [%r2], %r2 add %r1, %r2, %r2 st %r2, [%r1] ta 0 CSE360

Immediate values (continued) E.g., add %rs, siconst13, %rd !%rd¬%rs+const Constant is coded into instruction itself, therefore available after fetching the instruction (no extra trip to memory for an operand). On SPARC, no special notation for differentiating constants from addresses because no ambiguity in a load/store architecture. Immediate value coded in 13 bit sign-extended value. Range is, then, -212…212-1 or to 4095. Immediate values can be specified in decimal, hexadecimal, octal, or binary. E.g., add %r2, 0x1A, %r2 ! %r2 ¬ %r2 + 26 The prefix for binary is 0b Please see for details regarding the assembler’s treatment of immediate values outside of the range -4096…4095. CSE360

Synthetic Instructions: assembler translates one “instruction” into several machine instructions. set : used to load a 32-bit signed integer constant into a register. Has 2 operands - 32 bit value and register number. How does that fit into a 32 bit instruction? E.g., set iconst32, %rd set -10, %r3 set x_m, %r4 set ’=’, %r8 clr %rd : used to set all bits in a register to 0. How? mov %rs, %rd : copies a register. neg %rs, %rd : copies the negation of a register. clr implemented as or %r0, %r0, %rd mov implemented as or %r0, %rs, %rd neg implemented as sub %r0, %rs, %rd clr %rd could be done with and %r0, %r0, %rd sub %rd, %rd, %rd mov could be done with add %r0, %rs, %rd CSE360

Operand sizes double word = 8 bytes, word = 4 bytes, half word = 2 bytes, byte = 8 bits. Recall memory alignment issues. set x_m, %r !Put addr x_m in %r2 ld [%r2], %r1 !load word ldsb [%r2], %r1 !load byte, sign extended ldub [%r2], %r1 !load byte, extend with 0’s st %r1, [%r2] !store word, addr is mult of 4 stb %r1, [%r2] !store byte, any address sth %r1, [%r2] !store half word, address is even Characters use 8 bits ldub to load a character stb to store a character Diagram memory and registers %r1 and %r2, and show where in the destination the information goes, as well as where in the source the information comes from. The error message that occurs, at run time, when one of these restrictions on the address is violated is: “TRAP (memory address not aligned) occurred at PC: xy.” CSE360

Traps : provides initial help with I/O, also used in operating systems programming. ta 0 : terminate program ta 1 : output ASCII character from %r8 ta 2 input ASCII character into %r8 ta 4 : output integer from %r8 in unsigned hexadecimal ta 5 : input integer into %r8, can be decimal, octal, or hex E.g., set ’=’, %r8 !put ’=’ in %r8 ta !output the ’=’ ta !read in value into %r8 mov %r8, %r1 !copy %r8 into %r1 set 0x0a, %r8 !load a newline into %r8 ta !output the newline CSE360

More assembler directives (.asciz and .ascii): Each of the following two directives is equivalent: msg01: .asciz "a phrase" msg01: .byte 'a', ' ', 'p', 'h', 'r' byte 'a', 's', 'e', 0 Note that .asciz generates one byte for each character between the quote (") marks in the operand, plus a null byte at the end. The .ascii directive does not generate that extra byte. Each of the following three directives is equivalent: digits: .ascii " " digits: .byte '0', '1', '2', '3', '4', '5' byte '6', '7', '8', '9' digits: .byte 0x30, 0x31, 0x32, 0x33, 0x byte 0x35, 0x36, 0x37, 0x38, 0x39 CSE360

Quick review of instructions so far: ld [addr], %rd ! %rd ¬ M[addr] st %rd, [addr] ! M[addr] ¬ %r2 op %rs1, %rs2, %rd ! op is ALU op op %rs, siconst13, %rd ! %rd¬%rs op const set siconst32, %rd ! %rd¬const ta # ! trap signal Have actually seen many more variants, e.g., ldub, ldsb, sth, clr, mov, neg, add, sub, smul, sdiv, umul, udiv, etc. Can evaluate just about any simple arithmetic expression. Try this. Assume .data x_m: .word 0 prompt: .asciz "? " Write the Sparc instructions to Prompt the user for two values Multiply them Store the result in X Output the result as a hex value .text set prompt, %r1 ldub [%r1], %r8 ta 1 inc %r1 ta 5 mov %r8, %r2 smul %r2, %r8, %r8 set x_m, %r1 st %r8, [%r1] ta 4 ta 0 CSE360

Review: Sparc Loads, Stores
.data x_m: .word 0xa1b2c3d4 .skip 12 .text set x_m, %r2 ld [%r2], %r3 ldsb [%r2], %r4 ldub [%r2], %r5 st %r3, [%r2+4] sth %r3, [%r2+8] stb %r3, [%r2+12] ta 0 BEFORE: ISEM> dump x_m 0a: a1 b2 c3 d ISEM> reg G ~~~~~~~~~~~~~~~~~~~~~~~~~~~ AFTER: 0a: a1 b2 c3 d4 a1 b2 c3 d4 c3 d d G a1b2c3d4 ffffffa a1 In 8-bit 2’s complement encoding, 0xa1 represents -0x5f; i.e., it (161) represents -95. After this runs, what values are in %r2-5, and memory locations starting at byte address x_m? CSE360

Flow of Control 1 In addition to sequential execution, need ability to repeatedly and conditionally execute program fragments. High level language has: while, for, do, repeat, case, if-then-else, etc. Assembler has if, goto. Compare: high level vs. pseudo-assembler, implementation of f=n! f = 1; i = 2; while (i <= n) { f = f * i; i = i + 1; } f = 1 i = 2 loop: if (i > n) goto done f = f * i i = i + 1 goto loop done: ... CSE360

Flow of Control 2 Branch -- put a new address in the program counter. Next instruction comes from the new address, effectively, a “goto”. Unconditional branch (book) BRANCH addr ! PC ¬ addr (SPARC) ba addr ! PC ¬ addr Conditional branch (book) BRcc R1, R2, target “if R1 cc R2 then PC ¬ target” and cc is comparison operation (e.g., LT is <, GE is ³, etc.) CSE360

Flow of Control 3 Evaluating conditional branches
Evaluate condition If condition is true, then PC ¬ target, else PC ¬ PC+1 Consider changes to the fetch-execute cycle given earlier for accumulator machine. What needs to change? Do data paths need to change? New control paths? New opcodes? New instruction formats? Questions answered: no, yes, yes, yes CSE360

Flow of Control 4 Other conditions (from text, very similar to MIPS)
Can implement high level control structures now. Back to the factorial example using the book’s assembly language: LOAD R1, #1 ; R1 = f = 1 LOAD R2, #2 ; R2 = i = 2 LOAD R3, n ; R3 = n loop: BRGT R2, R3, done ; branch if i > n MPY R1, R1, R2 ; f = f * i ADD R2, R2, #1 ; i = i + 1 BRANCH loop ; goto loop done: STORE f, R1 ; f = n! CSE360

Flow of Control 5 Condition Codes
Book’s assembly language has 3-address branches. SPARC uses 1-address branches. Must use condition codes. Non-MIPS machines use condition codes to evaluate branches. Condition Code Register (CCR) holds these bits. SPARC has 4-bit CCR. N: Negative, Z: Zero, V: Overflow, C: Carry. All are shown in a trace, or in the reg command under ISEM. Condition codes are not changed by normal ALU instructions. Must use special instructions ending with cc, e.g., addcc. CSE360

Flow of Control 6 .text start: set 1, %r2
set 0xFFFFFFFE, %r1 ! –2 in 32-bit 2’s comp cc_set: subcc %r1, %r2, %r3 ! r3<= -2-1 end: ta 0 ISEM> reg G fffffffe O L I PC: 08: nPC: c PSR: e N:0 Z:0 V:0 C:0 cc_set : subcc %g1, %g2, %g3 ISEM> trace G fffffffe fffffffd PC: 08: c nPC: PSR: 00b0003e N:1 Z:0 V:0 C:0 Before the subcc all the bits of the CCR are zero. After the subcc, the N bit is set and r3 shows the 2’s complement representation of –3. CSE360

Flow of Control 7 Setting the condition codes
Regular ALU operations don’t set condition codes. Use addcc, subcc, smulcc, sdivcc, etc., to set condition codes. E.g., Suppose %r1 contains -4 and %r2 contains 5. addcc %r1, %r2, %r3 subcc %r1, %r2, %r3 subcc %r2, %r1, %r3 subcc %r1, %r1, %r3 N and Z are easy to understand. A correct understanding of V and C can be obtained easily by regarding V=1 as signaling a wrong result for signed arithmetic, C=1 as signaling a wrong result for unsigned arithmetic. These examples assume 5-bit encodings, both 2’s complement (signed) and unsigned. -4+5=1≠28+5=33; hence, C=1 for a wrong unsigned result: addcc: 11100 00101 ===== 1, N=0, Z=0, V=0, C=1 These details do not matter, but: Implementation of subtraction is not self-evident: a-b  a+(-b) a-b= -(b+(-a)) -4 – 5 = -9, and 28 – 5 = 23; hence, V=0 and C=0 for correct signed and unsigned results: subcc (5+(-(-4)))=-9. +00100 0,01001; negation yields N=1, Z=0, V=0, C=0 5 – (-4) = 9 ≠ -23 = 5 – 28; hence, C=1 for a wrong unsigned result: subcc (-4+(-5))=9. 1,10111; negation yields N=0, Z=0, V=0, C=1 -4 – (-4) = 0 = ; hence, V=0 and C=0 for correct signed and unsigned results: subcc (-4+(-(-4)))=0. 1,00000; negation comes right back to N=0, Z=1, V=0, C=0 In the case of addition, it is safe to assume that the ALU’s carry-out from the highest bit determines C’s value; however, this assumption does not hold for subtraction, where matters are a bit more complicated. The rule regarding correct signed and unsigned results still applies to subtraction, however. CSE360

ALU Hardware 1 How does a computer add?
Design a circuit that adds three single digit binary numbers. Results in a sum, and a carry out. x y cin x y  FA cout cin cout = (x  y)  [(x  y)  cin] Sum = (x  y)  cin See Maccabe, p. 41. Identify the half adders in this diagram; see slide 42 (Basic Components 4).  Sum  Sum cout CSE360

ALU Hardware 2 Now cascade the full adder hardware
How are CCR bits set? (Above is a ripple-carry adder.) C-bit = Cout V-bit = Cout  Cn-1 Z-bit = (rzn-1  rzn-2  rzn-3  ...  rz0) N-bit = rzn-1 register x register y cout FA FA FA FA FA register z See Maccabe, pp. 246 and following, for a discussion of the ripple-carry adder and some faster adders. This description of setting the CCR bits applies only to addition; subtraction is a somewhat more complicated matter. CSE360

Flow of Control 8 Branches use logic to evaluate CCR (SPARC) CSE360
Operation Assembler Syntax Branch Condition Branch always ba target 1 (always) Branch never bn 0 (never) Branch not equal bne Z Branch equal be Z Branch greater bg (Z  (N  V)) Branch less or equal ble (Z  (N  V)) Branch greater or equal bge (N  V) Branch less bl N  V Branch greater, unsigned bgu (C  Z) Branch less or equal, unsigned bleu C  Z Branch carry clear bcc C Branch carry set bcs C Branch positive bpos N Branch negative bneg N Branch overflow clear bvc V Branch overflow set bvs V Consider that when a subtraction yields zero the minuend and subtrahend must have been equal; this is the relationship between “equal” and “zero”; “be” can also be written “bz”. Suppose %r3 = 0x , the most negative of the integers representable in the 32-bit 2’s complement scheme. Consider “subcc %r3, 1, %r2” (similarly “subcc %r3, 1, %r0”, or “cmp %r3, 1”). The result in %r2 is 0x7FFFFFFF, the most positive of the representable signed integers; hence, overflow. That is to say, the resulting condition codes are: N = 0; Z = 0; V = 1; C = 0. Of course, 0x is less than 1; that’s why the branch condition for bl (branch less) is N  V. Either there was no overflow and the result was negative, or there was overflow and the result was not negative. (1 – 0x results in 0x and N = 1; Z = 0; V = 1; C = 1. When there is both a negative result and overflow, the minuend is greater than the subtrahend.) CSE360

Flow of Control 9 Setting Condition Codes (continued)
Synthetic instruction cmp %rs1, %rs2 Sets CCR, but doesn't modify any registers. Implemented as subcc %rs1, %rs2, %g0 Back to the factorial example (SPARC) set 1, %r1 ! %r1 = f = 1 set 2, %r2 ! %r2 = i = 2 set n, %r3 ! Get loc of n ld [%r3], %r3 ! Put n in %r3 loop: cmp %r2, %r3 ! Set CCR (i?n) bg done ! i > n done nop ! Branch delay umul %r1, %r2, %r1 ! f = f * i add %r2, 1, %r2 ! i = i + 1 ba loop ! Goto loop done: set f, %r3 ! Get loc of f st %r1, [%r3] ! f = n! For example, CMP %r1, %r2 ! Subcc %r1, %r2, %r0 %r1 %r2 N Z V C BE? BG? (Z  (N  V)) BLE? (Z  (N  V)) 1 Y CSE360

Flow of Control 10 Branch delay slots: unique to RISC architecture
Non-technical explanation: processor is running so fast, it can’t make a quick turn. Instruction following branch is always executed. Technical explanation: the efficiency advantage of pipelining is greater if the following instruction, which has almost completed execution, is allowed to complete. Compilers take advantage of branch delay slots by putting a useful instruction there if possible. For our purposes, use the nop (no operation) instruction to fill branch delay slots. Beware! Forgetting the nop will be a large source of errors in your programs! NOP is another synthetic, implemented as SETHI 0x0, %g0 However, the ISEM Reference Card lists it on p. 2, not among the Synthetic Instructions on p. 3. CSE360

High Level Control Structures 1
Converting high level control structures You get to be the “compiler”. Some compilers convert the source language (C, Pascal, Modula 2, etc.) into assembly language and then assemble the result to an object file. GNU C, C++ do this to GAS (Gnu Assembler). if-then-else, while-do, repeat-until are all possible to create in a structured way in assembly language. CSE360

General guidelines Break down into independent (or nested) logical units Convert to if/goto pseudo-code. Mechanical, step-by-step, non-creative process f=1 i=2 loop: if (i>n) goto done f = f*i i = i+1 goto loop done: ... f = 1; for (i=2; i<=n; i++) f = f * i; CSE360

init: set a, %r2 ! get &a into r2 ld [%r2], %r2 ! get a into r2 set b, %r3 ! get &b into r3 ld [%r3], %r3 ! get b into r3 if: cmp %r2, %r3 ! a ?? b (want >=) bge else ! a >= b, do then nop set d, %r5 ! get &d into r5 ld [%r5], %r5 ! get d into r5 add %r5, 1, %r4 ! r4 <- d+1 ba end nop else: set 7, %r4 ! get 7 into r4 end: set c, %r5 ! get &c into r5 st %r4, [%r5] ! c <- r4 if-then-else if (a<b) c = d + 1; else c = 7; if/goto if (a >= b) goto else c = d + 1 goto end else: c = 7 end: CSE360

init: set a, %r4 ! get &a into r4 ld [%r4], %r2 ! get a into r2 set b, %r3 ! get &b into r3 ld [%r3], %r3 ! get b into r3 whle: cmp %r2, %r3 ! a ?? b (want >=) bge done ! a >= b skip body nop body: add %r2, 1, %r2 ! r2 = a + 1 st %r2, [%r4] ! a = a + 1 ba whle ! repeat loop body done: set c, %r5 ! get &c into r5 ... while loops: while (a<b) a = a+1; c = d; if/goto: whle: if (a>=b) goto done body: a = a+1 goto whle done: c = d CSE360

repeat-until loops: repeat … until (a>b) if/goto: repeat: if (a<=b) goto repeat nop CSE360

Complex condition if((a<b)and(b>=c)) … if((a<b)or(b>=c)) … These can be combined and used in if/else or while loops. Short circuit evaluation: as soon as the result is known, stop doing comparisons! Try this, as an exercise in class. This is handy right now because the students are working on lab2. Assume ‘str’ is a pointer to a null terminated string in memory: While (*strptr!=0) { char=*strptr print char strptr ++; } 1) Write the if/goto code to output the string Strptr=str while: char=*strptr if char=0 then goto end strptr=strptr+1 go to while end: quit 2) Write the SPARC assembly code. Remember that ascii characters are only 1 byte. .data str: .asciz “invalid operator” .text set str, %r1 whle: ldub [%r1], %r8 cmp %r8, %r0 be end nop ta 1 add %r1, 1, %r1 ba whle end: ta 0 CSE360

Flow of Control 11 Optimizing code: change order of instructions, combine instructions, take advantage of branch delay slots. Factorial example again. (for i:=n downto 1 do…) Reduced 7 instructions in loop to just 4. (You gain no advantage if you optimize code in your labs.) set 1, %r ! %r1=f=1 set n, %r2 ! Get loc of n ld [%r2], %r2 ! Put n in %r2 loop: umul %r1, %r2, %r1 ! f=f*n subcc %r2, 1, %r2 ! Decrement n bg loop ! Repeat nop ! Branch delay set f, %r3 ! Get loc of f st %r1, [%r3] ! f=n! Optimizations: Eliminated i; replaced with n. Moved branch to end of loop, so instead of 2n branch instructions, we do n+1 branch instructions. Now would be a great time for “Tracing a Loop”. The following program uses instruction annulment (when the branch is not taken) to have only 3 instructions in the loop: set 1, %r1 ! %r1 = f = 1 set n, %r2 ! get loc of n ld [%r2], %r2 ! put n in %r2 cmp %r2, %r1 ! if n <= 1 then ble done ! we're done: f = 1 nop umul %r1, %r2, %r1 ! f = f * n loop: subcc %r2, 1, %r2 ! decrement n bg,a loop ! repeat umul %r1, %r2, %r1 ! f = f * n (delay) done: set f, %r2 ! get loc of f st %r1, [%r2] ! f = n! CSE360

Synthetic Instructions
Remember lab0? .data x_m: .word 0x42 y_m: .word 0x20 z_m: .word 0 .text start: set x_m, %r2 ld [%r2], %r2 set y_m,%r3 ld [%r3], %r3 and so on… Suppose you gave this command to ISEM (after loading): ISEM> dump start start a0 00 c Could you find the set instruction? How would you expect the SET instruction to be encoded? Suppose we needed 5 bits for choosing the operation (a fair assumption as we’ll see when we look at sethi). Rd= 5 bits Constant = (32 bits – 5) - 5 = 22 bits This implementation limits constants to 22 bits. These constants are often used as memory addresses; we would, therefore, be restricting memory directly accessible through labels to 222 bytes. Here is a good opportunity for a synthetic instruction. CSE360

Instruction Encodings 1
First, Instruction Encoding is how instructions are assembled All instructions must fit into 32 bits. Register-register: op=10, i=0 Register-immediate: op=10, i=1 Floating point: op=10, i=0 op, op2 op3, opf opcodes rd destination register rs1, rs2 source registers i immediate asi address space identifier (useful only for privileged instructions) simm13 signed immediate value 13 bits CSE360

Instruction Encodings 2
Call instructions: op=01 Branch instructions: op=00, op2=010 SETHI instructions: op=00, op2=100 Ex.: add %r2, %r3, %r4 in hexadecimal: a disp bit signed word displacement a nullification bit cond condition for conditional branch disp bit signed word displacement imm bit immediate value word displacement PC <- PC + 4*displacement In a source file, one could replace the instruction “add %r2, %r3, %r2” with the directive “.word 0x ” or “.word ” or “.word ”. CSE360

Understanding SET Synthetic
Usually used to put the value of an address in memory into a register. For example, set 0x4004, %r3 Can do neither ‘add %r0, 0x4004, %r3’ nor ‘or %r0, 0x4004, %r3’. Why not? SET is a synthetic instruction which may be implemented in two steps. #1 #2 0x4004 is > 0xfff = 0x x1 = 4095, which is the upper limit of siconst13. Machine language encoding for 'set 0x4004, %r3' CSE360

Decoding an Instruction
Instruction Group (bits 30:31) = 00 Destination Register (bits 25:29) = 00010 Op Code (bits 22:24) = 100 Constant (bits 0:21) = Meaning: sethi 0x10, %r2 %r2 < (0x4000) Ask: What is being done to %r2? Why? How many bits are required to express address 0x4000? CSE360

More Decoding CSE360 Binary Group O P Rd Rs1 Rs2 SICONST 84 10 A0 00
1000 0100 0001 0000 1010 C 07 00 00 10 84 10 a0 00 = GROUP OP RD RS SICONST x0000 or %r2, 0x0000, %r2 ! %r2 <-- 0x4000 c = GROUP OP RD RS1 ld [%r2+%r0], %r2 ! %r2 <-- 0x42 = GROUP OP RD SICONST x10 sethi 0x10, %r3 ! %r3 <-- 0x4000 86 10 e0 04 = x0004 or %r3, 0x4, %r3 ! %r3 <-- 0x4004 Now, encode these: smul %r2, %r3, %r2 > 0x st %r2, [%r1] ! Note: rs1 is actually encoded as [%r1+%r0] > 0x C subcc %r2, 0xFF, %r0 > 0x 80 A0 A0 FF 86 10 E0 04 CSE360

SET Synthetic Instruction
set iconst, rd sethi %hi(iconst), rd or rd, %lo(iconst), rd --or-- or %g0, iconst, rd Example constants for the three forms: 0x4004, 0x2060 0x4000, 0x2000 i, where -0x1000 ≤ i ≤ 0xfff CSE360

Bitwise Operations 1 Bit Manipulation Instructions
Bitwise logical operations and %rs1, %rs2, %rd … (32 bits) … or %rs1, %rs2, %rd xor %rs1, %rs2, %rd and %rs1, %rs2, %rd … (32 bits) … or %rs1, %rs2, %rd xor %rs1, %rs2, %rd CSE360

Bitwise Operations 2 andn %rs1, %rs2, %rd orn %rs1, %rs2, %rd
… (32 bits) … orn %rs1, %rs2, %rd not %rs, %rd Recall the cc operations, so andcc, orcc, etc. are available. (However, there is no notcc; use xnorcc.) andn %rs1, %rs2, %rd … (32 bits) … orn %rs1, %rs2, %rd not %rs, %rd (a synthetic instruction) There is no notcc synthetic (or actual) instruction, so use xnorcc, if necessary. CSE360

Bitwise Operations 3 For what kinds of things are these bit level operations used? Recall the synthetic operation clr, and mov. clr %r2 Þ or %r0, %r0, %r2 mov %r2, %r3 Þ or %r0, %r2, %r3 Masking operations: Want to select a bit or group of bits from a set of 32. E.g., convert lower (or upper) to upper case: ‘a’ in binary is ‘A’ in binary is All we need to do is “turn off” the bit in position 5. and %r1, 0b , %r1 will turn off that bit! What if we subtract 32 (0b100000) from %r1? What about converting upper to lower case? Notes: really only works if the character is already a letter of the alphabet. Subtracting 32 only works if value is already lowercase. Think about numbers: ASCII for ‘0’ is 0x30 or CSE360

Bitwise Operations 4 Bitwise shifting operations
Shift logical left: sll %rs1, %rs2, %rd %rs1: data to be shifted %rs2: shift count %rd: destination register E.g., set 0xABCD1234, %r2 sll %r2, 3, %r3 %r2: %r3: sll is equivalent to multiplying by a power of 2 (barring overflow). (In the decimal system, what’s a shortcut for multiplying by a power of ten?) Left shift (= 14) 1 bit (=28) 2 bits (=56) 3 bits (=112) ergo, left shift 3 bits = mul by 23 CSE360

Bitwise Operations 5 Shift Logical Right: srl %rs1, %rs2, %rd
Shifts right instead of left, inserting zeros. Arithmetic shifts: propagate the sign bit when shifting right, e.g., sra. (Left shift doesn't change.) Almost equivalent to dividing by a power of 2. Rotating shifts: Bits that would have gone into the bit bucket are shifted in instead. (E.g., rr, rl) Rotate not implemented in SPARC Almost equivalent to dividing by a power of 2 because, for negative values, sdiv rounds toward zero and sra rounds away from zero. Arithmetic and logical shift instructions can be used to set up masks for bit setting and clearing. Consider bset and bclr below. bit set would use ‘or’ bit clear would use ‘andn’ bset n, m, Rd set bits n thru m to 1 OR bclr n, m, Rd clear bits n thru m to 0 ANDN CSE360

More SPARC Assembly Language
Assembler directives Are not encoded as machine instructions Memory alignment: .align 4 Used when mixing allocations of bytes, words, halfwords, etc. and need word boundary alignment Reserve bytes of space: .skip 20 Useful for allocating large amounts of space (e.g., arrays) Create a symbolic constant: .set mask, 0x0f Can now use the word “mask” anywhere we could use the constant 0x0f previously All this is leading to additional addressing modes, which help us work with pointers, arrays, and records in assembly language. Is the assembler (gas, the GNU SPARC assembler) case sensitive? Yes. Do we know that all the space reserved by a .skip directive starts at a value of zero? Yes, at load time; beware, however, that a program can be re-run in isem without re-loading (by changing the pc). Now would be a great time for “A Closer Look at Symbolic Constants”. In connection with the .set directive, discuss “.set exit, 0” so that we may write “ta exit” rather than “ta 0”; also discuss assembler expressions, Lab Manual, p. 40. If you have time, discuss the fact that the siconst13 restriction cannot be enforced at assemble time for labels. The linker is then responsible for issuing a “Relocation Truncated to Fit” error message. If you have time, discuss the fact that 0xffffffff is a perfectly acceptable siconst13. If you have time, discuss the fact that when a register is interpreted as representing an integer, it could be interpreted either according to 32-bit two’s complement representation or according to 32-bit simple binary representation. These representations are what we mean by “signed” and “unsigned”, respectively. The operations add and sub are indifferent to which representation is intended, they work correctly for either. For multiplication and division of integers, we must choose between signed and unsigned. CSE360

Addressing Modes 1 Addressing Modes How do we specify operand values?
In a register, location is encoded in the instruction. As a constant, immediate value is in the instruction. In memory, operand is somewhere in memory, location may only be known at runtime. Memory operands: Effective address: actual location of operand in memory. This may be calculated implicitly (e.g., by a displacement in the instruction) or may be calculated by the programmer in code. CSE360

Addressing Modes 2 Summary of addressing modes: CSE360
Memory direct does not exist on SPARC. Emphasize, on SPARC, “memory indirect” does not exist either. CSE360

Addressing Modes 3 Memory Direct addressing Memory Indirect addressing
Entire address is in the instruction (not in SPARC). E.g., accumulator machine: each instruction had an opcode and a hard address in memory. Can’t be done on SPARC because an address is 32 bits, which is the length of an instruction. No room for opcodes, etc. Can be done in CISC because multi-word instructions are permitted. Memory Indirect addressing Pointer to operand is in memory. Instruction specifies location of pointer. Requires three memory fetches (one each for instruction, pointer, and data). Not in RISC machines because instruction is too slow; such an instruction would cause its own register interlock! CSE360

Addressing Modes 4 Register Indirect addressing
Register has address of operand (a pointer). Instruction specifies register number, effective address is contents of register. Ex.: .data n_m: .word ; initialize n to 5 .text set n_m, %r1 ; %r1 has n_m, pointer to n ld [%r1], %r3 ; fetch n into %r3 I’ve used n_m as the name of an address to express “variable n’s memory location”. Then, mw[n_m] could mean “the word in memory at address n_m”, and “n” could be its abbreviation. --Wayne CSE360

Addressing Modes 5 Ex.: sum up array of integers: n_m 5 a_m 4 a_m+4
.data n_m: .word 5 ! Size of array a_m: .word 4,2,5,8,3 ! 5 word array sum_m: .word 0 ! Sum of elements b_m: .skip 5*4 ! another 5 word array .text clr %r2 ! r2 will hold sum set n_m, %r3 ! r3 points to n ld [%r3], %r3 ! r3 gets array size set a_m, %r4 ! r4 points to array a loop: ld [%r4], %r5 ! Load element of a into r5 add %r5, %r2, %r2 ! sum = sum + element add %r4, 4, %r4 ! Incr ptr by word size subcc %r3, 1, %r3 ! Decrement counter bg loop ! Loop until count = 0 nop ! Branch delay slot set sum_m, %r1 ! r1 points to sum st %r2, [%r1] ! Store sum ta 0 ! done n_m a_m a_m+4 a_m+8 a_m+12 a_m+16 sum_m 5 4 2 5 8 3 r2 r3 r4 r5 loop loop+1 loop+2 loop+3 loop+4 5 4 3 2 1 a_m a_m+4 a_m+8 a_m+12 a_m+16 CSE360

Addressing Modes 6 C-style example of pointer data type char x; // object of type character char * ptr; // pointer to character type ptr = &x; // ptr has address of x (points to x) *ptr = ‘a’; // store ‘a’ at address in ptr Assembly language equivalent .data x_m: .byte ! reserve character space; x_m = &x; [x_m] = x .align ! align to word boundary ptr_m: .word ! pointer variable; [ptr_m] = ptr .text set x_m, %r1 ! get address x_m into %r1 set ptr_m, %r2 ! get address ptr_m into %r2 st %r1, [%r2] ! make [ptr_m] point to [x_m] set ’a’, %r3 ! put character ‘a’ into r3 set ptr_m, %r2 ! get address ptr_m into %r2 ld [%r2], %r1 ! get address [ptr_m], i.e. x_m, into %r1 stb %r3, [%r1] ! store ‘a’ at address [ptr_m], i.e., ptr x_m r1 r2 r3 ptr_m ‘a’ ‘a’ x_m, i.e., addr of x x_m: ptr_m: CSE360

Addressing Modes 7 Register Indexed addressing
Suitable for accessing successive elements of the same type in a data structure. Ex.: Swap elements A[i] and A[k] in array Effective address calculations! A A+4 A+8 A+12 r2 r3 r4 r7 r8 after sll 001 0010 A <- 100 1000 CSE360

Addressing Modes 8 Simulating Register Indirect addressing on SPARC
SPARC doesn't truly have register indirect addressing. We can write st %r2, [%r1] but assembler converts this automatically into st %r2, [%r1+%r0] Array mapping functions: used by compilers to determine addresses of array elements. Must know upper bound, lower bound, and size of elements of array. Total storage = (upper - lower + 1)*element_size Address offset for element at index k = (k - lower)*element_size Address (byte) offset for A[3] = (3-0)*4 = 12 This is for 1 dimensional arrays only! Up to now, all calculations assumed lower limit=0 consider a 10 element array where each element is 1 byte wide (good for character strings), starting address 1, so upper = 10: [1] = ‘N’ [2] = ‘o’ [3] = ‘w’ total storage = (upper-lower+1)*element size = (10-1+1) * 1= 10 bytes offset of 3rd char = (3-1)*1 =2 CSE360

Addressing Modes 9 1D array mapping functions: Want an array of n elements, each element is 4 bytes in size, array starts at address arr. Total storage is 4n bytes First element is at arr+0 Last element is at arr+4(n-1) kth (k can range from 0…n-1) element is at arr+4k. Array uses zero-based indexing. CSE360

Addressing Modes 10 2D array mapping functions: must linearize the 2D concept; e.g., map the 2D structure into 1D memory. Convert into 1D array in memory A good motivation for 2-dimensional arrays is an array of picture elements (pixels). CSE360

Addressing Modes 11 2 ways to convert to 1D
Row major order (Pascal, C, Modula-2) stores first by rows, then by columns. E.g., Column major order (FORTRAN) stores first by columns then by rows. E.g., Row major 2D array mapping function: Given an array starting at address arr that is x rows by y columns, each element is m bytes in size, and indices start at zero, then element (i, j) may be found at location: arr + (y ´ i + j) ´ m Effective address = start + (ncolumns*which row + which col) * element size CSE360

Addressing Modes 12 3D array mapping function: natural extension of 2D function. Store by row, then column, then depth. Array starting at arr with x rows, y columns, depth z, m element size. Element (i, j, k) is found at location: arr + (z´(y´i + j) + k)´m +1 +3 +5 +7 +9 +0 +2 +4 +6 +8 +10 +12 +14 +16 +18 1,0,0 This slide is changed to be consistent with the row major layout strategy for multidimensional arrays used in C and Pascal as discussed on p. 180 of Maccabe. The layout of indices would be as follows: <(0,0,0), (0,0,1), (0,1,0), (0,1,1), (0,2,0), (0,2,1), (0,3,0), (0,3,1), (0,4,0), (0,4,1),(1,0,0), …> Note especially the change to the array mapping function. The picture has not changed. Ergo, to find arr( 1,2,0) : arr + (2 * (5 * 1 + 2) +0) * m = 14 CSE360

Addressing Modes 13 CALCULATE: total storage offset for A(i,j,k)
address for A(i,j,k) 1D 2D 3D element size (#bytes) # rows (x) # cols (y) # depth (z) starting addr (0) i= j= k= element size (#bytes) 4 2 1 # rows (x) # cols (y) # depth (z) starting addr (0) I= j= k= total storage offset for A(i,j,k) address for A(i,j,k) Address Array Element Address Array Element 4 A( 0 ) 12 A (0, 0, 0) 8 A( 1 ) 13 0,0,1 12 A( 2 ) 14 0,1,0 16 : 15 0,1,1 Address Array Element 16 0,2,0 8 A (0, 0) 17 10 0,1 18 12 0,2 19 14 0,3 20 16 0,4 21 18 1,0 22 20 1,1 23 22 1,2 24 24 1,3 25 26 1,4 26 28 2,0 27 30 2,1 28 CSE360

Addressing Modes 14 CSE360 r1 r2 r4 r5 0 1, 2 0*3=0 [arr_m+4], +1 +1=1
! Example that adds 1 to every element of columns 1 and 2, not 0, of a 5 by 3 array .data .set rows, ! define symbolic constants .set cols, 3 arr_m: .skip rows * cols * 4 ! allocate space (.skip 60 same) .text ... set arr_m, %r3 ! get address of array clr %r1 ! %r1 is i (row) loop1: cmp %r1, rows ! done if i >= rows bge done nop set 1, %r2 ! %r2 is j (col); start at one (skip col zero) loop2: cmp %r2, cols ! if at last column, done with row bge inc1 nop umul %r1, cols, %r4 ! # elements to skip for current row add %r4, %r2, %r4 ! then which column being accessed umul %r4, 4, %r ! change from element to byte offset ld [%r3+%r4], %r5 ! get arr[i][j] add %r5, 1, %r ! add 1 to the element value st %r5, [%r3+%r4] ! store it back to arr[i][j] inc2: add %r2, 1, %r ! next column ba loop2 ! continue inner loop over columns nop inc1: inc %r ! next row ba loop1 ! continue outer loop over rows nop done: ... r1 r2 r4 r5 0 1, 2 0*3=0 [arr_m+4], +1 +1=1 *4=4 ~~~~~~~~~~~~ loop2 2, 3 8 [arr_m+8], +1 loop2 inc1 1 1, 2 16 [arr_m+16], +1 loop2 2, 3 20 [arr_m+20], +1 CSE360

Addressing Modes 15 Displacement Addressing
Suitable for accessing the individual fields of record data structures. Each field can be of a different type. Use .set directive to establish offsets to fields within records. Then use displacement addressing to access those fields. person+0 r1 <- person : r2 <- [person+age] =26 person+age [00] r2 <- r2 + 1 =27 person+21 [00] [person+age] <- r2 person+22 [00] person [26] person+dob CSE360

Addressing Modes 16 +0 N +4 +8 +C +10 -- +14 A +18 D
Ex.: Add 1 to the age field in a person record Problem: alignment in memory. May have to waste some space in the person record in order to have the integer fields align on a word boundary. .set filler, 0x12? +0 N +4 +8 +C +10 -- +14 A +18 D CSE360

Addressing Modes 17 Auto-increment and Auto-decrement addressing
SPARC does not support these modes. They may be simulated using register indirect addressing followed by an add or subtract of the size of the element on that register. Useful for traversing arrays forward (auto-increment) and backward (auto-decrement). Also useful for stacks and queues of data elements. Now would be a great time for “Data Structures”. CSE360

Subroutines 1 Subroutines and subroutine linkage Use of subroutines
Subroutines: programming mechanism to facilitate repeated computations and modularization. Use of subroutines Basis for structured and disciplined programming Compact code (no need to write monolithic loops) Relatively easy to debug (no cut-and-paste errors) Requires little hardware support, mostly protocols and conventions to handle parameters. An alternative requiring no hardware support is in-line subroutine translation: in-lining. CSE360

Subroutines 2 Terminology
Caller: the code (which could be a subroutine itself) which invokes the subroutine of interest Callee: the subroutine being invoked by the caller Function: subroutine that returns one or more values back to the caller and exactly one of these values is distinguished as the return value Return value: the distinguished value returned by a function CSE360

Subroutines 3 Terminology (continued)
Procedure: a subroutine that may return values to the caller (through the subroutine’s parameter(s)), but none of these values is distinguished as the return value Return address: address of the subroutine call instruction Parameters: information passed to/from a subroutine (a.k.a. arguments) Subroutine linkage: a protocol for passing parameters between the caller and the callee We used to say that the return address was the address of the instruction immediately following the subroutine call instruction, but, on SPARC, it is the address of the call instruction itself; at least that’s the value placed in register %r15. We can explain that, of course the program will return to an instruction following the call instruction, not to the call instruction itself. CSE360

Subroutines 4 Subroutine linkage Calling a subroutine
Assembly language syntax for calling a subroutine call label nop Must change the program counter (as in a branch instruction) however, we must also keep track of where to resume execution after the subroutine finishes. Call instruction handles this atomically (i.e., without interruption) by: %r15 ¬ #PC (PC ¬ #nPC) nPC ¬ label Returning from a subroutine Assembly language syntax for returning from a subroutine retl nop It is the nPC, not the PC that is changed. Why? Because the call instruction has a branch delay slot. Part of the execution of each instruction is: PC <- #nPC. CSE360

Subroutines 5 Returning from a subroutine (continued)
Again, must change the program counter to return to an instruction after the one that called the subroutine. The address of the instruction that called it was saved in %r15, and we must skip over the branch delay slot as well. So, this is accomplished by: nPC ¬ %r15+8 Parameter passing: 2 approaches Register based linkage: pass parameters solely through registers. Has the advantage of speed, but can only pass a few parameters, and it won’t support nested subroutine calls. Such a subroutine is called a leaf subroutine. Stack based linkage: pass parameters through the run-time stack. Not as fast, but can pass more parameters and have nested subroutine calls (including recursion). It is the nPC, not the PC that is changed. Why? Because the return from a subroutine has its own branch delay slot. Part of the execution of each instruction is: PC <- #nPC. Another name for register based linkage is optimized leaf procedure linkage. CSE360

Register-based Linkage 1
Subroutine linkage: Startup Sequence: load parameters and return address into registers, branch to subroutine. Prologue: if non-leaf procedure then save return address to memory, save registers used by callee. Epilogue: place return parameters into registers, restore registers saved in prologue, restore saved return address, return. Cleanup Sequence: work with returned values This slide presents an overview of subroutine linkage, applicable to both stack-based linkage and register-based linkage. The following questions pertain to register-based linkage: Prologue: But where in memory? Register-based linkage does not establish a protocol for doing this! So, a prologue is usually empty! An exception to this emptiness is when the author of a main program establishes a protocol for the program’s subroutines. Epilogue: Register-based linkage does not establish a protocol for restoring registers or a return address! So, an epilogue often merely places return parameters into registers. CSE360

Example: Print subroutine. .text main: set 1, %r1 ! Initialize r1 and r2 set 3, %r2 mov %r1, %r8 ! Print %r1 call print nop mov %r2, %r8 ! Print %r2 call print nop add %r1, %r2, %r8 ! Do our calculation call print ! Print the result (expect ‘4’) nop ta 0 print: set ‘0’, %r1 ! Ascii value of zero or %r8, %r1, %r2 ! Treat r8 as parameter mov %r2, %r8 ! Move into output register ta 1 ! Output character mov ‘\n’, %r8 ta 1 ! Output end of line (newline) retl ! Return nop What’s wrong with the above code? %r1 and %r2 are changed by the subroutine. See next slide for a solution to this problem. In response, “main” could avoid %r1 (use, say, %r3 instead), and print could avoid %r2 (using only %r1, %r8, and, possibly, %r9). Unfortunately for this example, the three offending lines of “print” could be replaced by “or %r8, ‘0’, %r8”. CSE360

Which registers can leaf subroutines change? Convention for optimized leaf procedures: The subroutine must not use the value in any other register except to save it to memory somewhere and restore it before returning to the caller. Problem: how can a subroutine call another subroutine? How can a subroutine call itself? %r2-%r7 are the caller’s registers; %r8-%r13 are the callee’s registers. You might want to go back to the previous slide with an overhead transparency marker, and fix that program according to these ideas. CSE360

Example: procedure to print linked list of ints. Terminology “head” is confusing (to me). The label should be “head_m” for “head memory location,” or something. object Node* head; head = %r8 = [head_m] A heap needs to be declared in the .data section: heap: .skip 4000 nop CSE360

Parameter Passing 1 Review of parameter passing mechanisms:
Pass by value copy: parameters to subroutine are copies upon which the subroutine acts. Pass by result copy: parameters are copies of results produced by the subroutine. Pass by reference copy: parameters to subroutine are (copies of) addresses of values upon which the subroutine acts. Callee is responsible for saving each result to memory at the location referred to by the appropriate parameter. Hybrid: some parameters passed by value copy, some by result copy, and/or some by reference copy. Callee is responsible for saving results for reference parameters. CSE360

Parameter Passing 2 Parameter passing notes:
Array or record parameters typically are passed by reference copy (efficiency reasons). Primitive data types may be passed either way. Conventions among languages allows any language to call functions in any other language: Pascal: VAR parameters are passed by reference copy; all others are passed by value copy. C: all parameters are passed by value copy. Must explicitly pass a pointer if you want a reference parameter. C++: like Pascal, can pass by value or reference copy. FORTRAN: all things passed by reference copy (even constants). ADA: pass by value/result copy. CSE360

Parameter Passing 3 .text ! Example 10.1 of Lab Manual
! pr_str – print a null terminated string ! Parameters: %r8 – pointer to string (initially) ! ! Temporaries: %r8 – the character to be printed ! %r9 – pointer to string pr_str: mov %r8, %r9 ! we need %r8 for the “ta 1” below pr_lp: ldub [%r9], %r8 ! load character cmp %r8, ! check for null be pr_dn nop ta ! print character ba pr_lp inc %r ! increment the pointer (in ! branch delay slot) pr_dn: retl What kind of parameter is r8? Well, the pointer is passed by value copy, and the string is passed by reference copy! CSE360

Parameter Passing 4 Summary from text (p. 220)
Pass by value copy: For small “in” parameters. Subroutines cannot alter the originals whose copies are passed as parameters. Pass by value/result copy: For small “in/out” parameters. Caller’s cleanup sequence stores values of any “in/out” parameters. Pass by reference copy: for “in/out” parameters of all sizes, and large “in” parameters. “Out” values are provided by changing memory at those addresses. (Note: pass by reference copy is passing an address by value copy.) CSE360

Parameter Passing 5 Write Sparc code for the caller and callee for the following subroutine using register based parameter passing ! global_function Integer subchr (A, B, C) ! Substitutes character C for each B in string [A], ! and returns count of changes. ! ! // In comments, "[A+index]" is denoted by "ch". ! index = 0 ! count = 0 ! LOOP: if [A+index]=0 go to END // while (ch != 0) { ! if [A+index]B go to INC // if (ch == B) { ! [A+index]=C // ch = C; ! count=count // count++; } ! INC: index=index // index++; ! go to LOOP // } ! END: “[A+index]” could also be spelled “*(A+index)”, “m[A+index]”, “M[A+index]”, “mem[A+index]”, or “Mem[A+index]”. .data ! data section C_m: .byte ’I’ ! parameter C B_m : .byte ’i’ ! parameter B A_m : .asciz "i will tip“ ! parameter A .align 4 R_m: .word 0 ! for storing result count .text Main: set A_m, %r8 set B_m, %r1 ldub [%r1], %r9 set C_m, %r1 ldub [%r1], %r10 call subchr nop set R_m, %r1 st %r8, [%r1] ta 0 subchr: ! subchr (A, B, C) ! Substitutes character C for all B in string [A], ! and returns count of changes. ! ! // In comments, “[A+index]” is denoted by “ch”. clr %r11 ! Index =0 clr %r12 ! Count=0 Loop: ldub [%r8+%r11], %r13! get [A+index] cmp %r13, %r0 ! While ch!=0 { be end cmp %r13, %r9 ! If ch=B { bne inc stb %r10, [%r8+%r11]! ch=C; inc %r12 ! Count ++ inc: ba loop ! } inc %r11 ! Index++ end: mov %r12, %r8 retl Assume .data ! data section C_m: .byte ’I’ ! parameter C B_m: .byte ’i’ ! parameter B A_m: .asciz "i will tip" ! parameter A .align 4 R_m: .word 0 ! for storing result count CSE360

Stack-based Linkage 1 Stack based linkage Advantages Disadvantages
Permits subroutines to call others. Allows a larger number of parameters to be passed. Permits records and arrays to be passed by value copy. Saving of registers by callee is “built-in”. A way for callee to reserve memory for other uses is “built-in”, too. Disadvantages Slower than register based More complex protocol Why a stack? Subroutine calls and returns happen in a last-in first-out order (LIFO). Also known as a runtime stack, parameter stack, or subroutine stack. CSE360

Stack-based Linkage 2 Items “saved” on the stack in one activation record Parameters to the subroutine Old values of registers used in the subroutine Local memory variables used in subroutine Return value and return address Say A() calls B(), B() calls C(), and C() calls A() CSE360

Stack-based Linkage 3 Stack based linkage parameter passing convention
Startup sequence: Push parameters Push space for return value Prologue Push registers that are changed (including return address) Allocate space for local variables Epilogue Restore general purpose registers Free local variable space Use return address to return Cleanup Sequence Pop and save returned values Pop parameters “Push”: changing the stack pointer and copying into memory in the stack. “Pop”: copying from the stack and changing the stack pointer. The Body is responsible for storing any return value and "out" parameters into memory, perhaps on the stack. The return value would be stored on the stack, of course. Any value/result parameters would be stored on the stack. "Out" parameters passed by reference would be stored in memory that is not part of the stack. CSE360

Stack-based Linkage 4 Stack based parameter passing example:
Register %r14 º %sp º stack pointer Invariant: Always indicates the top of the stack (it has the address in memory of the last item on stack, usually a word). Moved when items are “pushed” onto the stack. Due to interruptions (system interrupts (I/O) and exceptions), values stored above %sp (at addresses less than %sp) can change at any time! Hence, any access above %sp is unsafe! Register %r30 º %fp º frame pointer Indicates the previous stack pointer. Activation record is from (some subroutine-specific number of words before) the %fp to the %sp. Invariant: %fp is constant within a subroutine (after prologue). Assembly language programmer creates own stack in memory and manages it. The names “%sp” and “%fp” are part of the assembler’s support for stack-based linkage. CSE360

Stack-based Linkage 5 Stack based parameter passing example:
Want to implement the following subroutine (also a caller): ! global_function Integer subchr (A, B, C) ! Substitutes character C for all B in string A, ! and returns count of changes. ! ! // In comments, "*(A+index)" is denoted by "ch". ! index = 0 ! count = 0 ! LOOP: if *(A+index)=0 go to END // while (ch != 0) { ! if *(A+index)B go to INC // if (ch == B) { ! *(A+index)=C // ch = C; ! count=count // count++; } ! INC: index=index // index++; ! go to LOOP // } ! END: .data ! data section C_m: .byte ’I’ ! parameter C B_m: .byte ’i’ ! parameter B A_m: .asciz "i will tip" ! parameter A .align 4 R_m: .word 0 ! for storing result count CSE360

Stack-based Linkage 6 stack: %sp -> Return value %fp -> addr (a)
.data ! data section C_m: .word ’I’ ! parameter C B_m: .word ’i’ ! parameter B A_m: .asciz "i will tip" ! parameter A .align ! align to word address stack: .skip 250*4 ! allocate 250 word stack bstak: ! point to bottom of stack R_m: .word 0 ! reserve for count .text ! Program’s one-time initialization start: set bstak, %sp ! set initial stack ptr mov %sp, %fp ! set initial frame ptr ! STARTUP SEQUENCE to call subchr() sub %sp, 16, %sp ! move stack ptr set A_m, %r1 ! A is passed by reference st %r1, [%sp+4] ! push address on stack set B_m, %r1 ! B is passed by value ld [%r1], %r1 ! get value of B st %r1, [%sp+8] ! push parameter B on stack set C_m, %r1 ! C is passed by value ld [%r1], %r1 ! get value of C st %r1, [%sp+12] ! push parameter C on stack ! SUBROUTINE CALL call subchr ! make subroutine call nop ! branch delay slot ! CLEANUP SEQUENCE ld [%sp], %r1 ! pop return value off stack add %sp, 16, %sp ! pop stack set R_m, %r2 ! get address of R st %r1, [%r2] ! store R ! the rest of the program stack: %sp -> %fp -> Return value addr (a) b c So, where did 16 come from? 3 parameters; each is 4 bytes; plus space for return value (4 bytes). Note: pushed arguments on stack so that first argument is at “top” of stack. This allows argument lists with varying numbers of arguments. CSE360

Stack-based Linkage 7 ... %r9 %r8 %sp -> %fp -> return addr
! SUBROUTINE PROLOGUE subchr: sub %sp, 32, %sp ! open 8 words on stack st %fp, [%sp+28] ! Save old frame pointer add %sp, 32, %fp ! old sp is new fp st %r15, [%fp-8] ! save return address st %r8, [%fp-12] ! Save gen. Register … ! Save r9-r13, omitted ! SUBROUTINE BODY ld_reg: ld [%fp+4], %r8 ! “pop” (load) addr of A ld [%fp+8], %r9 ! “pop” (load) value of B ld [%fp+12], %r10 ! “pop” (load) value of C clr %r12 ! count clr %r13 ! index loop: ldub [%r8+%r13], %r11 ! load a string chr cmp %r11, 0x0 ! is chr=null? be done ! then go to done cmp %r11, %r9 ! is chr<>B? (branch delay) bne inc ! then go to inc nop ! branch delay slot stb %r10, [%r8+%r13] ! change chr to C add %r12, 1, %r12 ! increment count inc: add %r13, 1, %r13 ! increment index ba loop ! do next chr done: st %r12, [%fp+0] ! “push” (store) count on stack ! EPILOGUE … ! Restore r9-r13, omitted ld [%fp-12], %r8 ! Restore r8 ld [%fp-8], %r15 ! get saved return address ld [%fp-4], %fp ! Get old value of frame ptr add %sp, 32, %sp ! Restore stack pointer retl ! return to caller c b addr (a) %sp -> %fp -> return addr old frame ptr Return value ... %r9 %r8 Discuss: what if subroutine calls another routine? Or itself? Caller must (even if a subroutine itself): push params push space for rtn vals Call subroutine then pop (and save) return vals pop parameters Callee must: allocate space for (push) local variables push local use registers do its stuff pop and restore local use registers pop and restore return address free (pop) local var space return Try implementing TRAV (the subroutine that prints a list of ints in a linked list, from register-based linkage) caller and callee as a stack based linkage routine: .data Stack .skip 250*4 Bstak: .word 0 .set dta, 0 .set ptr, 4 Head: .word 0 .text set bstak, %sp mov %sp, %fp … ! Some bunch of instructions to allocate list startup: sub %sp, 4, %sp set head, %r1 ld [%r1], %r9 st %r9, [%sp+0] call trav nop cleanup: add %sp,4,%sp ta 0 trav: prolog: sub %sp, 32, %sp st %fp, [%sp+12] add %sp, 16, %fp st %r15, [%fp-8] st %r8, [%fp-12] st %r9, [%fp-16] st %r10, [%fp-20] st %r11, [%fp-24] st %r12, [%fp-28] st %r13, [%fp-32] body: ld [%fp+0], %r9 Loop: cmp %r9, 0 be epilogue ld [%r9+dta], %r8 ta 4 ld [%r9+ptr], %r9 ba loop epilog: ld [%fp-32], %r13 ld [%fp-28], %r12 ld [%fp-24], %r11 ld [%fp-20], %r10 ld [%fp-16], %r9 ld [%fp-12], %r8 ld [%fp-8], %r15 ld [%fp-4], %fp add %sp, 32, %sp retl CSE360

Stack-based Linkage 8 General Guidelines
Keep Startups, Cleanups, Prologues, and Epilogues standard (but not necessarily identical); easy to cut, paste, and modify. Caller: leave space for return value on the TOP of the stack. Callee: always save and restore locally used registers. Pass data structures and arrays by reference, all others by value (efficiency). Now would be a great time for “Stack-Based Subroutine Exercise” and “Recursive Subroutine Trace”. Try implementing TRAV (the subroutine that prints a list of ints in a linked list, from register-based linkage) as a recursive stack based linkage routine: trav: prolog: sub %sp, 32, %sp st %fp, [%sp+12] add %sp, 16, %fp st %r15, [%fp-8] st %r8, [%fp-12] st %r9, [%fp-16] st %r10, [%fp-20] st %r11, [%fp-24] st %r12, [%fp-28] st %r13, [%fp-32] body: ld [%fp+0], %r9 cmp %r9, 0 be epilogue nop ld [%r9+dta], %r8 ta 4 ld [%r9+ptr], %r9 startup: sub %sp, 4, %sp st %r9, [%sp+0] call trav cleanup: add %sp,4,%sp epilog: ld [%fp-32], %r13 ld [%fp-28], %r12 ld [%fp-24], %r11 ld [%fp-20], %r10 ld [%fp-16], %r9 ld [%fp-12], %r8 ld [%fp-8], %r15 ld [%fp-4], %fp add %sp, 32, %sp retl CSE360

Our Fourth Example Architecture
Motorola M68HC11 Called “HC11” for short Used in ECE 567, a course required of CSE majors References: Data Acquisition and Process Control with the M68HC11 Microcontroller, 2nd Ed., by F. F. Driscoll, R. F. Coughlin, and R. S. Villanucci, Prentice-Hall, 2000. is the M68HC11 Processor Manual from Motorola. CSE360

Another Reference Late in an academic term (such as now), you can hope to access on-line lecture notes from the Electrical and Computer Engineering course, ECE 265. Visit Under “Academic Program”, click on the link “ECE Course Listings”. Find 265 and click on the link “Syllabus of this quarter”. CSE360

HC11 compared with Sparc (1)
CISC RISC, Load/Store Instruction encoding lengths vary (8 to 32 bits) Instruction encoding lengths constant (32 bits) About 316 instructions About 175 instructions 4 16-bit user registers, one of which is divided into two 8-bit registers 32 32-bit user integer registers If you don’t count immediate value addressing mode instructions as separate from register direct addressing mode instructions, then Sparc has only 103 instructions. If you further remove the 12 Alternate (address) space instructions, you are left with only 91 instructions. CSE360

8-bit data bus 32-bit data bus 16-bit address bus 32-bit address bus 8-bit addressable Instruction execution not overlapped Instruction execution overlapped in a pipeline Sparc can address 6,553,500% more memory than the HC11, 65,536 times as much memory. CSE360

A Strange Fact: The HC11 architecture “allows accessing an operand from an external memory location with no execution-time penalty.” [p. 27, M68HC11 Processor Manual, Reason: The HC11 requirements state that the CPU cycle must be kept long enough to accommodate a memory access within one cycle. This seeming miracle is accomplished by keeping processor speed slow enough. The internal processor of the M68HC11 microcontroller works with a relatively slow clock (for example 2MHz, not 2GHz). Thus, each execution cycle is 0.5 µs. This is long enough to read a byte from a standard memory having a few tens or hundreds ns latency. CSE360

HC11 Programmer’s Model (1)
7 7 Accumulator A Accumulator B Accumulator D 15 X Index Register Y Index Register Stack Pointer (SP) Program Counter (PC) CSE360

HC11 Programmer’s Model (2)
Condition Code Register (CCR) 7 6 5 4 3 2 1 S X H I N Z V C Carry/Borrow Overflow Zero Negative I Interrupt Mask Half-Carry X Interrupt Mask Stop CSE360

HC11 Assembly Language Format (1)
Like Sparc, it is line-oriented. A line may: Be blank (containing no printable characters), Be a comment line, the first printable character being either a semicolon (‘;’) or an asterisk (‘*’), or Have the following format (“[] means an optional field”): [Label] Operation [Operand field] [Comment field] CSE360

Label: begins in column 1, ending either with a space or a colon (‘:’) Contains 1 to 15 characters Case sensitive The first character may not be a decimal digit (0-9) Characters may be upper- or lowercase letter, digits 0-9, period (‘.’), dollar sign (‘$’), or underscore (‘_’) CSE360

Operation: Cannot begin in column 1 Contains: Instruction mnemonic, Assembler directive, or Macro call (we haven’t studied macro expansion in this course) Operand field: Terminated by a space or tab character, So multiple operands are separated by commas (‘,’) without using any spaces or tabs CSE360

Comment field: Begins with the first space character following the operand field (or following the operation, if there is no operand field) So no special printable character is required to begin a comment field But it appears to be conventional to begin a comment field with a semicolon (‘;’) CSE360

Prefixes for Numeric Constants
Encoding HC11 Sparc Decimal No symbol Hexadecimal $ 0x Octal @ Binary % 0b CSE360

Assembler Directives (1)
Meaning HC11 Sparc Set location counter (origin) ORG .data or .text End of source END Doesn’t have Equate symbol to a value EQU .set Form constant byte FCB .byte CSE360

Assembler Directives (2)
Meaning HC11 Sparc Form double byte FDB .half Form character string constant FCC .ascii Reserve memory byte or bytes RMB .skip CSE360

HC11 Addressing Modes Immediate (IMM) Extended (EXT) Direct (DIR)
Inherent (INH) Relative (REL) Indexed (INDX, INDY) CSE360

Immediate (IMM) Assembler interprets the # symbol to mean the immediate addressing mode Examples LDAA #10 LDAA #$1C LDAA LDAA #%11100 LDAA #’C’ LDAA #LABEL LDAA means “Load accumulator A” CSE360

Extended (EXT) Lack of # symbol indicates extended or direct addressing mode. These are forms of memory direct addressing, like SAM. “Extended” means full 16-bit address, whereas “Direct” means directly to a low address, specified using only the least significant 8 bits of the address. Examples LDAA $2025 LDAA LABEL CSE360

Direct (DIR) Examples LDAA $C2 LDAA LABEL CSE360

Inherent (INH) All operands are implicit (i.e., inherent in the instruction) Examples: ABA, SBA, DAA ABA means add the contents of register B to the contents of A, placing the sum in A (A + B  A) SBA means A – B  A DAA means to adjust the sum that got placed in A by the previous instruction to the correct BCD result; e.g., $09 + $26 yields $2F in A, then DAA changes this to $35. $09 encodes, in BCD, 9, and $26 encodes, in BCD, 26. The correct sum of the sources (when the sources are both interpreted according to BCD encoding) is 35, which would be represented, in BCD, as $35. However, the ABA instruction always works according to simple binary arithmetic rules, setting condition code bits H, Z, and C accordingly. (It sets N and V according to 8-bit 2's complement rules.) So, it left the $2F result in A and turned bit H on. The DAA instruction, because H is on, changes $2F to $35. CSE360

Relative (REL) Used only for branch instructions
Relative to the address of the following instruction (the new value of the PC) Signed offset from -128 to +127 bytes Examples BGE -18 BHS 27 BGT LABEL BHS means “branch if higher or same”, for unsigned comparisons CSE360

Indexed (INDX, INDY) Uses the contents of either the X or Y register and adds it to a (positive, unsigned) offset contained in the instruction to calculate the effective address Example LDAA 4,X CSE360

Interrupts When an interrupt is acknowledged, the CPU’s hardware saves the registers’ contents on the stack. An interrupt service routine ends with a(n) RTI instruction. This instruction automatically restores the CPU register values from the copies on the stack. CSE360

Condition Code Register (CCR)
It’s reasonably safe to say that every instruction that changes a register (A, B, D, X, Y, SP) affects the CCR appropriately. Unlike Sparc, there are no arithmetic instructions that do not set condition codes. There do exist instructions that compare a register to a memory location by subtracting the memory contents from the register and throwing the result away, but setting the CCR (CMPA, CMPB, CPD, CPX, CPY). CSE360

Example HC11 Program Problem: Produce the following waveforms on the three least significant bits (LSBs) of parallel 8-bit output Port B (mapped to $1004), where we name the bits X, Y, and Z in increasing order of significance (X is bit 0; Y is bit 1; Z is bit 2). 10 ms X 20 ms Y 15 ms Z CSE360

Example Source File, p. 1 STACK: EQU $00FF ; set stack pointer
PORTB: EQU $ ; set address of Port B ORG 0 DELAY1: FCB ; set the waveform times DELAY2: FCB ; for X, Y, and Z DELAY3: FCB CSE360

Example Source File, p. 2 ORG $E000 ; program starts at $E000
MAIN: LDS #STACK ; initialize stack pointer L0: LDAA # ; set X on Port B to 1 STAA PORTB LDAB DELAY1 ; delay for 10 ms L1: JSR DELAY_1MS DECB BNE L1 “set X on Port B to 1” really means “on Port B, set X to 1, reset Z to 0, keep Y at 0, and keep the other five bits at 0”. CSE360

Example Source File, p. 3 LDAA #%00000010 ; set Y on Port B to 1
STAA PORTB LDAB DELAY ; delay for 20 ms L2: JSR DELAY_1MS DECB BNE L2 LDAA #% ; set Z on Port B to 1 LDAB DELAY ; delay for 15 ms L3: JSR DELAY_1MS BNE L3 BRA L ; continue to cycle “set Y on Port B to 1” really means “on Port B, set Y to 1, reset X to 0, keep Z at 0, and keep the other five bits at 0”. “set Z on Port B to 1” really means “on Port B, set Z to 1, reset Y to 0, keep X at 0, and keep the other five bits at 0”. CSE360

Example Source File, p. 4 DELAY_1MS: PSHB ; subr. to delay for 1 ms
LDAB #198 DELAY: DECB BRN DELAY NOP BNE DELAY PULB RETURN: RTS ORG $FFFE ; initialize reset vector RESET: FDB MAIN END BRN is “branch never”! It takes 3 cycles to execute. NOP takes 2 cycles to execute. CSE360

Traps and Exceptions 1 Traps, Exceptions, and Extended Operations
Other side of low level programming -- the interface between applications and peripherals OS provides access and protocols CSE360

Traps and Exceptions 2 BIOS: Basic Input/Output System
Subroutines that control I/O No need for you to write them as application programmer OS interfaces application with BIOS through traps (extended operations (XOPs)) CSE360

Traps and Exceptions 3 Where are OS traps kept? Two approaches:
Transient monitor: traps kept in a library that is copied into the application at link-time Resident monitor: always keep OS in main memory; applications share the trap routines. OS routines monitor devices. Frequently used routines kept resident; others loaded as needed. CSE360

Traps and Exceptions 4 (Assuming a res. monitor) How to find I/O routines? Store routines in memory, and make a call to a hard address. E.g., call 256 When new OS is released, need to recompile all application programs to use different addresses. Use a dispatcher Dispatcher is a subroutine that takes a parameter (the trap number). Dispatcher knows where all routines actually are in memory, and makes the branch for you. Dispatcher subroutine must always exist in the same location. 2 CSE360

Traps and Exceptions 5 Use vectored linking
Branch table exists at a well known location. The address of each trap subroutine is stored in the table, indexed by the trap number. On RISC, usually about 4 words reserved in the table. If the trap routine is larger than 4 words, can call the actual routine. Contents of branch table may contain a few instructions, including a branch CSE360

Traps and Exceptions 6 Levels of privilege Exceptions
Supervisor mode - can access every resource User mode - limited access to resources OS routines operate in supervisor mode, access is determined by bit in PSW (processor status word). XOP (book’s notation) can always be executed, sets privilege to supervisor mode (ta) RTX (book’s notation) can only be executed by the OS, and returns privilege to user mode (rett) Exceptions Caused by invalid use of resource. E.g., divide by zero, invalid address, illegal operation, protection violation, etc. ta = trap always; XOP = extended operation PSW (pg 416) contains: implementation & version CCR processor & co-processor processor interrupt supervisor mode etc. XOP procedure (vector of routines or addresses) find entry save PC and PSW modify PC and PSW execute OS routine restore PC & PSW There is no branch delay slot associated with either the “ta” instruction or the “rett” instruction. Because they change privilege between user and supervisor mode, it makes sense that the pipeline would always be broken at these points. Try to explain the distinction between trusted and untrusted code. If it is ever possible to execute untrusted code, even a single instruction, in privileged mode then system security can be violated. Trusted does not imply perfect, of course. CSE360

Traps and Exceptions 7 Trap example: non-blocking read ta 3
Control transferred automatically to exception handler routine. Similar to trap or XOP transfer. Exceptions vs. XOPs XOPs explicit in code, exceptions are implicit XOPs service request and return to application; exceptions print message and abort (unless masked). Trap example: non-blocking read ta 3 If there is nothing in the keyboard buffer, return with a message that nothing is there. Otherwise, put the character into register 8. CSE360

Traps and Exceptions 8 Status of the keyboard is kept in a memory location, as is the (one-character) keyboard buffer. Memory mapped devices. On SPARC, trap table has 256 entries are reserved for exceptions and external interrupts are used for XOPs. Trap table begins at address 0x Each entry is 4 instructions (16 bytes) long. This “keyboard buffer” should be understood to be a single character (the single-character receive register of a UART). Because this is memory-mapped I/O, “ld [%r1], %r8” has the special side effect of “changing the keyboard status to not ready” (clearing the receive register full bit (bit 0) of the keyboard UART’s status register). Here, %r1 is standing in place of a local register (%l0-%l7); SPARC uses register windows and assumes local registers are available. Similarly, %r8 is standing in place of %i0, which, upon return, is %o0 = %r8. CSE360

Traps and Exceptions 9 Trap execution: ta 3
Calculate trap address: 3 * x0800 = 16 * (3 + 0x080) Save nPC and PSW to memory SPARC uses register windows Assumes local registers are available Set privilege level to supervisor mode Update PC with trap address (and make nPC = PC + 4) (jumps to trap table) Trap table has instruction ba ta3_handler rett Restores PC (from saved nPC value) and PSW (resets to user mode) Returns to application program CSE360

Programmed I/O 1 Programmed I/O Early approach: Isolated I/O
Special instructions to do input and output, using two operands: a register and an I/O address. CPU puts device address on address bus, and issues an I/O instruction to load from or store to the device. Programmed I/O versus Interrupt-driven I/O Isolated I/O versus Memory-mapped I/O CSE360

Programmed I/O 2 Isolated I/O CSE360

Memory Mapped I/O No special I/O instructions. Treat the I/O device like a memory address. Hardware checks to see if the memory address is in the I/O device range, and makes the adjustment. Use high addresses (not “real” memory) for I/O memory maps. E.g., 0xFFFF0000 through 0xFFFFFFFF. addr bus data bus Memory read/write CPU I/O CSE360

Programmed I/O 3 Advantages of each
Memory mapped: reduced instruction set, reduced redundancy in hardware. Isolated: don’t have to give up memory address space on machines with little memory CSE360

Programmed I/O - UARTs UARTs Keyboard UART
Universal Asynchronous Receiver Transmitter Asynchronous = not on the same clock. Handshake coordinates communication between two devices. A kind of programmed I/O. 1 0 CPU . Keyboard serial UART parallel CSE360

UARTs 1 UART registers Control: set up at init, speed, parity, etc.
Status: transmit empty, receive ready, etc. Transmit: output data Receive: input data All four needed for bi-directional communications, Status/control, transmit / receive often combined. Why? Control bus Address bus Control Reg Transmit Logic Status Reg Transmit Reg Receive Logic Receive Reg Transmit register has the next character to send to the peripheral device; receive register has the most recent character received from the peripheral device. The CPU can only read from the status/receive registers; it can only write to the control/transmit registers. The following could be arranged: write/put: address 0 means control; address 1 means transmit. read/get: address 0 means status; address 1 means receive. Data bus CSE360

UARTs 2 Memory mapped UARTs CPU
FFFF 0000 FFFF 0004 FFFF 0008 FFFF 000C FFFF 0010 FFFF 0014 FFFF 0018 FFFF 001C Memory mapped UARTs Both memory and I/O “listen” to the address bus. The appropriate device will act based on the addresses. Keyboards and Printers require three addresses (when addresses are not combined). Modems require four. (why?) UART 1 data UART 1 status UART 1 control UART 2 xmit UART 2 recv UART 2 status UART 2 control UART 3 xmit All UARTs would require initialization, so all UARTs need to have an address for the control register. Address bus and so on Control bus CPU Memory UART1 UART2 Data bus CSE360

Programmed I/O 4 Programmed I/O Characteristics:
Used to determine if device is ready (can it be read or written). Each device has a status register in addition to the data register. Like previous trap example, must check status before getting data. Involves polling loops. CSE360

Programmed I/O – Polling
Ex.: ta 2 handler (blocking keyboard input) Can’t afford to wait like this. Computer is millions of times faster than a typist. Also, multi-tasking operating systems can’t wait. Special purpose computers can wait. E.g., microwave oven controllers. Must have a better way! Interrupts are the answer! Are you ready?... Are you ready now?... How about NOW?... Nope .. Not yet.. Hang on.. CSE360

Interrupts and DMA transfers 1
Programmed (polled) I/O used busy waiting. Advantages: simpler hardware Disadvantages: wastes time Interrupts (IRQs on PCs) I/O device “requests” service from CPU. CPU can execute program code until interrupted. Solves busy waiting problems. Interrupt handlers are run (like traps) whenever an interrupt occurs. Current application program is suspended. CSE360

Servicing an interrupt I/O controller generates interrupt, sets request line “high”. CPU detects interrupt at beginning of fetch/execute cycle (for interrupts “between” instructions). CPU saves state of running program, invokes intrpt. handler. Handler services request; sets the request line “low”. Control is returned to the application program. Application Program : *Interrupt Detected* Interrupt Handler Service Request : Clear CSE360

Changes to fetch/execute cycle Problems Requires additional hardware in Timing & Control. Queuing of interrupts Interrupting an interrupt handler (solution: priorities and maskable interrupts) Interrupts that must be serviced within an instruction How to find address of interrupt handler Interrupt Pending? Y N Save PC Save PSW PSW=new PSW PC=handler_addr PC -> bus load MAR INC to PC load PC CSE360

Example: interrupt driven string output Want to print a string without busy waiting. Want to return to the application as fast as possible I’m ready! device is faster than writes to memory? CSE360 I’m ready! monitor mem cpu Ta 6

Trap handler implementation
Install trap handler into trap table Buffer is like circular queue only outputs, at most, one character disp_buf: .skip 256 ! buffers string to print disp_frnt: .byte 0 ! offset to front of queue disp_bck: .byte 0 ! offset to back of queue ta_6_handler: ! Copy str from mem[%r8] to mem[disp_buf+disp_bck] ! Disp_back = (disp_back+len(str)) mod 256 ! If display is ready ! If first char is not null, then output it ! Disp_frnt = (disp_frnt+1) mod 256 rett ! Return from trap Disp_buf: disp_frnt Oldest byte Undisplayed byte newest byte disp_bck  CSE360

Interrupt handler implementation
This too outputs only one character at most, but when display becomes ready again, it generates another interrupt which invokes this routine! display_IRQ_handler: ! Save any registers used ! If disp_frnt != disp_bck (queue is not empty) ! Get char at mem[disp_frnt] ! If char is not null, then output it ! Disp_frnt = (disp_frnt+1) mod 256 ! Restore registers and set the request line “low” rett ! Return from trap Uses the UART for transmission. I’m ready! CPU Memory CSE360

Problems with interrupt driven I/O CPU is involved with each interrupt Each interrupt corresponds to transfer of a single byte Lots of overhead for large amounts of data (blocks of 512 bytes) Execute 10s or 100s of instructions per byte Memory CPU Device Controller Transfer one word of data Interrupt Transfer one byte of data CSE360

DMA (Direct Memory Access) Want I/O without CPU intervention Want larger than one byte data transfers Solution: add a new device that can talk to both I/O devices and memory without the CPU; a “specialized” CPU strictly for data transfers. CPU Memory Device Controller DMA Controller CSE360

Steps to a DMA transfer CPU specifies a memory address, the operation (read/write), byte count, and disk block location to the DMA controller (or specify other I/O device). DMA controller initiates the I/O, and transfers the data to/from memory directly DMA controller interrupts the CPU when the entire block transfer is completed. Problem Conflicts accessing memory. Can either arbitrate access or get a more expensive dual ported memory system. CSE360

CSE 360: Introduction to Computer Systems

Similar presentations

Presentation on theme: "CSE 360: Introduction to Computer Systems"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CSE 360: Introduction to Computer Systems

Similar presentations

Presentation on theme: "CSE 360: Introduction to Computer Systems"— Presentation transcript:

Similar presentations

About project

Feedback