Download presentation
Presentation is loading. Please wait.
1
CS 152 Computer Architecture and Engineering
Lecture Midterm II Review Session John Lazzaro (not a prof - “John” is always OK) Greet class TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/ Play:
2
Today - Midterm II Review Session
Study Tips HW 2, problem by problem (if there is time) HKN
3
1 25 2 3 4 CS152 Midterm II May 1st, 2014 # Name: SSID: Signature:
Points 1 25 2 3 4 Tot 100 Name: SSID: “All the work is my own. I have no prior knowledge of the exam contents, aside from guidance from class staff. I will not share the contents with others in CS152 who have not taken it yet.” Signature: Please write clearly, and put your name on each page. Please abide by word limits. Good luck! Eric Love John Lazzaro
4
What does it cover? Lectures 9 onward
Focus will be on problems that require you to do a task (write a small program, trace through execution ,etc) that demonstrates that you understand a concept. [...] No transistor-level questions (DRAM and SRAM cells, etc) Time for a quick walk-through ...
5
CS 152 Computer Architecture and Engineering
Lecture 9 -- Memory John Lazzaro (not a prof - “John” is always OK) Greet class TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/
6
(tester found good bits in bigger array)
Latency is not the same as bandwidth! Thus, push to faster DRAM interfaces What if we want all of the bits? In row access time (55 ns) we can do 22 transfers at 400 MT/s. 16-bit chip bus -> 22 x 16 = 352 bits << 16384 1 of 8192 decoder 13-bit row address input Now the row access time looks fast! 16384 columns 8192 rows usable bits (tester found good bits in bigger array) 16384 bits delivered by sense amps Select requested bits, send off the chip
7
CS 152 Computer Architecture and Engineering
Lecture Cache I John Lazzaro (not a prof - “John” is always OK) Greet class TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/
8
Read latency: Time to return first byte of a random access
Latency: A closer look Read latency: Time to return first byte of a random access Reg L1 Inst L1 Data L2 DRAM Disk Size 1K 64K 32K 512K 256M 80G Latency (cycles) 1 3 11 160 1E+07 (sec) 0.6n 1.9n 6.9n 100n 12.5m Hz 1.6G 533M 145M 10M 80 Architect’s latency toolkit: (1) Parallelism. Request data from N 1-bit-wide memories at the same time. Overlaps latency cost for all N bits. Provides N times the bandwidth. Requests to N memory banks (interleaving) have potential of N times the bandwidth. (2) Pipeline memory. If memory has N cycles of latency, issue a request each cycle, receive it N cycles later.
9
CS 152 Computer Architecture and Engineering
Lecture Cache II John Lazzaro (not a prof - “John” is always OK) Greet class TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/
10
Issue #4: When to write to lower level ...
Write-Through Write-Back Policy Data written to cache block also written to lower-level memory Write data only to the cache Update lower level when a block falls out of the cache Do read misses produce writes? No Yes Do repeated writes make it to lower level? Related issue: Do writes to blocks not in the cache get put in the cache (”write-allocate”) or not?
11
CS 152 Computer Architecture and Engineering
Lecture Virtual Memory John Lazzaro (not a prof - “John” is always OK) Greet class TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/
12
The TLB caches page table entries
In this example, physical and virtual pages must be the same size! TLB Page Table 2 1 3 virtual address page off frame 5 physical address TLB caches page table entries. Physical frame address for ASID V=0 pages either reside on disk or have not yet been allocated. OS handles V=0 “Page fault” MIPS handles TLB misses in software (random replacement). Other machines use hardware.
13
CS 152 Computer Architecture and Engineering
Lecture 13 - Synchronization John Lazzaro (not a prof - “John” is always OK) Greet class TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/
14
Non-blocking consumer synchronization
Another atomic read-modify-write instruction: Compare&Swap(Rt,Rs, m) if (Rt == M[m]) then M[m] = Rs; Rs = Rt; /* do swap */ else /* do not swap */ Assuming sequential consistency: MEMBARs not shown ... try: LW R3, head(R0) ; Load queue head into R3 spin: LW R4, tail(R0) ; Load queue tail into R4 BEQ R4, R3, spin ; If queue empty, wait LW R5, 0(R3) ; Read x from queue into R5 ADDI R6, R3, 4 ; Shift head by one word Compare&Swap R3, R6, head(R0); Try to update head BNE R3, R6, try ; If not success, try again If R3 != R6, another thread got here first, so we must try again. If thread swaps out before Compare&Swap, no latency problem; this code only “holds” the lock for one instruction!
15
CS 152 Computer Architecture and Engineering
Lecture 14 - Cache Design and Coherence John Lazzaro (not a prof - “John” is always OK) Greet class TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/
16
Writes from 10,000 feet ... for write-thru L1
1. Writing CPU takes control of bus. For write-thru caches ... CPU0 CPU1 2. Address to be written is invalidated in all other caches. Cache Snooper Cache Snooper Memory bus Reads will no longer hit in cache and get stale data. Shared Main Memory Hierarchy 3. Write is sent to main memory. To a first-order, reads will “just work” if write-thru caches implement this policy. Reads will cache miss, retrieve new value from main memory A “two-state” protocol (cache lines are “valid” or “invalid”).
17
CS 152 Computer Architecture and Engineering
Lecture Advanced CPUs John Lazzaro (not a prof - “John” is always OK) Greet class TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/
18
Split pipelines: a write-after-write hazard.
Solution: SUB detects R1 clash in decode stage and stalls, via a pipe-write scoreboard. WAW Hazard DIV R1, R2, R3 SUB R1, R2, R3 If long latency DIV and short latency SUB are sent to parallel pipes, SUB may finish first. The pipeline splits after the RF stage, feeding functional units with different latencies.
19
Instruction Issue Logic
IF (Fetch) ID (Decode) EX (ALU) MEM WB Superscalar R machine IR IR IR IR Instruction Issue Logic R R rd1 RegFile rd2 WE1 wd1 rs1 rs2 ws1 WE2 rd3 rd4 rs3 rs4 wd2 ws2 A Y IR Addr Data Instr Mem 64 32 PC and Sequencer B A Y Y B M IR IR IR IR IF (Fetch) ID (Decode) EX (ALU) MEM WB
20
CS 152 Computer Architecture and Engineering
Lecture Networks, Routers, Google John Lazzaro (not a prof - “John” is always OK) Greet class TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/
21
6 key parameters scale across dimension of
“by one server”, “by 80-server rack” and “by array” To get more DRAM and disk capacity, you must work on a scale larger than a single server. But as you do, latency and bandwidth degrade, because network performance << a server bus, and because array network is under-provisioned. Exception: disk latency is roughly scale-independent. you must work on a scale larger than a single server.
22
CS 152 Computer Architecture and Engineering
Lecture Dynamic Scheduling I Thanks to Krste Asanovic ... John Lazzaro (not a prof - “John” is always OK) Greet class TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/
23
Given an endless supply of registers ...
Rename “architected registers” (Ri, Fi) to new “physical registers” (PRi, PFi) on each write. ADDI R1,R0,64 F4,0(R1) ADDI PR01,PR00,64 LD PF00 0(PR01) ADDD PF04, PF00, PF02 SD PF04, 0(PR01) SUBI PR11, PR01, 8 BEQZ PR11 ENDLOOP ITER2: LD PF10 0(PR11) ADDD PF14, PF10, PF02 SD PF14, 0(PR11) SUBI PR21, PR11, 8 BEQZ PR21 ENDLOOP ITER3: LD PF20 O(PR21) [...] R1→ PR01 F0→ PF00 What was gained? An instruction may execute once all of its source registers have been written.
24
CS 152 Computer Architecture and Engineering
Lecture Dynamic Scheduling II John Lazzaro (not a prof - “John” is always OK) Greet class TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/
25
Rename stage close-up:
(1) Allocates new physical registers for destinations, (2) Looks up physical register numbers for sources, (3) Handle rename dependences within the 4 issuing instructions in one clock cycle! For mis-speculation recovery Time-stamped. Output: 12 physical registers numbers: 1 destination and 2 sources for the 4 instructions to be issued. Input: 4 instructions specifying architected registers.
26
CS 152 Computer Architecture and Engineering
Lecture Dynamic Scheduling III John Lazzaro (not a prof - “John” is always OK) Greet class TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/
27
Micro-op translation example ...
ADC m32, r32: // for a simple m32 address mode Becomes: LD T1 0(EBX); // EBX register point to m32 ADD T1, T1, CF; // CF is carry flag from EFLAGS ADD T1, T1, r32; // Add the specified register ST 0(EBX) T1; // Store result back to m32 Instruction traces of IA-32 programs show most executed instructions require 4 or fewer micro-ops. Translation for these ops are cast into logic gates, often over several pipeline cycles.
28
CS 152 Computer Architecture and Engineering
Lecture Dataflow John Lazzaro (not a prof - “John” is always OK) Greet class TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/
29
Dataflow stages of 21264 Idea: Write dataflow programs that reference physical registers, to execute on this machine. Input: Instructions that reference physical registers. Scoreboard: Tracks writes to physical registers.
30
CS 152 Computer Architecture and Engineering
Lecture GPU + SIMD + Vectors I John Lazzaro (not a prof - “John” is always OK) Greet class TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/
31
Pure data move opcode. Or, part of a math opcode.
32
CS 152 Computer Architecture and Engineering
Lecture GPU + SIMD + Vectors II John Lazzaro (not a prof - “John” is always OK) Greet class TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/
33
Assume MacBook Air ... 1386 x 768 screen ...
We are all zoomed in on Google Maps Top pyramid image is 4K x 4K ... Idea: Keep only a 1386 x 768 window of top images in RAM ... Lets us cache a 1024 x 1024 window of the 11 PB Earth map in 34.7 MB!
34
Zoom all the way in ... Bottom stack image shows the smallest part of the 1 mile sq. patch of the Earth of any stack image. units of sq. miles units of pixels Graphics hardware displays bottom stack image, which fills MacBook Air display. units of miles Hardware interpolation of stack levels.
35
CS 152 Computer Architecture and Engineering
Lecture Voxel Processing John Lazzaro (not a prof - “John” is always OK) Greet class TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/
36
A 3-D matrix of cubes, in object space (X,Y,Z).
After processing ... Interesting to computer architects because n^3 grows so quickly! A 3-D matrix of cubes, in object space (X,Y,Z). 8-bit density value stored for each cube (0 = “air”). 256^3 = 16 MB = 10 inch cube (for 1mm voxels) 0.125 mm voxels? 8 GB
37
CS 152 Computer Architecture and Engineering
Lecture Digital Imaging John Lazzaro (not a prof - “John” is always OK) Greet class TA: Eric Love www-inst.eecs.berkeley.edu/~cs152/
38
Camera interface to the outside world
Simple Power Hookup 8-bit Dout Port @ 15 fps 1280 x 1024 54 MHz Clk @ 30 fps 640 x 512 YCrCb 4:2:2 Serial port to control the camera.
39
AWARE-2: Array of 98 phone camera modules (14 M-pixel) 1.3 G-pixel camera @ 3 frames/sec
40
On Thursday Mid-term II ... Ground rules ...
41
Mid-term: How to do well ...
Problem intro often features a lecture slide. If you have to teach yourself that slide during the test, you’re starting out behind. Getting the problem correct requires thinking on your feet to do a new design or analyze one given to you. There will not be “you can only get it if do the reading” problems ... but the reading helps you understand how to think through the problem.
42
Mid-term: There may be math ...
No memorization: If we ask about Amdahl’s Law, we will show its definition lecture slide. Understanding is needed: A problem may require you to apply equation to a design, etc. Cannot use electronic devices ... more administrative info after we do some content. You may need to do: simple algebra and calculus, add a few numbers by hand, etc.
43
When is it? Where is it? Ground rules.
9:30 AM sharp, Tuesday May 1st, 306 Soda. Every-other-seat seating, except for the front rows, where every-seat is permitted. No blue-books needed. We will be handing out a paper test. Pencil is preferred. Pencils 10:55 AM, so we can collect papers before next class comes in.
44
When is it? Where is it? Ground rules.
No use of calculators, smartphones, laptops, etc ... during the exam. Closed-book, closed-notes. Just pencils, erasers. No consulting with students. Restroom breaks are OK, but you’ll still need to hand in your 10:55. Questions are reserved for serious concerns about a bug in the question.
45
Today - Midterm II Review Session
Study Tips HW 2, problem by problem (if there is time) HKN
46
On Thursday Mid-term II ... See you there !
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.