Download presentation
Presentation is loading. Please wait.
Published byMagdalena Blažková Modified over 5 years ago
1
CDA 3101 Introduction to Computer Organization
Course Requirements Grading Criteria Course Policies Textbook Course Outline CDA 3101 – Fall Copyright © Prabhat Mishra
2
Instructor: Prabhat Mishra, Ph.D.
Professor, CISE Room: CSE 568 Phone: Office hours: Wednesday 12:00 – 2:00pm Brief History Since 2004 at University of Florida. Held positions in Texas Instruments, Intel, Motorola, Synopsys, and Sasken Published seven books and 150+ research articles
3
Teaching Assistants Zhixin Pan Daniel Volya Miranda Overstreet
2nd year Ph.D. Student Office hours: Monday 1:55 – 3:00 Room: CSE 309 Daniel Volya 1st year Ph.D. Student Office hours: Tue and Wed 5:10 – 6:00 Miranda Overstreet Senior Undergraduate Office hours: Monday 12:50 – 1:40
4
Teaching Assistants Audrey Servilio Office hours: Tuesday 12:50 – 1:40
Senior Undergraduate Office hours: Tuesday 12:50 – 1:40 Room: CSE 309 Nathan Edgar Office hours: Wednesday 10:40 – 11:30
5
Lectures and Discussions
Course Page Visit regularly for updates and announcements Lectures Mon/Wed/Fri :30 – 9:20 am CAR 100 Discussions Section Thu 9:35 – 10:25 am CSE E221 Section 29C7 Thu 10:40 – 11:30 am CSE E221 Section 29C8 Thu 11:45 – 12:35 pm CSE E221 Section 29DA Thu 12:50 – 1:40 pm CSE E221 Section Thu 1:55 – 2:45 pm CSE E221 Section Thu 3:00 – 3:50 pm CSE E221 Section Thu 4:05 – 4:55 pm CSE E220 Please be on time
6
Course Prerequisites Required
CIS 3020 (Introduction to CIS) or CIS (Programming Fundamentals for CIS Majors 2) MAC 2233 (Survey of Calculus 1) MAC 2311 (Analytic Geometry and Calculus 1) or MAC 3472 (Honors Calculus 1) Good programming skills using C, C++ or Java Necessary to do well in course projects Please check with the instructor if you do not have the required pre-requisites
7
When to consider dropping it?
You are not interested in learning or reading book! Plan to collaborate even in individual assignments You do not have the required background required courses and programming skills The name “software” or “hardware” scares you Need a specific grade in this course to graduate but not willing to put the required effort Please meet me if your major is in pink/red category Perfect Fit: Computer Science, Computer Engineering, Electrical Eng. Needs Work: Civil, Mechanical, Chemical, Biomedical, Eng. Studies Too Far: Arts, Philosophy, Political Science, Linguistics, Business, Chemistry, Statistics, Mathematics, etc.
8
Grading Criteria Two Exams Five Homeworks – 25%
Quantitative and analytical questions Two Projects – 10% Programming in ARM assembly Developing an ARM simulator (500+ lines of code) Participation and Quizzes – 5% Attend (actively participate) in classes and discussions Two Exams Midterm – 25% Comprehensive Final – 35% Grading will be based on your overall score A (91 – 100), A- (86 – 90), B+ (81 – 85), B (76 – 80), B- (72 – 75), C+ (68 – 71), C (64 – 67), C- (60 – 63), D+ (56 – 59), D (52 – 55), D- (48 – 51), E (0 – 47)
9
How do I get an “A” grade? Make sure your overall score is 91+
How to get more than 91%? Make sure you have the required background Attend every lecture and actively participate in discussions Do not forget to submit the homeworks and projects on time Come prepared for the next lecture Make sure you understood the previous lectures completely Read the book along with the lectures and solve exercises Book has extra material, more examples and lots of exercises Use both TA and instructor office hours effectively Use office hour for technical discussions (not for “re-grading only”) Solve homeworks and projects on your own Lectures + HWs + Projects Read Book + Understand Examples Exercises + Discussions+ … A B+ B A- B- C+ C
10
Course Policies No grades for late submissions.
Complete homeworks/projects on your own. “Zero Tolerance” policy towards cheating Exams are closed book/notes. Hand-written note: one side (midterm) and both sides (final) Calculator Re-grading requests within a week. One week from when it is available. Attendance, Cell Phones, … Course webpage has details
11
Required Textbook FREE ebook
UF Libraries Textbook Sustainability Project Computer Organization and Design - The Hardware / Software Interface David Patterson, John L. Hennessy ARM Edition Morgan Kaufmann Publishers, 2017. ISBN:
12
Tentative Schedule Assigned Due Homework 1: Sep 12 Sep 18 (11:30 PM)
Project 1: Sep Oct 09 (11:30 PM) Homework 3: Oct Oct 16 (11:30 PM) Homework 4: Oct Oct 23 (11:30 PM) Midterm: Oct :30 – 9:20 am in CAR 100 Homework 5: Oct Nov 06 (11:30 PM) Project 2: Nov Nov 20 (11:30 PM) Final: Dec :30 – 2:30 pm in CAR 100
13
Course Outline Computer Abstractions and Technology
Digital Logic and Number Representation Instructions: Language of the Computer Arithmetic for Computers The Processor Memory Hierarchy Parallel Processors from Client to Cloud Processor Validation and Verification Security Attacks and Countermeasures
14
Introduction This course is all about how computers work
But what do we mean by a computer? Different types: desktop, servers, embedded devices Different uses: automobiles, graphics, finance, genomics… Different manufacturers: ARM, Intel, Apple, IBM, Different underlying technologies and different costs! Analogy: Consider a course on “automotive vehicles” Many similarities from vehicle to vehicle (e.g., wheels) Huge differences from vehicle to vehicle (e.g., gas vs. electric) Best way to learn: Focus on a specific instance and learn how it works While learning general principles and historical perspectives
15
Morgan Kaufmann Publishers
February 7, 2020 The PostPC Era Chapter 1 — Computer Abstractions and Technology
16
Morgan Kaufmann Publishers
February 7, 2020 What You Will Learn How programs are translated into the machine language and how the hardware executes them The hardware/software interface What determines program performance and how it can be improved How hardware designers improve performance What is parallel processing Chapter 1 — Computer Abstractions and Technology
17
Design Complexity Exponential Growth – doubling of transistors every couple of years
18
Moore’s Law In 1965, Gordon Moore predicted that
The number of transistors that can be integrated on a die would double every 18 to 24 months i.e., grow exponentially with time. Amazingly visionary – million transistor/chip barrier was crossed in the 1980’s. 2300 transistors, 1 MHz clock (Intel 4004) 16 Million transistors (Ultra Sparc III) 42 Million transistors, 2 GHz clock (Intel Xeon) – 2001 55 Million transistors, 3 GHz, 130nm technology, 250mm2 die (Intel Pentium 4) 140 Million transistor (HP PA-8500) Tbyte = 2^40 bytes (or 10^12 bytes) Note that Moore’s law is not about speed predictions but about chip complexity
19
Technology and Demand #of transistors are doubling every 2 years
Communication, multimedia, entertainment, networking Exponential growth of design complexity verification complexity
20
Who wants to be a Millionaire
You double your investment everyday Starting investment - one cent. How long it takes to become a millionaire 20 days 27 days 37 days 365 days Lifetime ++
21
Who wants to be a Millionaire
You double your investment everyday Starting investment - one cent. How long it takes to become a millionaire 20 days One million cents 27 days Millionaire 37 days Billionaire Doubling transistors every 18 months This growth rate is hard to imagine, most people underestimate Believe it or not Each of us have more than a million ancestors in last 20 generations.
22
Computer Market Desktop Server Embedded Systems
Driven by price-performance $ $10,000 [$100 - $1000 per processor] Server Throughput, availability, scalability $10K - $10M [$200 - $2000 per processor] Embedded Systems Application specific Low cost, low power, real-time performance $10 - $100,000 [$ $200 per processor]
23
An Example Embedded System
Digital Camera Block Diagram
24
Components of Embedded Systems
Controllers Memory Interface Software (Application Programs) Processor Coprocessors ASIC Converters Analog Digital Analog
25
Microprocessor Performance Trends
Relative to VAX-11/780 using SpecInt Benchmarks Slowdown due to limits of power and available ILP Due to advances in architecture Due to technological advances
27
Power and Energy P E t In many cases, faster execution means less energy, but the opposite may be true if power has to be increased to allow faster execution.
28
Reducing Energy Consumption
Pentium Crusoe Running the same multimedia application. [ Infrared Cameras (FLIR) can be used to detect thermal distribution.
29
Understanding Performance
Morgan Kaufmann Publishers February 7, 2020 Understanding Performance Algorithm Determines number of operations executed Programming language, compiler, architecture Determine number of machine instructions executed per operation Processor and memory system Determine how fast instructions are executed I/O system (including OS) Determines how fast I/O operations are executed Chapter 1 — Computer Abstractions and Technology
30
Morgan Kaufmann Publishers
February 7, 2020 Below Your Program Application software Written in high-level language System software Compiler: translates HLL code to machine code Operating System: service code Handling input/output Managing memory and storage Scheduling tasks & sharing resources Hardware Processor, memory, I/O controllers §1.3 Below Your Program Chapter 1 — Computer Abstractions and Technology
31
Morgan Kaufmann Publishers
February 7, 2020 Levels of Program Code High-level language Level of abstraction closer to problem domain Provides for productivity and portability Assembly language Textual representation of instructions Hardware representation Binary digits (bits) Encoded instructions and data Chapter 1 — Computer Abstractions and Technology
32
Components of a Computer
Morgan Kaufmann Publishers February 7, 2020 Components of a Computer §1.4 Under the Covers Same components for all kinds of computer Desktop, server, embedded Input/output includes User-interface devices Display, keyboard, mouse Storage devices Hard disk, CD/DVD, flash Network adapters For communicating with other computers The BIG Picture Chapter 1 — Computer Abstractions and Technology
33
Morgan Kaufmann Publishers
February 7, 2020 Touchscreen PostPC device Supersedes keyboard and mouse Resistive and Capacitive types Most tablets, smart phones use capacitive Capacitive allows multiple touches simultaneously Chapter 1 — Computer Abstractions and Technology
34
Through the Looking Glass
Morgan Kaufmann Publishers February 7, 2020 Through the Looking Glass LCD screen: picture elements (pixels) Mirrors content of frame buffer memory Chapter 1 — Computer Abstractions and Technology
35
Morgan Kaufmann Publishers
February 7, 2020 Opening the Box Capacitive multitouch LCD screen 3.8 V, 25 Watt-hour battery Computer board Chapter 1 — Computer Abstractions and Technology
36
Inside the Processor (CPU)
Morgan Kaufmann Publishers February 7, 2020 Inside the Processor (CPU) Datapath: performs operations on data Control: sequences datapath, memory, ... Cache memory Small fast SRAM memory for immediate access to data Chapter 1 — Computer Abstractions and Technology
37
Morgan Kaufmann Publishers
February 7, 2020 Inside the Processor Apple A5 Chapter 1 — Computer Abstractions and Technology
38
Morgan Kaufmann Publishers
February 7, 2020 Abstractions The BIG Picture Abstraction helps us deal with complexity Hide lower-level detail Instruction set architecture (ISA) The hardware/software interface Application binary interface The ISA plus system software interface Implementation The details underlying and interface Chapter 1 — Computer Abstractions and Technology
39
Morgan Kaufmann Publishers
February 7, 2020 A Safe Place for Data Volatile main memory Loses instructions and data when power off Non-volatile secondary memory Magnetic disk Flash memory Optical disk (CDROM, DVD) Chapter 1 — Computer Abstractions and Technology
40
Morgan Kaufmann Publishers
February 7, 2020 Networks Communication, resource sharing, nonlocal access Local area network (LAN): Ethernet Wide area network (WAN): the Internet Wireless network: WiFi, Bluetooth Chapter 1 — Computer Abstractions and Technology
41
Morgan Kaufmann Publishers
February 7, 2020 Technology Trends Electronics technology continues to evolve Increased capacity and performance Reduced cost §1.5 Technologies for Building Processors and Memory DRAM capacity Year Technology Relative performance/cost 1951 Vacuum tube 1 1965 Transistor 35 1975 Integrated circuit (IC) 900 1995 Very large scale IC (VLSI) 2,400,000 2013 Ultra large scale IC 250,000,000,000 Chapter 1 — Computer Abstractions and Technology
42
Semiconductor Technology
Silicon: semiconductor Add materials to transform properties: Conductors Insulators Switch
43
Morgan Kaufmann Publishers
February 7, 2020 Manufacturing ICs Yield: proportion of working dies per wafer Chapter 1 — Computer Abstractions and Technology
44
Morgan Kaufmann Publishers
February 7, 2020 Intel Core i7 Wafer 300mm wafer, 280 chips, 32nm technology Each chip is 20.7 x 10.5 mm Chapter 1 — Computer Abstractions and Technology
45
Intel Pentium 4 Microprocessor
46
Integrated Circuit Cost
Morgan Kaufmann Publishers February 7, 2020 Integrated Circuit Cost Nonlinear relation to area and defect rate Wafer cost and area are fixed Defect rate determined by manufacturing process Die area determined by architecture and circuit design Chapter 1 — Computer Abstractions and Technology
47
Morgan Kaufmann Publishers
February 7, 2020 Defining Performance §1.6 Performance Which airplane has the best performance? Chapter 1 — Computer Abstractions and Technology
48
Response Time and Throughput
Morgan Kaufmann Publishers February 7, 2020 Response Time and Throughput Response time How long it takes to do a task Throughput Total work done per unit time e.g., tasks/transactions/… per hour How are response time and throughput affected by Replacing the processor with a faster version? Adding more processors? We’ll focus on response time for now… Chapter 1 — Computer Abstractions and Technology
49
Morgan Kaufmann Publishers
February 7, 2020 Relative Performance Define Performance = 1/Execution Time “X is n time faster than Y” Example: time taken to run a program 10s on A, 15s on B Execution TimeB / Execution TimeA = 15s / 10s = 1.5 So A is 1.5 times faster than B Chapter 1 — Computer Abstractions and Technology
50
Measuring Execution Time
Morgan Kaufmann Publishers February 7, 2020 Measuring Execution Time Elapsed time Total response time, including all aspects Processing, I/O, OS overhead, idle time Determines system performance CPU time Time spent processing a given job Discounts I/O time, other jobs’ shares Comprises user CPU time and system CPU time Different programs are affected differently by CPU and system performance Chapter 1 — Computer Abstractions and Technology
51
Morgan Kaufmann Publishers
February 7, 2020 CPU Clocking Operation of digital hardware governed by a constant-rate clock Clock period Clock (cycles) Data transfer and computation Update state Clock period: duration of a clock cycle e.g., 250ps = 0.25ns = 250×10–12s Clock frequency (rate): cycles per second e.g., 4.0GHz = 4000MHz = 4.0×109Hz Chapter 1 — Computer Abstractions and Technology
52
Morgan Kaufmann Publishers
February 7, 2020 CPU Time Performance improved by Reducing number of clock cycles Increasing clock rate Hardware designer must often trade off clock rate against cycle count Chapter 1 — Computer Abstractions and Technology
53
Morgan Kaufmann Publishers
February 7, 2020 CPU Time Example Computer A: 2GHz clock, 10s CPU time Designing Computer B Aim for 6s CPU time Can do faster clock, but causes 1.2 × clock cycles How fast must Computer B clock be? Chapter 1 — Computer Abstractions and Technology
54
Instruction Count and CPI
Morgan Kaufmann Publishers February 7, 2020 Instruction Count and CPI Instruction Count for a program Determined by program, ISA and compiler Average cycles per instruction Determined by CPU hardware If different instructions have different CPI Average CPI affected by instruction mix Chapter 1 — Computer Abstractions and Technology
55
Morgan Kaufmann Publishers
February 7, 2020 CPI Example Computer A: Cycle Time = 250ps, CPI = 2.0 Computer B: Cycle Time = 500ps, CPI = 1.2 Same ISA Which is faster, and by how much? A is faster… …by this much Chapter 1 — Computer Abstractions and Technology
56
Morgan Kaufmann Publishers
February 7, 2020 CPI in More Detail If different instruction classes take different numbers of cycles Weighted average CPI Relative frequency Chapter 1 — Computer Abstractions and Technology
57
Morgan Kaufmann Publishers
February 7, 2020 CPI Example Alternative compiled code sequences using instructions in classes A, B, C Class A B C CPI for class 1 2 3 IC in sequence 1 IC in sequence 2 4 Sequence 1: IC = 5 Clock Cycles = 2×1 + 1×2 + 2×3 = 10 Avg. CPI = 10/5 = 2.0 Sequence 2: IC = 6 Clock Cycles = 4×1 + 1×2 + 1×3 = 9 Avg. CPI = 9/6 = 1.5 Chapter 1 — Computer Abstractions and Technology
58
Morgan Kaufmann Publishers
February 7, 2020 Performance Summary The BIG Picture Performance depends on Algorithm: affects IC, possibly CPI Programming language: affects IC, CPI Compiler: affects IC, CPI Instruction set architecture: affects IC, CPI, Tc Chapter 1 — Computer Abstractions and Technology
59
Morgan Kaufmann Publishers
February 7, 2020 Power Trends §1.7 The Power Wall In CMOS IC technology ×30 5V → 1V ×1000 Chapter 1 — Computer Abstractions and Technology
60
Morgan Kaufmann Publishers
February 7, 2020 Reducing Power Suppose a new CPU has 85% of capacitive load of old CPU 15% voltage and 15% frequency reduction The power wall We can’t reduce voltage further We can’t remove more heat How else can we improve performance? Chapter 1 — Computer Abstractions and Technology
61
Uniprocessor Performance
Morgan Kaufmann Publishers February 7, 2020 Uniprocessor Performance Constrained by power, instruction-level parallelism, memory latency Chapter 1 — Computer Abstractions and Technology
62
Morgan Kaufmann Publishers
February 7, 2020 Multiprocessors Multicore microprocessors More than one processor per chip Requires explicitly parallel programming Compare with instruction level parallelism Hardware executes multiple instructions at once Hidden from the programmer Hard to do Programming for performance Load balancing Optimizing communication and synchronization Chapter 1 — Computer Abstractions and Technology
63
Morgan Kaufmann Publishers
February 7, 2020 SPEC CPU Benchmark Programs used to measure performance Supposedly typical of actual workload Standard Performance Evaluation Corp (SPEC) Develops benchmarks for CPU, I/O, Web, … SPEC CPU2006 Elapsed time to execute a selection of programs Negligible I/O, so focuses on CPU performance Normalize relative to reference machine Summarize as geometric mean of performance ratios CINT2006 (integer) and CFP2006 (floating-point) Chapter 1 — Computer Abstractions and Technology
64
Morgan Kaufmann Publishers
February 7, 2020 CINT2006 for Intel Core i7 920 Chapter 1 — Computer Abstractions and Technology
65
Morgan Kaufmann Publishers
February 7, 2020 SPEC Power Benchmark Power consumption of server at different workload levels Performance: ssj_ops/sec Power: Watts (Joules/sec) Chapter 1 — Computer Abstractions and Technology
66
SPECpower_ssj2008 for Xeon X5650
Morgan Kaufmann Publishers February 7, 2020 SPECpower_ssj2008 for Xeon X5650 Chapter 1 — Computer Abstractions and Technology
67
Morgan Kaufmann Publishers
February 7, 2020 Pitfall: Amdahl’s Law Improving an aspect of a computer and expecting a proportional improvement in overall performance §1.10 Fallacies and Pitfalls Example: multiply accounts for 80s/100s How much improvement in multiply performance to get 5× overall? Can’t be done! Corollary: make the common case fast Chapter 1 — Computer Abstractions and Technology
68
Fallacy: Low Power at Idle
Morgan Kaufmann Publishers February 7, 2020 Fallacy: Low Power at Idle Look back at i7 power benchmark At 100% load: 258W At 50% load: 170W (66%) At 10% load: 121W (47%) Google data center Mostly operates at 10% – 50% load At 100% load less than 1% of the time Consider designing processors to make power proportional to load Chapter 1 — Computer Abstractions and Technology
69
Pitfall: MIPS as a Performance Metric
Morgan Kaufmann Publishers February 7, 2020 Pitfall: MIPS as a Performance Metric MIPS: Millions of Instructions Per Second Doesn’t account for Differences in ISAs between computers Differences in complexity between instructions CPI varies between programs on a given CPU Chapter 1 — Computer Abstractions and Technology
70
Abstraction Layers in Modern Systems
Application Algorithm Original domain of the computer architect (‘50s-’80s) Programming Language Operating System/Virtual Machines Domain of recent computer architecture (‘90s) Instruction Set Architecture (ISA) Microarchitecture Gates/Register-Transfer Level (RTL) Circuits Devices Physics
71
Example Machine Organization
Let’s start with a simple machine organization Workstation design target 25% of cost on processor 25% of cost on memory (minimum memory size) Rest on I/O devices, power supplies, box CPU Computer Control Datapath Memory Devices Input Output
72
The Big Picture High-level language program (in C)
swap (int v[], int k) (int temp; temp = v[k]; v[k] = v[k+1]; v[k+1] = temp;) Assembly language program (for MIPS) swap: sll $2, $5, 2 add $2, $4,$2 lw $15, 0($2) lw $16, 4($2) sw $16, 0($2) sw $15, 4($2) jr $31 Machine (object) code (for MIPS) . . . C compiler For lecture: Computer only understands zeros and ones – instructions of 0’s and 1’s. Early programmers found representing machine instructions in a symbolic notation – assembly language And developed programs that translate from assembler to machine code Eventually, programmers found working even in assembler too tedious so migrated to higher-level languages and developed compilers that would translate from the higher-level languages to assembler Higher-level languages Allow the programmer to think in a more natural language Improve programmer productivity – more understandable code that is easier to debug and validate Improve program maintainability Allow programmers to be independent of the computer on which they are developed (compilers and assemblers can translate high-level language programs to the binary instructions of any machine) Emergence of optimizing compilers that produce very efficient assembly code optimized for the target machine assembler
73
(vonNeumann) Processor Organization
Control needs to read instructions from Memory issue signals to control the information flow between the Datapath components and what operations they perform Datapath needs to have the components – the functional units and storage (e.g., register file) needed to execute instructions interconnects - components connected to ensure flow of instructions and data between components. CPU Control Datapath Memory Devices Input Output Fetch Decode Exec
74
Input Device Inputs Code
Processor Devices Control Input Memory Datapath Output 1. The input devices bring the object code and input data from the outside world into the computer.
75
Code Stored in Memory Memory Processor Devices Control Input Datapath
Control Input Datapath Output 2. These data are kept in the computer’s memory until ... Note that memory holds both INSTRUCTIONS and DATA and you can’t tell the difference. They are both just 32 bit strings of zeros and ones.
76
Processor Fetches an Instruction
Processor fetches an instruction from memory Memory Processor Devices Control Input Datapath Output 3. The processor request and process them. Where does it fetch from?
77
Control Decodes the Instruction
Processor Devices Control Memory Input Datapath Output 3. The processor control decodes the instruction in order to figure out what to tell the datapath to do Control decodes the instruction to determine what to execute
78
Datapath Executes the Instruction
Processor Devices Control Memory Input Datapath contents Reg #4 ADD contents Reg #2 results put in Reg #2 Output 4. The operation of the datapath is controlled by the processor’s controller. Datapath executes the instruction as directed by control
79
What Happens Next? Memory Processor Devices Control Input Datapath
Control Input Datapath Output For lecture 3. What happens next (next instruction is fetched/how to tell where that instruction is located in memory/…) Fetch Exec Decode
80
Output Data Stored in Memory
Processor Devices Control Input Datapath Output 5. The data to be output are kept in the computer’s memory until ... At program completion the data to be output resides in memory
81
Output Device Outputs Data
Processor Devices Control Input Memory Datapath Output 1. The output device outputs the data in the memory to the output medium.
82
Eight Great Ideas Design for Moore’s Law
Use abstraction to simplify design Make the common case fast Performance via parallelism Performance via pipelining Performance via prediction Hierarchy of memories Dependability via redundancy
83
Why learn all this? To know how a computer executes a program that you have written Makes it possible to write computer programs that are: Faster Smaller Secure Energy-efficient Less prone to errors A stepping stone to learn how to design and verify complex computer architectures
84
Notes Course webpage: Office hours start from the second week.
Discussion sessions start from the second week. Office hours start from the second week. Please read Chapter 1 Sections 1.1 – 1.7 e-Learning has been setup already. Next lecture will cover digital logic design If you do not have any digital logic background, read Appendix A: The Basics of Logic Design.
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.