Presentation is loading. Please wait.

Presentation is loading. Please wait.

COSC 1306 COMPUTER SCIENCE AND PROGRAMMING Jehan-François Pâris

Similar presentations


Presentation on theme: "COSC 1306 COMPUTER SCIENCE AND PROGRAMMING Jehan-François Pâris"— Presentation transcript:

1 COSC 1306 COMPUTER SCIENCE AND PROGRAMMING Jehan-François Pâris jfparis@uh.edu

2 CHAPTER VIII COMPUTER ORGANIZATION

3 WARNING These slides contain too much material I need to remove some of them before the end of the semester After it's done this warning will be removed

4 Chapter Overview We will focus on the main challenges of computer architecture  Managing the I/O hierarchy Caching, multiprogramming, virtual memory  Speeding up the CPU Pipelined and multicore architectures  Protecting user computations and data Memory protection, privileged instructions

5 Managing the Memory Hierarchy

6 The memory hierarchy CPU registers Main memory (RAM) Secondary storage (Disks) Mass storage (Often offline)

7 CPU registers Inside the processor itself Some can be accessed by our programs  Others no Can be read/written to in one processor cycle  If processor speed is 2 GHz 2,000,000,000 cycles per second 2 cycles per nanosecond

8 Main memory (I) Byte accessible  Each group of 8 bits has an address Dynamic random access memory (DRAM)  Slower but much cheaper than static RAM  Contents must be refreshed every 64 ms Otherwise its contents are lost:  DRAM is volatile

9 Main memory (II) Memory is organized as a sequence of 8-bit bytes  Each byte an address  Bytes can contain one character Roman alphabet with accents 0123456789101112131415

10 Main memory (III) Groups of four bytes starting at addresses that are multiple of 4 form words  Better suited to hold numbers  Also have half-words, double words, quad words 04812

11 Accessing main memory (I) When look for some item, our search criteria can include the location of the item  The book on the table  The student behind you, … More often our main search criterion is some attribute of the item  The color of a folder  The title or the authors of a book  The name of an individual

12 Accessing main memory (II) Computers always access their memory by location  The byte at address 4095  The word at location 512 States the address of the first byte in the word Why?  Fastest way to access an item 512513514515

13 An analogy (I) Some research libraries have a closed-stack policy  Only library employees can access the stacks  Patrons wanting to get an item fill a form containing a call number specifying the location of the item Could be Library of Congress classification if the stacks are organized that way.

14 An analogy (II) The procedure followed by the employee fetching the book is fairly simple  Go at location specified by the book call number  Check it the book is there  Bring it to the patron

15 An analogy (III) The memory operates in an even simpler manner  Always fetch the contents of the addressed bytes Junk or not

16 Disk drives (I) Sole part of computer architecture with moving parts: Data stored on circular tracks of a disk  Spinning speed between 5,400 and 15,000 rotations per minute  Accessed through a read/write head

17 Disk drives (II) Platter R/W head Arm Servo

18 Disk drives (III) Data can be accessed by blocks of 4KB, 8 KB, …  Depends on disk partition parameters User selectable To access a disk block  Read/write head must be over the right track Seek time  Data to be accessed must pass under the head Rotational latency

19 Estimating the rotational latency On the average half a disk rotation If disk spins at 15,000 rpm  250 rotations per second  Half a rotation corresponds to 2ms Most desktops have disks that spin at 7,200 rpm Most notebooks have disks that spin at 5,400 or 7,200 rpm

20 Accessing disk contents Each block on a disk has a unique address  Normally a single number Logical block addressing (LBA)  Older PCs used a different scheme

21 The memory hierarchy (II) Leve l DeviceAccess Time 1Fastest registers (2 GHz CPU) 0.5 ns 2Main memory10-70 ns 3Secondary storage (disk)7 ms 4Mass storage (CD-ROM library) a few s

22 The memory hierarchy (III) To make sense of these numbers, let us consider an analogy

23 Writing a paper (I) LevelResourceAccess Time 1Open book on desk1 s 2Book on desk 3Book in library 4Book far away

24 Writing a paper (II) LevelResourceAccess Time 1Open book on desk1 s 2Book on desk20-140 s 3Book in library 4Book far away

25 Writing a paper (III) LevelResourceAccess Time 1Open book on desk1 s 2Book on desk20-140 s 3Book in library162 days 4Book far away

26 Writing a paper (IV) LevelResourceAccess Time 1Open book on desk1 s 2Book on desk20-140 s 3Book in library162 days 4Book far away63 years

27 The two gaps (I) Gap between CPU and main memory speeds:  Will add intermediary levels L1, L2, and L3 caches  Will store contents of most recently accessed memory addresses Most likely to be needed in the future  Purely hardware solution Software does not see it

28 The two gaps (II) Gap between main memory and disk speeds:  Will store in main memory contents of most recently accessed disk blocks Most likely to be needed in the future  Purely software solution Software does not see it  Can also replace the hard disk by faster flash memory

29 Why? Having hardware handle an issue  Complicates hardware design  Offers a very fast solution  Best approach for very frequent actions Letting software handle an issue  Cheaper  Has a much higher overhead  Best approach for less frequent actions

30 Will the problem go away? It will become worse  RAM access times are not improving as fast as CPU power  Disk access times are limited by rotational speed of disk drive

31 What are the solutions? To bridge the CPU/DRAM gap:  Interposing between the CPU and the DRAM smaller, faster memories that cache the data that the CPU currently needs Cache memories Managed by the hardware and invisible to the software (OS included)

32 What are the solutions? To bridge the DRAM/disk drive gap: Storing in main memory the data blocks that are currently accessed (I/O buffer) Managing memory space and disk space as a single resource (Virtual memory) I/O buffer and virtual memory are managed by the OS and invisible to the user processes

33 Why do these solutions work? Locality principle:  Spatial locality: at any time a process only accesses a small portion of its address space  Temporal locality: this subset does not change too frequently

34 The true memory hierarchy CPU registers Main memory (RAM) Secondary storage (Disks) Mass storage (Often offline) L1, L2 and L3 caches

35 Handling the CPU/DRAM speed gap

36 The technology Caches use faster static RAM (SRAM)  (D flipflops if you really want to know) Can have  Separate caches for instructions and data Great for pipelining  A unified cache

37 Basic principles Assume we want to store in a faster memory 2 n words that are currently accessed by the CPU  Can be instructions or data or even both When the CPU will need to fetch an instruction or load a word into a register  It will look first into the cache  Can have a hit or a miss

38 Cache hits Occur when the requested word is found in the cache  Cache avoided a memory access  CPU can proceed

39 Cache misses Occur when the requested word is not found in the cache  Will need to access the main memory  Will bring the new word into the cache Must make space for it by expelling one of the cache entries  Need to decide which one

40 Cache design challenges Cache contains a small subset of memory addresses Must find a very fast access mechanism  No linear search, no binary search  Would like to have an associative memory Can search by content all memory entries in parallel  Like human brains do

41 An associative memory Search for “ice cream” COSC 1306 program Finding a parking spot My last ice cream Other ice cream moment Found

42 An analogy (I) Let go back to our closed-stack library example  Librarians have noted that some books get asked again and again Want to put them closer to the circulation desk  Would result in much faster service  The problem is how to locate these books They will not be at the right location!

43 An analogy (II) Librarians come with a great solution  They put behind the circulation desk shelves with 100 book slots numbered from 00 to 99  Each slot is a home for the most recently requested book that has a call number whose last two digits match the slot number 3141593 can only go in slot 93 1234567 can only go in slot 67

44 An analogy (III) The call number of the book I need is 3141593 Let me see if it's in bin 93

45 An analogy (IV) To let the librarian do her job each slot much contain either  Nothing or  A book and its reference number There are many books whose reference number ends in 93 or 67 or any two given digits

46 An analogy (V) Could I get this time the book whose call number 4444493? Sure

47 An analogy (VI) This time the librarian will  Go bin 93  Find it contains a book with a different call number She will  Bring back that book to the stacks  Fetch the new book

48 A very basic cache Has 2 n entries Each entry contains  A word (4 bytes)  Its memory address Sole way to identify the word  A bit indicating whether the cache entry contains something useful

49 A very basic cache (I) RAM AddressWord RAM AddressWord RAM AddressWord RAM AddressWord RAM AddressWord RAM AddressWord RAM AddressWord RAM AddressWord Actual caches are much bigger RAM AddressWord RAM AddressWord RAM AddressWord RAM AddressWord RAM AddressWord RAM AddressWord RAM AddressWord RAM AddressWord TagContentsValid Y/N 110 111 000 001 100 101 010 011

50 Multiword cache Word 000 001 010 011 000 001 100 101 010 011 000 001 110 111 100 101 010 011 000 001 Valid Y/N 110 111 100 101 010 011 000 001 Valid Y/N 110 111 100 101 010 011 000 001 Tag Valid Y/N 110 111 100 101 010 011 000 001 100 101 010 011 Contents 110 111 000 001 100 101 010 011 Word Address * Word

51 Set-associative caches (I) Can be seen as 2, 4, 8 caches attached together Reduces collisions Skip

52 Back to our library example What if two books whose call number have the same last two digits are often asked on the same day:  Say, 3141593 and 4444493 Best solution is  Keep the number of book slots equal to 100  Store more than one book with same last two digits in the same slot Skip

53 Set-associative caches (II) 000 001 010 011 100 101 110 111 Address*Block Address*Block Address*Block Address*Block Address*Block Address*Block Address*Block Address*Block TagContentsValid Y/N 000 001 010 011 100 101 110 111 Address*Block Address*Block Address*Block Address*Block Address*Block Address*Block Address*Block Address*Block TagContentsValid Y/N Skip

54 Set-associative caches (III) Advantage:  We take care of more collisions Like a hash table with a fixed bucket size  Results in lower miss rates than direct- mapped caches Disadvantage:  Slower access  Best solution if miss penalty is very big Skip

55 Fully associative caches The dream! A block can occupy any index position in the cache Requires an associative memory  Content-addressable  Like our brain! Remains a dream Skip

56 A cache hierarchy Two or three levels (L1, L2, L3) L1 cache:  Highest level  Optimized for speed: Direct mapping and no associativity L2 and L3 caches:  Optimized for hit ratio Higher associativity Skip

57 Handling the DRAM/disk speed gap

58 What can be done? Two main techniques  Making disk accesses more efficient  Doing something else while waiting for an I/O operation Not very different from what we are doing in our every day's lives

59 Optimizing read accesses (I) When we shop in a market that’s far away from our home, we plan ahead and buy food for several days The OS will read as many bytes as it can during each disk access  In practice, whole blocks (4KB or more)  Blocks are stored in the I/O buffer

60 Optimizing read accesses (II) Process I/O buffer Disk Drive Read operation Physical I/O Most short read operations can be completed without any disk access

61 Optimizing read accesses (III) Buffered disk reads work quite well  Most systems use it They have a major limitation  If we try to read too much ahead of the program, we risk to bring into main memory data that will never be used

62 Optimizing read accesses (IV) Can also keep in a buffer recently accessed blocks hoping they will be accessed again  Caching Works very well because we keep accessing again and again the data we are working with Caching is a fundamental technique of OS and database design

63 Optimizing write accesses (II) If we live far away from a library, we wait until we have several books to return before making the trip The OS will delay writes for a few seconds then write an entire block  Since most writes are sequential, most short writes will not require any disk access

64 Optimizing write accesses (II) Delayed writes work quite well  Most systems use it It has a major drawback  We will lose data if the system or the program crashes After the program issued a write but Before the data were saved to disk

65 Doing something else When we order something on the web, we do not remain idle until the goods are delivered The OS can implement multiprogramming and let the CPU run another program while a program waits for an I/O

66 Advantages (I) Multiprogramming is very important in business applications  Many of these applications use the peripherals much more than the CPU  For a long time the CPU was the most expensive component of a computer  Multiprogramming was invented to keep the CPU busy

67 Advantages (II) Multiprogramming made time-sharing possible Multiprogramming lets your PC run several applications at the same time  MS Word and MS Outlook

68 Requirements Two basic requirements  Must have a mechanism that handles I/O without CPU intervention I/O controller  Must have a way to notify the CPU when an I/O operation is completed Interrupts

69 Analogy When we buy a book in brick and mortar store,  We wait until we get the book May have to waste time waiting in line When we order a book over the Internet,  We do other things while waiting for the book UPS takes care of it We get notified when book has arrived

70 Interrupts Normally the kernel is inactive  Users' programs are in control When OS intervention is required  Must interrupt the flow of execution of the CPU  Give CPU to the OS

71 Interrupts Detected by the CPU hardware  After it has executed the current instruction  Before it starts the next instruction

72 A very schematic view (I) A very basic CPU would execute the following loop: forever { fetch_instruction(); decode_instruction(); execute_instruction(); } Pipelining makes things more complicated Skip

73 A very schematic view (II) We add an extra step: forever { check_for_interrupts(); fetch_instruction(); decode_instruction(); execute_instruction(); } Skip

74 Types of interrupts I/O completion interrupts  Requested data are now in memory Timer interrupts  A process has been using the CPU for more than x ms System calls  Running process needs something from the OS …

75 Kernel System calls Program System call / system request

76 Managing the main memory

77 Virtual memory Combines two big ideas  Non-contiguous memory allocation: processes are allocated page frames scattered all over the main memory  On-demand fetch: Process pages are brought in main memory when they are accessed for the first time MMU takes care of almost everything

78 Main memory Divided into fixed-size page frames  Allocation units  Sizes are powers of 2 (512 B, 1KB, 2KB, 4KB)  Properly aligned  Numbered 0, 1, 2,... 012345678

79 Program address space Divided into fixed-size pages  Same sizes as page frames  Properly aligned  Also numbered 0, 1, 2,... 01234567

80 The mapping Will allocate non contiguous page frames to the pages of a process 01234567012

81 Is it virtual or real? MMU translates  Virtual addresses used by the process into  Real addresses in main memory

82 Realization PHYSICAL ADDRESS 1 5 3 7 (unchanged) VIRTUAL ADDRESS 2897 Page NoOffset 5897 Frame NoOffset PAGE TABLE Skip

83 On-demand fetch (I) Most processes terminate without having accessed their whole address space  Code handling rare error conditions,... Other processes go to multiple phases during which they access different parts of their address space  Compilers Skip

84 On-demand fetch (II) VM systems do not fetch whole address space of a process when it is brought into memory They fetch individual pages on demand when they get accessed the first time  Page miss or page fault When memory is full, they expel from memory pages that are not currently in use Skip

85 Advantages System does not waste time loading pages that will be never accessed. Can have very large virtual address spaces Could run programs that are too big to fit in main memory  Important during the 70's and early 80's Skip

86 Disadvantages Slows down memory accesses  Address translation overhead  Page faults Page faults introduce unpredictable delays  Very bad for real-time system No substitute for enough physical memory  Page faults are very expensive Skip

87 Implementation To speed up address translation  A few hundred recently accessed page table entries are cached in the Translation Lookaside Buffer (TLB)  Remainder of page table is divided between Main memory (active page table entries) Secondary storage (the other entries) Skip

88 SPEEDING UP THE CPU

89 Main techniques Speeding up the CPU clock  A 2GHz CPU goes through to computing cycle every nanosecond Letting the CPU "pipeline" instructions  Let the CPU work as an assembly line Have a multicore architecture

90 Limitations Speeding up the CPU clock  CPU with clock rates over 2 to 3 GHz become increasingly hard to cool Letting the CPU "pipeline" instructions  Cannot have perfect pipelining Have a multicore architecture  Harder to write software

91 Pipelining

92 The big idea Making different parts of the CPU work at the same time at different steps of a different instructions  Transforming the CPU into an assembly line

93 An analogy (I) Washing your clothes  Four steps: 1.Putting in the washer 2.Putting in the dryer 3.Folding/ironing 4.Putting them away

94 An analogy (II) Most people  Start second wash load as soon as first wash load is in dryer  Put second wash load in dryer and start a third wash load while they are folding/ironing the firs wash load

95 Purely sequential approach Time6 pm6:307pm7:308pm8:309pm9:30 WashDryFoldStore WashDryFoldStore

96 Smart approach Time6 pm6:307pm7:308pm8:309pm9:30 WashDryFoldStore WashDryFoldStore WashDryFoldStore WashDryFoldStore Solution assumes that a housemate puts folded/ironed clothes way for us

97 Main advantage Can do much more in much less time

98 An example Pipelining in the MIPS architecture Why?  Architecture developed by a team lead by John Hennessy from Stanford University  Pipelining is well described in architecture textbook by Hennessy and Patterson WARNING: It is just an example Skip

99 Multiprocessor architectures

100 The solutions Many parallel processing solutions  Multiprocessor architectures Two or more microprocessor chips Multiple architectures  Multicore architectures Several processors on a single chip  Can have both

101 A dual-core architecture RAM Shared Cache Core Cache Core Cache

102 Even our your cell phone Taiwanese chip maker MediaTek has introduced what it’s calling the first “true” octa-core chip. The MT6592 can use up to 8 processor cores at once  It’s not clear what that will actually mean in terms of day-to-day performance. Liliputing.com, November 20, 2013

103 A major challenge Keeping the caches consistent  Contents of same memory location can be cached by different processing units  What if one of the processing units modifies these contents All other caches will have the old values  Must update the memory location and invalidate the values in the other caches

104 The software side Two ways for software to exploit parallel processing capabilities of hardware  Job-level parallelism Several sequential processes run in parallel Easy to implement (OS does the job!)  Process-level parallelism A single program runs on several processors at the same time

105 Main considerations Some problems are embarrassingly parallel  Many computer graphics tasks  Brute force searches in cryptography or password guessing Much more difficult for other applications  Communication overhead among sub-tasks  Balancing the load

106 A last issue Humans likes to address issues one after the order  We have meeting agendas  We do not like to be interrupted  We write sequential programs

107 Rene Descartes Seventeenth-century French philosopher Invented  Cartesian coordinates  Methodical doubt [To] never to accept anything for true which I did not clearly know to be such Proposed a scientific method based on four precepts

108 Method's third rule The third, to conduct my thoughts in such order that, by commencing with objects the simplest and easiest to know, I might ascend by little and little, and, as it were, step by step, to the knowledge of the more complex; assigning in thought a certain order even to those objects which in their own nature do not stand in a relation of antecedence and sequence.

109 My take Things will have to change

110 PROTECTION

111 Protecting users’ data (I) Unless we have an isolated single-user system, we must prevent users from  Accessing  Deleting  Modifying without authorization other people's programs and data

112 Protecting users’ data (III) Two aspects  Protecting user's files on disk  Preventing programs from interfering with each other

113 Historical Considerations Earlier operating systems for personal computers did not have any protection  They were single-user machines  They typically ran one program at a time Windows 2000, Windows XP, Vista, Windows 7 and MacOS X are protected

114 Protecting users’ files Key idea is to prevent users’ programs from directly accessing the disk Will require I/O operations to be performed by the kernel Make them privileged instructions that only the kernel can execute

115 Privileged instructions Require a dual-mode CPU Two CPU modes  Privileged mode or executive mode that allows CPU to execute all instructions  User mode that allows CPU to execute only safe unprivileged instructions State of CPU is determined by a special bit

116 Kernel User Process X User must go through the kernel

117 Switching between states User mode will be the default mode for all programs  Only the kernel can run in supervisor mode Switching from user mode to supervisor mode is done through an interrupt  Safe because the jump address is at a well- defined location in main memory Skip

118 Performing an I/O Program Kernel answers the request and performs the I/O I/O request (interrupt) Physical I/O (executed by the kernel)

119 An analogy (I) Most UH libraries are open stacks  Anyone can consult books in the stacks and bring them to checkout National libraries and the Library of Congress have close stack collections  Users fill a request for a specific document  A librarian will bring the document to the circulation desk

120 An analogy (II) Open stack collections  Let users browse the collections  Users can misplace or vandalize books Close stack collections  Much slower and less flexible  Much safer

121 More trouble Having a dual-mode CPU is not enough to protect user’s files Must also prevent rogue users from tampering with the kernel  Same as a rogue customer bribing a librarian in order to steal books Done through memory protection

122 Memory protection (I) Prevents programs from accessing any memory location outside their own address space Requires special memory protection hardware Memory protection hardware  Checks every reference issued by program  Generates an interrupt when it detects a protection violation

123 Memory protection (II) Has additional advantages:  Prevents programs from corrupting address spaces of other programs  Prevents programs from crashing the kernel Not true for device drivers which are inside the kernel Required part of any multiprogramming system

124 Even more trouble Having both a dual-mode CPU and memory protection is not enough to protect user’s files Must also prevent rogue users from booting the system with a doctored kernel  Example: Can run Linux from a “live” CD Linux Linux will read all NTFS files ignoring all restrictions set up by Vista or Windows 7 Skip

125 Conclusion As computer architecture becomes more complex  Some old problems continue to bother us: Wide access time gaps between  CPU and main memory  Main memory and disk (or even flash)  Some solutions bring new challenges: Multicore architectures


Download ppt "COSC 1306 COMPUTER SCIENCE AND PROGRAMMING Jehan-François Pâris"

Similar presentations


Ads by Google