Presentation is loading. Please wait.

Presentation is loading. Please wait.

L02 - Information 1 Comp411 – Spring 2007 1/16/2007 What is “Computation”? Computation is about “processing information” -Transforming information from.

Similar presentations


Presentation on theme: "L02 - Information 1 Comp411 – Spring 2007 1/16/2007 What is “Computation”? Computation is about “processing information” -Transforming information from."— Presentation transcript:

1 L02 - Information 1 Comp411 – Spring 2007 1/16/2007 What is “Computation”? Computation is about “processing information” -Transforming information from one form to another - Deriving new information from old - Finding information associated with a given input - “ Computation” describes the motion of information through time - “ Communication” describes the motion of information through space

2 L02 - Information 2 Comp411 – Spring 2007 1/16/2007 What is “Information”? information, n. Knowledge communicated or received concerning a particular fact or circumstance. Information resolves uncertainty. Information is simply that which cannot be predicted. The less predictable a message is, the more information it conveys! Carolina won again. Tell me something new… “ 10 Problem sets, 2 quizzes, and a final!” A Computer Scientist’s Definition:

3 L02 - Information 3 Comp411 – Spring 2007 1/16/2007 Real-World Information Why do unexpected messages get allocated the biggest headlines? … because they carry the most information.

4 L02 - Information 4 Comp411 – Spring 2007 1/16/2007 What Does A Computer Process? A Toaster processes bread and bagels A Blender processes smoothies and margaritas What does a computer process? 2 allowable answers: – Information – Bits What is the mapping from information to bits?

5 L02 - Information 5 Comp411 – Spring 2007 1/16/2007 Quantifying Information (Claude Shannon, 1948) Suppose you’re faced with N equally probable choices, and I give you a fact that narrows it down to M choices. Then I’ve given you log 2 (N/M) bits of information Examples:  information in one coin flip: log 2 (2/1) = 1 bit  roll of a single die: log 2 (6/1) = ~2.6 bits  outcome of a Football game: 1 bit (well, actually, “they won” may convey more information than “they lost”…) Information is measured in bits (binary digits) = number of 0/1’s required to encode choice(s)

6 L02 - Information 6 Comp411 – Spring 2007 1/16/2007 Example: Sum of 2 dice 2 3 4 5 6 7 8 9 10 11 12 i 2 = log 2 (36/1) = 5.170 bits i 3 = log 2 (36/2) = 4.170 bits i 4 = log 2 (36/3) = 3.585 bits i 5 = log 2 (36/4) = 3.170 bits i 6 = log 2 (36/5) = 2.848 bits i 7 = log 2 (36/6) = 2.585 bits i 8 = log 2 (36/5) = 2.848 bits i 9 = log 2 (36/4) = 3.170 bits i 10 = log 2 (36/3) = 3.585 bits i 11 = log 2 (36/2) = 4.170 bits i 12 = log 2 (36/1) = 5.170 bits The average information provided by the sum of 2 dice: Entropy

7 L02 - Information 7 Comp411 – Spring 2007 1/16/2007 Show Me the Bits! Can the sum of two dice REALLY be represented using 3.274 bits? If so, how? The fact is, the average information content is a strict *lower-bound* on how small of a representation that we can achieve. In practice, it is difficult to reach this bound. But, we can come very close.

8 L02 - Information 8 Comp411 – Spring 2007 1/16/2007 Variable-Length Encoding Of course we can use differing numbers of “bits” to represent each item of data This is particularly useful if all items are *not* equally likely Equally likely items lead to fixed length encodings: – Ex: Encode a particular roll of 5? – {(1,4), (2,3), (3,2), (4,1)} which are equally likely if we use fair dice – Entropy = bits – 00 – (1,4), 01 – (2,3), 10 – (3,2), 11 – (4,1) Back to the original problem. Let’s use this encoding: 2 - 100113 - 01014 - 0115 - 001 6 - 1117 - 1018 - 1109 - 000 10 - 100011 - 010012 - 10010

9 L02 - Information 9 Comp411 – Spring 2007 1/16/2007 Variable-Length Encoding Taking a closer look Decoding 2 - 100113 - 01014 - 0115 - 001 6 - 1117 - 1018 - 1109 - 000 10 - 100011 - 010012 - 10010 Unlikely rolls are encoded using more bits More likely rolls use fewer bits 2536583 Example Stream: 1001100101011110011100101

10 L02 - Information 10 Comp411 – Spring 2007 1/16/2007 Huffman Coding A simple *greedy* algorithm for approximating an entropy efficient encoding 1. Find the 2 items with the smallest probabilities 2. Join them into a new *meta* item whose probability is the sum 3. Remove the two items and insert the new meta item 4. Repeat from step 1 until there is only one item 36/36 11 2/36 3 2/36 4/36 4 3/36 7/36 9 4/36 5 4/36 8/36 15/36 12 1/36 2 1/36 2/36 7 6/36 11/36 8 5/36 6 5/36 10/36 21/36 10 3/36 5/36 Huffman decoding tree

11 L02 - Information 11 Comp411 – Spring 2007 1/16/2007 Converting Tree to Encoding 36/36 4 3/36 11 2/36 3 2/36 4/36 7/36 9 4/36 5 4/36 8/36 15/36 7 6/36 10 3/36 12 1/36 2 1/36 2/36 5/36 11/36 8 5/36 6 5/36 10/36 21/36 0 0 0 0 00 0 0 0 0 1 1 1 1 1 1 1 1 1 1 Huffman decoding tree Once the *tree* is constructed, label its edges consistently and follow the paths from the largest *meta* item to each of the real item to find the encoding. 2 - 100113 - 01014 - 0115 - 001 6 - 1117 - 1018 - 1109 - 000 10 - 100011 - 010012 - 10010

12 L02 - Information 12 Comp411 – Spring 2007 1/16/2007 Encoding Efficiency How does this encoding strategy compare to the information content of the roll? Pretty close. Recall that the lower bound was 3.274 bits. However, an efficient encoding (as defined by having an average code size close to the information content) is not always what we want!

13 L02 - Information 13 Comp411 – Spring 2007 1/16/2007 Encoding Considerations -Encoding schemes that attempt to match the information content of a data stream remove redundancy. They are data compression techniques. - However, sometimes our goal in encoding information is increase redundancy, rather than remove it. Why? -Make the information easier to manipulate (fixed-sized encodings) -Make the data stream resilient to noise (error detecting and correcting codes) -Data compression allows us to store our entire music collection in a pocketable device -Data redundancy enables us to store that *same* information *reliably* on a hard drive

14 L02 - Information 14 Comp411 – Spring 2007 1/16/2007 4 Error detection using parity Sometimes we add extra redundancy so that we can detect errors. For instance, this encoding detects any single-bit error: 2-1111000 3-1111101 4-0011 5-0101 6-0110 7-0000 8-1001 9-1010 10-1100 11-1111110 12-1111011 4 2* 5 4:001111100000101 4 10* 7 3:001111100000101 4 9* 7 2:001111100000101 4 6* 7 1:001111100000101 There’s something peculiar about those codings Same bitstream – w/4 possible interpretations if we allow for only one error ex:001111100000101

15 L02 - Information 15 Comp411 – Spring 2007 1/16/2007 Property 1: Parity The sum of the bits in each symbol is even. (this is how errors are detected) 2-1111000 = 1 + 1 + 1+ 1 + 0 + 0 + 0 = 4 3-1111101= 1 + 1 + 1 + 1 + 1 + 0 + 1 = 6 4-0011= 0 + 0 + 1 + 1 = 2 5-0101= 0 + 1 + 0 + 1 = 2 6-0110= 0 + 1 + 1 + 0 = 2 7-0000= 0 + 0 + 0 + 0 = 0 8-1001= 1 + 0 + 0 + 1 = 2 9-1010= 1 + 0 + 1 + 0 = 2 10-1100= 1 + 1 + 0 + 0 = 2 11-1111110= 1 + 1 + 1 + 1 + 1 + 1 + 0 = 6 12-1111011= 1 + 1 + 1 + 1 + 0 + 1 + 1 = 6 How much information is in the last bit?

16 L02 - Information 16 Comp411 – Spring 2007 1/16/2007 Property 2: Separation Each encoding differs from all others by at least two bits in their overlapping parts This difference is called the “Hamming distance” “A Hamming distance of 1 is needed to uniquely identify an encoding”

17 L02 - Information 17 Comp411 – Spring 2007 1/16/2007 A Short Detour It is illuminating to consider Hamming distances geometrically. Given 2 bits the largest Hamming distance that we can achieve between 2 encodings is 2. This allows us to detect 1-bit errors if we encode 1 bit of information using 2 bits. With 3 bits we can find 4 encodings with a Hamming distance of 2, allowing the detection of 1-bit errors when 3 bits are used to encode 2 bits of information. We can also identify 2 encodings with Hamming distance 3. This extra distance allows us to detect 2-bit errors. However, we could use this extra separation differently. 00 10 01 11 000001 010 100 011 111 101 110 Encodings separated by a Hamming distance of 2. 000001 010 100 011 111 101 110 Encodings separated by a Hamming distance of 3.

18 L02 - Information 18 Comp411 – Spring 2007 1/16/2007 Error correcting codes We can actually correct 1-bit errors in encodings separated by a Hamming distance of 2. This is possible because the sets of bit patterns located a Hamming distance of 1 from our encodings are distinct. However, attempting error correction with such a small separation is dangerous. Suppose, we have a 2-bit error. Our error correction scheme will then misinterpret the encoding. Misinterpretations also occurred when we had 2-bit errors in our 1-bit-error-detection (parity) schemes. A safe 1-bit error correction scheme would correct all 1-bit errors and detect all 2-bit errors. What Hamming distance is needed between encodings to accomplish this? 000001 010 100 011 111 101 110 This is just a voting scheme

19 L02 - Information 19 Comp411 – Spring 2007 1/16/2007 Information Summary Information resolves uncertainty Choices equally probable: N choices down to M  log 2 (N/M) bits of information Choices not equally probable: choice i with probability p i  log 2 (1/p i ) bits of information average number of bits =  p i log 2 (1/p i ) use variable-length encodings

20 L02 - Information 20 Comp411 – Spring 2007 1/16/2007 Computer Abstractions and Technology 1. Layer Cakes 2. Computers are translators 3. Switches and Wires (Read Chapter 1)

21 L02 - Information 21 Comp411 – Spring 2007 1/16/2007 Computers Everywhere The computers we are used to Desktops Laptops Embedded processors Cars Mobile phones Toasters, irons, wristwatches, happy-meal toys

22 L02 - Information 22 Comp411 – Spring 2007 1/16/2007 Compiler for (i = 0; i < 3; i++) m += i*i; Assembler and Linkeraddi $8, $6, $6 sll $8, $8, 4 CPU Module ALU AB Cells A B CO CI S FA A Computer System What is a computer system? Where does it start? Where does it end? Gates Transistors

23 L02 - Information 23 Comp411 – Spring 2007 1/16/2007 Computer Layer Cake Applications Systems software Shared libraries Operating System Hardware – the bare metal Hardware Operating System Libraries Systems S/WApps Computers are digital Chameleons

24 L02 - Information 24 Comp411 – Spring 2007 1/16/2007 Computers are Translators User-Interface (visual programming) High-Level Languages Compilers Interpreters Assembly Language Machine Language x:.word 0 y:.word 0 c:.word 123456... lw$t0, x addi$t0, $t0, -3 lw$t1, y lw$t2, c add$t1, $t1, $t2 mul$t0, $t0, $t1 sw$t0, y int x, y; y = (x-3)*(y+123456)

25 L02 - Information 25 Comp411 – Spring 2007 1/16/2007 Computers are Translators User-Interface (visual programming) High-Level Languages Compilers Interpreters Assembly Language Machine Language x:.word 0 y:.word 0 c:.word 123456... lw$t0, x addi$t0, $t0, -3 lw$t1, y lw$t2, c add$t1, $t1, $t2 mul$t0, $t0, $t1 sw$t0, y 0x04030201 0x08070605 0x00000001 0x00000002 0x00000003 0x00000004 0x706d6f43

26 L02 - Information 26 Comp411 – Spring 2007 1/16/2007 Why So Many Languages? Application Specific Historically: COBOL vs. Fortran Today: C# vs. Java Visual Basic vs. Matlab Static vs. Dynamic Code Maintainability High-level specifications are easier to understand and modify Code Reuse Code Portability Virtual Machines

27 L02 - Information 27 Comp411 – Spring 2007 1/16/2007 Under the Covers Input Output Storage Processing Datapath Control

28 L02 - Information 28 Comp411 – Spring 2007 1/16/2007 Under the Covers Input Output Storage Processing Datapath Control

29 L02 - Information 29 Comp411 – Spring 2007 1/16/2007 Under the Covers Input Output Storage Processing Datapath Control Cathode Ray Tube (CRT) The “last vacuum tube” Now nearing extinction

30 L02 - Information 30 Comp411 – Spring 2007 1/16/2007 Under the Covers Input Output Storage Processing Datapath Control Liquid Crystal Displays (LCDs)

31 L02 - Information 31 Comp411 – Spring 2007 1/16/2007 Under the Covers Input Output Storage Processing Datapath Control

32 L02 - Information 32 Comp411 – Spring 2007 1/16/2007 Under the Covers Input Output Storage Processing Datapath Control

33 L02 - Information 33 Comp411 – Spring 2007 1/16/2007 Under the Covers Input Output Storage Processing Datapath Control Intel Pentium III Xeon

34 L02 - Information 34 Comp411 – Spring 2007 1/16/2007 Implementation Technology Relays Vacuum Tubes Transistors Integrated Circuits Gate-level integration Medium Scale Integration (PALs) Large Scale Integration (Processing unit on a chip) Today (Multiple CPUs on a chip) Nanotubes?? Quantum-Effect Devices??

35 L02 - Information 35 Comp411 – Spring 2007 1/16/2007 open closed Implementation Technology Common Links? A controllable switch Computers are wires and switches open control

36 L02 - Information 36 Comp411 – Spring 2007 1/16/2007 Chips Silicon Wafers Chip manufactures build many copies of the same circuit onto a single wafer. Only a certain percentage of the chips will work; those that work will run at different speeds. The yield decreases as the size of the chips increases and the feature size decreases. Wafers are processed by automated fabrication lines. To minimize the chance of contaminants ruining a process step, great care is taken to maintain a meticulously clean environment.

37 L02 - Information 37 Comp411 – Spring 2007 1/16/2007 Field Effect Transistors (FETs) Modern silicon fabrication technology is optimized to build a particular type of transistor. The flow of electrons from the source to the drain is controlled by a gate voltage. SourceDrainGate Bulk n+ p I DS = 0 V DS

38 L02 - Information 38 Comp411 – Spring 2007 1/16/2007 Chips Silicon Wafers Metal 2 M1/M2 via Metal 1 Polysilicon Diffusion Mosfet (under polysilicon gate) IBM photomicrograph (Si has been removed!)

39 L02 - Information 39 Comp411 – Spring 2007 1/16/2007 How Hardware WAS Designed 20 years ago I/O Specification Truth tables State diagrams Logic design Circuit design Circuit Layout

40 L02 - Information 40 Comp411 – Spring 2007 1/16/2007 How Hardware IS Designed Today (with software) High-level hardware specification languages Verilog VHDL

41 L02 - Information 41 Comp411 – Spring 2007 1/16/2007 Reconfigurable Chips Programmable Array Logic (PALs) Fixed logic / programmable wires Field Programmable Gate Arrays (FPGAs) Repeated reconfigurable logic cells

42 L02 - Information 42 Comp411 – Spring 2007 1/16/2007 Next Time Computer Representations How is X represented in computers? X = text X = numbers X = anything else Encoding Information


Download ppt "L02 - Information 1 Comp411 – Spring 2007 1/16/2007 What is “Computation”? Computation is about “processing information” -Transforming information from."

Similar presentations


Ads by Google