LING 388: Computers and Language

Slides:



Advertisements
Similar presentations
ICS312 Set 2 Representation of Numbers and Characters.
Advertisements

The Binary Numbering Systems
Digital Fundamentals Floyd Chapter 2 Tenth Edition
Data Representation COE 205
Chapter 3 Data Representation.
Chapter 3 Data Representation. Chapter goals Describe numbering systems and their use in data representation Compare and contrast various data representation.
COMP3221 lec07-numbers-III.1 Saeid Nooshabadi COMP 3221 Microprocessors and Embedded Systems Lecture 7: Number Systems - III
Data Representation ICS 233
CCE-EDUSAT SESSION FOR COMPUTER FUNDAMENTALS Date: Session III Topic: Number Systems Faculty: Anita Kanavalli Department of CSE M S Ramaiah.
1.6 Signed Binary Numbers.
LING 408/508: Programming for Linguists Lecture 1 August 26 th.
LING 408/508: Programming for Linguists Lecture 2 August 28 th.
Computers Organization & Assembly Language
LING 408/508: Programming for Linguists Lecture 1 August 24 th.
Computer Programming I. Today’s Lecture  Components of a computer  Program  Programming language  Binary representation.
Lec 3: Data Representation Computer Organization & Assembly Language Programming.
ECEN2102 Digital Logic Design Lecture 1 Numbers Systems Abdullah Said Alkalbani University of Buraimi.
Digital Logic Design Lecture 3 Complements, Number Codes and Registers.
LING 408/508: Programming for Linguists Lecture 2 August 26 th.
ICS312 Set 9 Logic & Shift Instructions. Logic & Shift Instructions Logic and Shift Instructions can be used to change the bit values in an operand. The.
Computer Architecture and Organization
 Lecture 2 Processor Organization  Control needs to have the  Ability to fetch instructions from memory  Logic and means to control instruction sequencing.
1 Information Representation in Computer Lecture Nine.
Computer Science I Storing data. Binary numbers. Classwork/homework: Catch up. Do analysis of image types.
CS 2130 Lecture 23 Data Types.
Chapter 1 Representing Data in a Computer. 1.1 Binary and Hexadecimal Numbers.
Data Representation COE 301 Computer Organization Dr. Muhamed Mudawar
CS 125 Lecture 3 Martin van Bommel. Overflow In 16-bit two’s complement, what happens if we add =
 Computers are 2-state devices › Pulse – No pulse › On – Off  Represented by › 1 – 0  BINARY.
Number Systems. The position of each digit in a weighted number system is assigned a weight based on the base or radix of the system. The radix of decimal.
Numerical Representation Intro to Computer Science CS1510 Dr. Sarah Diesburg 1.
Chapter 1 Digital Systems and Binary Numbers
Data Representation COE 308 Computer Architecture
Bits, Data Types, and Operations
Chapter 2 Bits, Data Types, and Operations
Programming and Data Structure
Data Representation ICS 233
Lec 3: Data Representation
Data Representation.
Lecture No. 4 Number Systems
Number Representation
3.1 Denary, Binary and Hexadecimal Number Systems
Data Representation Binary Numbers Binary Addition
Microprocessor Systems Design I
Microprocessor Systems Design I
CS1010 Programming Methodology
BEE1244 Digital System and Electronics BEE1244 Digital System and Electronic Chapter 2 Number Systems.
LING 408/508: Computational Techniques for Linguists
Chapter 1 Data Storage.
Data Representation COE 301 Computer Organization
LING 388: Computers and Language
LING 388: Computers and Language
LING 388: Computers and Language
Numerical Representation
LING 408/508: Programming for Linguists
LING 408/508: Computational Techniques for Linguists
Digital Logic & Design Lecture 03.
Data Representation – Chapter 3
Bits and Bytes Topics Representing information as bits
COMS 161 Introduction to Computing
Bits and Bytes Topics Representing information as bits
COMS 161 Introduction to Computing
Chapter 3 DataStorage Foundations of Computer Science ã Cengage Learning.
LING 388: Computers and Language
Data Representation ICS 233
Comp Org & Assembly Lang
LING 388: Computers and Language
COMS 161 Introduction to Computing
Data Representation COE 308 Computer Architecture
Presentation transcript:

LING 388: Computers and Language Lecture 3

Today's Topics Review Quick Homework 2 Quick Homework 3 Underneath Python Binary Numbers Character sets

Homework 2 review: Egghead 1952

Homework 2 review: Egghead nerd or geek "While its true origins are a bit murky, it's widely agreed that "egghead" reached the height of its usage in 1952, when then-vice-presidential candidate Richard Nixon used it to describe the Democratic candidate for president, Adlai Stevenson -- whose balding head certainly conjured images of an egg. But his supporters were quickly labeled eggheads as well, and thus the term took on a political meaning. It came, eventually, to mean approximately what the word "elitist" means today." Adlai Stevenson (source: Wikipedia)

Python: Numbers import math math.pi

Python: Numbers Identities: one here one for your homework Two decimal places: '{:.2f}'.format(i) Let's define a function for this identity

Quick Homework 3 Complex number library: i is j in Python: cmath https://docs.python.or g/3.0/library/cmath.ht ml i is j in Python: >>> cmath.sqrt(-1) 1j Demonstrate Euler's identity in Python Due date Monday (midnight) https://en.wikipedia.org/wiki/Euler%27s_identity

The underlying architecture Python is a high-level language Beneath Python lies binary… interesting use of binary notation: 6 bit Leica camera lens encoding The 6-bit code shown on the right is 101011 (1 = black, 0 = white),  read starting from the philips screw head

Introduction Memory Representation binary: zeros and ones (1 bit) array a[23] addressable Memory (RAM) Memory Representation binary: zeros and ones (1 bit) organized into bytes (8 bits) memory is byte-addressable word (32 bits) e.g. integer (64 bits: floating point number) big-endian/little-endian most significant byte first or least significant byte communication … your Intel and ARM CPUs FFFFFFFF

Increased address space and 64-bit registers Introduction Increased address space and 64-bit registers A typical notebook computer Example: a 2013 Macbook Air CPU: Core i5-4250U 1.3 billion transistors built-in GPU TDP: 15W (1.3 GHz) Dual core (Turbo: 2.6 GHz) Hyper-Threaded (4 logical CPUs, 2 physical) 64 bit 64 KB (32 KB Instruction + 32 KB Data) L1 cache 256 KB L2 cache per core 12MB L3 cache shared 16GB max RAM A 4 core machine: 8 virtual anandtech.com

Introduction Machine Language A CPU understands only one language: machine language all other languages must be translated into machine language Primitive instructions include: MOV PUSH POP ADD / SUB INC / DEC IMUL / IDIV AND / OR / XOR / NOT NEG SHL / SHR JMP CMP JE / JNE / JZ / JG / JGE / JL / JLE CALL / RET http://www.cs.virginia.edu/~evans/cs216/guides/x86.html Assembly Language: (this notation) by definition, nothing built on it is more powerful

Introduction Not all the machine instructions are conceptually necessary many provided for speed/efficiency Theoretical Computer Science All mechanical computation can be carried out using a TURING MACHINE Finite state table + (infinite) tape Tape instructions: at the tape head: Erase, Write, Move (Left/Right/NoMove) Finite state table: Current state x Tape symbol --> new state x New Tape symbol x Move

Introduction Storage: based on digital logic binary (base 2) – everything is a power of 2 Byte: 8 bits 01011011 = 26+24+23+21+20 = 64 + 16 + 8 + 2 + 1 = 91 (in decimal) Hexadecimal (base 16) 0-9,A,B,C,D,E,F (need 4 bits) 5B (= 1 byte) = 5*161 + 11 = 80 + 11 = 91 27 26 25 24 23 22 21 20 1 27 26 25 24 23 22 21 20 23 22 21 20 1 23 22 21 20 23 22 21 20 1 5 B 161 160 5 B 161 160

Introduction: data types Integers In one byte (= 8 bits), what’s the largest and smallest number, we can represent? Answer: -128 .. 0 .. 127 Why? -28-1.. 0 .. 28-1 – 1 00000000 = 0 01111111 = 127 10000000 = -128 11111111 = -1 27 26 25 24 23 22 21 20 1 27 26 25 24 23 22 21 20 27 26 25 24 23 22 21 20 27 + 26 + 25 + 24 + 23 + 22 + 21 + 20 28 – 1 = 255 27 26 25 24 23 22 21 20 1 27 26 25 24 23 22 21 20 27 – 1 = 127 27 26 25 24 23 22 21 20 1

Introduction: data types Integers In one byte (= 8 bits), what’s the largest and smallest number, we can represent? 00000000 = 0 01111111 = 127 10000000 = -128 11111111 = -1 00000000 11111111 … 127 -128 -127 -1 2’s complement representation

Introduction: data types Integers In one byte, what’s the largest and smallest number, we can represent? Answer: -128 .. 0 .. 127 using the 2’s complement representation Why? super-convenient for arithmetic operations “to convert a positive integer X to its negative counterpart, flip all the bits, and add 1” Example: 00001010 = 23 + 21 = 10 (decimal) 11110101 + 1 = 11110110 = -10 (decimal) 11110110 flip + 1 = 00001001 + 1 = 00001010 Addition: -10 + 10 = 11110110 + 00001010 = 0 (ignore overflow)

Introduction: data types Typically 32 bits (4 bytes) are used to store an integer range: -2,147,483,648 (231-1 -1) to 2,147,483,647 (232-1 -1) what if you want to store even larger numbers? Binary Coded Decimal (BCD) code each decimal digit separately, use a string (sequence) of decimal digits … 231 230 229 228 227 226 225 224 … 27 26 25 24 23 22 21 20 byte 3 byte 2 byte 1 byte 0 C: int

Introduction: data types what if you want to store even larger numbers? Binary Coded Decimal (BCD) 1 byte can code two digits (0-9 requires 4 bits) 1 nibble (4 bits) codes the sign (+/-), e.g. hex C/D 23 22 21 20 2 1 4 2 bytes (= 4 nibbles) 23 22 21 20 1 1 + 2 1 4 2.5 bytes (= 5 nibbles) 23 22 21 20 1 9 credit (+) debit (-) 23 22 21 20 1 23 22 21 20 1 C D

Introduction: data types Typically, 64 bits (8 bytes) are used to represent floating point numbers (double precision) c = 2.99792458 x 108 (m/s) coefficient: 52 bits (implied 1, therefore treat as 53) exponent: 11 bits (not 2’s complement, unsigned with bias) sign: 1 bit (+/-) C: float double x86 CPUs have a built-in floating point coprocessor (x87) 80 bit long registers wikipedia

Introduction: data types char How about letters, punctuation, etc.? ASCII American Standard Code for Information Interchange Based on English alphabet (upper and lower case) + space + digits + punctuation + control (Teletype Model 33) Question: how many bits do we need? 7 bits + 1 bit parity Remember everything is in binary … Teletype Model 33 ASR Teleprinter (Wikipedia)

Introduction: data types order is important in sorting! 0-9: there’s a connection with BCD. Notice: code 30 (hex) through 39 (hex)

Introduction: data types x86 assemby language: PF: even parity flag set by arithmetic ops. TEST: AND (don’t store result), sets PF JP: jump if PF set Example: MOV al,<char> TEST al, al JP <location if even> <go here if odd> Parity bit: transmission can be noisy parity bit can be added to ASCII code can spot single bit transmission errors even/odd parity: receiver understands each byte should be even/odd Example: 0 (zero) is ASCII 30 (hex) = 011000 even parity: 0110000, odd parity: 0110001 Checking parity: Exclusive or (XOR): basic machine instruction A xor B true if either A or B true but not both (even parity 0) 0110000 xor bit by bit 0 xor 1 = 1 xor 1 = 0 xor 0 = 0 xor 0 = 0 xor 0 = 0 xor 0 = 0 xor 0 = 0

Introduction: data types UTF-8 standard in the post-ASCII world backwards compatible with ASCII (previously, different languages had multi-byte character sets that clashed) Universal Character Set (UCS) Transformation Format 8-bits (Wikipedia)

Introduction: data types Example: あ Hiragana letter A: UTF-8: E38182 Byte 1: E = 1110, 3 = 0011 Byte 2: 8 = 1000, 1 = 0001 Byte 3: 8 = 1000, 2 = 0010 い Hiragana letter I: UTF-8: E38184 Shift-JIS (Hex): あ: 82A0 い: 82A2

Introduction: data types How can you tell what encoding your file is using? Detecting UTF-8 Microsoft: 1st three bytes in the file is EF BB BF (not all software understands this; not everybody uses it) HTML: <meta http-equiv="Content-Type" content="text/html;charset=UTF-8" > (not always present) Analyze the file: Find non-valid UTF-8 sequences: if found, not UTF-8… Interesting paper: http://www-archive.mozilla.org/projects/intl/UniversalCharsetDetection.html

Introduction: data types Text files: text files have lines: how do we mark the end of a line? End of line (EOL) control character(s): LF 0x0A (Mac/Linux), CR 0x0D (Old Macs), CR+LF 0x0D0A (Windows) End of file (EOF) control character: (EOT) 0x04 (aka Control-D) programming languages: NUL used to mark the end of a string binaryvision.nl