Dale & Lewis Chapter 3 Data Representation

Dale & Lewis Chapter 3 Data Representation

Analog and digital information
The real world is continuous and finite, data on computers are finite  need to approximate real-world data for our computational needs Analog data: information represented in a continuous form Digital data: information represented in digital form

Analog and digital information

Noise in signals

Digitizing a signal Sample the signal in time within discrete levels
The pieces are numbered The binary number system is used to represent the numbers n bits can represent 2n numbers Q: how many bits are needed to represent m numbers? Actual number of bits that can be easily addressed in a computer sets some constraints

Representing text English language character set: 26 letters (both upper and lower case), punctuation, numeric digits, etc How many bits can we use? What about other languages?

ASCII character set American Standard Code for Information Interchange
Each character is coded as a byte (8 bits) 7-bit code (1 check bit) Later all 8 bits used in the “extended character set” 128 characters encoded (27) 95 visible characters 33 invisible (control) characters

7-bit ASCII character set

ASCII Table The table above was sorted in decimal values
These decimal values are really representing binary sequences So the character J is in position 74 This would be in Binary or 4A in Hexadecimal j in 106 is in Binary or 6A in Hexadecimal Notice anything? There is a purpose for that! The Unicode character set 16-bit standard, 65,536 possible codes Enough to cover the principal languages of the World Superset of ASCII so the first 256 codes of Unicode are the same as Extended ASCII

Text compression Keyword encoding
Substitute frequently used words with single characters i.e.: “as”  ^, “the”  ~, “and”  +, “that”  $, etc. Problems: These characters can’t be part of the text Frequently used words tend to be short, so not much gain Word variations are not handled: i.e. “The” vs. “the”

Run-length encoding Replace long series of a repeated character with a special short code i.e.: replace “AAAAAAA” with *A7 This is equivalent to with Note that repetitions shorter than 4 characters are not worth encoding Also note that the repetition number is encoded in binary, not ASCII, so that repetitions longer than 9 can be captured Used in limited-palette image compression and fax machines

Huffman encoding Generalization of Morse Code
Morse code (dots & dashes) is based on distribution of letters in general English usage Huffman encoding in based on distribution in a given message Algorithm: Encoding: Build frequency table of letter usage Build the code and encode the message Decoding Huffman code has the prefix property Prefix property: no code is the front part of another code Decoding processes the bit stream until a match is found

Example of Huffman encoding/decoding
Message: DOORBELL Encoding: Compression ratio (vs ASCII): 25/64 = 0.39 Decode:

Dale & Lewis Chapter 3 Data Representation

Similar presentations

Presentation on theme: "Dale & Lewis Chapter 3 Data Representation"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Dale & Lewis Chapter 3 Data Representation

Similar presentations

Presentation on theme: "Dale & Lewis Chapter 3 Data Representation"— Presentation transcript:

Similar presentations

About project

Feedback