Informatics I101 February 25, 2003 John C. Paolillo, Instructor.

Slides:



Advertisements
Similar presentations
Lecture 4 (week 2) Source Coding and Compression
Advertisements

Learning Objectives Explain the link between patterns, symbols, and information Determine possible PandA encodings using a physical phenomenon Encode.
Lecture04 Data Compression.
Craig Schock, 2003 Binary Numbers Numbering Systems Counting Symbolic Bases Common Bases (10, 2, 8, 16) Representing Information Binary to Decimal Conversions.
EECC250 - Shaaban #1 lec #13 Winter HEX DEC CHR Ctrl 00 0NUL 01 1 SOH ^A 02 2STX ^B 03 3ETX ^C 04 4EOT ^D 05 5ENQ ^E 06 6ACK ^F 07 7BEL.
Information Theory Eighteenth Meeting. A Communication Model Messages are produced by a source transmitted over a channel to the destination. encoded.
9/14/2004Comp 120 Fall September 2004 First Exam next Tuesday 21 September Programming Questions? All of Chapter 3 and Appendix A are relevant.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Fluency with Information Technology Third Edition by Lawrence Snyder Chapter.
Digital Text Primer Prepared for: AIEA Roundtable on Digitization of Armenian Documents Saturday 7 October 2006, University of Geneva, Switzerland Roland.
Management Information Systems Lection 06 Archiving information CLARK UNIVERSITY College of Professional and Continuing Education (COPACE)
Characters & Strings Lesson 1 CS1313 Spring Characters & Strings Lesson 1 Outline 1.Characters & Strings Lesson 1 Outline 2.Numeric Encoding of.
1 Data Representation Computer Organization Prof. H. Yoon DATA REPRESENTATION Data Types Complements Fixed Point Representations Floating Point Representations.
Agenda Data Representation – Characters Encoding Schemes ASCII
The character data type char
Chapter 7 Data Coding. Agenda Coding Code efficiency and conversion Compression/compaction Code encryption/decryption.
BIOS1 Basic Input Output System BIOS BIOS refers to a set of procedures or functions that enable the programmer have access to the hardware of the computer.
Digital Design: From Gates to Intelligent Machines
Decimal Binary Octal Hex
BIOS1 Basic Input Output System BIOS BIOS refers to a set of procedures or functions that enable the programmer have access to the hardware of the computer.
PHY281Variables operators and math functions Slide 1 More On Variables and Operators, And Maths Functions In this section we will learn more about variables.
Dept. of Computer Science Engineering Islamic Azad University of Mashhad 1 DATA REPRESENTATION Dept. of Computer Science Engineering Islamic Azad University.
Huffman Codes. Encoding messages  Encode a message composed of a string of characters  Codes used by computer systems  ASCII uses 8 bits per character.
Postacademic Interuniversity Course in Information Technology – Module C1p1 Chapter 1 Evolution of Communication Networks.
Huffman Encoding Veronica Morales.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Representing Information Digitally Bits and the “Why” of Bytes lawrence snyder.
CS-2852 Data Structures LECTURE 13B Andrew J. Wozniewicz Image copyright © 2010 andyjphoto.com.
Lecture 4: Number Systems (Chapter 3) (1) Data TypesSection3-1 (2) ComplementsSection3-2 (3) Fixed Point RepresentationsSection3-3 (4) Floating Point RepresentationsSection3-4.
CS151 Introduction to Digital Design
Linawati Electrical Engineering Department Udayana University
Lossless Compression CIS 465 Multimedia. Compression Compression: the process of coding that will effectively reduce the total number of bits needed to.
Huffman coding Content 1 Encoding and decoding messages Fixed-length coding Variable-length coding 2 Huffman coding.
1 Information Representation in Computer Lecture Nine.
CS 2130 Lecture 23 Data Types.
Systems Architecture, Fourth Edition 1 Data Representation Chapter 3.
Characters CS240.
ASCII AND EBCDIC CODES By : madam aisha.
Information Coding Schemes Group Member : Yvonne Tiffany Jurifah bt Junaidi Clara Jane George.
ECE 101 An Introduction to Information Technology Information Coding.
Programming for GCSE Topic 2.2: Binary Representation T eaching L ondon C omputing William Marsh School of Electronic Engineering and Computer Science.
PRIMITIVE TYPES IN JAVA Primitive Types Operations on Primitive Types.
1.4 Representation of data in computer systems Character.
Lecture Coding Schemes. Representing Data English language uses 26 symbols to represent an idea Different sets of bit patterns have been designed to represent.
4th Edition, Irv Englander
Bits, Data Types, and Operations
Chapter 2 Bits, Data Types, and Operations
Machine level representation of data Character representation
Chapter 3 Data Representation Text Characters
Digitizing Discrete Information
Characters Lesson Outline
DATA REPRESENTATION Data Types Complements Fixed Point Representations
Chapter 2 Data Types and Representations
Data Representation ASCII.
Javascript, Loops, and Encryption
Chapter 2 Bits, Data Types, and Operations
Chapter 2 Bits, Data Types, and Operations
ASCII Character Codes nul soh stx etx eot 1 lf vt ff cr so
October 1 Programming Questions?
DATA REPRESENTATION Data Types Complements Fixed Point Representations
Cosc 2P12 Week 2.
Number Systems Lecture 2.
Text Encoding.
School of Computer Science and Technology
Characters Lesson Outline
Introduction to Computer Engineering
DATA REPRESENTATION Data Types Complements Fixed Point Representations
Rayat Shikshan Sanstha’s S. M. Joshi College, Hadapsar
Text Representation ASCII Collating Sequence
DATA REPRESENTATION Data Types Complements Fixed Point Representations
Cosc 2P12 Week 2.
Chapter 2 Bits, Data Types, and Operations
Presentation transcript:

Informatics I101 February 25, 2003 John C. Paolillo, Instructor

Electronic Text ASCII — American Standard Code for Information Interchange EBCDIC (IBM Mainframes, not standard) Extended ASCII (8-bit, not standard) –DOS Extended ASCII –Windows Extended ASCII –Macintosh Extended ASCII UNICODE (16-bit, standard-in-progress)

ASCII Alphabet letter "A" means Screen Representation A A A  is displayed as

The ASCII Code NUL SOH STX ETX EOT ENQ ACK BEL BS HT LF VT FF CR SO SI ABCDEF ABCDEF blank ! " # $ % & ' ( ) * + ` -. / `abcdefghijklmno`abcdefghijklmno DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US :;<=>? :;<=>? PQRSTUVWXYZ[\]^~PQRSTUVWXYZ[\]^~ p q r s t u v w x y z { | } ~ DEL

An Example Text T h i s i s a n e x a m p l e Note that each ASCII character corresponds to a number, including spaces, carriage returns, etc. Everything must be represented somehow, otherwise the computer couldn’t do anything with it.

Representation in Memory _elpmaxe_na_elpmaxe_na

Features of ASCII 7 bit fixed-length code –all codes have same number of bits Sorting: A precedes B, B precedes C, etc. Caps + 32 = Lower case (A + space = a) Word divisions, etc. must be parsed ASCII is very widespread and almost universally supported.

Variable-Length Codes Some symbols (e.g. letters) have shorter codes than others –E.g. Morse code: e = dot, j = dot-dash-dash-dash –Use frequency of symbols to assign code lentgths Why? Space efficiency –compression tools such as gzip and zip use variable- length codes (based on words)

Requirements Starting and ending points of symbols must be clear (simplistic) example: four symbols must be encoded: 0   1110  All symbols end with a zero Any zero ends a symbol Any one continues a symbol Average number of bits per symbol = 2

Example 12 symbols –digits 0-9 –decimal point and space (end of number) _ _

Efficient Coding Huffman coding (gzip) 1.count the number of times each symbol occurs 2.start with the two least frequent symbol a)combine them using a tree b)put 0 on one branch, 1 on the other c)combine counts and treat as a single symbol 3.continue combining in the same way until every symbol is assigned a place in the tree 4.read the codes from the top of the tree down to each symbol

Information Theory Mathematical theory of communication –How many bits in an efficient variable-length encoding? –How much information is in a chunk of data? –How can the capacity of an information medium be measured? Probabilistic model of information –“Noisy channel” model –less frequent ≈ more surprising ≈ more informative Measures information using the notion entropy

Noisy Channel Source Destination We measure the probability of each possible path (correct reception and errors)

Entropy Entropy of a symbol is calculated from its probability of occurrence Number of bits required h s = log 2 p s Average entropy: H(p) = – sum( p i log p i ) Related to variance Measured in bits (log 2 )

Base 2 Logarithms 2 log 2 x = x ; e.g. log 2 2 = 1, log 2 4 = 2, log 2 8 = 3, etc. Often we round up to the nearest power of two (= min number of bits)

Unicode Administered by the Unicode ConsortiumUnicode Consortium Assigns unique code to every written symbol (21 bits: 2,097,152 codes) –UTF-32: four-byte fixed-length code –UTF-16: two to four-byte variable-length code –UTF-8: one to 4-byte variable length code ASCII Block (one byte) + basic multilingual plane (2-3 bytes) + supplementary (4 bytes)