Machine level representation of data Character representation

Slides:



Advertisements
Similar presentations
Learning Objectives Explain the link between patterns, symbols, and information Determine possible PandA encodings using a physical phenomenon Encode.
Advertisements


Craig Schock, 2003 Binary Numbers Numbering Systems Counting Symbolic Bases Common Bases (10, 2, 8, 16) Representing Information Binary to Decimal Conversions.
CS 61C L02 Number Representation (1) Garcia, Spring 2004 © UCB Lecturer PSOE Dan Garcia inst.eecs.berkeley.edu/~cs61c CS61C.
EECC250 - Shaaban #1 lec #13 Winter HEX DEC CHR Ctrl 00 0NUL 01 1 SOH ^A 02 2STX ^B 03 3ETX ^C 04 4EOT ^D 05 5ENQ ^E 06 6ACK ^F 07 7BEL.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Representing Information Digitally Bits and the “Why” of Bytes lawrence snyder.
9/14/2004Comp 120 Fall September 2004 First Exam next Tuesday 21 September Programming Questions? All of Chapter 3 and Appendix A are relevant.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Fluency with Information Technology Third Edition by Lawrence Snyder Chapter.
CIS 234: Character Codes Dr. Ralph D. Westfall April, 2011.
Digital Text Primer Prepared for: AIEA Roundtable on Digitization of Armenian Documents Saturday 7 October 2006, University of Geneva, Switzerland Roland.
Characters & Strings Lesson 1 CS1313 Spring Characters & Strings Lesson 1 Outline 1.Characters & Strings Lesson 1 Outline 2.Numeric Encoding of.
1 Data Representation Computer Organization Prof. H. Yoon DATA REPRESENTATION Data Types Complements Fixed Point Representations Floating Point Representations.
Agenda Data Representation – Characters Encoding Schemes ASCII
The character data type char
Computers Organization & Assembly Language
Chapter 7 Data Coding. Agenda Coding Code efficiency and conversion Compression/compaction Code encryption/decryption.
BIOS1 Basic Input Output System BIOS BIOS refers to a set of procedures or functions that enable the programmer have access to the hardware of the computer.
Digital Design: From Gates to Intelligent Machines
Decimal Binary Octal Hex
BIOS1 Basic Input Output System BIOS BIOS refers to a set of procedures or functions that enable the programmer have access to the hardware of the computer.
PHY281Variables operators and math functions Slide 1 More On Variables and Operators, And Maths Functions In this section we will learn more about variables.
Dept. of Computer Science Engineering Islamic Azad University of Mashhad 1 DATA REPRESENTATION Dept. of Computer Science Engineering Islamic Azad University.
Informatics I101 February 25, 2003 John C. Paolillo, Instructor.
Postacademic Interuniversity Course in Information Technology – Module C1p1 Chapter 1 Evolution of Communication Networks.
Lec 3: Data Representation Computer Organization & Assembly Language Programming.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Representing Information Digitally Bits and the “Why” of Bytes lawrence snyder.
Binary, Decimal and Hexadecimal Numbers Svetlin Nakov Telerik Corporation
Lecture 4: Number Systems (Chapter 3) (1) Data TypesSection3-1 (2) ComplementsSection3-2 (3) Fixed Point RepresentationsSection3-3 (4) Floating Point RepresentationsSection3-4.
1 Information Representation in Computer Lecture Nine.
The Information School of the University of Washington 15-Oct-2004cse digital1 Digital Representation INFO/CSE 100, Spring 2005 Fluency in Information.
CS 2130 Lecture 23 Data Types.
Systems Architecture, Fourth Edition 1 Data Representation Chapter 3.
Characters CS240.
Character representation in the computers Home Assignment 1 Assigned. Deadline 2016 January 24th, Sunday.
Programming for GCSE Topic 2.2: Binary Representation T eaching L ondon C omputing William Marsh School of Electronic Engineering and Computer Science.
PRIMITIVE TYPES IN JAVA Primitive Types Operations on Primitive Types.
1.4 Representation of data in computer systems Character.
Lecture Coding Schemes. Representing Data English language uses 26 symbols to represent an idea Different sets of bit patterns have been designed to represent.
Binary Representation in Text
4th Edition, Irv Englander
Chapter 8 & 11: Representing Information Digitally
Bits, Data Types, and Operations
Chapter 2 Bits, Data Types, and Operations
Chapter 3 Data Representation Text Characters
Lec 3: Data Representation
Digitizing Discrete Information
Characters Lesson Outline
DATA REPRESENTATION Data Types Complements Fixed Point Representations
Chapter 2 Data Types and Representations
Javascript, Loops, and Encryption
Chapter 2 Bits, Data Types, and Operations
Chapter 2 Bits, Data Types, and Operations
Ch2: Data Representation
ASCII Character Codes nul soh stx etx eot 1 lf vt ff cr so
Introduction to Computer Engineering
October 1 Programming Questions?
DATA REPRESENTATION Data Types Complements Fixed Point Representations
Cosc 2P12 Week 2.
Number Systems Lecture 2.
Text Encoding.
School of Computer Science and Technology
Characters Lesson Outline
Introduction to Computer Engineering
DATA REPRESENTATION Data Types Complements Fixed Point Representations
Rayat Shikshan Sanstha’s S. M. Joshi College, Hadapsar
Text Representation ASCII Collating Sequence
DATA REPRESENTATION Data Types Complements Fixed Point Representations
Cosc 2P12 Week 2.
ASCII and Unicode.
Chapter 2 Bits, Data Types, and Operations
Presentation transcript:

Machine level representation of data Character representation Bits can represent anything!! Characters? 26 letters  5 bits (25 = 32) upper/lower case + punctuation  7 bits (in 8) (“ASCII”) standard code to cover all the world’s languages  8,16,32 bits (“Unicode”) Logical values? 0  False, 1  True colors ? Ex: locations / addresses? commands? MEMORIZE: N bits  at most 2N things Red (100) Green (010) Blue (001)

Characters representation in computers and devices ASCII  ANSI  MultiByte  Unicode American Standard Code for Information Interchange This was the de facto world-wide standard for the code numbers used by computers to represent all the upper and lower-case Latin letters, numbers, punctuation, etc. How many bits we need for 5 letters representation? How many bits we need to represent ? 4 letters - A,B,C,D – 2 bits - 22 = 4 patterns How many bits for 26 English uppercase letters ? - 5 bits – 25 = 32 patterns 26 English lowercase letters - 5 bits – 25 = 32 patterns Decimal digits and special signs - 5 bits – 25 = 32 patterns Special Control Characters - 5 bits – 25 = 32 patterns How many bits we need for 128 patterns representation?

The Content of ASCII table Contains 128 characters ASCII needs only 7 bits for character representation (“A” - 100 00012 ) The first printable character is SP (space) and corresponds to the bit pattern 010 0000 – 0x20. The characters A and B correspond to A - 100 0001 – 0x41 B - 100 00102 – 0x42 Find “z” ’s ASCII code z – 111 10102 – 0x7A 1's place 16's place 1 2 3 4 5 6 7 8 9 A B C D E F NUL SOH STX ETX EOT ENQ ACK BEL BS HT LF VT FF CR SO SI DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US SP ! " # $ % & ' ( ) * + , - . / : ; <  = >  ? @ G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~ DEL

Control characters 1 2 3 4 5 6 7 8 9 A B C D E F Control characters are not shown or printed on the different devices. They control the devices. These control characters have different meaning for different devices. 0x0A – Line Feed, 0x0D – Carriage Return 1's place 16's place 1 2 3 4 5 6 7 8 9 A B C D E F NUL SOH STX ETX EOT ENQ ACK BEL BS HT LF VT FF CR SO SI DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US SP ! " # $ % & ' ( ) * + , - . / : ; <  = >  ? @ G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~ DEL ASCII full chart

ASCII code Advantages, Disadvantages Why this is an advantage? How? How? 1's place 16's place 1 2 3 4 5 6 7 8 9 A B C D E F NUL SOH STX ETX EOT ENQ ACK BEL BS HT LF VT FF CR SO SI DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM SUB ESC FS GS RS US SP ! " # $ % & ' ( ) * + , - . / : ; <  = >  ? @ G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~ DEL The history of ASCII since 1967 is mostly a history of attempts to overcome its limitations and make it more applicable to languages other than American English.

ASCII code Extension. Character sets. 8 bits ANSI - 256 chars IBM - 256 chars Other- 256 chars ASCII 128 chars Extra 128 chars ASCII 128 chars Extra 128 chars ASCII 128 chars Arme nian 7 bits 7 bits

1 byte – 256 character Code pages ANSI and similar code pages need 8 bits for character representation (“A” - 0100 00012 ) (“ ” – 0xB2 - 1011 00102 ) Ա (“ ” – 0xE2 - 1110 00102 ) Б ANSI - 256 chars IBM - 256 chars Armenian code page #1 Armenian code page #2 Cyrillic code page #1 ASCII Extra 128 chars ASCII Extra 128 chars ASCII Arme nian 1 ASCII Arme nian 2 ASCII Cyril lic 1

Chinese code page (lead byte) Chinese code page (trail byte) Double (or multi) byte character sets Chinese code page (lead byte) Chinese code page (trail byte) ASCII 128 chars A DBCS starts off with 256 codes Like any well-behaved code page, the first 128 of these codes are ASCII. However, some of the codes in the higher 128 are always followed by a second byte. The two bytes together (called a lead byte and a trail byte) define a single character, usually a complex ideograph.

Double (or multi) byte character sets Advantage: DBCS allows to create pages also for languages having more than 256 letters or signs. Disadvantage: Different documents created by different code pages even for the same language (Cyrillic or Armenian) are still be not compatible. The problem with a double-byte character set is not that characters are represented by 2 bytes. The problem is that some characters (in particular, the ASCII characters) are represented by 1 byte. This creates odd programming problems. For example, the number of characters in a character string cannot be determined by the byte size of the string.

Unicode code point – Symbol’s identifier The best thing about Unicode is that there's only one character set. There's simply no ambiguity. The representation in bits is enough large to accommodate all the languages and signs. Flavors – UTF-8, UTF-16, UTF-32. code point – Symbol’s identifier

Unicode UTF-8 (like multibyte) Code point – Symbol’s identifier Leading Byte of the multi-byte sequence Continuation Byte of the multi-byte sequence Bits Last code point Byte 1 Byte 2 Byte 3 Byte 4 Byte 5 Byte 6   7 U+007F 0xxxxxxx 11 U+07FF 110xxxxx 10xxxxxx 16 U+FFFF 1110xxxx 21 U+1FFFFF 11110xxx 26 U+3FFFFFF 111110xx 31 U+7FFFFFFF 1111110x Single Byte Header bits Header bits – The # of “1”s = # of bytes Header bit always = 0 Armenian character set uses the codes 0x0530 through 0x058F

Unicode UTF-16 The Unicode code space is divided into seventeen PLANES of 216 (65,536) code points each The code points in each plane have the hexadecimal values xx0000 to xxFFFF where xx is a hex value from 00 to 10 1st plane code points U+0000 to U+D7FF and U+E000 to U+FFFF - Basic Multilingual Plane – most frequently used characters. Code points U+D800 to U+DFFF - Extensions

Unicode Advantage: Supports all languages and different signs by single code page. UTF-8 is back compatible with the ASCII UTF-16 Basic multilingual plane (first 16 bits- 2 bytes) and UTF-32 (4 bytes) allow to work with the characters like with the regular text.

Unicode Disadvantage: Fixed 2 or 4 bytes UTF-16 and UTF-32 Unicode character strings occupy twice or four times as much memory as ASCII strings.   UTF-8 character set has variable length. This is an advantage to have less size than the fixed Unicode character set. And this is a disadvantage with the programming. For example, the number of characters in a character string cannot be determined by the byte size of the string.