Lesson Objectives Aims You should be able to: Explain the use of binary to represent characters Explain the term “Character set” Describe the relationship between number of bits per character and amount of characters that can be represented ASCII Unicode
We know that computers work in binary Therefore, computers don’t understand text So it needs encoding – turning in to another form.
Text All text, symbols, numbers, emoji’s (yes even those) are encoded They form part of a “Character set” A character set is simply a standard way of representing characters as numbers E.g. A = 1, B = 2, C = 3….
Character sets There are two main character sets you need to know for your exam: ASCII (American Standard Code for Information Interchange) Unicode A character set means a computer can represent/display all recognised characters and symbols in that set
ASCII Binary Decimal Hexadecimal Character 110 0001 97 61 a 110 0010 98 62 b 110 0011 99 63 c 110 0100 100 64 d 110 0101 101 65 e 110 0110 102 66 f 110 0111 103 67 g 110 1000 104 68 h 110 1001 105 69 i 110 1010 106 6A j 110 1011 107 6B k 110 1100 108 6C l 110 1101 109 6D m 110 1110 110 6E n 110 1111 111 6F o 111 0000 112 70 p 111 0001 113 71 q 111 0010 114 72 r 111 0011 115 73 s 111 0100 116 74 t 111 0101 117 75 u 111 0110 118 76 v 111 0111 119 77 w 111 1000 120 78 x 111 1001 121 79 y 111 1010 122 7A z
7 bit encoding (8th bit is parity) Advantages: ASCII 7 bit encoding (8th bit is parity) Advantages: A standard method of communicating Needs only 8 bits per character (1 byte) Disadvantages Can only represent 128 characters Does not cater for other languages Does not encode many symbols
A world wide standard for encoding characters Unicode A world wide standard for encoding characters Designed to cover ALL languages and symbols in a standard way Latest version (9) of Unicode contains: More than 128,000 characters 135 languages and symbol sets
Unicode Advantages Disadvantages Able to encode every possible character, number and symbol in any language past or present 16 bit (loosely, its actually 8 bit pairings (up to 32 bit) depending on standard used, but OCR keep saying 16 bit) vs 8 for ASCII UTF 8 and UTF 16 - 1,112,064 possible characters UTF 8 fully compatible with ASCII A unified, standard method of communication Disadvantages Requires more storage (bytes) per character than ASCII
January 2013 Q8b
January 2011 Q10