Text Encoding.

Slides:



Advertisements
Similar presentations
JS Array Hijacking with MBCS encodings JS Array Hijacking with MBCS encodings MBCS文字コードを使ったJS配列の乗っ取り Apr Yosuke HASEGAWA.
Advertisements

Learning Objectives Explain the link between patterns, symbols, and information Determine possible PandA encodings using a physical phenomenon Encode.
Craig Schock, 2003 Binary Numbers Numbering Systems Counting Symbolic Bases Common Bases (10, 2, 8, 16) Representing Information Binary to Decimal Conversions.
論理回路 第2回 今日の内容 前回の課題の説明 数の体系 – 数の表現 – 代表的な数 – 基数の変換 – 補数.
Excelによる積分.
JPN 311: Conversation and Composition 許可 (permission)
EECC250 - Shaaban #1 lec #13 Winter HEX DEC CHR Ctrl 00 0NUL 01 1 SOH ^A 02 2STX ^B 03 3ETX ^C 04 4EOT ^D 05 5ENQ ^E 06 6ACK ^F 07 7BEL.
言語とジェンダー. 目的 言語には、性的な存在である人間の自己認識や 世界認識を決定する力が潜んでいる。 – 言語構造の面(言語的カテゴリー ) – 言語運用の面 日常に潜む無意識の言語の力を、記述し、意識 化することが本講義の目的である。 同時に、さまざまな言語、さまざまな文化には、 それぞれに特徴的な問題があり、ジェンダーの.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Representing Information Digitally Bits and the “Why” of Bytes lawrence snyder.
9/14/2004Comp 120 Fall September 2004 First Exam next Tuesday 21 September Programming Questions? All of Chapter 3 and Appendix A are relevant.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Fluency with Information Technology Third Edition by Lawrence Snyder Chapter.
CIS 234: Character Codes Dr. Ralph D. Westfall April, 2011.
Digital Text Primer Prepared for: AIEA Roundtable on Digitization of Armenian Documents Saturday 7 October 2006, University of Geneva, Switzerland Roland.
Characters & Strings Lesson 1 CS1313 Spring Characters & Strings Lesson 1 Outline 1.Characters & Strings Lesson 1 Outline 2.Numeric Encoding of.
1 Data Representation Computer Organization Prof. H. Yoon DATA REPRESENTATION Data Types Complements Fixed Point Representations Floating Point Representations.
Agenda Data Representation – Characters Encoding Schemes ASCII
The character data type char
BIOS1 Basic Input Output System BIOS BIOS refers to a set of procedures or functions that enable the programmer have access to the hardware of the computer.
Digital Design: From Gates to Intelligent Machines
Decimal Binary Octal Hex
BIOS1 Basic Input Output System BIOS BIOS refers to a set of procedures or functions that enable the programmer have access to the hardware of the computer.
B 02 Writing in Japanese: The 3 Character Sets How do you write in Japanese?
PHY281Variables operators and math functions Slide 1 More On Variables and Operators, And Maths Functions In this section we will learn more about variables.
Dept. of Computer Science Engineering Islamic Azad University of Mashhad 1 DATA REPRESENTATION Dept. of Computer Science Engineering Islamic Azad University.
Informatics I101 February 25, 2003 John C. Paolillo, Instructor.
Postacademic Interuniversity Course in Information Technology – Module C1p1 Chapter 1 Evolution of Communication Networks.
ICT Foundation 1 Copyright © 2010, IT Gatekeeper Project – Ohiwa Lab. All rights reserved. Character representation.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Representing Information Digitally Bits and the “Why” of Bytes lawrence snyder.
Lecture 4: Number Systems (Chapter 3) (1) Data TypesSection3-1 (2) ComplementsSection3-2 (3) Fixed Point RepresentationsSection3-3 (4) Floating Point RepresentationsSection3-4.
日本語一 1月 7 日 New Year’s Greetings : E b0.
Ho w to write ひらがな Left click the mouse to move through each of the slides. Place your mouse on each symbol to hear how it is said. When you see this.
日本語1 2月12日 愛 あい. みっきーは みにーを あいしてい ます。 ほーまーは まーじを あいしてい ます。
1 Information Representation in Computer Lecture Nine.
B 04 How to Type in Japanese How do you TYPE in Japanese?
CS 2130 Lecture 23 Data Types.
Assignments: -Writing practice prompt due THUR. -Quiz signed.
Systems Architecture, Fourth Edition 1 Data Representation Chapter 3.
Characters CS240.
Programming for GCSE Topic 2.2: Binary Representation T eaching L ondon C omputing William Marsh School of Electronic Engineering and Computer Science.
ようこそ日本・日本語のクラ スへ Welcome to Japanese Class! Transition Year 2011.
PRIMITIVE TYPES IN JAVA Primitive Types Operations on Primitive Types.
4th Edition, Irv Englander
Bits, Data Types, and Operations
Chapter 2 Bits, Data Types, and Operations
Machine level representation of data Character representation
Chapter 3 Data Representation Text Characters
Lec 3: Data Representation
Digitizing Discrete Information
Characters Lesson Outline
DATA REPRESENTATION Data Types Complements Fixed Point Representations
Chapter 2 Data Types and Representations
Javascript, Loops, and Encryption
Chapter 2 Bits, Data Types, and Operations
Chapter 2 Bits, Data Types, and Operations
ASCII Character Codes nul soh stx etx eot 1 lf vt ff cr so
October 1 Programming Questions?
DATA REPRESENTATION Data Types Complements Fixed Point Representations
Cosc 2P12 Week 2.
An introduction to UML 2 for modelling communications
Ask Have ~ ? / How long ~ ? Answer these questions
Number Systems Lecture 2.
School of Computer Science and Technology
Characters Lesson Outline
Introduction to Computer Engineering
DATA REPRESENTATION Data Types Complements Fixed Point Representations
Rayat Shikshan Sanstha’s S. M. Joshi College, Hadapsar
Text Representation ASCII Collating Sequence
DATA REPRESENTATION Data Types Complements Fixed Point Representations
Cosc 2P12 Week 2.
Chapter 2 Bits, Data Types, and Operations
Presentation transcript:

Text Encoding

Using binary numbers to represent characters Computers can handle characters As an example, assigning each character a unique binary number, also known as mapping, allows the computer to indirectly handle characters using bits and bytes. The process of converting characters to binary strings (bit sequences) is called encoding. The table that consists characters and binary strigs is called character code table. Character A B C D Binary number 00 01 10 11 ※ These mapping rules are not the real ones in computer.

Character Encoding ASCII code Characters are encoded in 7-digit binary numbers. The 4 left-most bits represent hexadecimal numbers from 0~F, the 3 right-most bits represent hexadecimal numbers from 0~7. (E.g.:A=41(16)=1000001(2)) Some keys trigger certain functions, such as BS and CR. BS(Back Space): turn back by one character CR(Carriage Return): start new line of text Some languages, such as Japanese, have more characters than English, which in this case requires more than 7 bits for representation.

ASCII code table 1 2 3 4 5 6 7 8 9 Null DLE @ P ` p SOH DC1 ! A Q a q 1 2 3 4 5 6 7 Null DLE Space @ P ` p SOH DC1 ! A Q a q STX DC2 " B R b r ETX DC3 # C S c s EOT DC4 $ D T d t ENQ NAK % E U e u ACK SYN & F V f v BEl ETB ' G W g w 8 BS CAN ( H X h x 9 HT EM ) I Y i y LF SUB * : J Z j z VT ESC + ; K [ k { FF FS , < L \ l ¦ CR GS - = M ] m } SO RS . > N ^ n ~ SI US / ? O _ o DEL

Japanese Encoding Multibyte Encoding (variable-width encoding) Japanese consists of 65536 characters including kanjis, and can be represented by 16-bit (2 bytes) binary numbers. JIS X 0208 standard prescribes 6879 characters of Hiragana, Katakana, Kanji, … There are three types of encoding based on JIS X 0208 ISO-2022-JP(JIS)・・・ Mainly used in email Shift_JIS・・・ First used in Windows and widely used in personal computer. EUC-JP・・・ Mainly used in Unix

Unicode Mapping characters of major languages into one character code table Shift-JIS or EUC-JP that is based on JIS X 0208 is only used in Japan Expressing characters of different languages by a16-bit binary number. Two encoding types UCS-2,UCS-4 UTF-7,UTF-8,UTF-16,UTF-32 Official website:http://unicode.org 中国語や日本語,韓国語で使われる漢字で字形が似ている文字を同一とみなす(統合作業)などの問題点もある There’s also a problem of integrated working (be assumed as the same) in such languages using similar Kanji as Japanese, Chinese or Korean