© 2001, Penn State University Encoding on the Internet Elizabeth J. Pyatt CETS.

Slides:



Advertisements
Similar presentations
Bits and the "Why" of Bytes: Representing Information Digitally
Advertisements

Tafseer Ahmed Department of Computer Science University of Karachi Urdu on Linux International Support.
Unicode and the Web Nathan Schneider. Special Text In our interactions with computers, it is often desirable to use characters other than the standard.
1. Discrete / Continuous Representations Of numbers – binary & decimal Bits Hexadecimal - 'Hex' Representing text Bits and Bytes.
Binary Expression Numbers & Text CS 105 Binary Representation At the fundamental hardware level, a modern computer can only distinguish between two values,
Chapter 8_2 Bits and the "Why" of Bytes: Representing Information Digitally.
Media: Text “Words and symbols in any form, spoken or written, are the most common system of communication.” ~ unknown.
1/25 Writing Character sets Unicode Input methods.
Data Representation (in computer system) Computer Fundamental CIM2460 Bavy LI.
CCE-EDUSAT SESSION FOR COMPUTER FUNDAMENTALS Date: Session III Topic: Number Systems Faculty: Anita Kanavalli Department of CSE M S Ramaiah.
Moving a Large Scale University to Unicode Elizabeth J. Pyatt, Ph.D. Teaching and Learning with Technology Penn State University
CIS 234: Character Codes Dr. Ralph D. Westfall April, 2011.
COMPUTER FUNDAMENTALS David Samuel Bhatti
ASCII and Unicode. ASCII Inside a computer, EVERYTHING is a number – that includes music, sound, and text. In the early days of computers, every manufacturer.
2.1.4 BINARY ASCII CHARACTER SETS A451: COMPUTER SYSTEMS AND PROGRAMMING.
Decisions in Python Comparing Strings – ASCII History.
Unicode, character sets, and a a little history. Historical Perspective First came EBCIDIC (6 Bits?) Then in the early 1960s came ASCII – Most computers.
CHARACTERS Data Representation. Using binary to represent characters Computers can only process binary numbers (1’s and 0’s) so a system was developed.
Computer Science and Software Engineering University of Wisconsin - Platteville Note 9. Internationalization Yan Shi SE 3730 / CS 5730 Lecture Notes Part.
The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky Veronika.
Strategies for Developing Non-English Websites Elizabeth J. Pyatt Instructional Designer Education Technology Services.
1 © 2000, Cisco Systems, Inc. DNSSEC IDN Patrik Fältström
ECA 228 Internet/Intranet Design I Meta Tags & Directories.
Introduction to Human Language Technologies Tomaž Erjavec Karl-Franzens-Universität Graz Tomaž Erjavec Lecture: Character sets
Sophia Antipolis, September 2006 Multilinguality, localization and internationalization Miruna Bădescu Finsiel Romania.
East Meets Rest Adding East Asian Scripts to Harvard’s ILS Prepared for presentation to the North American Aleph Users’ Group 2 June 2003 Charles Husbands,
Creating Interfaces: Localization Language & other issues character codes Homework: preparation for future topics.
Globalisation & Computer Systems week 5 1. Localisation presentations 2.Character representation and UNICODE UNICODE design principles UNICODE character.
Encoding and fonts Edward Garrett Software Developer, ELAR.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
Digital Multimedia, 2nd edition Nigel Chapman & Jenny Chapman Chapter 10 This presentation © 2004, MacAvon Media Productions Characters & Fonts.
Week 4 Number Systems.
Binary Numbers and ASCII and EDCDIC Mrs. Cueni. Data Representation  Human speech is analog because it uses continuous signals (waves) that vary in strength.
Dale Roberts Department of Computer and Information Science, School of Science, IUPUI CSCI N305 Information Representation: Characters and Images.
Spring /6.831 User Interface Design and Implementation1 Lecture 22: Internationalization.
1 CS 502: Computing Methods for Digital Libraries Lecture 4 Text.
Globalisation & Computer systems Week 4 writing systems and their implications for globalisation character representation ASCII extended ASCII code pages.
Dale Roberts Department of Computer and Information Science, School of Science, IUPUI CSCI 230 Dale Roberts, Lecturer Information.
CS151 Introduction to Digital Design
Character Encoding, F onts. Overview Why do character encoding and fonts matter to linguists? How can you identify problems? Why do these problems arise?
� Teaching Languages February 00. � Teaching Languages Agenda Teaching Languages - Rolly Sussex Uni Qld Case Study - Mike Fardon Uni WA Language support.
1 3 Computing System Fundamentals 3.5 Data Representation.
Globalisation & Computer systems Week 5/6 Character representation ACII and code pages UNICODE.
Data Files on Computers Text Files (ASCII) Files that can be created by typing on the keyboard while using a text editor such as notepad or TextEdit.
Data Representation Conversion 24/04/2017.
Representing Characters in a computer Pressing a key on the computer a code is generated that the computer can convert into a symbol for displaying or.
Representation of Characters
Data Representation. What is data? Data is information that has been translated into a form that is more convenient to process As information take different.
M204 - Data Representation
DATA REPRESENTATION 4 Y. Colette Lemard February 2009.
Representing Characters in a Computer System Representation of Data in Computer Systems.
Information Coding Schemes Group Member : Yvonne Tiffany Jurifah bt Junaidi Clara Jane George.
Basics of Unicode (base upon a presentation by NRSI, SIL International)
Computers: Information Technology in Perspective By Long and Long Copyright 2002 Prentice Hall, Inc. Encoding J. Holvikivi 2012.
1.4 Representation of data in computer systems Character.
Introduction to computer science Lec2 cs111. Extended Binary Coded Decimal Interchange Code (EBCDIC) is an 8- bit character encoding used mainly on.
1 Non-Numeric Data Representation V1.0 (22/10/2005)
Nat 4/5 Computing Science Data Representation Lesson 3: Storing Text
DATA REPRESENTATION - TEXT
Unit 2.6 Data Representation Lesson 2 ‒ Characters
Binary 1 Basic conversions.
Characters & Fonts Digital Multimedia, 2nd edition
Representing Information as bit patterns
Data Encoding Characters.
TOPICS Information Representation Characters and Images
Representing Characters
Characters & Fonts Digital Multimedia, 2nd edition
Learning Intention I will learn how computers store text.
ASCII LP1.
ASCII and Unicode.
Presentation transcript:

© 2001, Penn State University Encoding on the Internet Elizabeth J. Pyatt CETS

© 2001, Penn State University Computers Do Numbers All data on computers are ultimately stored as numbers Letters are assigned numbers via an encoding sytem Numbers in the encoding system determine the alphabetical order of the letters Keyboards input a number which correponds to that letter

© 2001, Penn State University ASCII Encoding ASCII - American Standard Code for Information Exhange Invented in the 1960s Limited to 128 (2 7 ) characters (English only) ASCII encoding on all modern computers ASCII encodes letters, digits, punctuation and the blank space character Distinguishes capital letters from lower case

© 2001, Penn State University ASCII Chart (Excel)

© 2001, Penn State University First Steps Beyond ASCII Vendors add an additional 128 characters for 256 total characters ( or 2 8 “8-bit”) Characters #0-127 = ASCII Characters # = non-English letters and punctuation Each accented letter (e.g. á,â or Á) is a separate character. Multiple vendors = multiple standards

© 2001, Penn State University ISO / Latin 1 Internet standard for English and Western European Languages is ISO ISO = International Organization of Standards 8859 = encoding standard 1 = 1st one registered at ISO Latin / Roman = English alphabet Almost identical to Windows-1252 encoding Differs from “MacRoman” on Macintosh

© 2001, Penn State University Latin-1 vs. Mac Roman (GIF)

© 2001, Penn State University Encoding Non-Roman Scripts Alternate encodings developed for other scripts like Russian, Arabic, Greek, Hebrew Template is: –Character #0-127=ASCII –Character # =Non-Roman script Some scripts also developed multiple encodings, typically an ISO version and a Windows version (e.g.Hebrew = ISO or Windows-1255)

© 2001, Penn State University Encoding Schemes

© 2001, Penn State University 16 Bit and Beyond Chinese, Japanese and Korean have more than 256 characters 16-Bit encodings with 2 16 or 65,536 characters developed Unicode, which attempts to combine all modern scripts into one super encoding block, is currently being developed Increasing Unicode support on Windows and Macintosh, but still limited in application

© 2001, Penn State University How browsers read a site Web site specifies encoding to browser Browser matches encoding with the right font on your machine Browser displays the appropriate characters (English and non-English)

© 2001, Penn State University How to mess up the browser Web site doesn’t specify encoding, so browser stays on default (usually Latin-1) Web site specifies font not on user’s machine Font doesn’t match encoding Font doesn’t have all the right characters (e.g. € (Euro currency symbol))

© 2001, Penn State University Keyboards & Fonts Normal fonts (e.g. Times, Arial) match the character to its ASCII/Latin-1 number based on the keyboard “Dingbat Fonts” (e.g. Symbol, Wingdings) do not match the character to the ASCII Code Most keyboards still access only 128 characters at time (Mac can do 256) Therefore, older non-English script fonts (e.g. Symbol) do not always match script encoding

© 2001, Penn State University Where to get good fonts Microsoft provides free, properly encoded, fonts with Windows NT and Windows 2000 Apple provides free, properly encoded fonts via its Language Kits (free with System 9) Third party fonts are available (but can be glitchy)