Lis508 lecture 1: bits, bytes and characters Thomas Krichel 2003-09-30.

Slides:



Advertisements
Similar presentations
1. XP 2 * The Web is a collection of files that reside on computers, called Web servers. * Web servers are connected to each other through the Internet.
Advertisements

Copyright © 2006 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill Technology Education Copyright © 2006 by The McGraw-Hill Companies,
ITR3 lecture 1: bits, bytes and characters Thomas Krichel
Lis508 lecture 1: bits, bytes and characters Thomas Krichel
INTERNET PROTOCOLS Class 9 CSCI 6433 David C. Roberts Entire contents copyright 2011, David C. Roberts, all rights reserved.
Transforming Data into Information lesson 7 This lesson includes the following sections: How Computers Represent Data How Computers Process Data Factors.
Network, Local, and Portable Storage Media Computer Literacy for Education Majors.
XP New Perspectives on Browser and Basics Tutorial 1 1 Browser and Basics Tutorial 1.
Review of HTML Ch. 1.
Binary Expression Numbers & Text CS 105 Binary Representation At the fundamental hardware level, a modern computer can only distinguish between two values,
Data Representation (in computer system) Computer Fundamental CIM2460 Bavy LI.
15 September How Computers Work: Other Forms of Data.
Computer Arithmetic: Binary, Octal and Hexadecimal Presented by Frank H. Osborne, Ph. D. © 2005 ID 2950 Technology and the Young Child.
Binary and Decimal Numbers
1 12/08/03SW Abingdon and Witney College Binary Converting to and from decimal.
 Method of representing or encoding numbers  Two main notation types  Sign-value  Roman numerals  Positional (place-value)  Modern decimal notation.
Computer Fluency Binary Systems. Humans Decimal Numbers (base 10) Decimal Numbers (base 10) Sign-Magnitude (-324) Sign-Magnitude (-324) Decimal Fractions.
© BYU 02 NUMBERS Page 1 ECEn 224 Binary Number Systems and Codes.
CCE-EDUSAT SESSION FOR COMPUTER FUNDAMENTALS Date: Session III Topic: Number Systems Faculty: Anita Kanavalli Department of CSE M S Ramaiah.
CIS 234: Character Codes Dr. Ralph D. Westfall April, 2011.
DAT2343 Basic Character Encoding Including ASCII © Alan T. Pinck / Algonquin College; 2003.
Homework Reading –Finish K&R Chapter 1 (if not done yet) –Start K&R Chapter 2 for next time. Programming Assignments –DON’T USE and string library functions,
Decisions in Python Comparing Strings – ASCII History.
CHARACTERS Data Representation. Using binary to represent characters Computers can only process binary numbers (1’s and 0’s) so a system was developed.
Hexadecimal and ASCII Lesson Objective: Understand the purpose of ASCII and how to use it. Lesson Outcome: Convert between Hexadecimal and ASCII Convert.
LIS510 lecture 12 Thomas Krichel today Leftovers from last time. I discuss some elements of Bill Arms’ book on Digital Libraries. –It’s introductory.
Chapter 3 Representing Numbers and Text in Binary Information Technology in Theory By Pelin Aksoy and Laura DeNardis.
Representing text Each of different symbol on the text (alphabet letter) is assigned a unique bit patterns the text is then representing as.
Agenda Data Representation – Characters Encoding Schemes ASCII
Lecture 2 Character Codes and Low-Structure Text Document Formats.
Bits & Bytes: How Computers Represent Data
IT-101 Section 001 Lecture #3 Introduction to Information Technology.
Data Representation S2. This unit covers how the computer represents- Numbers Text Graphics Control.
Chapter 2 Computer Hardware
Skill Area 311 Part A. Lecture Overview Binary Numbers Binary Arithmetic ASCII Code Machine Code Instruction Format Advantages and disadvantages of machine.
1 Introduction Chapter 1 n What is Assembly Language? n Data Representation.
Data Representation and Storage Lecture 5. Representations A number value can be represented in many ways: 5 Five V IIIII Cinq Hold up my hand.
Fill in the blanks: (1) _________ has only two possible values 0 and 1. (2) There are __________bits in a byte. (3) 1 kilobyte of memory space can store.
Introduction to Computer Design CMPT 150 Section: D Ch. 1 Digital Computers and Information CMPT 150, Chapter 1, Tariq Nuruddin, Fall 06, SFU 1.
CS151 Introduction to Digital Design
Text and Graphics September 26, Unit 3.
Bits & Bytes Created by Chris McAbee For AAMU AGB199 Extra Credit Created from information copied and pasted from
SEC (1.4) Representing Information as bit patterns.
Lis508 lecture 2: characters to textual documents Thomas Krichel
CS 111 – Sept. 1 Intro to data representation Binary numbers –Convert binary  decimal –Convert decimal  binary Text –ASCII and Unicode Commitment: –For.
Chapter 3 The Power of HEX Finding Slivers of Data.
Data Representation. What is data? Data is information that has been translated into a form that is more convenient to process As information take different.
Data Representation. How is data stored on a computer? Registers, main memory, etc. consists of grids of transistors Transistors are in one of two states,
 Method of representing or encoding numbers  Two main notation types  Sign-value  Roman numerals  Positional (place-value)  Modern decimal notation.
CC111 Lec#2 The System Unit The System Unit: Processing and Memory Lecture 2 Binary System.
1.4 Representation of data in computer systems Character.
Nat 4/5 Computing Science Data Representation Lesson 3: Storing Text
DATA REPRESENTATION - TEXT
Binary Representation in Text
Binary Representation in Text
Chapter 8 & 11: Representing Information Digitally
Chapter 3 Data Representation Text Characters
Bits & Bytes How Computers Represent Data
Representing Information as bit patterns
Data Encoding Characters.
Coding Schemes and Number Systems
Representing Characters
Ch2: Data Representation
How Computers Store Data
Comp Org & Assembly Lang
Learning Intention I will learn how computers store text.
LO1 – Understand Computer Hardware
Text Representation ASCII Collating Sequence
Chapter 3 - Binary Numbering System
ASCII and Unicode.
Presentation transcript:

lis508 lecture 1: bits, bytes and characters Thomas Krichel

Structure Numbers –Bits –Bytes Character sets –Coded character set –Character endcoding

Literature, no need to read… Norton new inside the PC chapter 4 htmhttp:// htm ations/ictp99/ictp99N2705.htmlhttp://wwwinfo.cern.ch/asdoc/WWW/public ations/ictp99/ictp99N2705.html htmlhttp:// html

Information Information is best understood as what it takes to answer a question. The simplest question has a yes or no answer. Therefore a bit is the natural measure of information. Term first used by John Turkey in Concatenation of binary digit.

Usage of bits Computers are sometimes classified by the number of bits they can process at one time. "32 bit processor" Graphics are also often described by the number of bits used to represent each dot.

bits and bytes a bit can take the values 0 or 1, thus it can describe 2 possibilities two bits can take the value 00, 01, 10, 11, thus it can describe four 2×2 possibilities n bits can encode 2 power n possibilities. The first chips used to process 8 bits at a time. It become customary to refer to them as a byte. It can encode 2 power 8 possibilities. We can use binary numbers just as decimal numbers.

application of bytes IP (Internet Protocol) numbers are used as the addresses of computers on the Internet. In IP version 4 (the one that is most commonly used), each IP number has 4 bytes. It is represented as x.x.x.x where x is a number between 0 and 255 (why?) how many computers can there be on the Internet at any one time?

decimal/binary numbers

Many bytes Larger units are –Kilo byte is 2 power 10 bytes (=1024 bytes) –Mega bytes is 2 power 20 bytes –Giga bytes is 2 power 30 bytes –Tera byte is 2 power 40 bytes From ancient Greek words for "thousand", "large", "giant", and "monster", respectively. Terms date back to the French revolution.

Hex numbers A byte is often represented by two hex numbers. Each hex number can encode 16 values Written 0 to 9, then A B C D E F. F is 15. Conventionally prefixed with 0x Use Microsoft calculator with scientific notation to convert.

application of hex numbers Media Access Control (mac) addresses of hardware that allows access to computer networks. They are 6-byte numbers, each byte written as 2 hex numbers, e.g. 00:60:08:F5:20:A9 character numbers that you see when you are inserting a special symbol in Microsoft software, e.g. powerpoint.

Characters Much of the information processed by computers is in the form of characters. A character only makes sense for a human user of a minimum cultural level. A character is not a glyph. –ligatures

Information in a computer file A file is a piece of data on a stored on a computer. Any file contains a sequence of 0s and 1s, like … For a computer to make sense of a file, it has to know what type of file it is.

executable files Files that are executable are files that make the computer do something. For example the file starts a program, say powerpoint. An executable on one computer may not run on another Non-executable files hold data that is used by an executable file. We will call them data files. Example: powerpoint slides file.

text files Many data files contain textual data. Textual data is a sequence of characters. A character is an elementary symbol that has some meaning –alphabet letter –hieroglyph Example: file Text files can be read by many computer programs.

non-text files Examples for non-text files are –graphics files –movie files –sound files non-text files are not very important in library settings –there is not way to organize information retrieval for non-text files. They have to be retrieved using a textual surrogate. –traditional library material are textual will talk about this later.

Representing characters Computers don't understand text, they only understand numbers. For computers to be able to treat text, there must be a correspondence between numbers and text characters. Such a correspondence is called a character set. Examples for characters are –a–a –c–c –ë–ë –

Legacy character sets In early days, computers were a lot less powerful than they are today. Could only deal with the characters that are most commonly used. Such sets are –ascii –ISO –cp1252

ASCII American Standard Code for Information Interchange 7-bit character set. There is no such thing as 8-bit ASCII 95 printable symbols 33 control characters (0-31, 127) scii2.html has a list up to 127http:// scii2.html

some ASCII control characters CR (13, ^M) is the carriage return LF (10, ^J) is the linefeed FF (12, ^L) is the form feed (new page) BS (8, ^H) is the backspace DEL (127, ALT-127) is delete ESC (27, ^[) escape

ISO ISO , aka ISO-latin-1 extends ASCII with characters that are commonly used by the western European languages. It is the default character set of html. Positions 128 to 159 are not used. Cp1252 fills these with graphic chars. It is as Microsoft character set.

This is not enough There are around 6800 different languages around. Some of these languages use characters sets that are not finite, i.e. folks can make up now characters out of existing ones! Setting up a character set for all languages is almost impossible.

ISO Defines the Universal Character Set (UCS) UCS contains the characters required to represent characters used by many known languages, even the likes of Oriya, Telugu, Bopomofo, Runic. ISO defines formally a 31-bit character set. They are represented as 32 bits, i.e. 4 bytes, or 8 hex chars. Not finished..

Unicode ISO is a inter-government agency. Slow and bureaucratic. Industry has come together to work on Unicode, a 2-byte character set. With some minor exceptions, the Unicode characters are the some as the first characters in UCS. Much better documented standard.

Unicode and legacy sets The first 128 characters are identical to those in ASCII The next 128 characters are identical to ISO (Latin-1). Unicode is well documented and the Unicode book can be downloaded from the Internet. A must-have for the serious digital librarian.

Politics… Does it make sense to use Unicode rather than, say, ISO-latin-1? Many commercial pieces of software have data files that contain character data interspersed with non-character data. Is that good?

Thank you for your attention!