Lecture 3 Data Representation

Lecture 3 Data Representation
Introduction to Information Technology Lecture 3 Data Representation Dr. Ken Tsang 曾镜涛 Room E408 R9

Outline Distinguish between analogue and digital information
Explain data compression and compression ratios Examine the binary formats for negative values Describe the characteristics of the ASCII and Unicode character sets Explain the nature of sound and its representation Explain how RGB values define a colour Look at representing Audio Information Look at representing Images & Graphics Look at representing Video Information

Data Representation Data comes in many forms
Numbers: 235, 11.01, -24, … Text: “hello, world!” “你好！” Audio: .mp3 Images and graphics: .bmp, gif, JPEG Video: .avi All of the data is stored in computers as binary digits Data must be represented in a way that Captures the essence of the information And in a form that is convenient for computer processing

Data Compression Data compression
Reduction in the amount of space needed to store a piece of data Compression ratio The size of the compressed data divided by the size of the original data Data compression techniques can be lossless, which means the data can be retrieved without any loss of the original information, lossy, which means some information may be lost in the process of compaction

WinRAR Currently the best archiver WinRAR Tutorial

Data about the world around us
Is the physical world around us smooth and continuous? In the microscopic level, materials are all make up of molecules and atoms, energies are all in units of quanta. A smooth and continuous physical world is an illusion due to our limited senses.

Analogue Data: an example
Analogue: something that is analogous or similar to something else (Webster) Analogue Data: The use of continuously changing quantities to represent data. A mercury thermometer is an analogue device. The mercury rises and falls in a continuous flow in the tube in direct proportion to the temperature. The mathematical idealization of this smooth change as a continuous function leads to “Analogue Data”, an infinite amount of data

From Analogue to Digital data
Data can be represented in one of two ways: analogue or digital: Analogue data: A continuous representation (using mathematical function or smooth curve) , analogous to the actual information it represents Digital data: A discrete representation, breaking the information up into separate elements (data)

Digital data in computer
Computer components are discrete in nature Computer memory and other hardware (e.g. cpu) have only finite room to store and manipulate data The goal is to represent enough of the world to satisfy our computational needs and our senses of sight and sound

Digitized Information
Computers, cannot deal with analogue information So we digitize information by breaking it into pieces and representing those pieces separately Why do we use binary? Modern computers are designed to use and manage binary values because the devices that store and manage the data are far less expensive and far more reliable if they only have to represent one of two possible values

Electronic Signals An analogue signal continually fluctuates in voltage up and down A digital signal has only a high or low state, corresponding to the two binary digits All electronic signals (both analogue and digital) degrade as they move down a line The voltage of the signal fluctuates due to environmental effects

Analogue and Digital Information
Periodically, a digital signal is reclocked to regain its original shape An analogue and a digital signal Degradation of analogue and digital signals

Binary Representation
One bit can be either 0 or 1 Therefore, one bit can represent only two things To represent more than two things, we need multiple bits Two bits can represent four things because there are four combinations of 0 and 1 that can be made from two bits: 00, 01, 10, 11

Represents 2 numbers 4 8 16 32

In general, n bits can represent 2n things because there are 2n combinations of 0 and 1 that can be made from n bits Note that every time we increase the number of bits by 1, we double the number of things we can represent Questions: How many bits are needed to represent 128 things? How many bits are needed to represent 67 things?

Representing Negative Values
You have used the signed-magnitude representation of numbers before The sign represents the ordering/direction The digits represent the magnitude of the number

Problems with the sign-magnitude representation There are two representations of zero (plus zero and minus zero, +0 and -0) which can cause unnecessary complexity Problem to represent the negative sign If we allow only a fixed number of values (stored in n-bits), we can represent numbers as just integer values, where half of them represent negative numbers

For example, if the maximum number of decimal digits we can represent is two, we can let 1 through 49 be the positive numbers 1 through 49 and let 50 through 99 represent the negative numbers -50 through -1 This representation of negative numbers is called the ten’s complement

Advantages of Using 10’s Complement
To perform addition within this scheme, you just add the numbers together and discard any carry

Advantages of Using 10’s Complement
A-B=A+(-B). We can subtract one number from another by adding the negative of the second to the first Addition and subtraction become the same

2’s Complement 8 bits: 3 bits: 000 0 001 +1 010 +2 011 +3 100 - 4

Overflow Overflow occurs when the value that we compute cannot fit into the number of bits we have allocated for the result For example, if each value is stored using eight bits, adding 127 to 3 causes overflow Overflow is a classic example of the type of problems we encounter by mapping an infinite world onto a finite machine

Overflow 127

Representing Text A text document can be decomposed into chapters, paragraphs, sentences, words, and ultimately individual characters To represent a text document in digital form, we simply need to be able to represent every character that may appear In English, “a, b, …, z, A, B,…Z” The general approach for representing characters is to list them all and assign each a binary string ‘a’  ( )2  (97)10  61h

Character Set A character set is a list of characters and the codes used to represent them By agreeing to use a particular character set, computer manufacturers have made the processing of text data easier ASCII, Unicode, etc.

ASCII ASCII stands for American Standard Code for Information Interchange The ASCII character set originally used seven bits to represent each character, allowing for 128 unique characters Later ASCII evolved so that all eight bits were used which allows for 256 characters

ASCII Note that the first 32 characters in the ASCII character chart do not have a simple character representation that you could print to the screen (unprintable)

Unicode Character Set Extended version of the ASCII character set is not enough for international use The Unicode character set uses 16 bits per character Therefore, the Unicode character set can represent 216, or over 65 thousand, characters Unicode was designed to be a superset of ASCII The first 256 characters in the Unicode character set correspond exactly to the extended ASCII character set

Unicode

Representing Audio Information
We perceive sound when a series of air compressions vibrate a membrane in our ear, which sends signals to our brain A stereo sends an electrical signal to a speaker to produce sound This signal is an analogue representation of the sound wave The voltage in the signal varies in direct proportion to the sound wave

To digitize the signal we periodically measure the voltage of the signal and record the appropriate numeric value A process called sampling In general, a sampling rate of around 40,000 times per second is enough to create a reasonable sound reproduction

A compact disk (CD) stores audio information digitally On the surface of the CD are microscopic pits that represent Binary digits A low intensity laser is pointed as the disc The laser light reflects strongly if the surface is smooth and reflects poorly if the surface is pitted

Audio Formats WAV, AU, AIFF, VQF, and MP3 MP3 is dominant MP3 is short for MPEG (Moving Picture Experts Group) audio layer 3 file MP3 employs both lossy and lossless compression First it analyzes the frequency spread and compares it to mathematical models of human psychoacoustics (the study of the interrelation between the ear and the brain), then it discards information that can’t be heard by humans Then the bit stream is compressed to achieve additional compression

Representing Colour Colour is our perception of the various frequencies of light that reach the retinas of our eyes Our retinas have three types of colour photoreceptor cone cells that respond to different sets of frequencies These photoreceptor categories correspond to the colours of red, green, and blue

Representing Colour Color is often expressed in a computer as an RGB (red-green-blue) value, which is actually three numbers that indicate the relative contribution of each of these three primary colours For example, an RGB value of (255, 255, 0) maximizes the contribution of red and green, and minimizes the contribution of blue, which results in a bright yellow

Three Dimension Colour Space
(0,0,0) (1,1,1)

Representing Images and Graphics
The amount of data that is used to represent a colour is called the colour depth HiColour is a term that indicates a 16-bit color depth Five bits are used for each number in an RGB value and the extra bit is sometimes used to represent transparency TrueColour indicates a 24-bit colour depth Each number in an RGB value gets eight bits

Indexed Color A particular application such as a browser
may support only a certain number of specific colors, creating a palette from which to choose. For example:

Digitized Images and Graphics
Digitizing a picture is the act of representing it as a collection of individual dots called pixels The number of pixels used to represent a picture is called the resolution Storage of image information on a pixel-by-pixel basis is called a raster-graphics format Several popular raster file formats including bitmap (BMP), GIF, and JPEG

High Resolution

Low Resolution

Representing Video Video codec (COmpressor/DECompressor) refers to the methods used to shrink the size of a movie to allow it to be played on a computer or over a network Almost all video codecs use lossy compression to minimize the huge amounts of data associated with video The goal is not to lose information that affects the viewer's senses

Video Players QuickTime Player (Apple) Real Player VLC media player
Microsoft Media Player

Summary Distinguished between analogue and digital information
Explained data compression and compression ratios Examined the binary formats for negative values Described the characteristics of the ASCII and Unicode character sets Explained the nature of sound and its representation Explained how RGB values define a colour Looked at representing Audio Information Looked at representing Images & Graphics Looked at representing Video Information

The links (for your website) to the glossary, PDF (single) and PDF (2x2) are here:

Lecture 3 Data Representation

Similar presentations

Presentation on theme: "Lecture 3 Data Representation"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 3 Data Representation

Similar presentations

Presentation on theme: "Lecture 3 Data Representation"— Presentation transcript:

Similar presentations

About project

Feedback