Chapter 3 Data Representation

Chapter 3 Data Representation
CSCI 311 Dr. Frank Li ©2016. Cengage Learning. All rights reserved.

Data Representation and Processing
To “process” or manipulate data, it must be converted into a form that the “processor” can understand Any data processing system (for example, a human or a computer system) must have the ability to: Recognize external data/stimuli and convert them into a suitable internal format Store and retrieve data internally Transport data among internal and external storage and processing components Manipulate the data to produce desired results or actions The human brain analogy: Our five senses convert touch, taste, smell, hearing, and vision into electrical impulses Our brain has dedicated regions for both short-term and long-term memory Our brain transmits electrical signals among its various storage and processing regions Our brain’s “processing circuitry” is based on electro-chemical reactions among interconnected neurons “Processing results” can be stored internally or output as physical actions

Automated Data Processing
The processing circuitry of a modern computer consists of: Electrical switches combined into more complex circuits that implement storage and processing functions Wiring among the switches and complex circuits Switches are two-state devices that are either open (off) or closed (on) Each position can represent a binary digit (bit) that is either zero or one Thus, any data processed by a modern computer must be converted to a set of binary digits before processing Processor outputs are also sets of binary digits which must usually be converted to another form (e.g., text or visual images) for humans to use ©2016. Cengage Learning. All rights reserved.

Courtesy of Course Technology/Cengage Learning
FIGURE 3.1 Two electrical inputs on the left flow through processing circuitry that generates their sum on the right Courtesy of Course Technology/Cengage Learning ©2016. Cengage Learning. All rights reserved.

Data Representation and Processing Power
A computer processor’s ability to manipulate only binary digits is sometimes a shortcoming Data that “naturally” converts to binary digits such as accounting data is easily and accurately manipulated Data that isn’t easily converted to binary digits such as literary concepts and facial images are difficult to accurately represent and manipulate Our brains have greater power than a computer because they can process a wider variety of data types Computers have more limited processors in terms of data complexity but they compensate for that with greater accuracy and execution speed ©2016. Cengage Learning. All rights reserved.

Binary Data Representation
A binary number is a single- or multi-digit number in which each digit is either zero or one Each digit of a multi-digit binary numeric value is called a bit (short for binary digit) Advantages of using binary numbers for processing: Binary numbers can be reliably transported via digital signals (more on this in Chapter 8) Two-state processing devices are relatively easy to design and fabricate (more on this in Chapter 4) Both advantages are based on existing technology. Future technology changes may make other data representation and processing schemes a better choice For example, optical CPUs or quantum computing ©2016. Cengage Learning. All rights reserved.

Boolean Logic Binary digit values correspond directly to the values “true” and “false”, for example Zero = “true” and One = “false”, or Zero = “false” and One = “true The choice is arbitrary though most computers use the first interpretation/mapping Boolean logic is a mathematical system of manipulating and combining true and false values Computer processing of simple and complex conditions is based on Boolean logic, for example, IF AccountBalance > WithdrawalAmount THEN … ©2016. Cengage Learning. All rights reserved.

Interpreting Binary Values
Multi-digit numeric values are interpreted by digit position and weight, for example: The value in base 10 is interpreted as: (6 × 100) + (5 × 101) + (4 × 102) The value 1012 in base 2 (binary) is interpreted as: (1 × 20) + (0 × 21) + (1 × 22) = 510 The base or radix of the numbering system is the: Number of different digit values (e.g., decimal has 10 digits numbered 0-9) The value that is raised to a power based on the position of a digit within a multi-digit value The base/radix of the decimal numbering system is 10 The base/radix of the binary numbering system is 2 ©2016. Cengage Learning. All rights reserved.

Interpreting Binary Values - Continued
Interpretation of multi-digit values with fractional components is a bit more complex, for example: The value in base 10 is interpreted as: (6 × 10-1) + (5 × 100) + (4 × 101) The value in base 2 (binary) is interpreted as: (1 × 2-1) + (0 × 20) + (1 × 21) = 2.510 A radix point (the period in the examples above) denotes the boundary between whole and fractional parts of a multi-digit numeric value ©2016. Cengage Learning. All rights reserved.

TABLE 3.1 Binary and decimal notations for the values 0 through 10
©2016. Cengage Learning. All rights reserved.

FIGURE 3.2 Computing the decimal equivalent of a binary number
Courtesy of Course Technology/Cengage Learning ©2016. Cengage Learning. All rights reserved.

Numeric Range If we assume that there is no limit on the number of digits in a multi-digit numeric then there is no limit on the size (large or small) of values that can be represented If we assume the number of digits is limited then numbering systems with larger bases have greater numeric range, for example: A three digit base 10 number can represent 1000 (103) different values A three digit base 2 number can represent 8 (23) different values Since computers process binary numbers, they require many bits to represent very large or very small values Computer circuitry is greatly simplified if all data of the same type uses the same number of bits! ©2016. Cengage Learning. All rights reserved.

TABLE 3.2 Binary notations for decimal values up to 16 positions

Beyond Binary Numbering systems based on 8 and 16 digit values are sometimes used by computer programmers because they are: More compact than binary values Easy to convert to/from binary values Even though a programmer may write values in something other than binary, the value must be converted to binary for storage or processing. Hexadecimal notation: Has 16 digit values (0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F) Is commonly used to represent memory and storage addresses Octal notation: Has 8 digit values (0,1,2,3,4,5,6,7) Was commonly used with older IBM mainframe systems and several generations of later systems that were backward compatible) ©2016. Cengage Learning. All rights reserved.

TABLE 3.3 Hexadecimal and decimal values

CPU Data Types The CPUs of most modern computer recognize and process 5 different data types (formats): Integer Real Character Boolean Memory address Each data type: Is represented using binary digits Is stored in a one or more format(s) that vary in the arrangement and interpretation of the digits Formats are generally chosen by a hardware designer or programmer for their: Suitability to the problem being solved Efficiency (of processing and communication circuitry) ©2016. Cengage Learning. All rights reserved.

Goals of Computer Data Representation
The ideal data representation method would be: Compact – stored values consume little “space” Accurate – any value could be represented and processed without error (e.g., no rounding) Have infinite range – infinitely large and small values could be represented Easy to manipulate – related processing circuitry would be simple, cheap, and fast Standardized – data represented within one computer device would could be recognized and processed by any other device Unfortunately: The first 4 ideals/goals are in conflict (e.g., compactness and range) Different trade-offs among the goals are needed for different data types (e.g., integer vs. real values) and different processing purposes (e.g., sorting alphabetic data vs. performing calculations for astronomy applications) ©2016. Cengage Learning. All rights reserved.

Integers An integer is a whole number value (no fractional parts)
1234 is an integer value 12.34 is not an integer value Positive and negative values: An unsigned integer can hold only a positive value (e.g., 1234 is assumed to be a positive value) A signed integer is an integer that can value a positive or negative value (e.g., or ‒1234) How do we represent a sign as a binary digit? Two notation formats are well suited to signed integers: Excess notation Twos complement Most computers use twos complement notation ©2016. Cengage Learning. All rights reserved.

Excess Notation Can be used to represent signed integers
Divides a range of ordinary binary numbers in half; uses lower half for negative values and upper half for nonnegative values Always uses a fixed number of bits with the leftmost bit representing the sign (1 for nonnegative and 0 for negative values) ©2016. Cengage Learning. All rights reserved.

Excess Notation To represent a specific integer value in excess notation, you must know how many bits are to be used. Range: from -2^(n-1) to 2^(n-1) – 1 Exercise: In 8-bit excess notation, e.g.1: (25)10 = ( )2 e.g.2: (-25)10 = ( )2 Ordinary binary: (25)10 = (1 1001)2 ( )2 is 0 in 8-bit excess notation (25) 4) (-25)

Twos Complement Notation
The complement of a bit value is its opposite 1 is the complement of 0 0 is the complement of 1 Twos complement notation represents positive values as ordinary binary values, for example: = 0111 Twos complement notation represents negative values as the complement of the positive value plus 1: -710 = complement of (0111) + 1) = = 1001 Note that the leftmost bit is a sign bit (1 if negative) ©2016. Cengage Learning. All rights reserved.

Numeric Range The numeric range of a twos complement value is
−(2n−1) to (2n−1−1) Where n is the number of bits For 32 bits, numeric ranges is approximately ±2 billion Any attempt to store a larger value (e.g., 20 billion) in a 32-bit space using twos complement notation results in overflow, loss of high-order bits As a practical matter for integers, the problem is disappearing as modern CPUs and software shift to 64-bit data storage For older processors with 32-bit data storage, double precision data types were sometimes defined which stored a 64-bit value in two adjacent storage locations ©2016. Cengage Learning. All rights reserved.

Two’s Complement Notation
Compatible with digital electronic circuitry Leftmost bit represents the sign Fixed number of bit positions Only two logic circuits required to perform addition on single-bit values Subtraction can be performed as addition of a negative value ©2016. Cengage Learning. All rights reserved.

Advantages of Two’s Complement
Two’s complement is awkward to people, but It is highly compatible with digital electronic circuitry Only two logic circuits required to perform addition on single-bit values Adding two's complement numbers requires no special processing if the operands have opposite signs: the sign of the result is determined automatically. Subtraction can be performed as addition of a negative value e.g. 1: (15) (-5)  drop the 9-th bit “1”, = (10) e.g. 2: (x, 100) (y, 22) (x) (ones' complement of y) (to get the two's complement)  drop the 9-th bit “1”, = (78)

Courtesy of Course Technology/Cengage Learning
Real Numbers Real Numbers can have both whole and fractional parts, for example (e.g., ) To store a real numeric value we must “store” both a sign and a radix point A simplistic way to accomplish that is: FIGURE 3.3 A 32-bit storage format for real numbers using a fixed radix point Courtesy of Course Technology/Cengage Learning ©2016. Cengage Learning. All rights reserved.

Floating Point Notation
Similar to scientific notation, except that 2 is the base value = mantissa x 2exponent Many CPU-specific implementations of floating-point notation are possible IEEE standard 754 defines formats for floating-point data ©2016. Cengage Learning. All rights reserved.

Floating Point Notation
The previous slide depicts fixed point notation in which the location of the radix point never varies Pros – simple data format that’s easy for humans to understand Cons – limited numeric range – extremely large or small values can’t be represented unless a large number of bits are used Floating point notation is similar to scientific notation (e.g., 13,526,473 × 10−16) except that the radix is 2 instead of 10 As in scientific notation, floating point notation uses both a: Mantissa (e.g., 13,526,473 in the above example) Exponent (e.g., -16 in the above example) The Institute of Electrical and Electronics Engineers (IEEE) has defined floating point notations with several different lengths and all CPU manufacturers now follow those standards ©2016. Cengage Learning. All rights reserved.

FIGURE 3.4 Conversion of scientific notation to decimal notation

FIGURE 3.5 IEEE binary32 floating-point format

Processing Floating Point Numbers
Floating point notation is considerably more complex than twos complement notation The greater format complexity yields greater complexity in processing circuitry Computational circuits for real numbers generally have at least 2X more components than for integers More components and wiring yields larger and slower circuits Floating point notation is also subject to truncation errors For example, the value has a non-terminating mantissa in floating point notation Thus, adding yields a result slightly smaller than Such errors can accumulate and compound in long computational sequences (e.g., some of the formulas used in MGMT 326) Floating point notation is also subject to overflow and underflow errors ©2016. Cengage Learning. All rights reserved.

Character Data Characters in written human languages have no obvious binary numeric equivalents But we can represent characters with integer values via a coding table and store the characters using twos complement or as ordinary binary values (usually the former so that the number of different data formats is minimized) A coding table is simply a set of correspondences between characters and integer values (for example, see next slide) ©2016. Cengage Learning. All rights reserved.

ASCII The American Standard Code for Information Interchange (ASCII) is a coding table for English Represents uppercase and lowercase letters and the digits 0-9 Represents some punctuation marks Represents a small set of device control characters (e.g., line feed) ASCII is a 7-bit coding format Burd, Systems Architecture, seventh edition, Table 3-5, Copyright © 2015 Course Technology ©2016. Cengage Learning. All rights reserved.

ASCII Limitations Insufficient range English-based Latin-1
Uses 7-bit code, providing 128 table entries (33 for device control) 95 printable characters can be represented English-based Latin-1 Lower 128 entries ASCII-7 characters Upper 128 entries multinational characters ©2016. Cengage Learning. All rights reserved.

Unicode Unicode is an advanced character coding table developed by an international consortium Unicode incorporates ASCII as a subset (thus, it’s at least partly backward-compatible) The goal of Unicode is to include the written forms of all human languages (including the dead ones) Unicode has advanced through a number of version toward that goal but is still under development Unicode currently defines over 100,000 characters and uses a 32-bit code ©2016. Cengage Learning. All rights reserved.

Boolean Data When a CPU processes a programming language statement such as: IF AccountBalance < WithdrawalAmount GOTO 1000 It must execute one instruction to perform the comparison and another instruction to perform the conditional branch The “result” of the comparison is a data value that must be stored in a register by the first instruction and used as input to the second instruction Thus, CPUs need a Boolean data type that can store the values “true” and “false” Most CPUs adopt a simplistic approach: A multi-digit binary value with all zeros is “false” Any other binary value is “true” ©2016. Cengage Learning. All rights reserved.

Memory Address CPUs constantly read and write to/from memory so efficient processing of memory addresses is a critical performance issue Memory is treated a sequence of storage locations numbered start at zero Each storage location is usually assumed to hold 8 bits (one byte) Thus, a memory address is an unsigned integer, usually stored as a binary value with no sign bit Addressable memory is limited by the number of bits used to store this unsigned integer 32 bits – addressable memory is 4 gigabytes (232 bytes) 64 bits – 264 bytes Older Intel CPUs used a more complex segmented memory address format ©2016. Cengage Learning. All rights reserved.

Summary Data can be represented in many ways
Data types are used as building blocks to create more complex data structures (e.g., arrays, records) Data representation is key to understanding hardware and software technology ©2016. Cengage Learning. All rights reserved.

Chapter 3 Data Representation

Similar presentations

Presentation on theme: "Chapter 3 Data Representation"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 3 Data Representation

Similar presentations

Presentation on theme: "Chapter 3 Data Representation"— Presentation transcript:

Similar presentations

About project

Feedback