Integers & Floating Point Numbers: Limits of Representation

Integers & Floating Point Numbers: Limits of Representation
CSE 351 Autumn 2016 Section 3

Key Points Remember that there are limitations! Design Decisions
Memory is finite, numbers/data are not finite We can only represent so much We have 𝟐 𝒘 distinct bit patterns with w bits Design Decisions Efficient/Fast and Easy to Implement Accuracy Range Precision

Unsigned Integers Unsigned values follow base 2 system
Example of converting from base 2 to base 10 b 7 b 6 b 5 b 4 b 3 b 2 b 1 b 0 = b b …+ b b Benefit: Add and subtract using the normal “carry” and “borrow” rules, just in binary

Signed Integers: Two’s Complement
𝐛 𝐰−𝟏 has weight − 𝟐 𝐰−𝟏 , other bits have usual weights + 𝟐 𝐢 . . . b0 bw-1 bw-2 TMax TMin –1 –2 UMax UMax – 1 TMax + 1 2’s Complement Range Unsigned Range Benefits: Roughly same number of (+) and (–) numbers Positive number encodings match unsigned Single representation of zero All zeros encoding (000000…) = 0 Negation is easy: ~x + 1 == -x

Values To Remember! Two’s Complement Values
Unsigned Values UMin = 0 000…0 UMax = 2w – 1 111…1 Two’s Complement Values TMin = –2w–1 100…0 TMax = 2w–1 – 1 011…1 Negative one 111… xF...F Values for W = 32 Decimal Hex Binary UMax 4,294,967,296 FF FF FF FF TMax 2,147,483,647 7F FF FF FF TMin -2,147,483,648 -1 LONG_MIN = Values for W = 64 LONG_MAX = ULONG_MAX =

Floating Point Numbers: The Vision
What do we want? Large range of values Large numbers and very small numbers Precise values Reflect real arithmetic Support values such as +∞, ‐∞, Not‐A‐Number (NaN) Similar encoding to Two’s Complement

Floating Point Numbers
V = (–1)s * M * 2E s exp frac Numerical Form Sign bit s determines whether number is negative or positive Significand (mantissa) M normally a fractional value in range [1.0, 2.0) Exponent E weights value by a (possibly negative) power of two Representation in Memory MSB s is sign bit s exp field encodes E (but is not equal to E) – remember the bias frac field encodes M (but is not equal to M)

Floating Point Numbers
Value: ±1 × Mantissa × 2Exponent Bit Fields: (‐1)S × 1.M × 2(E+bias) Bias Read exponent as unsigned, but with bias of –(2w‐1‐1) = –127 Representable exponents roughly ½ positive and ½ negative Exponent 0 (Exp = 0) is represented as E = 0b 0111 1111 Why? Floating point arithmetic = easier Somewhat compatible with 2’s complement

Floating Point Numbers: Denormalized
No leading 1 Remember! Implicit exponent is –126 (not –127) even though E = 0x00 Why? To represent really smaller numbers that are close to 0

Floating Point Representation Summary
Exponent Mantissa Meaning 0x00 ± 0 Non-zero ± denorm num 0x01 – 0xFE Anything ± norm num 0xFF ± ∞ NaN

Floating Point Limitations: Math Properties
Exponent overflow yields +∞ or -∞ Floats with value +∞, -∞, and NaN can be used in operations Result usually still +∞, -∞, or NaN; sometimes intuitive, sometimes not Floating point ops do not work like real math, due to rounding! Not associative: ( e100) – 1e100 != (1e100 – 1e100) Not distributive: 100 * ( ) != 100 * * 0.2 Not cumulative Repeatedly adding a very small number to a large one may do nothing

Distribution of Values
What can’t we get? Between largest norm and infinity: Overflow Between zero and smallest denorm: Underflow Between norm numbers?: Rounding

Problems Problems Problems!
Consider the decimal number Give the IEEE-754 representation of this number as a 32-bit floating-point number. Convert 1.1 x 2-128 to IEEE 754 single precision a b. (note, 1.1x2^-128 is in base 2) Shift the radix left by 2 to get 0.011x2^-126 which is denormalized

If x and y have type float, give two different reasons that (x+2*y)-y == x+y might evaluate to 0 (i.e., false). Refer to midterm 2016 winter Overflow: If x and y are certain (very large) values, then x+2*y might produce the special value “infinity” (or negative infinity is possible too) and then subtracting y still produces infinity while x+y is still small enough not to overflow. Rounding error: If x and y are the right number of orders of magnitude apart, we might due to rounding get that x+y is still x while x+2*y-y is slightly more than x.

What is the largest positive number we can represent with a 10-bit signed two’s complement integer? Bit pattern? Decimal value? 2^8 + … + 2^0 = 2^(9 – 1) = = 511

Assuming unsigned integers, what is the result when you compute UMAX+1? Assuming two’s complement signed representation, what is the result when you compute TMAX+1? TMIN (0x )

Is the ‘==’ operator a good test of equality for floating point values? Why or why not? No, since floating point suffers from rounding issues. Instead, use the <= or >= operator to test a range once you take a difference.

Give an example of three floating-point numbers x, y, and z, such that the distributive property x (y + z) = x y + x z does not hold. X = large number Y = small number Z = large number We might lose y when we add it to z

How to use GDB Download calculator.c from class webpage
For debugging, we need to compile the file with debugging symbols. This an be done using –g flag in GCC. gcc -Wall -std=gnu99 -g calculator.c -o calculator To load binary into GDB, use following command: gdb calculator You should see bunch of information including version and license information To run binary in GDB, use run command (type run or just r). This will start executing your program till any error occurs in your program. If you want to start stepping through main(), use start command. Passing command line arguments in GDB run calculator 3 4 + View source code while debugging Use list command. For example, if you want to look at the main function, type list main() If you want to list a content around line 45, then type list 45 If you want to display a range of line numbers such as lines 10-15, then use list 10,15

How to use GDB (continued)
Setting Breakpoints break command creates break point (example: break main). Each break point is associated with a number. To enable/disable breakpoint, use enable or disable command. TO see summary of all breakpoints, use info command (example: info break) To continue execution after breakpoint, use continue or c command Stepping through source code in GDB To step one line of source code at a time, use next or n command. To step through functions, use step or s command. To step out of the function, use finish command. Printing values while debugging Use print command. Exiting GDB Press Ctrl-D or type quit or type q

Integers & Floating Point Numbers: Limits of Representation

Similar presentations

Presentation on theme: "Integers & Floating Point Numbers: Limits of Representation"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Integers & Floating Point Numbers: Limits of Representation

Similar presentations

Presentation on theme: "Integers & Floating Point Numbers: Limits of Representation"— Presentation transcript:

Similar presentations

About project

Feedback