Presentation is loading. Please wait.

Presentation is loading. Please wait.

Number Representation Fixed and Floating Point

Similar presentations


Presentation on theme: "Number Representation Fixed and Floating Point"— Presentation transcript:

1 Number Representation Fixed and Floating Point
No Method Capable of Representing ALL Real Numbers Using Finite Register Lengths Must Use Approximations to Represent Values Concentrate on Two Forms: Fixed Point Floating Point Others are: Rational Number Systems – uses ratios of integers Logarithmic Number Systems – uses signs and logarithms of values

2 Fixed Versus Floating Point
Fixed Point Values Represent Values where Any Two Differ by 1 unit in the last place (ulp) Equal Spacing Between Numbers Floating Point Values Use Two Multi-Bit Words Mantissa Exponent Both Forms Must be Capable of Representing Signed Quantities Fixed Point Values CAN be Used to Represent Fractional Quantities

3 Floating Point Characteristics
Total Number of Representations = Total Bit Strings For n-bit Register we have 2n Range of Value is Larger than Fixed Point Precision of Value is Smaller Distance Between Two Consecutive Values Increases

4 Floating Point s e m s – Sign Bit (signed magnitude)
e – Exponent (in 2’s Complement Form) m – Mantissa (significand or fraction) mMAX=1 - ulp; [0,1) hidden bit float – BIAS = 127 (32 bits-23 for m and 8 for e) double – BIAS=1023 (64 bits-52 for m and 11 for e) Sign of Exponent is Complement of it’s MSb Thus, adding/subtracting bias is just complementation of MSb

5 Floating Point Example
double = bfe80000 Big Endian – MSW has Higher Address s e m s = 1; e = 1022; m = 0.5 Value = (-1)11.5 2( ) Value = -(1.5)(0.5) = -0.75

6 Floating Point Normalization
Redundant /representations are Possible! Hidden Bit Helps Out of All Possible Representations, Choose One With Fewest Leading Zeros in Significand This is Normalization After Performing Arithmetic, Renormalization May Need to be Accomplished

7 Floating Point Special Numbers
Value v when exponent e and fraction f are special values (IEEE standard) Note: NaN = Not a Number

8 IEEE/ANSI 754/854 Standard

9 Denormalized Numbers Allows for Gradual Degradation for Underflow

10 Denormals

11 Operations – Internal Precision

12 Floating Point Addition/Subtraction

13 Floating Point Multiplication/Division

14 Conversions and Roundings

15 Exceptions

16 Rounding Schemes Signed Magnitude Two’s Complement

17 Round to Nearest (Signed Magnitude)

18 Rounding Comments

19 Round to Nearest Even/Odd
Round to Nearest Odd (R*)

20 Jamming/von Neumann Rounding

21 ROM Rounding

22 Rounding

23 Rounding Examples Round Towards + Downward Directed Rounding

24 Floating Point Operations

25 Adders/Subtractors

26 Operand Packing/Unpacking

27 Other Key Parts of FP Add/Sub Unit

28 Pre-Shifting

29 Four-stage Combinational Shifter
Pre-shifts Operand by 0 to 15 Bits

30 Leading Zeros/Ones – Counting vs. Prediction

31 Leading Zeros Prediction

32 Guard Digits What is the smallest number of extra digits needed for rounding? post-normalization? Multiplication – Double Length Result Add/Sub w/ differing exp. – Can have Double Length Result FP Unit Provides One Length Result

33 Significand Ranges Assume Significand M(0,1-ulp]
Then Normalized M ranges as: Multiplication: prod=M1M2 For postnormalization need at most one shift left to get:

34 Significand Ranges (cont)
Division: quot=M1M2 Need at most one shift right to get: Conclusion: 1 Extra Digit Needed for Postnormalization 1 Extra Digit Needed for Round-to-Nearest 2 Extra Digits Needed G - guard R - round

35 “Sticky Bit” in std754 Round-to-Nearest-Even Requires 1 Extra Bit
The “sticky bit”, S Turns out to be Logical-OR of Other Additional Bits

36 Floating Point Multiplier

37 Floating Point Divider


Download ppt "Number Representation Fixed and Floating Point"

Similar presentations


Ads by Google