Lecture 15 Floating point math Dr. Dimitrios S. Nikolopoulos CSL/UIUC.

Slides:



Advertisements
Similar presentations
CH10 Instruction Sets: Characteristics and Functions
Advertisements

1/1/ / faculty of Electrical Engineering eindhoven university of technology Introduction Part 2: Data types and addressing modes dr.ir. A.C. Verschueren.
Fixed Point Numbers The binary integer arithmetic you are used to is known by the more general term of Fixed Point arithmetic. Fixed Point means that we.
ECE291 Computer Engineering II Lecture 24 Josh Potts University of Illinois at Urbana- Champaign.
Arithmetic in Computers Chapter 4 Arithmetic in Computers2 Outline Data representation integers Unsigned integers Signed integers Floating-points.
Intel’s MMX Dr. Richard Enbody CSE 820. Michigan State University Computer Science and Engineering Why MMX? Make the Common Case Fast Multimedia and Communication.
6/9/2015TUC-N dr. Emil CEBUC Math Coprocessor Also called Floating Point Unit FPU.
1 Lecture 9: Floating Point Today’s topics:  Division  IEEE 754 representations  FP arithmetic Reminder: assignment 4 will be posted later today.
COE 308: Computer Architecture (T041) Dr. Marwan Abu-Amara Integer & Floating-Point Arithmetic (Appendix A, Computer Architecture: A Quantitative Approach,
ECEN 248 Integer Multiplication, Number Format Adopted from Copyright 2002 David H. Albonesi and the University of Rochester.
Outline Floating point arithmetic 80x87 Floating Point Unit.
Assembly Language for x86 Processors 6th Edition
Computer Organization and Architecture Computer Arithmetic Chapter 9.
Computer Arithmetic Nizamettin AYDIN
Computer Arithmetic. Instruction Formats Layout of bits in an instruction Includes opcode Includes (implicit or explicit) operand(s) Usually more than.
Computer Arithmetic.
CEN 316 Computer Organization and Design Computer Arithmetic Floating Point Dr. Mansour AL Zuair.
Machine Instruction Characteristics
Computer Architecture ALU Design : Division and Floating Point
Number Systems So far we have studied the following integer number systems in computer Unsigned numbers Sign/magnitude numbers Two’s complement numbers.
ECE232: Hardware Organization and Design
Dr Mohamed Menacer College of Computer Science and Engineering Taibah University CS-334: Computer.
Computer Architecture and Operating Systems CS 3230 :Assembly Section Lecture 10 Department of Computer Science and Software Engineering University of.
Oct. 18, 2007SYSC 2001* - Fall SYSC2001-Ch9.ppt1 See Stallings Chapter 9 Computer Arithmetic.
Floating Point Arithmetic
Lecture 9: Floating Point
CSC 221 Computer Organization and Assembly Language
MMX technology for Pentium. Introduction Multi Media Extension (MMX) for Pentium Processor Which has built in 80X87 Can be switched for multimedia computations.
FLOATING POINT ARITHMETIC. TOPICS Binary representation of floating point Numbers Computer representation of floating point numbers Floating point instructions.
The x87 FPU Lecture 19 Fri, Mar 26, 2004.
Computer Architecture and Organization
Computer Architecture EKT 422
Introduction to MMX, XMM, SSE and SSE2 Technology
With a focus on floating point.  For floating point (i.e., real numbers), MASM supports:  real4  single precision; IEEE standard; analogous to float.
Floating Point Numbers Representation, Operations, and Accuracy CS223 Digital Design.
Chapter 10 Instruction Sets: Characteristics and Functions Felipe Navarro Luis Gomez Collin Brown.
EEL5708/Bölöni Lec 8.1 9/19/03 September, 2003 Lotzi Bölöni Fall 2003 EEL 5708 High Performance Computer Architecture Lecture 5 Intel 80x86.
Intel Multimedia Extensions and Hyper-Threading Michele Co CS451.
Instruction Sets. Instruction set It is a list of all instructions that a processor can execute. It is a list of all instructions that a processor can.
Introduction to Intel IA-32 and IA-64 Instruction Set Architectures.
ECE291 Lecture 13 Advanced Mathematics. ECE 291 Lecture 13Page 2 of 40 Lecture outline Floating point arithmetic 80x87 Floating Point Unit MMX Instructions.
o History of Floating Point o Defining Floating Point Arithmetic o Floating Point Representation o Floating Point Format o Floating Point Precisions o.
ECE291 Computer Engineering II Lecture 11 Dr. Zbigniew Kalbarczyk University of Illinois at Urbana- Champaign.
Chapter 14: The Arithmetic Coprocessor, MMX, and SIMID Technologies.
Floating Points & IEEE 754.
Floating Point Representations
Integer Division.
Chapter 6 Floating Point
Arithmetic for Computers
Topic 3d Representation of Real Numbers
MMX technology for Pentium
MMX Multi Media eXtensions
Using the 80x87 Math Chip in Assembly
The x87 FPU Lecture 18 Wed, Mar 23, 2005.
Computer Organization and Assembly Languages Yung-Yu Chuang 2005/12/22
Morgan Kaufmann Publishers Arithmetic for Computers
Review In last lecture, done with unsigned and signed number representation. Introduced how to represent real numbers in float format.
Topic 3d Representation of Real Numbers
Computer Architecture and System Programming Laboratory
MMX technology for Pentium
Computer Organization and Assembly Language
Computer Architecture and System Programming Laboratory
Part I Data Representation and 8086 Microprocessors
Chapter 10 Instruction Sets: Characteristics and Functions
Presentation transcript:

Lecture 15 Floating point math Dr. Dimitrios S. Nikolopoulos CSL/UIUC

ECE291 Lecture 152 Outline Floating point arithmetic 80x87 Floating Point Unit MMX Instructions SSE Instructions

ECE291 Lecture 153 Floating-point numbers Components of a floating point number Sign: 0=pos(+); 1=neg(-) Exponent: (with Constant bias added) Mantissa: 1 <= M < 2 Number = (Mantissa) * 2(exponent-bias) Example: (decimal) = (binary) = * 2^3 (normalized)

ECE291 Lecture 154 Floating-point format Single-precision 32 bits, sign, 8 exponent (bias 7Fh), 23 mantissa, mantissa first digit is implied Double-precision, 64 bits, sign, 11 exponent (bias 3FFh), 52 mantissa, mantissa first digit is implied Extended-precision, 80 bits, sign, 15 bits exponent (bias 3FFFh), 64 bits mantissa, mantissa first digit is expilcit

ECE291 Lecture 155 Converting to 32-bit floating-point form Convert the decimal number into binary Normalize the binary number Calculate the biased exponent Store the number in floating point format Example: = = x (7Fh) = (85h) 4. Sign = 0 Exponent = Mantissa = Note mantissa of a number 1.XXXX is the XXXX The 1. is an implied one-bit that is only stored in the extended precision form of the floating-point number as an explicit-one bit

ECE291 Lecture 156 Converting to floating-point form FormatSignExponentMantissa Single(+)7F+6 = 85h (8 bits)Implied 1 + (23 bits) 01000, Double(+)3FFh + 6 = 405h (11 bits)Implied 1 + (52 bits) 0100,0000, … 0000 Extended(+)3FFFh+6 = 4005h (15 bits)Explicit 1 + (63 bits) 0100,0000,0000, … 0000

ECE291 Lecture 157 Special representations Zero: –Exp = All Zero, Mant = All Zero NAN (not a number) an invalid floating-point result –Exp = All Ones, Mant non-zero + Infinity: –Sign = 0, Exp = All Ones, Mant = 0 - Infinity: –Sign = 1, Exp = All Ones, Mant = 0

ECE291 Lecture 158 Converting from Floating-Point Form (example) Sign = 1 Exponent = 1000,0011 Mantissa = 100,1001,0000,0000, = 1000, ,1111 (convert biased exponent) x 2^4(a normalized binary number) (a de-normalized binary number) ( a decimal number)

ECE291 Lecture 159 Variable declaration examples Declaring and Initializing Floating Point Variables Fpvar1dd-7.5; single precision (4 bytes) Fpvar2dq0.5625; double precision (8 bytes) Fpvar3dt-247.6; extended precision (10 bytes) Fpvar4dt345.2; extended precision (10 bytes) Fpvar5dt1.0; extended precision (10 bytes)

ECE291 Lecture 1510 The 80x87 floating point unit On-chip hardware for fast computation on floating point numbers FPU operations run in parallel with integer operations Historically implemented as separate chip – Coprocessor 80486DX-Pentium III contain their own internal and fully compatible versions of the CPU/FPU Exchange data through memory –X87 always converts Single and Double to Extended- precision

ECE291 Lecture 1511 Arithmetic Coprocessor Architecture Control unit –interfaces the coprocessor to the microprocessor-system data bus Numeric execution unit (NEU) –executes all coprocessor instructions NEU has eight-register stack –each register is 80-bit wide (extended-precision numbers) –data appear as any other form when they reside in the memory system –holds operands for arithmetic instructions and the results of arithmetic instructions Other registers - status, control, tag, exception

ECE291 Lecture 1512 Arithmetic Coprocessor FPU Registers The eight 80-bit data registers are organized as a stack As data items are pushed into the top register, previous data items move into higher-numbered registers, which are lower on the stack All coprocessor data are stored in registers in the 10-byte real format Internally, all calculations are done on the greatest precision numbers Instructions that transfer numbers between memory and coprocessor automatically convert numbers to and from the 10-byte real format

ECE291 Lecture 1513 Sample 80x87 FPU Opcodes FLD:Load (lets you load a memory address into an fp reg) FST:Store (moves fp reg to memory address) FADD: Addition FMUL: Multiplication FDIV: Division (divides the destination by the source) FDIVR: Division (divides the source by the destination) FSIN: Sine (uses radians) FCOS: Cosine (uses radians) FSQRT: Square Root FSUB: Subtraction FABS: Absolute Value FYL2X: Calculates Y * log 2 (X) (Really) FYL2XP1: Calculates Y * log 2 (X+1) (Really)

ECE291 Lecture 1514 Programming the FPU FPU Registers = ST0 … ST7 Format: OPCODE source Restrictions: Source can usually be ST# or memory location Use of ST0 as destination is implicit when not specified otherwise FINIT - Execute before starting to initialize FPU registers!

ECE291 Lecture 1515 Programming the FPU (cont.) FLD size[MemVar] –Load ST0 with MemVar & –Push all other STi to ST(i+1) for i=0..6 FSTP size[MemVar] –Move top of stack (ST0) to memory & –Pop all other STi to ST(i-1) for I = 1..7 FST size[MemVar] –Move top of stack to memory –Leave result in ST0

ECE291 Lecture 1516 Classical-Stack Format Instructions treat the coprocessor registers like items on the stack ST0 is used as an implied source/destination ST0 ST1 fld1fldpifadd st Instructions Stack Contents 1.0

ECE291 Lecture 1517 FPU Programming Memory Format The stack top (ST0) is always the implied destination ST0 ST1 fld m1fld m2 fadd m1 1.0 fstp m1 fst m2 m1 m SEGMENT DATA m1 DD 1.0 m2 DD 2.0 SEGMENT CODE …. fld dword [m1] fld dword [m2] fadd dword [m1] fstp dword [m1] fst dword [m2]

ECE291 Lecture 1518 FPU Programming Register Format Instructions treat coprocessor registers as registers rather than stack elements ST0 ST1 fadd st1 st fadd st2 fxch st1 ST

ECE291 Lecture 1519 FPU Programming Register-Pop Format Coprocessor registers are treated as a modified stack –the source register must always be a stack top ST0 ST1 faddp st2, st0 2.0 ST

ECE291 Lecture 1520 FPU Programming Timing Issues A processor instruction following a coprocessor instr –coordinated by assembler for 8086 and 8088 –coordinated by processor on Pentium A processor instruction that accesses memory following a coprocessor instruction that accesses the same memory –for 8087 need to include WAIT or FWAIT to ensure that coprocessor finishes before the processor begins –assembler did this automatically

ECE291 Lecture 1521 Calculating the area of a circle RAD DD 2.34, 5.66, 9.33, 234.5, 23.4 AREA RESD5..START movsi, 0;source element 0 movdi, 0;destination element 0 mov cx, 5;count of 5 finit Main1 fld dword [RAD + si];radius to ST0 fmulst0; square radius fldpi ; pi to ST0 fmul st1 ; multiply ST0 = ST0 x ST1 fstp dword [AREA + di] add si, 4 add di, 4 loop Main1 mov ax, 4C00h int21h

ECE291 Lecture 1522 Floating Point Operations Calculating Sqrt(x 2 + y 2 ) StoreFloat resb94 X dd4.0 Y dd3.0 Z resd %macro% ShowFP fsaveStoreFloat frstorStoreFloat %endmacro%..start movax, cs ; Initialize DS=CS movds, ax finit ShowFP fld dword [X] ShowFP fld st0 ShowFP fmulpst1, st0 ShowFP flddword [Y] ShowFP fld st0 ShowFP fmulpst1, st0 ShowFP faddst0, st1 ShowFP fsqrt ShowFP fstdword [Z] ShowFP movax, 4C00h ;Exit to DOS int21h

ECE291 Lecture 1523 MMX MultiMedia eXtensions Designed to accelerate multimedia and communication applications –motion video, image processing, audio synthesis, speech synthesis and compression, video conferencing, 2D and 3D graphics Includes new instructions and data types to significantly improve application performance Exploits the parallelism inherent in many multimedia and communications algorithms Maintains full compatibility with existing operating systems and applications

ECE291 Lecture 1524 MMX MultiMedia eXtensions Provides a set of basic, general purpose integer instructions Single Instruction, Multiple Data (SIMD) technique –allows many pieces of information to be processed with a single instruction, providing parallelism that greatly increases performance 57 new instructions Four new data types Eight 64-bit wide MMX registers First available in 1997 Supported on: –Intel Pentium-MMX, Pentium II, Pentium III (and later) –AMD K6, K6-2, K6-3, K7 (and later) –Cyrix M2, MMX-enhanced MediaGX, Jalapeno (and later)

ECE291 Lecture 1525 Internal Register Set of MMX Technology Uses 64-bit mantissa portion of 80-bit FPU registers This technique is called aliasing because the floating-point registers are shared as the MMX registers Aliasing the MMX state upon the floating point state does not preclude applications from executing both MMX technology instructions and floating point instructions The same techniques used by FPU to interface with the operating system are used by MMX technology –preserves FSAVE/FRSTOR instructions

ECE291 Lecture 1526 Data Types MMX architecture introduces new packed data types Multiple integer words are grouped into a single 64-bit quantity Eight 8-bit packed bytes (B) Four 16-bit packed words (W) Two 32-bit packed doublewords (D) One 64-bit quadword (Q) Example: consider graphics pixel data represented as bytes. –with MMX, eight of these pixels can be packed together in a 64-bit quantity and moved into an MMX register –MMX instruction performs the arithmetic or logical operation on all eight elements in parallel

ECE291 Lecture 1527 Arithmetic Instructions PADD(B/W/D): Addition PADDB MM1, MM2 adds 64-bit contents of MM2 to MM1, byte-by-byte any carries generated are dropped, e.g., byte A0h + 70h = 10h PSUB(B/W/D): Subtraction

ECE291 Lecture 1528 Arithmetic Instructions PMUL(L/H)W: Multiplication (Low/High Result) –multiplies four pairs of 16-bit operands, producing 32 bit result PMADDWD: Multiply and Add Key instruction to many signal processing algorithms like matrix multiplies or FFTs a3 a2 a1 a0 b3 b2 b1 b0 a3*b3 +a2*b2 a1*b1 +a0*b0

ECE291 Lecture 1529 Logical, Shifting, and Compare Instructions Logical PAND: Logical AND (64-bit) POR: Logical OR (64-bit) PXOR: Logical Exclusive OR (64-bit) PANDN: Destination = (NOT Destination) AND Source Shifting PSLL(W/D/Q): Packed Shift Left (Logical) PSRL(W/D/Q): Packed Shift Right (Logical) PSRA(W/D/Q): Packed Shift Right (Arithmetic) Compare PCMPEQ: Compare for Equality PCMPGT: Compare for Greater Than Sets Result Register to ALL 0's or ALL 1's

ECE291 Lecture 1530 Conversion Instructions Unpacking Unpacking (Increasing data size by 2*n bits) PUNPCKLBW: Reg1, Reg2: Unpacks lower four bytes to create four words. PUNPCKLWD: Reg1, Reg2: Interleaves lower two words to create two doubles PUNPCKLDQ: Reg1, Reg2: Interleaves lower double to create Quadword

ECE291 Lecture 1531 Conversion Instructions Packing Packing (Reducing data size by 2*n bits) PACKSSDW Reg1, Reg2: Pack Double to Word Four doubles in Reg2:Reg1 compressed to Four words in Reg1 PACKSSWB Reg1, Reg2: Pack Word to Byte Eight words in Reg2:Reg1 compressed to Eight bytes in Reg1 The pack and unpack instructions are especially important when an algorithm needs higher precision in its intermediate calculations, e.g., an image filtering

ECE291 Lecture 1532 Data Transfer Instructions MOVQ Dest, Source : 64-bit move –One or both arguments must be a MMX register MOVD Dest, Source : 32-bit move –Zeros loaded to upper MMX bits for 32-bit move

ECE291 Lecture 1533 Saturation/Wraparound Arithmetic Wraparound: carry bit lost (significant portion lost) (PADD) a3 a2 a1 FFFFh b3 b2 b1 8000h a3+b3a2+b2 a1+b17FFFh +

ECE291 Lecture 1534 Saturation/Wraparound Arithmetic (cont.) Unsigned Saturation: add with unsigned saturation (PADDUS) a3 a2 a1 FFF4h b3 b2 b1 1123h a3+b3a2+b2 a1+b1FFFFh + Saturation if addition results in overflow or subtraction results in underflow, the result is clamped to the largest or the smallest value representable for an unsigned, 16-bit word the values are FFFFh and 0000h for a signed 16-bit word the values are 7FFFh and 8000h

ECE291 Lecture 1535 Saturation/Wraparound Arithmetic (cont.) Saturation: add with signed saturation (PADDS) a3 a2 a1 7FF4h b3 b2 b1 0050h a3+b3a2+b2 a1+b17FFFh +

ECE291 Lecture 1536 Adding 8 8-bit Integers with Saturation X0 dq X1 dq FF009033FF MOVQ mm0,X0 PADDSB mm0,X1 Result mm0 = 80807F F54 80h+00h=80h (addition with zero) 80h+90h=80h (saturation to maximum negative value) 55h+33h=7fh (saturation to maximum positive value) 55h+FFh=54h (subtraction by one)

ECE291 Lecture 1537 Parallel Compare PCMPGT[B/W/D] mmxreg, r/m64 PCMPEQ[B/W/D] mmxreg, r/m Eq? h FFFFh0000h Gt? gt/?Gt? hFFFFh0000h

ECE291 Lecture 1538 Use of FPU Registers for Storing MMX Data The EMMS (empty MMX-state) instruction sets (11) all the tags in the FPU, so the floating-point registers are listed as empty EMMS must be executed before the return instruction at the end of any MMX procedure –otherwise any subsequent floating point operation will cause a floating point interrupt error, potentially crashing your application –if you use floating point within MMX procedure, you must use EMMS instruction before executing the floating point instruction Any MMX instruction resets (00) all FPU tag bits, so the floating- point registers are listed as valid

ECE291 Lecture 1539 Streaming SIMD Extensions Intel Pentium III Streaming SIMD defines a new architecture for floating point operations Operates on IEEE-754 Single-precision 32-bit Real Numbers Uses eight new 128-bit wide general-purpose registers (XMM0 - XMM7) Introduced in Pentium III in March 1999 –Pentium III includes floating point, MMX technology, and XMM registers

ECE291 Lecture 1540 Streaming SIMD Extensions Intel Pentium III (cont.) Supports packed and scalar operations on the new packed single precision floating point data types Packed instructions operate vertically on four pairs of floating point data elements in parallel –instructions have suffix ps, e.g., addps Scalar instructions operate on the least-significant data elements of the two operands –instructions have suffix ss, e.g., addss