The Queen’s Tower Imperial College London South Kensington, SW7 28th Jan 2007 | Ashley Brown Profiling floating point value ranges for reconfigurable implementation.

Slides:



Advertisements
Similar presentations
The Queen’s Tower Imperial College London South Kensington, SW7 6th Jun 2007 | Ashley Brown Real-Number Optimisation: A Speculative, Profile- Guided Approach.
Advertisements

Floating Point Numbers
1 Lecture 9: Floating Point Today’s topics:  Division  IEEE 754 representations  FP arithmetic Reminder: assignment 4 will be posted later today.
CS 447 – Computer Architecture Lecture 3 Computer Arithmetic (2)
Computer ArchitectureFall 2007 © September 5, 2007 Karem Sakallah CS 447 – Computer Architecture.
Fixed-Point Arithmetics: Part I
CHAPTER 5: Floating Point Numbers
Floating-Point and High-Level Languages Programming Languages Spring 2004.
CPSC 321 Computer Architecture ALU Design – Integer Addition, Multiplication & Division Copyright 2002 David H. Albonesi and the University of Rochester.
ECEN 248 Integer Multiplication, Number Format Adopted from Copyright 2002 David H. Albonesi and the University of Rochester.
1 Lecture 10: FP, Performance Metrics Today’s topics:  IEEE 754 representations  FP arithmetic  Evaluating a system Reminder: assignment 4 due in a.
Computer ArchitectureFall 2008 © August 27, CS 447 – Computer Architecture Lecture 4 Computer Arithmetic (2)
IT 251 Computer Organization and Architecture Introduction to Floating Point Numbers Chia-Chi Teng.
Information Representation (Level ISA3) Floating point numbers.
Computer Organization and Architecture Computer Arithmetic Chapter 9.
Computer Arithmetic Nizamettin AYDIN
Computer Arithmetic. Instruction Formats Layout of bits in an instruction Includes opcode Includes (implicit or explicit) operand(s) Usually more than.
1 Lecture 5 Floating Point Numbers ITEC 1000 “Introduction to Information Technology”
CEN 316 Computer Organization and Design Computer Arithmetic Floating Point Dr. Mansour AL Zuair.
Fixed-Point Arithmetics: Part II
Floating Point vs. Fixed Point for FPGA 1. Applications Digital Signal Processing -Encoders/Decoders -Compression -Encryption Control -Automotive/Aerospace.
Number Systems So far we have studied the following integer number systems in computer Unsigned numbers Sign/magnitude numbers Two’s complement numbers.
Computing Systems Basic arithmetic for computers.
Software Overview. Why review software? Software is the set of instructions that tells hardware what to do The reason for hardware is to execute a program.
Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.
CH09 Computer Arithmetic  CPU combines of ALU and Control Unit, this chapter discusses ALU The Arithmetic and Logic Unit (ALU) Number Systems Integer.
Oct. 18, 2007SYSC 2001* - Fall SYSC2001-Ch9.ppt1 See Stallings Chapter 9 Computer Arithmetic.
HW/SW PARTITIONING OF FLOATING POINT SOFTWARE APPLICATIONS TO FIXED - POINTED COPROCESSOR CIRCUITS - Nalini Kumar Gaurav Chitroda Komal Kasat.
Classifying GPR Machines TypeNumber of Operands Memory Operands Examples Register- Register 30 SPARC, MIPS, etc. Register- Memory 21 Intel 80x86, Motorola.
Lecture 9: Floating Point
Floating Point Arithmetic
COMP201 Computer Systems Floating Point Numbers. Floating Point Numbers  Representations considered so far have a limited range dependent on the number.
Computer Arithmetic See Stallings Chapter 9 Sep 10, 2009
1 Lecture 10: Floating Point, Digital Design Today’s topics:  FP arithmetic  Intro to Boolean functions.
Dr Mohamed Menacer College of Computer Science and Engineering Taibah University CE-321: Computer.
Chapter 9 Computer Arithmetic
William Stallings Computer Organization and Architecture 8th Edition
Introduction To Computer Science
Lecture 9: Floating Point
Floating Point Numbers: x 10-18
CS 232: Computer Architecture II
Floating-Point and High-Level Languages
IEEE floating point format
William Stallings Computer Organization and Architecture 7th Edition
Arithmetic for Computers
Data Structures Mohammed Thajeel To the second year students
Lecture 10: Floating Point, Digital Design
CSCE 350 Computer Architecture
Floating Point Representation
NVIDIA Fermi Architecture
How to represent real numbers
ECEG-3202 Computer Architecture and Organization
FIGURE 9-1 Graph for Example of Conversion from Infix to RPN
Chapter 8 Computer Arithmetic
Today: a look to the future
INSTRUCTION SET ARCHITECTURE
Presentation transcript:

The Queen’s Tower Imperial College London South Kensington, SW7 28th Jan 2007 | Ashley Brown Profiling floating point value ranges for reconfigurable implementation Workshop on Reconfigurable Computing at 2007 Ashley Brown, 28 th Jan 2007

28th Jan 2007 | Ashley Brown# 2 Floating Point on FPGAs Two distinct sets of requirements Embedded systems (often as/alongside DSPs) –High precision often not important (video/audio processing) –Fixed point implementations possible Scientific computation –High precision extremely important –Reduction in precision or conversion to single prec. must be done with great care

28th Jan 2007 | Ashley Brown# 3 Our Focus Scientific applications –MORPHY: “automated topological analysis of a molecular electron density” –‘ydl_pij’ (MMVB): Iterative solver for computational chemistry SPECFP95 benchmarks –Only mildly interesting – do not have multiple datasets SPECFP2000 to follow

28th Jan 2007 | Ashley Brown# 4 The Problem D.P. floating point on FPGAs uses a lot of area Density is improving: but still want to squeeze more in! –Re-using hardware can reduce concurrency Scientific applications: typically 64-bit floating point Often full precision is (believed to be) required –Is this really the case? We have more options than single or double

28th Jan 2007 | Ashley Brown# 5 Current Solutions for F.P. minimisation Finding ‘minimal precision’: –Tools such as BitSize –Select precision for some operands, tool calculates the rest –Test vectors used to gauge errors Reducing hardware area: –Replacing floating point by fixed point, transparent to user (Cheung et al.) –Solution above would make the scientists cry. –Any butchery of the floating point hardware must be justified and checked

28th Jan 2007 | Ashley Brown# 6FloatWatch Valgrind-based value profiler Can return a number of metrics: –Floating point value ranges –Variation between 32-bit and 64-bit F.P. executions –Difference in magnitude between F.P. operations Each metric has uses for optimisation!

28th Jan 2007 | Ashley Brown# 7

28th Jan 2007 | Ashley Brown# 8

28th Jan 2007 | Ashley Brown# 9

28th Jan 2007 | Ashley Brown# 10

28th Jan 2007 | Ashley Brown# 11

28th Jan 2007 | Ashley Brown# 12

28th Jan 2007 | Ashley Brown# 13 What does this tell us? Alpha is constant (but could have found that from source) Memory operands all fall within the same range Result falls within the same range as memory operands Intermediate values result in a shift in the range Optimisation: we do not need double precision –A custom floating point format would suffice

28th Jan 2007 | Ashley Brown# 14FloatWatch Operates on x86 binaries under Valgrind –x86 machine code converted to simplified SSA –FloatWatch inserts instrumentation code after floating point operations –SSA converted back to x86 and cached Outputs a data file with selected metrics Processing script produces HTML+JavaScript report

28th Jan 2007 | Ashley Brown# 15Report Dynamic HTML interface –Copy HTML file from computing cluster to desktop, no installation required Select/deselect source lines, SSA “instructions” –Dynamic in-page graph –Table for exporting to GNU-plot, Excel etc. View value ranges at instruction, source line, function, file and application levels.

28th Jan 2007 | Ashley Brown# 16 Optimisation Opportunities Reduce floating point unit –Reduced precision –Restricted normalisation Use an alternative representation –Non-standard floating point (e.g. 48-bit) –Fixed point –Dual fixed-point Minimisation of redundancy –Remove denormal handling unless required –Remove or predict zero-value calculations

28th Jan 2007 | Ashley Brown# 17 Reduce Hardware Example using MORPHY F.P. values are interesting –Most confined to a narrow range –Different data sets to not vary the range Full range of double precision floating point not required Reduce Exponent

28th Jan 2007 | Ashley Brown# 18 Reduce Hardware – Alignment/Normalisation Most expensive step: shifting for add/subtract –Operand alignment –Normalisation Set limits on alignment to reduce hardware size –Trap to software to perform other alignments Provisional results: only shift-by-4 required for some applications

28th Jan 2007 | Ashley Brown# 19 Alternative Representations #1: Custom Floating Point No need to use 64- or 32-bit Use a compromise instead, maybe 48-bit is enough? 1mantissa(52)exp(11) 1mantissa(23)exp(8) 1mantissa(38)exp(9) IEEE Single Custom IEEE Double Maybe we can we drop the sign bit?

28th Jan 2007 | Ashley Brown# 20 Alternative Representations #2: Fixed Point For very narrow ranges, fixed point may be an option Must be treated with extreme care Dual fixed-point format provides another possibility –Two different formats: different fixed point positions –1 bit reserved to switch between formats

28th Jan 2007 | Ashley Brown# 21 “Pipeline Prediction” Similar concept to branch prediction Build a selection of pipelines with different performance characteristics –Slow but generic version –Fast version with limited range, reduced operand alignment –Compromise in between Predict which version is best to use (how?)

28th Jan 2007 | Ashley Brown# 22 True Reconfiguration – Temporal Profiling Value ranges can vary for different application phases Potential to reconfigure hardware as phases change Test applications have not shown this behaviour so far –Small kernels only –Full applications would be expected to show this behaviour

28th Jan 2007 | Ashley Brown# 23 Profiling Results – SPECFP95 ‘mgrid’ Operations producing zero Two ranges: similar shapes

28th Jan 2007 | Ashley Brown# 24 Range Close-up

28th Jan 2007 | Ashley Brown# 25 Profiling Results – SPECFP95 ‘swim’ Sawtooth caused by multiplication

28th Jan 2007 | Ashley Brown# 26 ‘swim’ Close-up

28th Jan 2007 | Ashley Brown# 27 Profiling Results – MMVB As with MORPHY, ranges similar between datasets

28th Jan 2007 | Ashley Brown# 28 Problems with this approach No guarantees that values do not occur outside identified ranges Not all applications will demonstrate behaviour similar to MORPHY –Value ranges could vary wildly with different datasets Valgrind is slow

28th Jan 2007 | Ashley Brown# 29 Future Work State-based profiling: –profile functions based on call-stack –allows context-dependent configurations Active simulation –Test new representations to check for rounding errors Use results in practice –FPGA implementations for real applications –Modelling of large-scale deployments

The Queen’s Tower Imperial College London South Kensington, SW7 28th Jan 2007 | Ashley Brown Any Questions? Jezebel 1916 Dennis ‘N’ Type Fire Engine Royal College of Science Motor Club Imperial College Union, SW7