Skip to content

tomverbeure/math

Repository files navigation

Introduction

This project contains a library of math-related hardware units.

Right now, it contains only "Fpxx" units: floating point with a user-programmable number of exponent and mantissa.

Table of Contents

Getting Started

This library is using SpinalHDL.

If you want to run some of the code here, you first need to install that.

Installation instructions can be found here.

Once one, run ./run.sh to generate whichever unit you want to test. Edit this file if you want to run a different test. (All of this could be streamlined with a better Makefile...)

Then run make sim to run a test.

Fpxx

The Fpxx library is one that supports floating point operations for which the exponent and mantissa can be specified at compile time.

The primary use of this library is for FPGA projects that need floating point, but don't necessarily need all the features and precision of 32-bit standard floating point operations. By reducing the size of the mantissa and exponent, the hardware of some floating point operations can be made to map directly onto the hardware multipliers of the DSPs that are often present in today's FPGAs, and the maximum clock speed can be increased significantly.

For example, many FPGAs support 18x18 bit multiplications. By restricting the size of the mantissa, a single hardware multiplier may be sufficient to implement the core operation of the a floating point multiplier.

Goals:

  • SpinalHDL

    The code is written in SpinalHDL instead of Verilog or VHDL. This makes it much easier to write generic code with programmable widths and pipeline stages. It also cuts back on boiler plate code.

    That said, it's almost trivial to generate the Verilog or VHDL for use in your own project. And if that's too much effort, a number of configuration are pre-generated and stored as Verilog and VHDL in the repository, so they can be copied straight into your own project.

  • Floating port support for all basic operations

    At the minimum, add, multiply and divide should work with acceptable accuracy, whatever that means.

    For additional operations (e.g. sqrt and 1/sqrt), accuracy may very well be completely unacceptable: depending on my use cases, a small lookup table could be sufficient and the library won't have a better solution.

  • User-programmable mantissa and exponent size

    There are some limitations. For example, FpxxDiv currently requires an odd numbered mantissa.

  • User-programmable size of various lookup tables or internal results

    The user may want to specify a particular mantissa, but still restrict the precion for select operations when it's clear that the full precision won't be needed.

    For example, one may want to use a 20-bit size mantissa in general, but restrict multiplications to 17 or 18 bits to map to a single FPGA DSP multiplier.

    Similarly, the divide operation uses lookup table. For certain input ranges, the size and precision of this lookup table may not be as larges recommended for maximum precision.

    Where possible, the library provides knobs to play with this.

  • Support for NaN, Infinity, and sign checking

    It's important that NaN and Infinity values get propagated through the pipeline, to avoid cases where these kind of values alias into a real value. NaN number should be generated for operations such as asking for the square root of a negative. Overflows or division by 0 will result in Infinity.

  • One result per cycle

    The library is initially designed for a use cases where one result is needed per clock cycle.

  • User-programmable pipeline depth

    For each instance, the user can control the amount of intermediate pipeline stages. This makes it possible to trade off between clock speed, pipeline latency and clock speed.

  • C++ model

    There is C++ template class with an implementation of the Fpxx modules.

    This can be very useful to first create a C++ proof of concept of your design before implementing it in hardware.

    The goal is for the C++ model and the hardware model to be bit exact (though this might not always be the case.)

  • Testbench

    A testbench with directed and random vectors is provided to verify the results between a model that has a 23-bit mantissa and 8-bit exponent and the standard IEEE fp32 operations of your PC.

    The testbench ignores differences that are due to the limitations of the library (e.g. denormals, rounding differences etc.)

Non-goals:

  • Support for denormals

    Denormals requires quite a bit of additional logic for often little benefit. Support for them may be added later, there it's not there at this time.

    When a denormal is encountered on an input, it is immediately clamped to zero. Denormal results are replaced by a zero as well.

  • (Correct) rounding

    Rounding is a surprisingly expensive operation and hard to get really right. At this moment, it is not supported at all. This has definitely an impact on precision.

  • Correct handling of negative and positive zeros

    For some operations, negative and positive zeros are dealt with correctly, but not all of them.

FpxxAdd

FpxxMul

FpxxDiv

FpxxSqrt

FpxxRSqrt

Math Related Literature

Reduced Precision Floating Point

Articles on two-complement floating point

Division

Square Root and Reciprocal Square Root

Leading Zero Counter (LZC) and Leading Zero Anticipor (LZA)

Sin/Cos Calculation