https://eclipse.umbc.edu/robucci/cmpeRSD/Lectures/Lecture19__Implementation_of_Elementary_Functions/

Lecture 19 – Implementations of Elementary Functions

Ryan Robucci

References

$^\dagger$ Koren, Israel. Computer arithmetic algorithms, 2nd ed. AK Peters/CRC Press, 2001.

Implementation of Functions

Computation of various elementary functions have several approaches include lookup tables (LUT), approximations, iterative methods, and combinations thereof.

LUT

LUT/ROM: store results for all possible inputs, unfortunately exhaustive storage can be too large
- 32-bit input implies 4 billion entries!!

Interpolation (+LUT)

Interpolation may be linear, cubic spline,..
finite set of points in a table, e.g. $f(x_3)$ , $f(x_4)$
Estimate $f(x)$ in the interval $x_i \ge x \ge x_{i+1}$ with $x_i$ and $x_{i+1}$ stored in a LUT
Linear Interpolation:
- $f(x)= f(x_i) + m \cdot (x_{i+1}-x_i);$
  
  $m=\frac{f(x_{i+1})-f(x_i)}{x_{i+1}-x_{i}}$

MAPPING AND INTERPOLATION

need to design mapping scale offset for encoding:
digital encoding = $(x-\rm offset)\cdot \frac{1}{\rm scale} \cdot 2^m$
Choice of LUT Indexing for Linear Interpolation
- May use uniformly spaced points to create equal-sized intervals, or non-uniform spacing
- Irregular spacing requires a search for the relevant entries $x_i \le x \le x_{i+1}$ while uniform intervals ( $\Delta$ ) allow a more direct referencing to an offset related to $x/\Delta$
- Not all non-uniform spacing is irregular, for instance indexes of logarithmically spaced samples (10 per decade) can also be directly computed
- A useful approach for uniform interval indexes is to use an interval size such that the first few bits of x are used directly as the index into the LUT

Polynomial Approximation

Polynomials
Ratio of Polynomials
Piecewise Approximation

TAYLOR SERIES EXPANSION

Polynomial approximation in the vicinity of a point
Ex: $e^x = \displaystyle \sum_{i=0}^\infty\frac{x^i}{i!}$ $e^{x} = i = 0 \sum \infty \frac{x ^{i}}{i !}$
- requires large number of terms for accuracy at when $x$ is close to $1$ , though few terms are needed when x is close to 0

APPROXIMATION USING CHEBYSHEV POLYNOMIALS

Use polynomial basis functions to estimate an interval
Better than approximating a high-order polynomial polynomial by just omitting higher-order terms
- e.g. better than approximating
  $k_8 x^8 + k_7 x^7 + k_5x^5 - k_3x^3 +k_1x$ by
  $\cancel{k_8 x^8} + \cancel{k_7 x^7} + k_5x^5 - k_3x^3 +k_1x$
Example $5^{th}\text{-order}$ approx. using "1st kind of Chebyshev Polynomials":
Available Basis:
$\begin{array}{rl} T_0(x) &= 1 \\ T_1(x) &= x \\ T_2(x) &= 2x^2 - 1 \\ T_3(x) &= 4x^3 - 3x \\ T_4(x) &= 8x^4 - 8x^2 + 1 \\ T_5(x) &= 16x^5 - 20x^3 + 5x \\ \end{array}$
https://en.wikipedia.org/wiki/Chebyshev_polynomials
Compute weights according to $w_i = \frac{k}{\pi} \int_{-1}^1 \frac{f(x) T_i(x)}{\sqrt{1-x^2}} dx$ where $f(x)$ is the polynomial or function to approximate, and k is 1 for n=0 and 2 otherwise
Create polynomial approximation with computed weights
$f(x)\approx w_0 T_0(x) + w_1 T_1(x) + ... + w_5 T_5(x)$
$\begin{aligned} f(x)\approx & w_0 \cdot 1 + \\ & w_1 \cdot (x) +\\ & w_2 \cdot (2x^2 - 1) +\\ & w_3 \cdot (4x^3 - 3x) +\\ & w_4 \cdot (8x^4 - 8x^2 + 1) +\\ & w_5 \cdot (16x^5 - 20x^3 + 5x) \\ \end{aligned}$
$\begin{aligned} f(x)\approx & (w_0-w_2+w_4) \cdot (1) + \\ & (w_1-3w_3+5w_5) \cdot (x) +...+ (16w_5) \cdot (x^5) \end{aligned}$

Example: $\frac{1}{1+e^{-5x}}$

Taylor Series: $\frac{1}{2}+\frac{5}{4} x-\frac{125}{48} x^3+\frac{625}{96} x^5$
Chebyshev Polynomial Weights: $w_0=0.5$ , $w_1=0.587681$ , $w_2=0$ , $w_3=-0.121496$ , $w_4=0$ , $w_5=0.035318$
additional note for future reference: if evaluating over a range other than -1 to 1, then use an input mapping function to shift the center of the range to 0 and scale the range
- Ex: if interested in $f_1(x)$ with x in range 2 to 5, instead approximate $f_2(x)$ , where $f_2(x)=f_1((x\times1.5+3.5))$

RATIONAL POLYNOMIAL APPROX. $\DAGGER$

Approx. using Ratio of Polynomials:
- Ex: $e^x=2^{x \log_2 e}$ $e^{x} = 2^{x l o g_{2} e}$
  - let $x \log_2 e =I+f$ , where $I$ is an integer and $f$ is a fractional value $\implies e^x = 2^I \cdot 2^f$
  - computing $2^I$ is trivial
  - $2^f=\frac{(((((a_5f+a_4)f+a_3)f+a2)f+a_1)f+a_0)}{(((((b_5f+b_4)f+b_3)f+b2)f+b_1)f+1)}$ where $a_i$ and $b_i$ are known constants $\ddagger$
    10 multiplications, 10 additions, 1 division
    $\dagger \tiny \text{Isreal Koren, Computer Arithmetic Algorithms, Ch 9}$
    $\ddagger \tiny \text{J.F. Hart et. al. Computer Approximations Wiley New York 1968}$

PIECEWISE APPROXIMATION

Piecewise Polynomial Approximation
- Break function into multiple intervals
- Create Polynomial Approximation for each interval (e.g. Taylor series expansion around middle of interval)
- Store polynomial approximation for each interval in a lookup table (LUT)
- During run:
  - decide interval to use,
  - use LUT to recall parameters for that interval
  - compute approximation

Partial Results LUT

Consider $Y=e^{X}$ , $X$ is 32-bit UNSIGNED fractional argument, $X = \sum_{i=32}^{i=1} = x_i 2^{-i}$
$Y=e^{\left(x_{-1} \cdot 2^{-1}+x_{-2} \cdot 2^{-2}+...+x_{-31} \cdot 2^{-31} + x_{-32} \cdot 2^{-32}\right)}$
$= e^{x_{-1} \cdot 2^{-1}} \times e^{x_{-2} \cdot 2^{-2}} \times ... \times e^{x_{-31} \cdot 2^{-31}} \times e^{x_{-32} \cdot 2^{-32}}$
Y can be iteratively computed as
$Y = Y \times e^{x_{-i} \cdot 2^{-i}}$ with Y initialized to 1
One choice for implementing the iteration is to use a lookup table storing 32 precomputed entries $T_i=e^{2^{-i}}$
$Y = \begin{cases}Y \times T_i & \text{if } x_i==1 \\ Y & \text{otherwise} \end{cases}$
This requires a LU and multiply per iteration
But, there is a way to construct the iterations and LUT to make the multiplication very cheap ( scale by a scale)...
Interestingly, it involves a lookup table for $\ln()$ instead of $\exp()$ with a pair of coupled iteration equations

Additive and & Multiplicative Normalization

Additive and Multiplicative Normalization are two approaches to change a variable towards a target value
Algorithms involve record decisions/values chosen along the way, which are used to construct an answer to a question
Earlier we saw Multiplicative Normalization in the division through multiplication to move the denominator towards a target value, 1
- $Q=\frac{N}{D}=\frac{N\times m_1 \cdot m_2 \cdot m_3 \cdot m_4 \cdot m_5}{D\times m_1 \cdot m_2 \cdot m_3 \cdot m_4 \cdot m_5}=\frac{Q}{1}$
Additive Normalization, used in the next approach, changes variable to a target value using add/sub operations.

Exponentiation via Additive Normalization

Iterative approximation for $Y=e^{X}$ $Y = e^{X}$ :
- $Y_{i+1} = Y_i \cdot B(i,s_i)$
- $X_{i+1} = X_i - \ln B(i,s_i) = X_i - T(i,s_i)$
$B(i,s_i)$ in each iteration will be restricted to a set that makes the update $Y_{i+1}$ inexpensive
$i$ will be a parameter of the restriction, which makes the possible manipulation of Y unique to each iteration
$s_i$ is a selector allowing choice within the allowable set in each iteration, but the decision rule is also based on the X iteration
the selector $s_i$ in each iteration is chosen to make the successive approximation update $X_{i+1}$ move towards the desired X
avoid the expense of the run-time computation of $\ln B$ by precomputing all possible values of $\ln B$ that might be used, and storing them in a LUT for use during run: $T(i,s_i)$

Restriction on B to simplify the multiplication:

$B(i,s_i)={1+s_i\cdot2^{-i}};s_i\in\{-1,0,1\}$
Therefore the $Y_{i+1}$ update requires only shift and add/sub operations
$y_{i+1} = \begin{cases} y_i \cdot \left(1+2^{-i}\right)&\text{if } s_i==1\\ y_i \cdot \left(1+0\right)&\text{if } s_i==0\\ y_i \cdot \left(1-2^{-i}\right)&\text{if } s_i==-1 \end{cases}$
Remember, for a given input X, the particular $B(i,s_i)$ in a given iteration is selected by $s_i$ and is so selected according to the desire to make
$X_{i+1}$ converge to $X$ : $X = \displaystyle\sum_{i=0}^{m-1} \ln(1+s_i2^{-i})$

RANGE OF REPRESENTABLE VALUES

The range of representable values can be found by assuming the selection argument is the same in every iteration
$\displaystyle\sum_{i=1}^{m-1} \ln(1-2^{-i}) \le X \le \displaystyle\sum_{i=0}^{m-1} \ln(1+2^{-i})$
It also depends on the number of bits or iterations. For just 10 iterations we can say $X\in [-1.24,1.56]$ , though typically even more bits in the argument are desired

ITERATIVE CONVERGENCE TO X

If we restrict X to be POSITIVE fractional values, we can use a binary selection rule to subtract or keep (instead of add,sub,keep)
- My preferred bookkeeping for additive normalization is to iteratively move a value towards 0, $R\to0$
- Initialize R=X
- Step:
  - if $R-\ln(1+2^i)>0$ $R - ln (1 + 2^{i}) > 0$
    - $s_i = 1$ , $R=R-\ln(1+2^i)$
  - else
    - $s_i=0$

SIMPLIFYING THE SELECTION RULE

Sometimes the iterative steps can be simplified by analyzing the numerical possibilities in the "binary representation" of the hardware domain.
Examine the subtractive term:
- $\ln(1+s_i 2^i) = s_i 2^{-i} - s_i^2 \frac{2^{2^{-2i}}}{2} + s_i^3 \frac{2^{-3i}}{3} -...$
- Only the first term affects bit -i
- Therefore, in each iteration we can choose $s_i$ to cancel the -i bit in the remainder R, using only selection values 0,1
- This simplifies the selection rule to just checking the bit at -i and requires one iteration per bit of the input argument (R=X).

ALLOWING LARGER THAN FRACTIONAL ARGUMENTS

Let x be represented by $(I+f)S$ where
- I is an integer
- f a fractional value with range $0\le f \lt 1$ and
- S is a scale
X : $X=(I+f)\ln 2$ .
- Find I and f by scaling $X$ down by $S=\ln 2$ and splitting the result at the decimal (I.f)
Therefore we can compute $e^X$ by augmenting the previous method with a shift:
$e^X = e^{(I+f)\ln 2} = e^{(I\ln 2+f\ln 2)}=e^{\ln 2^I+f\ln 2}=\underbrace{2^I\cdot}_{\rm shift} e^{f \ln 2}$
set R = f ln 2, this is a fraction $R\in[0,\ln 2=0.693)$

CORDIC

A popular method for iterative computation of trigonometric functions is called CORDIC (COordinate Rotation Digital Computer)
reduces trig functions to a series of only shift and add/sub operations with one multiplication gain correction at the end
next lecture ...

Diseño Electrónico

lunes, 28 de noviembre de 2022

APPROXIMATION USING CHEBYSHEV POLYNOMIALS

Lecture 19 – Implementations of Elementary Functions

References

Implementation of Functions

LUT

Interpolation (+LUT)

MAPPING AND INTERPOLATION

Polynomial Approximation

TAYLOR SERIES EXPANSION

APPROXIMATION USING CHEBYSHEV POLYNOMIALS

RATIONAL POLYNOMIAL APPROX. $\DAGGER$

PIECEWISE APPROXIMATION

Partial Results LUT

Additive and & Multiplicative Normalization

Exponentiation via Additive Normalization

RANGE OF REPRESENTABLE VALUES

ITERATIVE CONVERGENCE TO X

SIMPLIFYING THE SELECTION RULE

ALLOWING LARGER THAN FRACTIONAL ARGUMENTS

CORDIC

No hay comentarios:

Publicar un comentario

lunes, 28 de noviembre de 2022

APPROXIMATION USING CHEBYSHEV POLYNOMIALS

Lecture 19 – Implementations of Elementary Functions

References

Implementation of Functions

LUT

Interpolation (+LUT)

MAPPING AND INTERPOLATION

Polynomial Approximation

TAYLOR SERIES EXPANSION

APPROXIMATION USING CHEBYSHEV POLYNOMIALS

RATIONAL POLYNOMIAL APPROX. \DAGGER†

PIECEWISE APPROXIMATION

Partial Results LUT

Additive and & Multiplicative Normalization

Exponentiation via Additive Normalization

RANGE OF REPRESENTABLE VALUES

ITERATIVE CONVERGENCE TO X

SIMPLIFYING THE SELECTION RULE

ALLOWING LARGER THAN FRACTIONAL ARGUMENTS

CORDIC

No hay comentarios:

Publicar un comentario

RATIONAL POLYNOMIAL APPROX. $\DAGGER$