lunes, 28 de noviembre de 2022

APPROXIMATION USING CHEBYSHEV POLYNOMIALS

 https://eclipse.umbc.edu/robucci/cmpeRSD/Lectures/Lecture19__Implementation_of_Elementary_Functions/




Lecture 19 – Implementations of Elementary Functions

Ryan Robucci

References

  • ^\dagger Koren, Israel. Computer arithmetic algorithms, 2nd ed. AK Peters/CRC Press, 2001.

Implementation of Functions

  • Computation of various elementary functions have several approaches include lookup tables (LUT), approximations, iterative methods, and combinations thereof.

LUT

  • LUT/ROM: store results for all possible inputs, unfortunately exhaustive storage can be too large
    • 32-bit input implies 4 billion entries!!

Interpolation (+LUT)

  • Interpolation may be linear, cubic spline,..
  • finite set of points in a table, e.g. f(x_3),f(x_4)
  • Estimate f(x) in the interval x_i \ge x \ge x_{i+1} with x_i and x_{i+1} stored in a LUT
  • Linear Interpolation:
    • f(x)= f(x_i) + m \cdot (x_{i+1}-x_i);

      m=\frac{f(x_{i+1})-f(x_i)}{x_{i+1}-x_{i}}

MAPPING AND INTERPOLATION

  • need to design mapping scale offset for encoding:
    digital encoding = (x-\rm offset)\cdot \frac{1}{\rm scale} \cdot 2^m
  • Choice of LUT Indexing for Linear Interpolation
    • May use uniformly spaced points to create equal-sized intervals, or non-uniform spacing
    • Irregular spacing requires a search for the relevant entries x_i \le x \le x_{i+1} while uniform intervals (\Delta) allow a more direct referencing to an offset related to x/\Delta
    • Not all non-uniform spacing is irregular, for instance indexes of logarithmically spaced samples (10 per decade) can also be directly computed
    • A useful approach for uniform interval indexes is to use an interval size such that the first few bits of x are used directly as the index into the LUT

Polynomial Approximation

  • Polynomials
  • Ratio of Polynomials
  • Piecewise Approximation

TAYLOR SERIES EXPANSION

  • Polynomial approximation in the vicinity of a point
  • Ex: e^x = \displaystyle \sum_{i=0}^\infty\frac{x^i}{i!}
    • requires large number of terms for accuracy at when x is close to 1, though few terms are needed when x is close to 0

APPROXIMATION USING CHEBYSHEV POLYNOMIALS

  • Use polynomial basis functions to estimate an interval

  • Better than approximating a high-order polynomial polynomial by just omitting higher-order terms

    • e.g. better than approximating
      k_8 x^8 + k_7 x^7 + k_5x^5 - k_3x^3 +k_1x by
      \cancel{k_8 x^8} + \cancel{k_7 x^7} + k_5x^5 - k_3x^3 +k_1x
  • Example 5^{th}\text{-order} approx. using "1st kind of Chebyshev Polynomials":

    Available Basis:

    \begin{array}{rl} T_0(x) &= 1 \\ T_1(x) &= x \\ T_2(x) &= 2x^2 - 1 \\ T_3(x) &= 4x^3 - 3x \\ T_4(x) &= 8x^4 - 8x^2 + 1 \\ T_5(x) &= 16x^5 - 20x^3 + 5x \\ \end{array}

    https://en.wikipedia.org/wiki/Chebyshev_polynomials
  • Compute weights according to w_i = \frac{k}{\pi} \int_{-1}^1 \frac{f(x) T_i(x)}{\sqrt{1-x^2}} dx where f(x) is the polynomial or function to approximate, and k is 1 for n=0 and 2 otherwise

  • Create polynomial approximation with computed weights
    f(x)\approx w_0 T_0(x) + w_1 T_1(x) + ... + w_5 T_5(x)
    \begin{aligned} f(x)\approx & w_0 \cdot 1 + \\ & w_1 \cdot (x) +\\ & w_2 \cdot (2x^2 - 1) +\\ & w_3 \cdot (4x^3 - 3x) +\\ & w_4 \cdot (8x^4 - 8x^2 + 1) +\\ & w_5 \cdot (16x^5 - 20x^3 + 5x) \\ \end{aligned}

    \begin{aligned} f(x)\approx & (w_0-w_2+w_4) \cdot (1) + \\ & (w_1-3w_3+5w_5) \cdot (x) +...+ (16w_5) \cdot (x^5) \end{aligned}

Example: \frac{1}{1+e^{-5x}}

  • Taylor Series: \frac{1}{2}+\frac{5}{4} x-\frac{125}{48} x^3+\frac{625}{96} x^5

  • Chebyshev Polynomial Weights: w_0=0.5,w_1=0.587681,w_2=0,w_3=-0.121496,w_4=0,w_5=0.035318

  • additional note for future reference: if evaluating over a range other than -1 to 1, then use an input mapping function to shift the center of the range to 0 and scale the range

    • Ex: if interested in f_1(x) with x in range 2 to 5, instead approximate f_2(x), where f_2(x)=f_1((x\times1.5+3.5))

RATIONAL POLYNOMIAL APPROX. \DAGGER

  • Approx. using Ratio of Polynomials:
    • Ex: e^x=2^{x \log_2 e}
      • let x \log_2 e =I+f, where I is an integer and f is a fractional value \implies e^x = 2^I \cdot 2^f
      • computing 2^I is trivial
      • 2^f=\frac{(((((a_5f+a_4)f+a_3)f+a2)f+a_1)f+a_0)}{(((((b_5f+b_4)f+b_3)f+b2)f+b_1)f+1)} where a_i and b_i are known constants \ddagger
        10 multiplications, 10 additions, 1 division
        \dagger \tiny \text{Isreal Koren, Computer Arithmetic Algorithms, Ch 9}
        \ddagger \tiny \text{J.F. Hart et. al. Computer Approximations Wiley New York 1968}

PIECEWISE APPROXIMATION

  • Piecewise Polynomial Approximation
    • Break function into multiple intervals
    • Create Polynomial Approximation for each interval (e.g. Taylor series expansion around middle of interval)
    • Store polynomial approximation for each interval in a lookup table (LUT)
    • During run:
      • decide interval to use,
      • use LUT to recall parameters for that interval
      • compute approximation

Partial Results LUT

  • Consider Y=e^{X} , X is 32-bit UNSIGNED fractional argument, X = \sum_{i=32}^{i=1} = x_i 2^{-i}

  • Y=e^{\left(x_{-1} \cdot 2^{-1}+x_{-2} \cdot 2^{-2}+...+x_{-31} \cdot 2^{-31} + x_{-32} \cdot 2^{-32}\right)}
    = e^{x_{-1} \cdot 2^{-1}} \times e^{x_{-2} \cdot 2^{-2}} \times ... \times e^{x_{-31} \cdot 2^{-31}} \times e^{x_{-32} \cdot 2^{-32}}

  • Y can be iteratively computed as
    Y = Y \times e^{x_{-i} \cdot 2^{-i}} with Y initialized to 1
    One choice for implementing the iteration is to use a lookup table storing 32 precomputed entries T_i=e^{2^{-i}}
    Y = \begin{cases}Y \times T_i & \text{if } x_i==1 \\ Y & \text{otherwise} \end{cases}

  • This requires a LU and multiply per iteration

  • But, there is a way to construct the iterations and LUT to make the multiplication very cheap ( scale by a scale)...

  • Interestingly, it involves a lookup table for \ln() instead of \exp() with a pair of coupled iteration equations

Additive and & Multiplicative Normalization

  • Additive and Multiplicative Normalization are two approaches to change a variable towards a target value
  • Algorithms involve record decisions/values chosen along the way, which are used to construct an answer to a question
  • Earlier we saw Multiplicative Normalization in the division through multiplication to move the denominator towards a target value, 1
    • Q=\frac{N}{D}=\frac{N\times m_1 \cdot m_2 \cdot m_3 \cdot m_4 \cdot m_5}{D\times m_1 \cdot m_2 \cdot m_3 \cdot m_4 \cdot m_5}=\frac{Q}{1}
  • Additive Normalization, used in the next approach, changes variable to a target value using add/sub operations.

Exponentiation via Additive Normalization

  • Iterative approximation for Y=e^{X}:
    • Y_{i+1} = Y_i \cdot B(i,s_i)
    • X_{i+1} = X_i - \ln B(i,s_i) = X_i - T(i,s_i)
  • B(i,s_i) in each iteration will be restricted to a set that makes the update Y_{i+1} inexpensive
  • i will be a parameter of the restriction, which makes the possible manipulation of Y unique to each iteration
  • s_i is a selector allowing choice within the allowable set in each iteration, but the decision rule is also based on the X iteration
  • the selector s_i in each iteration is chosen to make the successive approximation update X_{i+1} move towards the desired X
  • avoid the expense of the run-time computation of \ln B by precomputing all possible values of \ln B that might be used, and storing them in a LUT for use during run: T(i,s_i)

Restriction on B to simplify the multiplication:

  • B(i,s_i)={1+s_i\cdot2^{-i}};s_i\in\{-1,0,1\}

  • Therefore the Y_{i+1} update requires only shift and add/sub operations
    y_{i+1} = \begin{cases} y_i \cdot \left(1+2^{-i}\right)&\text{if } s_i==1\\ y_i \cdot \left(1+0\right)&\text{if } s_i==0\\ y_i \cdot \left(1-2^{-i}\right)&\text{if } s_i==-1 \end{cases}

  • Remember, for a given input X, the particular B(i,s_i) in a given iteration is selected by s_i and is so selected according to the desire to make
    X_{i+1} converge to XX = \displaystyle\sum_{i=0}^{m-1} \ln(1+s_i2^{-i})

RANGE OF REPRESENTABLE VALUES

  • The range of representable values can be found by assuming the selection argument is the same in every iteration
    \displaystyle\sum_{i=1}^{m-1} \ln(1-2^{-i}) \le X \le \displaystyle\sum_{i=0}^{m-1} \ln(1+2^{-i})
    It also depends on the number of bits or iterations. For just 10 iterations we can say X\in [-1.24,1.56], though typically even more bits in the argument are desired

ITERATIVE CONVERGENCE TO X

  • If we restrict X to be POSITIVE fractional values, we can use a binary selection rule to subtract or keep (instead of add,sub,keep)
    • My preferred bookkeeping for additive normalization is to iteratively move a value towards 0, R\to0
    • Initialize R=X
    • Step:
      • if R-\ln(1+2^i)>0
        • s_i = 1 , R=R-\ln(1+2^i)
      • else
        • s_i=0

SIMPLIFYING THE SELECTION RULE

  • Sometimes the iterative steps can be simplified by analyzing the numerical possibilities in the "binary representation" of the hardware domain.
  • Examine the subtractive term:
    • \ln(1+s_i 2^i) = s_i 2^{-i} - s_i^2 \frac{2^{2^{-2i}}}{2} + s_i^3 \frac{2^{-3i}}{3} -...
    • Only the first term affects bit -i
    • Therefore, in each iteration we can choose s_i to cancel the -i bit in the remainder R, using only selection values 0,1
    • This simplifies the selection rule to just checking the bit at -i and requires one iteration per bit of the input argument (R=X).

ALLOWING LARGER THAN FRACTIONAL ARGUMENTS

  • Let x be represented by (I+f)S where

    • I is an integer
    • f a fractional value with range 0\le f \lt 1 and
    • S is a scale
  • X : X=(I+f)\ln 2.

    • Find I and f by scaling X down by S=\ln 2 and splitting the result at the decimal (I.f)
  • Therefore we can compute e^X by augmenting the previous method with a shift:
    e^X = e^{(I+f)\ln 2} = e^{(I\ln 2+f\ln 2)}=e^{\ln 2^I+f\ln 2}=\underbrace{2^I\cdot}_{\rm shift} e^{f \ln 2}

  • set R = f ln 2, this is a fraction R\in[0,\ln 2=0.693)

CORDIC

  • A popular method for iterative computation of trigonometric functions is called CORDIC (COordinate Rotation Digital Computer)
  • reduces trig functions to a series of only shift and add/sub operations with one multiplication gain correction at the end
  • next lecture ...

No hay comentarios:

Publicar un comentario