jueves, 5 de octubre de 2017

Polyniomial Roots - QR algorithm in 1961 - John Francis

http://www.math.cornell.edu/~web6140/TopTenAlgorithms/QRalgorithm.html


The QR algorithm

The QR algorithm is the algorithm employed for the last 50 years for computing eigenvalues and eigenvectors.

Before 1961

Before 1961, one bad way to compute eigenvalues of a matrix A
was to calculate the roots of the characteristic polynomial, i.e., find the zeros of
p(x)=det(AxI),
where I is the identity matrix. The accuracy of this approach depends on the basis employed for the polynomial as well the eigenstructure of A. For example, a semisimple eigenvalue of A (that should be computed to full accuracy) becomes a root of p with multiplicity >1
(that cannot be computed accurately in floating point). Finding eigenvalues of matrices via roots of polynomials is --- regardless of the algorithmic details --- a numerically unstable eigensolver.
# Find eigenvalues of $A$ via roots of the characteristic polynomial:
using Polynomials 
n = 10
A = randn(n,n) 
p(x) = det(A-x*eye(n))
x = collect(linspace(-1,1,n+1))
cfs = polyfit(x, map(p,x))
norm( sort(real(roots(cfs))) - sort(real(eigvals(A))), Inf)
2.2829160783999214e-10

The unshifted QR algorithm

John Francis' idea in 1961 for computing the eigenvalues of A
is (without any bells or whistles) surprisingly simple. We will refer to this as the unshifted QR algorithm. It looks like this:
Set A0=A,for k = 1,2,... (until convergence)Compute Ak1=QkRkSet Ak=RkQkend
That is, compute the QR factorization of A
, then reverse the factors, then compute the QR factorization of the result, before reversing the factors, and so on.
It turns out that the sequence A0,A1,,
have the same eigenvalues and for any large integer K the matrix AK is usually close to being upper-triangular. Since the eigenvalues of an upper-triangular matrix lie on its diagonal, the iteration above will allow us to read off the eigenvalues of A from the diagonal entries of AK. Once we have the eigenvalues, the eigenvectors can be computed, for example, by an inverse power iteration.
For example, below we take a random 100×100
matrix and plot the sparsity pattern of the matrix A1000
.
# unshift QR algorithm:
using PyPlot
A = rand(100,100); 

for k = 1:1000
    (Q,R) = qr(A)
    A = R*Q
end
spy(abs(A).>1e-4)
PyObject <matplotlib.image.AxesImage object at 0x31f497dd0>
After 1000
steps we find that A1000 is nearly, but not quite, upper-triangular. After another 1000 iterations, or so, this example will converge to upper-triangular so we can read off the eigenvalues.

A word of warning

The sequence A0,A1,,
computed by the unshifted QR algorithm above does not always converge. For example, consider the matrix
A=A0=[0110].
In this example, Ak=A0 for all k
and the unshift QR algorithm is stagnate. Below, we will fix that with Wilkinson shifts. (Note that Rayleigh quotient shifts do not fix this example.)

Why does the sequence A0,A1,,

have the same eigenvalues?

Answer: Similar matrices. Two matrices A
and B are similar if there exists an invertible matrix S so that A=S1BS. Similar matrices have the same eigenvalues:
Av=λvB(S1v)=λ(S1v).
The iterates A0,A1,, from the QR algorithm are similar matrices since
Ak+1=RkQk=Q1kQkRkQk=Q1kAKQk.
Therefore, A0,A1,,
have the same eigenvalues.
A = rand(10,10)
A0 = A
e0 = eigvals(A0)   # Eigenvalues of original matrix
(Q,R) = qr(A0)
e1 = eigvals(R*Q)  # Eigenvalues of A_1
# Same real and imag part:
norm( sort(real(e0)) - sort(real(e1)) ), norm( sort(imag(e0)) - sort(imag(e1)) )
(2.2760418195658854e-15,9.550499576785472e-16)

Why does the sequence A0,A1,,

usually converge to an upper-triangular matrix?

The secret to why the QR algorithm produces iterates that usually converge to reveal the eigenvalues is from the fact that the algorithm is a well-disguised (successive) power method. A very first idea to calculate eigenvalues might be to perform the power iteration on a basis x1,,xn
of Rn instead of just one vector. That is, to consider the sequences:
xj,Axj,A2xj,.
Unfortunately, this does not work as the vectors Akxj
all tend to be close to a multiple of the eigenvector of A corresponding to the eigenvalue of largest magnitude. Moreover, Akxj usually overflows or underflows for moderate k. We can resolve the situation by orthonormalizing the vectors after each application of A to prevent them from all converging to the dominate eigenvector.

The QR algorithm based on Gram-Schmidt and the power iteration

Let x1,,xn
be n linearly independent vectors in Rn and suppose that v1,,vn are an orthonormal basis of Schur vectors for A (see wiki). Consider the following algorithm for computing the vectors v1,,vn based on the Gram-Schmidt procedure and the power method:
u(1)0u(2)0u(2)0=u(2)0u(r)0u(r)0=u(r)0=x1x1,u(1)1=Au(1)0Au(1)0,u(1)2=Au(1)1Au(1)1,±v1=x2x2,u(2)1=Au(2)0Au(2)0,u(2)2=Au(2)1Au(2)1,±v2vT1u(2)0v1,u(2)1=u(2)1vT1u(2)1v1,=x2x2,u(r)1=Au(r)0Au(r)0,u(r)2=Au(r)1Au(r)1,±vrj=1r1vTju(r)0vj,u(r)1=u(r)1j=1r1vTju(r)1vj,
While this algorithm will be far better than the power iteration without orthogonalization, this algorithm is still numerically unstable (as it is based on Gram-Schmidt). In particular, the computed vectors v1,,vn
may not be orthonormal numerically. To try it out, we implement this algorithm below for symmetric matrices so that the Schur vectors are in fact eigenvalues:
function OrthogonalProjection( v::Vector, Q::Matrix )
   # Orthogonal projection 
    m = size(Q,2)
    for k = 1:m
        c = dot(Q[:,k], v)
        for j = 1:size(v,1)
            v[j] -= c*Q[j,k]
        end
    end
    return v
end

function OrthogonalPowerIteration( v::Vector, Q::Matrix )
   # Do Power iteration, while projecting:
    m = size(Q,2)
    for k = 1:1000
        v = OrthogonalProjection( A*v, Q )
        v = v/norm(v)
    end
    return v
end

function OrthogonalPowerIteration( v::Vector )
   # Do power iteration while projecting:
    for k = 1:1000
        v = A*v
        v = v/norm(v)
    end
    return v
end

function GramSchmidtBasedQRAlgorithm( A::Matrix )
    # Use Gram-Schmidt and power iteration to compute the eigenvalues of A:
    n = size(A,1)
    v = rand(n)
    Q = (OrthogonalPowerIteration( v )')'
    for k = 2:n
        v = rand( n )
        v = OrthogonalPowerIteration( v, Q )
        Q = hcat(Q, v)
    end
    return sort(mean((A*Q)./Q,1)[:],rev=false)
end
GramSchmidtBasedQRAlgorithm (generic function with 1 method)
Here, we test this code to witness the numerical instability:
# Random matrix: 
srand(1234)
A = rand(40,40)
A = A'*A
# True eigenvectors and eigenvalues.
(Λ,S) = eig( A )
# Compute eigenvalues of A via power iteration and Gram-Schmidt: 
λ = GramSchmidtBasedQRAlgorithm( A::Matrix )
norm( Λ - λ )
2.3313616336807864e-9

Orthogonal iteration

Of course, we can greatly improve the situation by using the modified Gram-Schmidt and the power method, where we do not wait for the vector v1
before doing a power iteration for v2. Instead, we do the power method simutaneously on every vector and orthogonalize after every step. A even better idea is to use Householder reflections for extra numerical stability. This gives us an algorithm called orthogonal iteration. The algorithm looks like this:
Set U(0)=[x1||xn],for k = 1,2,... (until convergence)Compute Q(k)R(k)=U(k1)Set U(k)=AQ(k)end
The matrix iterates U(0),U(1),,
here are computed so that (U(k))TAU(k)T, where T
is upper-triangular.
function OrthogonalIteration( A::Matrix )
    # Orthogonal iteration on A:
    n = size(A,1)
    (Q,R) = qr(rand(n,n))
    for k = 1:10000
        (Q,R) = qr(A*Q)
    end
    return Q
end
OrthogonalIteration (generic function with 1 method)
# For orthogonal iteration U^TAU ---> triangular:  (This may not 
# always work because we don't have shifts...)
A = rand(3,3)
U = OrthogonalIteration( A )
triu(U'*A*U)  U'*A*U
false

Orthogonal iteration to QR algorithm

One can view the QR algorithm as a well-designed version of orthogonal iteration with U(0)=I
. The connection can seen from the fact that they are both computing QR factorizations of the matrix Ak:
QR algorithmAkOrthogonal iterationAk=Q1R1Q1R1Q1R1=Q1Q2R2Q1Q1R1==(Q1Qk)(RkR1)=AAk1=AQ(k1)(Q(k1))TAk1=Q(k)R(k)(Q(k1))TAk1==Q(k)(R(k)R(1)).
John Francis' algorithm is computing Ak=QkRk
at the kth iteration, which is converging to an upper-triangular matrix. This is not too surprising because we expect Q1Qk to converge to an orthogonal matrix so Qk should converge to the identity matrix and hence, QkRk will usually converge to an upper-triangular matrix. Since Ak=QkRk is similar to A, the eigenvalues of Rk should converge to the eigenvalues of A as k. In this next section, we make this intuition precise.

A proof of convergence

Here, we prove the following convergence theorem with lots of assumption to make the proof as easy as possible. Afterwards, we write down two way to make the theorem strong.
Theorem: Let A
be an n×n symmetric positive definite matrix with distinct eigenvalues given by
λ1>λ2>>λn>0.
Assume further that A=QΛQT is the eigenvalue decomposition of A, where QT=LU has an LU decomposition and the diagonal entries of U are nonnegative. Then, the unshift QR algorithm on A computes iterates A1,A2,A3,, that converge to a diagonal matrix.
Proof For simplicity here we will start by assuming that every computed QR factorization of a matrix involves an upper-triangular matrix R
with nonnegative diagonal entries. This ensures that the QR factorization is unique (see class exercises). Let
A=QΛQT
be the eigenvalue decomposition for A. Then, Ak can be written as
Ak=QΛkQT=(Q1Qk)(RkR1).
Since QT=LU exists by assumption (recall that L is unit lower triangular so has 1's on the diagonal), we find
QΛkLΛk=(Q1Qk)(RkR1)U1Λk.
Considering the matrix ΛkLΛk, we observe that
(ΛkLΛk)ij=ij(λiλj)k,10,otherwisei>j,i=j,
so the matrix ΛkLΛkI as k. We conclude that QΛkLΛkQ and (Q1Qk)(RkR1)U1ΛkQ. Since the QR factorization is unique (when the triangular matrix has nonnegative entries), we have Q1QkQ and (RkR1)U1ΛkI. Finally, we find that QkI and
QTkQT1AQ1Qk=Ak=QkRkΛ,
where the last convergence results follows because QkI and QkRk is symmetric and similar to A.
There are two ways to improve the theorem above and the reader may like to try them:
  • Keep the assumption that QT=LU
exists, but remove the assumption that U
  • has nonnegative diagonal entries, and
    • is positive definite, but keep the assumption that every eigenvalue is distinct.

    Computational considerations

    While the basic QR algorithm can be used to compute eigenvalues it is (1) Computationally expensive (requiring O(n3)
    operations per iteration) and (2) Can have a painfully slow convergence depending on the eigenvalues of A. There are three ideas to improve the situation: (1) Reduce the matrix A to a similar matrix that is upper-Hessenberg (Hessenberg structure is preserved by the QR algorithm, see class exercises), which reduces the cost per iteration to O(n2) operations, (2) Once an eigenvalue has been computed deflate away this matrix, which greatly speeds up later eigenvalue computations, and (3) Use "shifts" in the QR algorithm.

    Reduction to a similar upper-Hessenberg matrix

    The idea here is reduce the matrix A
    to an upper-Hessenberg matrix H so that A and H have the same eigenvalues. We can do this by making sure that A and H are similar matrices. Our first step is to make the first column of A look like an upper-Hessenberg matrix. We use a Householder reflection for that (if a1 is the first column of A, then consider the Householder reflection for the vector a1(2:n)) and to ensure the resulting matrix is similar we apply the transpose of the Householder reflection on the right:
    QvAQTv==B
    The process occurs recursively now on the (n1)×(n1) principle matrix obtained after deleting the first row and column. The sequence of similar matrices looks like this:
    .
    The QR factorization of a upper-Hessenberg matrix is preserved by the QR algorithm and can be computed in O(n2)
    (see class lecture notes). Here, is an implementation of the Hessenberg reduction algorithm:
    function HessenbergReduction( A::Matrix )
        # Reduce A to a Hessenberg matrix H so that A and H are similar:
        
        n = size(A, 1)
        H = A
        if ( n > 2 )
            a1 = A[2:n, 1]
            e1 = zeros(n-1); e1[1] = 1
            sgn = sign(a1[1])
            v = (a1 + sgn*norm(a1)*e1); v = v./norm(v)
            Q1 = eye(n-1) - 2*(v*v')
            A[2:n,1] = Q1*A[2:n,1]
            A[1,2:n] = Q1*A[1,2:n]
            A[2:n,2:n] = Q1*A[2:n,2:n]*Q1' 
            H = HessenbergReduction( A[2:n,2:n] )
        else
            H = copy(A)
        end
       return A
    end
    
    HessenbergReduction (generic function with 1 method)
    Here, we check that the Hessenberg reduction works.
    A = rand(4,4)
    e = eigvals(A)
    H = HessenbergReduction( A )
    e  eigvals(A)
    
    true
    After this our faster QR algorithm with look like this:
    Set A0=QTHAQH= upper-Hessenberg,for k = 1,2,... (until convergence)Compute Ak1=QkRkSet Ak=RkQkend

    QR algorithm with shifts

    The unshift QR algorithm with Hessenberg reduction only costs O(n2)
    operations per iteration, but the convergence can be painfully slow. It typically requires thousands of iterations! This is too much. We need to do better.

    Deflation

    Notice that if Ak
    --- which is converging to upper-triangular --- had the form (notice the bold zero)
    0=[B110Tuann],
    then ann is an eigenvalue of Ak and hence, also A (since Ak and A are similar). Moreover, the n1 eigenvalues of B11 are the remaining n1 eigenvalues of A.
    Thus, if we happen to be lucky enough for the Ak[n,n1]
    entry to become close to zero, then we have approximately found one of the eigenvalues and can work with the (n1)×(n1) principle minor of Ak. This process is known as deflation.

    The shifted QR algorithm

    Instead of waiting for luck to allow for a deflation, Francis used a shifted version of the QR algorithm that is motivated by its origins from the power method. The shifted QR algorithm looks like this:
    Set A0=QTHAQH= upper-Hessenberg,for k = 1,2,... (until convergence)Compute a shift μkCompute Ak1μkI=QkRkSet Ak=RkQk+μkIend
    One can still check that this algorithm preserves the upper-Hessenberg structure and produces a sequence of similar matrices. The idea of the shift is to quickly make Ak[n,n1]
    converge to zero. A reasonable choice of the shift is the Rayleigh quotient, where μk=Ak1[n,n], because we would like μk to be an estimate for eigenvalue of A.
    The problem with the Rayleigh quotient shift is: it does not work. Consider, for example,
    [0110].
    Instead, Wilkinson designed a shift that does work on symmetric matrices. It looks at the 2×2
    submatrix Ak[n1:n,n1:n] and finds an estimate for an eigenvalue. If the 2×2 submatrix is [abbc], then Wilkinson's shift is given by
    μk=csign(δ)b2|δ|+δ2+b2
    where δ=(ac)/2. If δ=0, then sign(δ) can be arbitrarily selected as +1 or 1
    .
    function WilkinsonShift( a::Number, b::Number, c::Number )
        # Calculate Wilkinson's shift for symmetric matrices: 
        δ = (a-c)/2
        return c - sign(δ)*b^2/(abs(δ) + sqrt(δ^2+b^2))
    end
    
    function QRwithShifts( A::Matrix )
       # The QR algorithm for symmetric A with Rayleigh shifts and Hessenberg reduction. Please use eigvals() in 
       # Julia for serious applications.
        n = size(A,1)
        myeigs = zeros(n)
        if ( n == 1 )
            myeigs[1] = A[1,1]
        else
            I = eye( n )
            # Reduction to Hessenberg form:
            A = HessenbergReduction( A )
            # Let's start the shifted QR algorithm with 
            while( norm(A[n,n-1]) > 1e-10 )
    
                mu = WilkinsonShift( A[n-1,n-1], A[n,n], A[n-1,n] )
                # This line should use faster Hessenberg reduction:
                (Q,R) = qr(A - mu*I)
                # This line needs speeding up, currently O(n^3) operations!: 
                A = R*Q + mu*I
            end
            # Deflation and recurse:
            myeigs = [A[n,n] ; QRwithShifts( A[1:n-1, 1:n-1] )]
        end
        return myeigs
    end
    
    QRwithShifts (generic function with 1 method)
    For symmetric matrices, Wilkinson shifts and the shifted QR algorithm will always converge (in exact arithmetic). For example,
    n = 100
    A = randn(n,n); A = A'*A
    er = QRwithShifts(A)
    sort(er)  sort(eigvals(A))
    
    true

    The QR algorithm on nonsymmetric matrices

    Despite there being no general convergence proof for shifted versions of the QR algorithm on nonsymmetric matrices, there is not a single example I know (at the time of writing) for which the current QR algorithm does not converge. It is extremely robust nowadays and should be used with confidence.

    The man behind the QR algorithm

    John Francis is an English computer scientist, who invented the QR algorithm in 1961. A year later he left the field of numerical analysis and had no idea of the impact of his work. In 2007, Gene Golub and Frank Uhlig managed to contacted him. He was the opening speaker at a mini-symposium that marked 50 years of the QR algorithm, held at the 23rd Biennial Conference on Numerical Analysis in Glasgow in June 2009.
    alt text
  • Remove the assumption that A
  • No hay comentarios:

    Publicar un comentario