http://www.math.cornell.edu/~web6140/TopTenAlgorithms/QRalgorithm.html

The QR algorithm

The QR algorithm is the algorithm employed for the last 50 years for computing eigenvalues and eigenvectors.

Before 1961

Before 1961, one bad way to compute eigenvalues of a matrix

A

was to calculate the roots of the characteristic polynomial, i.e., find the zeros of

p (x) = d e t (A - x I),

where

I

is the identity matrix. The accuracy of this approach depends on the basis employed for the polynomial as well the eigenstructure of

A

. For example, a semisimple eigenvalue of

A

(that should be computed to full accuracy) becomes a root of

p

with multiplicity

> 1

(that cannot be computed accurately in floating point). Finding eigenvalues of matrices via roots of polynomials is --- regardless of the algorithmic details --- a numerically unstable eigensolver.

# Find eigenvalues of $A$ via roots of the characteristic polynomial:
using Polynomials 
n = 10
A = randn(n,n) 
p(x) = det(A-x*eye(n))
x = collect(linspace(-1,1,n+1))
cfs = polyfit(x, map(p,x))
norm( sort(real(roots(cfs))) - sort(real(eigvals(A))), Inf)

2.2829160783999214e-10

The unshifted QR algorithm

John Francis' idea in 1961 for computing the eigenvalues of

A

is (without any bells or whistles) surprisingly simple. We will refer to this as the unshifted QR algorithm. It looks like this:

Set A 0 = A, for k = 1,2,... (until convergence) Compute A k - 1 = Q k R k Set A k = R k Q k end

That is, compute the QR factorization of

A

, then reverse the factors, then compute the QR factorization of the result, before reversing the factors, and so on.
It turns out that the sequence

A_{0}, A_{1}, \dots,

have the same eigenvalues and for any large integer

K

the matrix

A_{K}

is usually close to being upper-triangular. Since the eigenvalues of an upper-triangular matrix lie on its diagonal, the iteration above will allow us to read off the eigenvalues of

A

from the diagonal entries of

A_{K}

. Once we have the eigenvalues, the eigenvectors can be computed, for example, by an inverse power iteration.
For example, below we take a random

100 \times 100

matrix and plot the sparsity pattern of the matrix

A_{1000}

# unshift QR algorithm:
using PyPlot
A = rand(100,100); 

for k = 1:1000
    (Q,R) = qr(A)
    A = R*Q
end
spy(abs(A).>1e-4)

PyObject <matplotlib.image.AxesImage object at 0x31f497dd0>

After

1000

steps we find that

A_{1000}

is nearly, but not quite, upper-triangular. After another

1000

iterations, or so, this example will converge to upper-triangular so we can read off the eigenvalues.

A word of warning

The sequence

A_{0}, A_{1}, \dots,

computed by the unshifted QR algorithm above does not always converge. For example, consider the matrix

A = A 0 = [0110] .

In this example,

A_{k} = A_{0}

for all

k

and the unshift QR algorithm is stagnate. Below, we will fix that with Wilkinson shifts. (Note that Rayleigh quotient shifts do not fix this example.)

Why does the sequence $A_{0}, A_{1}, \dots,$

have the same eigenvalues?

Answer: Similar matrices. Two matrices

A

and

B

are similar if there exists an invertible matrix

S

so that

A = S^{- 1} B S

. Similar matrices have the same eigenvalues:

A v - = λ v - \Leftrightarrow B (S - 1 v -) = λ (S - 1 v -) .

The iterates

A_{0}, A_{1}, \dots,

from the QR algorithm are similar matrices since

A k + 1 = R k Q k = Q - 1 k Q k R k Q k = Q - 1 k A K Q k .

Therefore,

A_{0}, A_{1}, \dots,

have the same eigenvalues.

A = rand(10,10)
A0 = A
e0 = eigvals(A0)   # Eigenvalues of original matrix
(Q,R) = qr(A0)
e1 = eigvals(R*Q)  # Eigenvalues of A_1
# Same real and imag part:
norm( sort(real(e0)) - sort(real(e1)) ), norm( sort(imag(e0)) - sort(imag(e1)) )

(2.2760418195658854e-15,9.550499576785472e-16)

Why does the sequence $A_{0}, A_{1}, \dots,$

usually converge to an upper-triangular matrix?

The secret to why the QR algorithm produces iterates that usually converge to reveal the eigenvalues is from the fact that the algorithm is a well-disguised (successive) power method. A very first idea to calculate eigenvalues might be to perform the power iteration on a basis

{\underline{x}}_{1}, \dots, {\underline{x}}_{n}

R^{n}

instead of just one vector. That is, to consider the sequences:

x - - j, A x - - j, A 2 x - - j, \dots .

Unfortunately, this does not work as the vectors

A^{k} {\underline{x}}_{j}

all tend to be close to a multiple of the eigenvector of

A

corresponding to the eigenvalue of largest magnitude. Moreover,

‖ A^{k} {\underline{x}}_{j} ‖

usually overflows or underflows for moderate

k

. We can resolve the situation by orthonormalizing the vectors after each application of

A

to prevent them from all converging to the dominate eigenvector.

The QR algorithm based on Gram-Schmidt and the power iteration

Let

{\underline{x}}_{1}, \dots, {\underline{x}}_{n}

n

linearly independent vectors in

R^{n}

and suppose that

v_{1}, \dots, v_{n}

are an orthonormal basis of Schur vectors for

A

(see wiki). Consider the following algorithm for computing the vectors

v_{1}, \dots, v_{n}

based on the Gram-Schmidt procedure and the power method:

u (1) 0 u (2) 0 u (2) 0 = u (2) 0 - ⋮ u (r) 0 u (r) 0 = u (r) 0 - = x - - 1 ∥ x - - 1 ∥, u (1) 1 = A u ( 1 ) 0 ∥ A u ( 1 ) 0 ∥, u (1) 2 = A u ( 1 ) 1 ∥ A u ( 1 ) 1 ∥, \dots ⟶ \pm v - 1 = x - - 2 ∥ x - - 2 ∥, u (2) 1 = A u ( 2 ) 0 ∥ A u ( 2 ) 0 ∥, u (2) 2 = A u ( 2 ) 1 ∥ A u ( 2 ) 1 ∥, \dots ⟶ \pm v - 2 v - T 1 u (2) 0 v - 1, u (2) 1 = u (2) 1 - v - T 1 u (2) 1 v - 1, ⋮ = x - - 2 ∥ x - - 2 ∥, u (r) 1 = A u ( r ) 0 ∥ A u ( r ) 0 ∥, u (r) 2 = A u ( r ) 1 ∥ A u ( r ) 1 ∥, \dots ⟶ \pm v - r \sum j = 1 r - 1 v - T j u (r) 0 v - j, u (r) 1 = u (r) 1 - \sum j = 1 r - 1 v - T j u (r) 1 v - j,

While this algorithm will be far better than the power iteration without orthogonalization, this algorithm is still numerically unstable (as it is based on Gram-Schmidt). In particular, the computed vectors

{\underline{v}}_{1}, \dots, {\underline{v}}_{n}

may not be orthonormal numerically. To try it out, we implement this algorithm below for symmetric matrices so that the Schur vectors are in fact eigenvalues:

function OrthogonalProjection( v::Vector, Q::Matrix )
   # Orthogonal projection 
    m = size(Q,2)
    for k = 1:m
        c = dot(Q[:,k], v)
        for j = 1:size(v,1)
            v[j] -= c*Q[j,k]
        end
    end
    return v
end

function OrthogonalPowerIteration( v::Vector, Q::Matrix )
   # Do Power iteration, while projecting:
    m = size(Q,2)
    for k = 1:1000
        v = OrthogonalProjection( A*v, Q )
        v = v/norm(v)
    end
    return v
end

function OrthogonalPowerIteration( v::Vector )
   # Do power iteration while projecting:
    for k = 1:1000
        v = A*v
        v = v/norm(v)
    end
    return v
end

function GramSchmidtBasedQRAlgorithm( A::Matrix )
    # Use Gram-Schmidt and power iteration to compute the eigenvalues of A:
    n = size(A,1)
    v = rand(n)
    Q = (OrthogonalPowerIteration( v )')'
    for k = 2:n
        v = rand( n )
        v = OrthogonalPowerIteration( v, Q )
        Q = hcat(Q, v)
    end
    return sort(mean((A*Q)./Q,1)[:],rev=false)
end

GramSchmidtBasedQRAlgorithm (generic function with 1 method)

Here, we test this code to witness the numerical instability:

# Random matrix: 
srand(1234)
A = rand(40,40)
A = A'*A
# True eigenvectors and eigenvalues.
(Λ,S) = eig( A )
# Compute eigenvalues of A via power iteration and Gram-Schmidt: 
λ = GramSchmidtBasedQRAlgorithm( A::Matrix )
norm( Λ - λ )

2.3313616336807864e-9

Orthogonal iteration

Of course, we can greatly improve the situation by using the modified Gram-Schmidt and the power method, where we do not wait for the vector

{\underline{v}}_{1}

before doing a power iteration for

{\underline{v}}_{2}

. Instead, we do the power method simutaneously on every vector and orthogonalize after every step. A even better idea is to use Householder reflections for extra numerical stability. This gives us an algorithm called orthogonal iteration. The algorithm looks like this:

Set U (0) = [x 1 | \dots | x n], for k = 1,2,... (until convergence) Compute Q (k) R (k) = U (k - 1) Set U (k) = A Q (k) end

The matrix iterates

U^{(0)}, U^{(1)}, \dots,

here are computed so that

(U^{(k)})^{T} A U^{(k)} \to T

, where

T

is upper-triangular.

function OrthogonalIteration( A::Matrix )
    # Orthogonal iteration on A:
    n = size(A,1)
    (Q,R) = qr(rand(n,n))
    for k = 1:10000
        (Q,R) = qr(A*Q)
    end
    return Q
end

OrthogonalIteration (generic function with 1 method)

# For orthogonal iteration U^TAU ---> triangular:  (This may not 
# always work because we don't have shifts...)
A = rand(3,3)
U = OrthogonalIteration( A )
triu(U'*A*U) ≈ U'*A*U

false

Orthogonal iteration to QR algorithm

One can view the QR algorithm as a well-designed version of orthogonal iteration with

U^{(0)} = I

. The connection can seen from the fact that they are both computing QR factorizations of the matrix

A^{k}

QR algorithm A k Orthogonal iteration A k = Q 1 R 1 Q 1 R 1 \dots Q 1 R 1 = Q 1 Q 2 R 2 Q 1 \dots Q 1 R 1 = \dots = (Q 1 \dots Q k) (R k \dots R 1) = A A k - 1 = A Q (k - 1) (Q (k - 1)) T A k - 1 = Q (k) R (k) (Q (k - 1)) T A k - 1 = \dots = Q (k) (R (k) \dots R (1)) .

John Francis' algorithm is computing

A_{k} = Q_{k} R_{k}

at the

k

th iteration, which is converging to an upper-triangular matrix. This is not too surprising because we expect

Q_{1} \dots Q_{k}

to converge to an orthogonal matrix so

Q_{k}

should converge to the identity matrix and hence,

Q_{k} R_{k}

will usually converge to an upper-triangular matrix. Since

A_{k} = Q_{k} R_{k}

is similar to

A

, the eigenvalues of

R_{k}

should converge to the eigenvalues of

A

k \to \infty

. In this next section, we make this intuition precise.

A proof of convergence

Here, we prove the following convergence theorem with lots of assumption to make the proof as easy as possible. Afterwards, we write down two way to make the theorem strong.
Theorem: Let

A

be an

n \times n

symmetric positive definite matrix with distinct eigenvalues given by

λ 1 > λ 2 > \dots > λ n > 0.

Assume further that

A = Q Λ Q^{T}

is the eigenvalue decomposition of

A

, where

Q^{T} = L U

has an LU decomposition and the diagonal entries of

U

are nonnegative. Then, the unshift QR algorithm on

A

computes iterates

A_{1}, A_{2}, A_{3}, \dots,

that converge to a diagonal matrix.
Proof For simplicity here we will start by assuming that every computed QR factorization of a matrix involves an upper-triangular matrix

R

with nonnegative diagonal entries. This ensures that the QR factorization is unique (see class exercises). Let

A = Q Λ Q T

be the eigenvalue decomposition for

A

. Then,

A^{k}

can be written as

A k = Q Λ k Q T = (Q 1 \dots Q k) (R k \dots R 1) .

Since

Q^{T} = L U

exists by assumption (recall that

L

is unit lower triangular so has

1

's on the diagonal), we find

Q Λ k L Λ - k = (Q 1 \dots Q k) (R k \dots R 1) U - 1 Λ - k .

Considering the matrix

Λ^{k} L Λ^{- k}

, we observe that

(Λ k L Λ - k) i j = ⎧ ⎩ ⎨ ⎪ ⎪ ⎪ ⎪ ℓ i j (λ i λ j) k, 1 0, otherwise i > j, i = j,

so the matrix

Λ^{k} L Λ^{- k} \to I

k \to \infty

. We conclude that

Q Λ^{k} L Λ^{- k} \to Q

and

(Q_{1} \dots Q_{k}) (R_{k} \dots R_{1}) U^{- 1} Λ^{- k} \to Q

. Since the QR factorization is unique (when the triangular matrix has nonnegative entries), we have

Q_{1} \dots Q_{k} \to Q

and

(R_{k} \dots R_{1}) U^{- 1} Λ^{- k} \to I

. Finally, we find that

Q_{k} \to I

and

Q T k \dots Q T 1 A Q 1 \dots Q k = A k = Q k R k \to Λ,

where the last convergence results follows because

Q_{k} \to I

and

Q_{k} R_{k}

is symmetric and similar to

A

.
There are two ways to improve the theorem above and the reader may like to try them:

Keep the assumption that $Q^{T} = L U$

exists, but remove the assumption that

U

has nonnegative diagonal entries, and

is positive definite, but keep the assumption that every eigenvalue is distinct.

Computational considerations

While the basic QR algorithm can be used to compute eigenvalues it is (1) Computationally expensive (requiring

O (n^{3})

operations per iteration) and (2) Can have a painfully slow convergence depending on the eigenvalues of

A

. There are three ideas to improve the situation: (1) Reduce the matrix

A

to a similar matrix that is upper-Hessenberg (Hessenberg structure is preserved by the QR algorithm, see class exercises), which reduces the cost per iteration to

O (n^{2})

operations, (2) Once an eigenvalue has been computed deflate away this matrix, which greatly speeds up later eigenvalue computations, and (3) Use "shifts" in the QR algorithm.

Reduction to a similar upper-Hessenberg matrix

The idea here is reduce the matrix

A

to an upper-Hessenberg matrix

H

so that

A

and

H

have the same eigenvalues. We can do this by making sure that

A

and

H

are similar matrices. Our first step is to make the first column of

A

look like an upper-Hessenberg matrix. We use a Householder reflection for that (if

a_{1}

is the first column of

A

, then consider the Householder reflection for the vector

a_{1} (2 : n)

) and to ensure the resulting matrix is similar we apply the transpose of the Householder reflection on the right:

Q v A Q T v = ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ * * * * * ⋮ * * * * ⋮ * \dots \dots \dots ⋱ \dots * * * ⋮ * ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ = B

The process occurs recursively now on the

(n - 1) \times (n - 1)

principle matrix obtained after deleting the first row and column. The sequence of similar matrices looks like this:

⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ * * * ⋮ * * * * ⋮ * * * * ⋮ * \dots \dots \dots ⋱ \dots * * * ⋮ * ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ \to ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ * * * * * ⋮ * * * * ⋮ * \dots \dots \dots ⋱ \dots * * * ⋮ * ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ \to ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ * * * * * * * * ⋮ * \dots \dots \dots ⋱ \dots * * * ⋮ * ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ \dots \to ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ * * * * * * * * ⋱ \dots \dots \dots ⋱ * * * * ⋮ * ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ .

The QR factorization of a upper-Hessenberg matrix is preserved by the QR algorithm and can be computed in

O (n^{2})

(see class lecture notes). Here, is an implementation of the Hessenberg reduction algorithm:

function HessenbergReduction( A::Matrix )
    # Reduce A to a Hessenberg matrix H so that A and H are similar:
    
    n = size(A, 1)
    H = A
    if ( n > 2 )
        a1 = A[2:n, 1]
        e1 = zeros(n-1); e1[1] = 1
        sgn = sign(a1[1])
        v = (a1 + sgn*norm(a1)*e1); v = v./norm(v)
        Q1 = eye(n-1) - 2*(v*v')
        A[2:n,1] = Q1*A[2:n,1]
        A[1,2:n] = Q1*A[1,2:n]
        A[2:n,2:n] = Q1*A[2:n,2:n]*Q1' 
        H = HessenbergReduction( A[2:n,2:n] )
    else
        H = copy(A)
    end
   return A
end

HessenbergReduction (generic function with 1 method)

Here, we check that the Hessenberg reduction works.

A = rand(4,4)
e = eigvals(A)
H = HessenbergReduction( A )
e ≈ eigvals(A)

true

After this our faster QR algorithm with look like this:

Set A 0 = Q T H A Q H = upper-Hessenberg, for k = 1,2,... (until convergence) Compute A k - 1 = Q k R k Set A k = R k Q k end

QR algorithm with shifts

The unshift QR algorithm with Hessenberg reduction only costs

O (n^{2})

operations per iteration, but the convergence can be painfully slow. It typically requires thousands of iterations! This is too much. We need to do better.

Deflation

Notice that if

A_{k}

--- which is converging to upper-triangular --- had the form (notice the bold zero)

⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ * * * * * * * * ⋱ \dots \dots \dots ⋱ 0 * * * ⋮ * ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ = [B 11 0 T u a n n],

then

a_{n n}

is an eigenvalue of

A_{k}

and hence, also

A

(since

A_{k}

and

A

are similar). Moreover, the

n - 1

eigenvalues of

B_{11}

are the remaining

n - 1

eigenvalues of

A

.
Thus, if we happen to be lucky enough for the

A_{k} [n, n - 1]

entry to become close to zero, then we have approximately found one of the eigenvalues and can work with the

(n - 1) \times (n - 1)

principle minor of

A_{k}

. This process is known as deflation.

The shifted QR algorithm

Instead of waiting for luck to allow for a deflation, Francis used a shifted version of the QR algorithm that is motivated by its origins from the power method. The shifted QR algorithm looks like this:

Set A 0 = Q T H A Q H = upper-Hessenberg, for k = 1,2,... (until convergence) Compute a shift μ k Compute A k - 1 - μ k I = Q k R k Set A k = R k Q k + μ k I end

One can still check that this algorithm preserves the upper-Hessenberg structure and produces a sequence of similar matrices. The idea of the shift is to quickly make

A_{k} [n, n - 1]

converge to zero. A reasonable choice of the shift is the Rayleigh quotient, where

μ_{k} = A_{k - 1} [n, n]

, because we would like

μ_{k}

to be an estimate for eigenvalue of

A

.
The problem with the Rayleigh quotient shift is: it does not work. Consider, for example,

[0110] .

Instead, Wilkinson designed a shift that does work on symmetric matrices. It looks at the

2 \times 2

submatrix

A_{k} [n - 1 : n, n - 1 : n]

and finds an estimate for an eigenvalue. If the

2 \times 2

submatrix is

[\begin{matrix} a & b \\ b & c \end{matrix}]

, then Wilkinson's shift is given by

μ k = c - s i g n ( δ ) b 2 | δ | + δ 2 + b 2 - - - - - - \sqrt

where

δ = (a - c) / 2

. If

δ = 0

, then

s i g n (δ)

can be arbitrarily selected as

+ 1

- 1

function WilkinsonShift( a::Number, b::Number, c::Number )
    # Calculate Wilkinson's shift for symmetric matrices: 
    δ = (a-c)/2
    return c - sign(δ)*b^2/(abs(δ) + sqrt(δ^2+b^2))
end

function QRwithShifts( A::Matrix )
   # The QR algorithm for symmetric A with Rayleigh shifts and Hessenberg reduction. Please use eigvals() in 
   # Julia for serious applications.
    n = size(A,1)
    myeigs = zeros(n)
    if ( n == 1 )
        myeigs[1] = A[1,1]
    else
        I = eye( n )
        # Reduction to Hessenberg form:
        A = HessenbergReduction( A )
        # Let's start the shifted QR algorithm with 
        while( norm(A[n,n-1]) > 1e-10 )

            mu = WilkinsonShift( A[n-1,n-1], A[n,n], A[n-1,n] )
            # This line should use faster Hessenberg reduction:
            (Q,R) = qr(A - mu*I)
            # This line needs speeding up, currently O(n^3) operations!: 
            A = R*Q + mu*I
        end
        # Deflation and recurse:
        myeigs = [A[n,n] ; QRwithShifts( A[1:n-1, 1:n-1] )]
    end
    return myeigs
end

QRwithShifts (generic function with 1 method)

For symmetric matrices, Wilkinson shifts and the shifted QR algorithm will always converge (in exact arithmetic). For example,

n = 100
A = randn(n,n); A = A'*A
er = QRwithShifts(A)
sort(er) ≈ sort(eigvals(A))

true

The QR algorithm on nonsymmetric matrices

Despite there being no general convergence proof for shifted versions of the QR algorithm on nonsymmetric matrices, there is not a single example I know (at the time of writing) for which the current QR algorithm does not converge. It is extremely robust nowadays and should be used with confidence.

The man behind the QR algorithm

John Francis is an English computer scientist, who invented the QR algorithm in 1961. A year later he left the field of numerical analysis and had no idea of the impact of his work. In 2007, Gene Golub and Frank Uhlig managed to contacted him. He was the opening speaker at a mini-symposium that marked 50 years of the QR algorithm, held at the 23rd Biennial Conference on Numerical Analysis in Glasgow in June 2009.
alt text

Remove the assumption that

A

Diseño Electrónico

jueves, 5 de octubre de 2017

Polyniomial Roots - QR algorithm in 1961 - John Francis

The QR algorithm

Before 1961

The unshifted QR algorithm

A word of warning

Why does the sequence $A_{0}, A_{1}, \dots,$

have the same eigenvalues?

Why does the sequence $A_{0}, A_{1}, \dots,$

usually converge to an upper-triangular matrix?

The QR algorithm based on Gram-Schmidt and the power iteration

Orthogonal iteration

Orthogonal iteration to QR algorithm

A proof of convergence

Computational considerations

Reduction to a similar upper-Hessenberg matrix

QR algorithm with shifts

Deflation

The shifted QR algorithm

The QR algorithm on nonsymmetric matrices

The man behind the QR algorithm

No hay comentarios:

Publicar un comentario

jueves, 5 de octubre de 2017

Polyniomial Roots - QR algorithm in 1961 - John Francis

The QR algorithm

Before 1961

The unshifted QR algorithm

A word of warning

Why does the sequence A0,A1,…,

have the same eigenvalues?

Why does the sequence A0,A1,…,

usually converge to an upper-triangular matrix?

The QR algorithm based on Gram-Schmidt and the power iteration

Orthogonal iteration

Orthogonal iteration to QR algorithm

A proof of convergence

Computational considerations

Reduction to a similar upper-Hessenberg matrix

QR algorithm with shifts

Deflation

The shifted QR algorithm

The QR algorithm on nonsymmetric matrices

The man behind the QR algorithm

No hay comentarios:

Publicar un comentario

Why does the sequence $A_{0}, A_{1}, \dots,$

Why does the sequence $A_{0}, A_{1}, \dots,$