The QR algorithm
The QR algorithm is the algorithm employed for the last 50 years for computing eigenvalues and eigenvectors.Before 1961
Before 1961, one bad way to compute eigenvalues of a matrix
(that cannot be computed accurately in floating point). Finding
eigenvalues of matrices via roots of polynomials is --- regardless of
the algorithmic details --- a numerically unstable eigensolver.
# Find eigenvalues of $A$ via roots of the characteristic polynomial:
using Polynomials
n = 10
A = randn(n,n)
p(x) = det(A-x*eye(n))
x = collect(linspace(-1,1,n+1))
cfs = polyfit(x, map(p,x))
norm( sort(real(roots(cfs))) - sort(real(eigvals(A))), Inf)
The unshifted QR algorithm
John Francis' idea in 1961 for computing the eigenvalues of, then reverse the factors, then compute the QR factorization of the result, before reversing the factors, and so on.
It turns out that the sequence
have the same eigenvalues and for any large integer
For example, below we take a random
matrix and plot the sparsity pattern of the matrix
.
# unshift QR algorithm:
using PyPlot
A = rand(100,100);
for k = 1:1000
(Q,R) = qr(A)
A = R*Q
end
spy(abs(A).>1e-4)
After 1000
steps we find that A word of warning
The sequencecomputed by the unshifted QR algorithm above does not always converge. For example, consider the matrix
and the unshift QR algorithm is stagnate. Below, we will fix that with
Wilkinson shifts. (Note that Rayleigh quotient shifts do not fix this
example.)
Why does the sequence A0,A1,…,
have the same eigenvalues?
Answer: Similar matrices. Two matricesand
have the same eigenvalues.
A = rand(10,10)
A0 = A
e0 = eigvals(A0) # Eigenvalues of original matrix
(Q,R) = qr(A0)
e1 = eigvals(R*Q) # Eigenvalues of A_1
# Same real and imag part:
norm( sort(real(e0)) - sort(real(e1)) ), norm( sort(imag(e0)) - sort(imag(e1)) )
Why does the sequence A0,A1,…,
usually converge to an upper-triangular matrix?
The secret to why the QR algorithm produces iterates that usually converge to reveal the eigenvalues is from the fact that the algorithm is a well-disguised (successive) power method. A very first idea to calculate eigenvalues might be to perform the power iteration on a basisof
all tend to be close to a multiple of the eigenvector of
The QR algorithm based on Gram-Schmidt and the power iteration
Letbe
may not be orthonormal numerically. To try it out, we implement this
algorithm below for symmetric matrices so that the Schur vectors are in
fact eigenvalues:
function OrthogonalProjection( v::Vector, Q::Matrix )
# Orthogonal projection
m = size(Q,2)
for k = 1:m
c = dot(Q[:,k], v)
for j = 1:size(v,1)
v[j] -= c*Q[j,k]
end
end
return v
end
function OrthogonalPowerIteration( v::Vector, Q::Matrix )
# Do Power iteration, while projecting:
m = size(Q,2)
for k = 1:1000
v = OrthogonalProjection( A*v, Q )
v = v/norm(v)
end
return v
end
function OrthogonalPowerIteration( v::Vector )
# Do power iteration while projecting:
for k = 1:1000
v = A*v
v = v/norm(v)
end
return v
end
function GramSchmidtBasedQRAlgorithm( A::Matrix )
# Use Gram-Schmidt and power iteration to compute the eigenvalues of A:
n = size(A,1)
v = rand(n)
Q = (OrthogonalPowerIteration( v )')'
for k = 2:n
v = rand( n )
v = OrthogonalPowerIteration( v, Q )
Q = hcat(Q, v)
end
return sort(mean((A*Q)./Q,1)[:],rev=false)
end
Here, we test this code to witness the numerical instability:
# Random matrix:
srand(1234)
A = rand(40,40)
A = A'*A
# True eigenvectors and eigenvalues.
(Λ,S) = eig( A )
# Compute eigenvalues of A via power iteration and Gram-Schmidt:
λ = GramSchmidtBasedQRAlgorithm( A::Matrix )
norm( Λ - λ )
Orthogonal iteration
Of course, we can greatly improve the situation by using the modified Gram-Schmidt and the power method, where we do not wait for the vectorhere are computed so that
is upper-triangular.
function OrthogonalIteration( A::Matrix )
# Orthogonal iteration on A:
n = size(A,1)
(Q,R) = qr(rand(n,n))
for k = 1:10000
(Q,R) = qr(A*Q)
end
return Q
end
# For orthogonal iteration U^TAU ---> triangular: (This may not
# always work because we don't have shifts...)
A = rand(3,3)
U = OrthogonalIteration( A )
triu(U'*A*U) ≈ U'*A*U
Orthogonal iteration to QR algorithm
One can view the QR algorithm as a well-designed version of orthogonal iteration withat the
A proof of convergence
Here, we prove the following convergence theorem with lots of assumption to make the proof as easy as possible. Afterwards, we write down two way to make the theorem strong.Theorem: Let
be an
Proof For simplicity here we will start by assuming that every computed QR factorization of a matrix involves an upper-triangular matrix
with nonnegative diagonal entries. This ensures that the QR factorization is unique (see class exercises). Let
There are two ways to improve the theorem above and the reader may like to try them:
- Keep the assumption that
QT=LU
- is positive definite, but keep the assumption that every eigenvalue is distinct.
Computational considerations
While the basic QR algorithm can be used to compute eigenvalues it is (1) Computationally expensive (requiringoperations per iteration) and (2) Can have a painfully slow convergence depending on the eigenvalues of
Reduction to a similar upper-Hessenberg matrix
The idea here is reduce the matrixto an upper-Hessenberg matrix
(see class lecture notes). Here, is an implementation of the Hessenberg reduction algorithm:
function HessenbergReduction( A::Matrix )
# Reduce A to a Hessenberg matrix H so that A and H are similar:
n = size(A, 1)
H = A
if ( n > 2 )
a1 = A[2:n, 1]
e1 = zeros(n-1); e1[1] = 1
sgn = sign(a1[1])
v = (a1 + sgn*norm(a1)*e1); v = v./norm(v)
Q1 = eye(n-1) - 2*(v*v')
A[2:n,1] = Q1*A[2:n,1]
A[1,2:n] = Q1*A[1,2:n]
A[2:n,2:n] = Q1*A[2:n,2:n]*Q1'
H = HessenbergReduction( A[2:n,2:n] )
else
H = copy(A)
end
return A
end
Here, we check that the Hessenberg reduction works.
A = rand(4,4)
e = eigvals(A)
H = HessenbergReduction( A )
e ≈ eigvals(A)
After this our faster QR algorithm with look like this:
Set A0=QTHAQH= upper-Hessenberg,for k = 1,2,... (until convergence)Compute Ak−1=QkRkSet Ak=RkQkend
QR algorithm with shifts
The unshift QR algorithm with Hessenberg reduction only costsDeflation
Notice that if--- which is converging to upper-triangular --- had the form (notice the bold zero)
Thus, if we happen to be lucky enough for the
entry to become close to zero, then we have approximately found one of the eigenvalues and can work with the
The shifted QR algorithm
Instead of waiting for luck to allow for a deflation, Francis used a shifted version of the QR algorithm that is motivated by its origins from the power method. The shifted QR algorithm looks like this:converge to zero. A reasonable choice of the shift is the Rayleigh quotient, where
The problem with the Rayleigh quotient shift is: it does not work. Consider, for example,
submatrix
.
function WilkinsonShift( a::Number, b::Number, c::Number )
# Calculate Wilkinson's shift for symmetric matrices:
δ = (a-c)/2
return c - sign(δ)*b^2/(abs(δ) + sqrt(δ^2+b^2))
end
function QRwithShifts( A::Matrix )
# The QR algorithm for symmetric A with Rayleigh shifts and Hessenberg reduction. Please use eigvals() in
# Julia for serious applications.
n = size(A,1)
myeigs = zeros(n)
if ( n == 1 )
myeigs[1] = A[1,1]
else
I = eye( n )
# Reduction to Hessenberg form:
A = HessenbergReduction( A )
# Let's start the shifted QR algorithm with
while( norm(A[n,n-1]) > 1e-10 )
mu = WilkinsonShift( A[n-1,n-1], A[n,n], A[n-1,n] )
# This line should use faster Hessenberg reduction:
(Q,R) = qr(A - mu*I)
# This line needs speeding up, currently O(n^3) operations!:
A = R*Q + mu*I
end
# Deflation and recurse:
myeigs = [A[n,n] ; QRwithShifts( A[1:n-1, 1:n-1] )]
end
return myeigs
end
For symmetric matrices, Wilkinson shifts and the shifted QR algorithm will always converge (in exact arithmetic). For example,
n = 100
A = randn(n,n); A = A'*A
er = QRwithShifts(A)
sort(er) ≈ sort(eigvals(A))
No hay comentarios:
Publicar un comentario