tl;dr They reported a condition number, not necessarily the right condition number for the matrix, because there is a difference.
This is specific to the matrix and the right hand side vector. If you look at the documentation for *getrs
, it says the forward error bound is
∥x−x0∥∞∥x∥∞≲cond(A,x)u≤cond(A)u.
Here
cond(A,x) is not quite the usual condition number
κ∞(A), but rather
cond(A,x)=∥|A−1||A||x|∥∞∥x∥∞,cond(A)=∥|A−1||A|∥.
(Here inside the norm these are component-wise absolute values.) See, for example,
Iterative refinement for linear systems and LAPACK by Higham, or Higham's
Accuracy and Stability of Numerical Algorithms (7.2).
For your example, I took a pseudospectral differential operator for a similar problem with n=128, and there is in fact a big difference between ∥|A−1||A|∥ and κ∞(A), I computed 7×103 and 2.6×107, which is enough to explain the observation that this happens for all right hand sides, because the orders of magnitudes roughly match what is seen in Table 3.1 (3-4 orders better errors). This doesn't work when I try the same for just a random ill-conditioned matrix, so it has to be a property of A.
An explicit example for which the two condition numbers don't match, which I took from Higham (7.17, p.124), due to Kahan is
⎛⎝⎜2−11−1ϵϵ1ϵϵ⎞⎠⎟,⎛⎝⎜2+2ϵ−ϵϵ⎞⎠⎟.
Another example I found is just the plain Vandermonde matrix on
[1:10]
with random
b. I went through
MatrixDepot.jl
and some other ill-conditioned matrices also produce this type of result, like
triw
and
moler
.
Essentially, what's going on is that when you analyze the stability of solving linear systems with respect to perturbations, you first have to specify which perturbations you are considering. When solving linear systems with LAPACK, this error bound considers component-wise perturbations in A, but no perturbation in b. So this is different from the usual κ(A)=∥A−1∥∥A∥, which considers normwise perturbations in both A and b.
Consider (as a counterexample) also what would happen if you don't make the distinction. We know that using iterative refinement with double precision (see link above) we can get the best possible forward relative error of O(u) for those matrices with κ(A)≪1/u. So if we consider the idea that linear systems can't be solved to accuracy better than κ(A)u, how would refining solutions possibly work?
P.S. It matters that ?getrs
says the computed solution is the true solution of (A + E)x = b
with a perturbation E in A, but no perturbation in b. Things would be different if perturbations were allowed in b.
Edit To show this working more directly, in code, that this is not a fluke or a matter of luck, but rather the (unusual) consequence of two condition numbers being very different for some specific matrices, i.e.,
cond(A,x)≈cond(A)≪κ(A).
function main2(m=128)
A = matrixdepot("chebspec", m)^2
A[1,:] = A[end,:] = 0
A[1,1] = A[end,end] = 1
best, worst = Inf, -Inf
for k=1:2^5
b = randn(m)
x = A \ b
x_exact = Float64.(big.(A) \ big.(b))
err = norm(x - x_exact, Inf) / norm(x_exact, Inf)
best, worst = min(best, err), max(worst, err)
end
@printf "Best relative error: %.3e\n" best
@printf "Worst relative error: %.3e\n" worst
@printf "Predicted error κ(A)*ε: %.3e\n" cond(A, Inf)*eps()
@printf "Predicted error cond(A)*ε: %.3e\n" norm(abs.(inv(A))*abs.(A), Inf)*eps()
end
julia> main2()
Best relative error: 2.156e-14
Worst relative error: 2.414e-12
Predicted error κ(A)*ε: 8.780e-09
Predicted error cond(A)*ε: 2.482e-12
Edit 2 Here is another example of the same phenomenon where the different conditions numbers unexpectedly differ by a lot. This time,
cond(A,x)≪cond(A)≈κ(A).
Here
A is the 10×10 Vandermonde matrix on
1:10, and when
x is chosen randomly,
cond(A,x) is noticably smaller than
κ(A), and the worst case
x is given by
xi=ia for some
a.
function main4(m=10)
A = matrixdepot("vand", m)
lu = lufact(A)
lu_big = lufact(big.(A))
AA = abs.(inv(A))*abs.(A)
for k=1:12
# b = randn(m) # good case
b = (1:m).^(k-1) # worst case
x, x_exact = lu \ b, lu_big \ big.(b)
err = norm(x - x_exact, Inf) / norm(x_exact, Inf)
predicted = norm(AA*abs.(x), Inf)/norm(x, Inf)*eps()
@printf "relative error[%2d] = %.3e (predicted cond(A,x)*ε = %.3e)\n" k err predicted
end
@printf "predicted κ(A)*ε = %.3e\n" cond(A)*eps()
@printf "predicted cond(A)*ε = %.3e\n" norm(AA, Inf)*eps()
end
Average case (almost 9 orders of magnitude better error):
julia> T.main4()
relative error[1] = 6.690e-11 (predicted cond(A,x)*ε = 2.213e-10)
relative error[2] = 6.202e-11 (predicted cond(A,x)*ε = 2.081e-10)
relative error[3] = 2.975e-11 (predicted cond(A,x)*ε = 1.113e-10)
relative error[4] = 1.245e-11 (predicted cond(A,x)*ε = 6.126e-11)
relative error[5] = 4.820e-12 (predicted cond(A,x)*ε = 3.489e-11)
relative error[6] = 1.537e-12 (predicted cond(A,x)*ε = 1.729e-11)
relative error[7] = 4.885e-13 (predicted cond(A,x)*ε = 8.696e-12)
relative error[8] = 1.565e-13 (predicted cond(A,x)*ε = 4.446e-12)
predicted κ(A)*ε = 4.677e-04
predicted cond(A)*ε = 1.483e-05
Worst case (a=1,…,12):
julia> T.main4()
relative error[ 1] = 0.000e+00 (predicted cond(A,x)*ε = 6.608e-13)
relative error[ 2] = 1.265e-13 (predicted cond(A,x)*ε = 3.382e-12)
relative error[ 3] = 5.647e-13 (predicted cond(A,x)*ε = 1.887e-11)
relative error[ 4] = 8.895e-74 (predicted cond(A,x)*ε = 1.127e-10)
relative error[ 5] = 4.199e-10 (predicted cond(A,x)*ε = 7.111e-10)
relative error[ 6] = 7.815e-10 (predicted cond(A,x)*ε = 4.703e-09)
relative error[ 7] = 8.358e-09 (predicted cond(A,x)*ε = 3.239e-08)
relative error[ 8] = 1.174e-07 (predicted cond(A,x)*ε = 2.310e-07)
relative error[ 9] = 3.083e-06 (predicted cond(A,x)*ε = 1.700e-06)
relative error[10] = 1.287e-05 (predicted cond(A,x)*ε = 1.286e-05)
relative error[11] = 3.760e-10 (predicted cond(A,x)*ε = 1.580e-09)
relative error[12] = 3.903e-10 (predicted cond(A,x)*ε = 1.406e-09)
predicted κ(A)*ε = 4.677e-04
predicted cond(A)*ε = 1.483e-05
Edit 3 Another example is the Forsythe matrix, which is a perturbed Jordan block of any size of the form
A=⎛⎝⎜⎜⎜000ϵ100001000010⎞⎠⎟⎟⎟.
This has
∥A∥=1,
∥A−1∥=ϵ−1, so
κ∞(A)=ϵ−1, but
|A−1|=A−1=|A|−1, so
cond(A)=1. And as can be verified by hand, solving systems of linear equations like
Ax=b with pivoting is extremely accurate, despite the potentially unbounded
κ∞(A). So this matrix too will yield unexpectedly precise solutions.
Edit 4 Kahan matrices are also like this, with cond(A)≪κ(A):
A = matrixdepot("kahan", 48)
κ, c = cond(A, Inf), norm(abs.(inv(A))*abs.(A), Inf)
@printf "κ=%.3e c=%.3e ratio=%g\n" κ c (c/κ)
κ=8.504e+08 c=4.099e+06 ratio=0.00482027