## Franklin's Notes

### Householder triangularization

Householder triangularization is a technique for computing a QR Factorization of a matrix $A$, which can be described as an alternative to the Gram-Schmidt process . Whereas Gram-Schmidt applies a sequence of elementary triangular matrices to $A$ in order to reduce it to a unitary matrix ... ...Householder applies a sequence of elementary unitary matrices to $A$ in order to reduce it to a triangular matrix: Trefethen and Bau sums this up succinctly by describing Gram-Schmidt as triangular orthogonalization and Householder as orthogonal triangularization. The matrices $Q_i$ are reflection matrices designed in such a way that $Q_j\cdots Q_1 A$ has zeroes under the $j$th diagonal entry, and applying each $Q_i$ eliminates the nonzero entries under the $i$th diagonal entry without affecting the entries in the rows above row $i$ or the zeroes below the diagonal in the previous columns.

In particular, the matrix $Q_i$ takes the form

where $I_i$ is the $(i-1)\times (i-1)$ identity matrix and $F_i$ is an $(n-i+1)\times (n-i+1)$ matrix. In order for $Q_i$ to eliminate nonzero entries below the $i$th diagonal entry, we would need $F$ to transform $(n-i+1)\times 1$ vectors as follows:

where the topmost entry of the image is required to be $\pm\lVert x\rVert$ because $F$ must be unitary and therefore preserve length. If $F$ reflects $x$ across the hyperplane $H$ in such a way that $x\mapsto \pm\lVert x\rVert e_1$, then $\pm\lVert x\rVert e_1 - x$ must be orthogonal to the hyperplane $H$. Thus, if $v = \pm\lVert x\rVert e_1 - x$, then the following matrix $P$ is a projector that projects vectors onto $H$:

and therefore the reflector $F_i$ that we desire is given by

Recall that we had two possible choices for $v$: either $+\lVert x\rVert e_1 - x$ or $-\lVert x\rVert e_1 - x$. If we want the best numerical stability possible, we should choose the value of the sign that maps $x$ as far as possible under the reflection $F_i$, in order to minimize the relative size of any roundoff errors that may occur. This means that we should choose the same sign as $x_i$, since $x_i + \text{sgn}(x_i)\lVert x\rVert$ is always greater (or equal) in magnitude compared to $x_i - \text{sgn}(x_i)\lVert x\rVert$. Thus, we should let

If we do not make this choice, catastrophic cancellation can occur when $x$ is very nearly parallel to the x-axis, so that $x\approx \lVert x \rVert e_1$ and $\lVert x \rVert e_1-x\approx 0$.