Get real! Solving a complex-value linear system using real arithmetic

Consider a linear system of complex-valued equations of the kind ${\mathbf{AX}=\mathbf{B}}$ , where ${\mathbf{A},\mathbf{B}}$ are complex square and rectangular matrices, respectively, and ${\mathbf{X}}$ is the unknown complex matrix.

To compute ${\mathbf{X}}$ , a natural option is to use any algorithm available in the literature initially conceived for real systems, such as Gauss elimination, LU/Cholesky decomposition, and translate its operations to complex arithmetic.

However, there may exist a couple of scenarios where we must or should do otherwise:

Our favorite software is not complex-friendly. For instance, the IEEE-754 standard for floating-point arithmetic does not natively support complex numbers. Also, TensorFlow comes has limited capabilities with complex numbers in Graph XLA mode;
The matrix of coefficients ${\mathbf{A}=\mathbf{A}_{\mathrm{R}}+i \mathbf{A}_{\mathrm{I}}}$ is ill-conditioned but its real and complex part ${\mathbf{A}_{\mathrm{R}},\mathbf{A}_{\mathrm{I}}}$ , respectively, are individually well-conditioned.

In such cases, one 1) must or 2) should rather first rewrite the problem in real arithmetics and then apply popular numerical algebra techniques operating in the real field.

1. From complex to real systems of equations

As a warm up, we start considering the simple case with just one (complex) unknown ${x}$ :

$\displaystyle a x = b$

or equivalently,

$\displaystyle (a_{\mathrm{R}}+i a_{\mathrm{I}})(x_{\mathrm{R}}+i x_{\mathrm{I}}) = b_{\mathrm{R}}+i b_{\mathrm{I}}$

$\displaystyle a_{\mathrm{R}} x_{\mathrm{R}} - a_{\mathrm{I}} x_{\mathrm{I}} + i(a_{\mathrm{I}} x_{\mathrm{R}} + a_{\mathrm{R}} x_{\mathrm{I}}) = b_{\mathrm{R}}+i b_{\mathrm{I}}$
which, by equating the real and imaginary parts of the two terms, leads to the real system of equations:

$\displaystyle \begin{bmatrix} a_{\mathrm{R}} & -a_{\mathrm{I}} \\ a_{\mathrm{I}} & a_{\mathrm{R}} \end{bmatrix} \begin{bmatrix} x_{\mathrm{R}} \\ x_{\mathrm{I}} \end{bmatrix} = \begin{bmatrix} b_{\mathrm{R}} \\ b_{\mathrm{I}} \end{bmatrix}.$

A similar expression applies to the general case ${\mathbf{AX}=\mathbf{B}}$ with multiple variables and equations. In fact, if we rewrite ${\mathbf{AX}=\mathbf{B}}$ as:

$\displaystyle \sum_{n} A[m,n] X[n,\ell] = B[m,\ell], \qquad \forall\, \ell,m$

and we express the real and complex part of each term:

$\displaystyle \sum_{n} (A_{\mathrm{R}}[m,n]+i A_{\mathrm{I}}[m,n]) (X_{\mathrm{R}}[n,\ell] + i X_{\mathrm{I}}[n,\ell]) = B_{\mathrm{R}}[m,\ell] + i B_{\mathrm{I}}[m,\ell] , \qquad \forall\, \ell,m$ we then obtain:

$\displaystyle \left\{ \begin{array}{ll} \sum_n A_{\mathrm{R}}[m,n] X_{\mathrm{R}}[n,\ell] - A_{\mathrm{I}}[m,n] X_{\mathrm{I}}[n,\ell] = B_{\mathrm{R}}[m,\ell] \\ \sum_n A_{\mathrm{I}}[m,n] X_{\mathrm{R}}[n,\ell] + A_{\mathrm{R}}[m,n] X_{\mathrm{I}}[n,\ell] = B_{\mathrm{I}}[m,\ell] \\ \end{array} \right. , \quad \forall \,\ell, m$

which, rewritten in matrix form, takes on the shape of a real-valued linear system:

$\displaystyle \begin{bmatrix} \mathbf{A}_{\mathrm{R}} & -\mathbf{A}_{\mathrm{I}} \\ \mathbf{A}_{\mathrm{I}} & \mathbf{A}_{\mathrm{R}} \end{bmatrix} \begin{bmatrix} \mathbf{X}_{\mathrm{R}} \\ \mathbf{X}_{\mathrm{I}} \end{bmatrix} = \begin{bmatrix} \mathbf{B}_{\mathrm{R}} \\ \mathbf{B}_{\mathrm{I}} \end{bmatrix}. \ \ \ \ \ (1)$

2. Expressing the unknown ${\mathbf{X}}$

To find an expression for ${\mathbf{X}}$ we must invert the relation (1). In other words, we want to find those matrices ${ (\mathbf{A})^{-1}_{\mathrm{R}}, (\mathbf{A})^{-1}_{\mathrm{I}}}$ for which:

$\displaystyle \begin{bmatrix} \mathbf{X}_{\mathrm{R}} \\ \mathbf{X}_{\mathrm{I}} \end{bmatrix} = \begin{bmatrix} (\mathbf{A})^{-1}_{\mathrm{R}} & - (\mathbf{A})^{-1}_{\mathrm{I}} \\ (\mathbf{A})^{-1}_{\mathrm{I}} & (\mathbf{A})^{-1}_{\mathrm{R}} \end{bmatrix} \begin{bmatrix} \mathbf{B}_{\mathrm{R}} \\ \mathbf{B}_{\mathrm{I}} \end{bmatrix}. \ \ \ \ \ (2)$

The notation above is not misleading: ${ (\mathbf{A})^{-1}_{\mathrm{R}}, (\mathbf{A})^{-1}_{\mathrm{I}}}$ are indeed the real and imaginary part of ${\mathbf{A}^{1}}$ ! To convince ourselves, it is sufficient to compare (2) with (1).

Next, it is instructive, although a bit tedious, to find the expression of ${ (\mathbf{A})^{-1}_{\mathrm{R}}, (\mathbf{A})^{-1}_{\mathrm{I}}}$ ourselves. First, we express ${\mathbf{X}_{\mathrm{R}}}$ as a function of ${\mathbf{X}_{\mathrm{I}}}$ via the first block of equations in (1):

$\displaystyle \mathbf{X}_{\mathrm{R}} = \mathbf{A}_{\mathrm{R}}^{-1} \mathbf{A}_{\mathrm{I}} \mathbf{X}_{\mathrm{I}} + \mathbf{A}_{\mathrm{R}}^{-1} \mathbf{B}_{\mathrm{R}}. \ \ \ \ \ (3)$

We then replace (3) into the second block of equations in (1) and obtain:

$\displaystyle (\mathbf{A}_{\mathrm{R}} + \mathbf{A}_{\mathrm{I}} \mathbf{A}_{\mathrm{R}}^{-1} \mathbf{A}_{\mathrm{I}}) \mathbf{X}_{\mathrm{I}} = \mathbf{B}_{\mathrm{I}} - \mathbf{A}_{\mathrm{I}} \mathbf{A}_{\mathrm{R}}^{-1} \mathbf{B}_{\mathrm{R}}$

or equivalently, in matrix form:

$\displaystyle \begin{bmatrix} \mathbf{X}_{\mathrm{R}} \\ \mathbf{X}_{\mathrm{I}} \end{bmatrix} = \begin{bmatrix} \dots & \dots \\ - \mathbf{S}^{-1} \mathbf{A}_{\mathrm{I}} \mathbf{A}_{\mathrm{R}}^{-1} & \mathbf{S}^{-1} \end{bmatrix} \begin{bmatrix} \mathbf{B}_{\mathrm{R}} \\ \mathbf{B}_{\mathrm{I}} \end{bmatrix} \ \ \ \ \ (4)$

where ${\mathbf{S} := (\mathbf{A}_{\mathrm{R}} + \mathbf{A}_{\mathrm{I}} \mathbf{A}_{\mathrm{R}}^{-1} \mathbf{A}_{\mathrm{I}})}$ is the so-called Schur complement of block ${\mathbf{A}_{\mathrm{R}}}$ .

A reader may wonder why we did not care about filling the dots. The answer is that we simply do not need to! In fact, by comparing (4) with (2), it is already clear that:

$\displaystyle (\mathbf{A})^{-1}_{\mathrm{R}} = \mathbf{S}^{-1} \ \ \ \ \ (5)$

$\displaystyle (\mathbf{A})^{-1}_{\mathrm{I}} = - \mathbf{S}^{-1} \mathbf{A}_{\mathrm{I}} \mathbf{A}_{\mathrm{R}}^{-1} = - \mathbf{A}_{\mathrm{R}}^{-1} \mathbf{A}_{\mathrm{I}} \mathbf{S}^{-1} \ \ \ \ \ (6)$

We are finally ready to derive the solution ${\mathbf{X}}$ . By equating the real and imaginary part of expression ${\mathbf{X}=\mathbf{A}^{-1}\mathbf{B}}$ we obtain:

$\displaystyle \mathbf{X}_{\mathrm{R}} = (\mathbf{A})^{-1}_{\mathrm{R}} \mathbf{B}_{\mathrm{R}} - (\mathbf{A})^{-1}_{\mathrm{I}} \mathbf{B}_{\mathrm{I}} \ \ \ \ \ (7)$

$\displaystyle \mathbf{X}_{\mathrm{I}} = (\mathbf{A})^{-1}_{\mathrm{I}} \mathbf{B}_{\mathrm{R}} + (\mathbf{A})^{-1}_{\mathrm{R}} \mathbf{B}_{\mathrm{I}} \ \ \ \ \ (8)$

where ${ (\mathbf{A})^{-1}_{\mathrm{R}}}$ and ${ (\mathbf{A})^{-1}_{\mathrm{I}}}$ are defined in (5) and (6), respectively. The formulae above are generally referred to as Frobenius inverse.

3. How to actually compute ${\mathbf{X}}$ ?

The first method that may come to our mind to derive ${\mathbf{X}_{\mathrm{I}},\mathbf{X}_{\mathrm{R}}}$ is to compute explicitly all the terms in (7), involving the (costly!) inversion of matrices ${\mathbf{A}_{\mathrm{R}}}$ and ${\mathbf{S}}$ . Yet, it is common numerical algebra knowledge that matrix inversion is to be avoided whenever possible, and replaced by more efficient and accurate methods originally developed for…solving linear system, once again! In other words, instead of explicitly computing terms like ${\mathbf{F}^{-1}\mathbf{G}}$ , one should rather solve the linear system ${\mathbf{F}\mathbf{Y}=\mathbf{G}}$ for the unknown ${\mathbf{Y}}$ .

But hey… wasn’t the solution to linear systems our problem in the first place? The answer is positive, but with a caveat: We have now succeeded in converting a problem in complex arithmetics, that we have troubles with, to one in real arithmetics, that we can solve by hypothesis.

For instance, a popular method for the solution of ${\mathbf{F}\mathbf{Y}=\mathbf{G}}$ in the unknown real variable ${\mathbf{Y}}$ is the so-called LU decomposition with partial pivoting, that we recall here below.

i) Decompose ${\mathbf{F}}$ as ${\mathbf{F}=\mathbf{P}^T\mathbf{LU}}$ , where ${\mathbf{P}}$ is a permutation matrix, ${\mathbf{L}}$ is lower diagonal and ${\mathbf{U}}$ is upper triangular
ii) Solve ${\mathbf{LZ} = \mathbf{PG}}$ for the unknown ${\mathbf{Z}}$ . This can be achieved for every column ${\mathbf{z}}$ of matrix ${\mathbf{Z}}$ by iteratively forward-substituting the result ${z_i}$ obtained in the ${i}$ -th equation into the ( ${i+1}$ )-th equation (note that ${z_1}$ is already available!)
iii) Solve ${\mathbf{UY}=\mathbf{Z}}$ for the unknown ${\mathbf{Y}}$ by backward-substitution (here, the last element of each column of ${\mathbf{Y}}$ is known from the start).

We conveniently call ${\mathtt{solve}(\mathbf{F},\mathbf{G})}$ a method, e.g., LU, for solving a linear system ${\mathbf{FY}=\mathbf{G}}$ in the unknown ${\mathbf{Y}}$ .

With this in mind, we are now ready to detail the algorithm to efficiently compute the solution ${\mathbf{X}=\mathbf{X}_{\mathrm{R}}+i \mathbf{X}_{\mathrm{I}}}$ in real arithmetics. One should compare the steps below with expressions (7),(8).

Procedure.

1) Compute ${\mathbf{A}_{\mathrm{R}}^{-1}\mathbf{A}_{\mathrm{I}}}$ as ${\mathtt{solve}(\mathbf{A}_{\mathrm{R}},\mathbf{A}_{\mathrm{I}})}$
2) Compute ${\mathbf{S}=\mathbf{A}_{\mathrm{R}} + \mathbf{A}_{\mathrm{I}} (\mathbf{A}_{\mathrm{R}}^{-1} \mathbf{A}_{\mathrm{I}})}$ , involving only matrix addition and multiplication
3) Compute ${\mathbf{S}^{-1}\mathbf{B}_{\mathrm{R}}}$ as ${\mathtt{solve}(\mathbf{S},\mathbf{B}_{\mathrm{R}})}$
4) Compute ${\mathbf{S}^{-1}\mathbf{B}_{\mathrm{I}}}$ as ${\mathtt{solve}(\mathbf{S},\mathbf{B}_{\mathrm{I}})}$
5) Compute ${(\mathbf{A}_{\mathrm{R}}^{-1}\mathbf{A}_{\mathrm{I}}) (\mathbf{S}^{-1}\mathbf{B}_{\mathrm{R}})}$ by multiplying the results produced in 1) and 3)
6) Compute ${(\mathbf{A}_{\mathrm{R}}^{-1}\mathbf{A}_{\mathrm{I}})( \mathbf{S}^{-1}\mathbf{B}_{\mathrm{I}})}$ by multiplying the results produced in 1) and 4)
7) Return $\displaystyle \mathbf{X}_{\mathrm{R}}=\mathbf{S}^{-1}\mathbf{B}_{\mathrm{R}} + \mathbf{A}_{\mathrm{R}}^{-1}\mathbf{A}_{\mathrm{I}} \mathbf{S}^{-1}\mathbf{B}_{\mathrm{I}}$ and $\displaystyle \mathbf{X}_{\mathrm{I}} = \mathbf{S}^{-1}\mathbf{B}_{\mathrm{I}} - \mathbf{A}_{\mathrm{R}}^{-1}\mathbf{A}_{\mathrm{I}}\mathbf{S}^{-1}\mathbf{B}_{\mathrm{R}}$