Inverse of a matrix after a row-and-column removal
Is using the matrix inverse faster than calculating the inverse of the smaller matrix?
This is a generalization and a detailed explanation of a neat little trick I came across in the paper Optimal Brain Compression. The solution uses the same approach as this answer but also extends it to a more general case. A Python implementation is also provided for both the cases.
The Original Problem
You are given an invertible n-by-n matrix \(M\) and its inverse \(M^{-1}\). Now let's say the i-th row and i-th column of the matrix \(M\) are removed. Let's call this matrix \(M_{-i}\). We would like to calculate the inverse of this matrix \(M_{-i}\) efficiently.
import numpy as np
def remove_row(matrix: np.matrix, index: int):
return np.delete(matrix, index, axis=0)
def remove_column(matrix: np.matrix, index: int):
return np.delete(matrix, index, axis=1)
def remove_row_and_column(matrix: np.matrix, row_index: int, col_index: int):
return remove_column(remove_row(matrix, row_index), col_index)
matrix = np.random.randn(5,5)
matrix_inverse = np.linalg.inv(matrix)
index = 3
smaller_matrix = remove_row_and_column(matrix, index, index)
If you do not care about efficiency, the problem is trivial - you simply calculate the inverse of \(M_{-i}\). But because we also know \(M^{-1}\), can we somehow make use of it? In other words, can we somehow manipulate \(M^{-1}\) to calculate the inverse of \(M_{-i}\), and if so, is it more efficient (or faster) compared to the trivial method?
# a solution that is easy to come up with implement but inefficient:
smaller_matrix_inverse = np.linalg.inv(smaller_matrix)
def computer_smaller_matrix_inverse_efficiently(matrix_inverse, index):
# can we only use "matrix_inverse" alongwith the index of the row and column removed
# to calculate the inverse of the smaller matrix?
...
A more general problem
We can extend this idea to a case where the indices of the removed row and column are different - if the i-th row and j-th column of the matrix \(M\) are removed, can we calculate the inverse of this (n-1)x(n-1) matrix by directly manipulating \(M^{-1}\)? We will explore this idea in a bit.
row_index = 3
col_index = 1
smaller_matrix = remove_row_and_column(matrix, row_index, col_index)
def computer_smaller_matrix_inverse_efficiently(matrix_inverse, row_index, col_index):
# can we only use "matrix_inverse" alongwith the indices of the row and column removed
# to calculate the inverse of the smaller matrix?
...
The Solution
Solving a specific problem first
Consider the case in which the last row and last column are removed. In other words, we want to calculate the inverse of \(M_{-n}\). Note that it is a (n-1)x(n-1) matrix. We can write it in block matrix form as:
\( M = \)\(
\begin{bmatrix}
A & b \\ c & d
\end{bmatrix}
\)Note that:
\(A\) is a (n-1)x(n-1) matrix.
\(b\) is a (n-1)x1 matrix (aka a column vector).
\(c\) is a 1x(n-1) matrix (aka a row vector).
\(d\) is a 1x1 matrix (aka a scalar).
Similarly, we can write \(M^{-1}\) in the block matrix form as:
\( M^{-1} = \)\(
\begin{bmatrix}
P & q \\ r & s
\end{bmatrix}
\)Note that:
\(P\) is a (n-1)x(n-1) matrix.
\(q\) is a (n-1)x1 matrix (aka a column vector).
\(r\) is a 1x(n-1) matrix (aka a row vector).
\(s\) is a 1x1 matrix (aka a scalar).
Also note that \(M_{-n}\) is actually equal to \(A\). We would like to find \(A^{-1}\) by manipulating \(M^{-1}\).
Now let's do some elementary algebra:
\( MM^{-1} = M^{-1}M = I_{n} \Rightarrow \)\(
\begin{bmatrix}
A & b \\ c & d
\end{bmatrix}
\)\(
\begin{bmatrix}
P & q \\ r & s
\end{bmatrix}
\)\( = I_n\)\( \Rightarrow AP + br = I_{n-1} \text{ and } Aq + bs = 0 \)\( \Rightarrow b = \frac{-1}{s} Aq \)\( \Rightarrow AP + \frac{-1}{s} (Aq)r = AP - A \frac{1}{s} (qr) = I_{n-1} \)\( \Rightarrow A(P - \frac{1}{s} qr) = I_{n-1} = AA^{-1} \)\( \Rightarrow A^{-1} = P - \frac{1}{s} qr \)Note that:
\( AP \text{, } qr \text{ and } br\) are (n-1)x(n-1) matrices.
\( Aq \text{ and } bs\) are (n-1)x1 matrices (aka column vectors).
Now let's denote that i-th column of the matrix \( M \) as \( M_{:,i} \) and j-th row of the matrix as \( M_{j,:} \). Then notice that \( q \) is actually equal to \( M_{:,n} \) with the last element missing. Similarly \( r \) is actually equal to \( M_{n,:} \) with the last element missing. Then we make the following series of observations:
\( qr \) is actually equal to the matrix \( [(M^{-1})_{:,n} * (M^{-1})_{n,:}] \) with the last column and row missing.
In other words, \( qr = [(M^{-1})_{:,n} * (M^{-1})_{n,:}]_{-n} \).
Similarly \( P \) is the same as \( M^{-1} \) with the last row and column missing: \( P = (M^{-1})_{-n} \)
Finally we can write: \( P - \frac{1}{s} qr = (M^{-1})_{-n} - \frac{1}{s}([(M^{-1})_{:,n} * (M^{-1})_{n,:}])_{-n} = (M^{-1} - \frac{1}{s}[(M^{-1})_{:,n} * (M^{-1})_{n,:}])_{-n} \)
In simple words, the inverse of a matrix \( M \) with the last row and column removed is equal to the inverse of \( M \) minus the product of last column and last row divided by the last diagonal value of \( M^{-1} \).
Is it faster than just calculating the inverse again?
Well of course it is! Note that we already had access to \( M^{-1} \). All we had to calculate was the product of the n-dimensional column vector with a n-dimensional row vector. Calculating this product has a complexity of \( O(n^2) \). Calculating the inverse of a matrix has complexity \( O(n^3) \).
Solving the original problem
We want to calculate the inverse of \( M_{-i} \) for any i. We just solved the problem when \( i = n \). To solve for any i, we will deploy the following strategy:
Swap the i-th and n-th columns. Similarly swap the i-th and n-th rows. Let's call this matrix \( N \). This can be done using a permutation matrix \( P \). Note that \( P \) is obtained by swapping the i-th and n-th columns of \( I_n \). Thus \( P \) is symmetric.
\( N = P^TMP \)\( \text{, } \)\( M_{-i} = N_{-n} \Rightarrow (M_{-i})^{-1} = (N_{-n})^{-1} \)
It is easy to calculate \( N^{-1} \) from \( M^{-1} \) using the identity \( P^{-1} = P^{T} = P \).
\( N^{-1} = PM^{-1}P \Rightarrow M^{-1} = PN^{-1}P \).
Calculate the inverse of \( N_{-n} \) by using \( N^{-1} \) only. We know how to do this by now.
\( (N_{-n})^{-1} = (N^{-1})_{-n} - (\frac{1}{s}[(N^{-1})_{:,n} * (N^{-1})_{n,:}])_{-n} = (M_{-i})^{-1} \) Interpret the manipulation of \( N^{-1} \) in terms of \( M^{-1} \) since we know how they are connected:
\( (N^{-1})_{-n} = (M^{-1})_{-i} \)
\( (N^{-1})_{:,n} = (M^{-1})_{:,i} \)
\( (N^{-1})_{n,:} = (M^{-1})_{i,:} \)
This gives us the answer:
\( (M_{-i})^{-1} = (M^{-1})_{-i} - \frac{1}{M_{ii}}([M^{-1}_{:,i} * M^{-1}_{i,:}])_{-i} = (M^{-1} - \frac{1}{M_{ii}}[M^{-1}_{:,i} * M^{-1}_{i,:}])_{-i} \)
Here \( M_{ii} \) is the i-th value of the diagonal of \( M \).
def calculate_smaller_matrix_inverse(matrix_inverse: np.matrix, index: int):
'''
a faster way to solve
np.linalg.inv( remove_row_and_column(matrix, index, index) )
'''
to_subtract = matrix_inverse[:, index].reshape(-1,1) @ matrix_inverse[index, :].reshape(1,-1)
to_subtract /= matrix_inverse[index, index]
result = matrix_inverse - to_subtract
return remove_row_and_column(result, index, index)
Solution to a more general problem
Now that we know how to calculate \( (M_{-i})^{-1} \), we can use the same approach to calculate the inverse of a matrix whose i-th row and j-th column have been removed. We use the notation \( M_{-ij} \) to denote the matrix \( M \) with i-th row and j-th column removed.
Use a permutation matrix to swap rows i and n. Call this row permutation matrix \( P_r \). Similarly use another permutation matrix to swap columns j and n. Call this column permutation matrix \( P_c \).
\( N = P_rMP_c \)
Note that this gives us \( N_{-n} = M_{-ij} \). Next we establish the relationship between the inverse matrices \( M^{-1}, N_{-1} \).
\( N^{-1} = P_c^TM^{-1}P_r^T \Rightarrow M^{-1} = P_cN^{-1}P_r \)
The inverse of \( M_{-ij} \) is equal to the inverse of \( N_{-n} \). We know how to use \( N^{-1} \) to get to \( (N_{-n})^{-1} \):
\( (N_{-n})^{-1} = (N^{-1})_{-n} - (\frac{1}{N^{-1}_{nn}}[(N^{-1})_{:,n} * (N^{-1})_{n,:}])_{-n} = (M_{-ij})^{-1} \) Interpret the manipulation of \( N^{-1} \) in terms of \( M^{-1} \) since we know how they are connected:
\( (N^{-1})_{-n} = (M^{-1})_{-ji} \)In other words, removal of the last row and column of \( N^{-1} \) is equivalent to removal of j-th row and i-th column of \( M^{-1} \).
\( (N^{-1})_{:,n} = (M^{-1})_{:,i} \)
\( (N^{-1})_{n,:} = (M^{-1})_{j,:} \)
Similarly, the last column of \( N^{-1} \) is the same as the i-th column of \( M^{-1} \) and the last row of \( N^{-1} \) is the same as the j-th row of \( M^{-1} \). Using similar logic, we can see that:
\( N^{-1}_{nn} = M^{-1}_{ji} \)
Thus the final solution is given as:
\( (M_{-ij})^{-1} = (M^{-1})_{-ji} - \frac{1}{M^{-1}_{ji}}([(M^{-1})_{:,i} * (M^{-1})_{j,:}])_{-ji} \)
\( \Rightarrow (M_{-ij})^{-1} = (M^{-1} - \frac{1}{M^{-1}_{ji}}[(M^{-1})_{:,i} * (M^{-1})_{j,:}])_{-ji} \)
def calculate_smaller_matrix_inverse(matrix_inverse: np.matrix, row_index: int, col_index: int):
'''
a faster way to solve:
np.linalg.inv( remove_row_and_column(matrix, row_index, col_index) )
'''
to_subtract = matrix_inverse[:, row_index].reshape(-1,1) @ matrix_inverse[col_index, :].reshape(1,-1)
to_subtract /= matrix_inverse[col_index, row_index]
result = matrix_inverse - to_subtract
return remove_row_and_column(result, col_index, row_index)
Read the e-book 📖
to learn linear algebra from scratch through a series of visual essays.