This question came up when trying to compare kernels as well as how low dimensional projections change over time. Intuitively it isn’t clear how to create a distance metric between subspaces. For example, lets say we are in $\mathbb{R}^3$, and we have to subspaces $A,B \in \mathbb{R}^2$. To this end, I will try to explain the Grassman distance.
Then the idea of principle angles is to iteratively find vectors in $A, B$ that have the closest angle. More formally, at iteration $0$, we have the following optimization problem:
$$ \arg \max_{a_0 \in A,\; b_0 \in B} a_0^\top b_0 \\ s.t. \,\,\Vert a\Vert = \Vert b\Vert = 1 $$
We keep repeating this procedure for following iterations with the additional constraint that if we are iteration $i$, then $a_i^Ta_j = 0$ and $b_i^Tb_j = 0, \forall j<i$. Therefore, we are trying to progressively find the most similar orthonormal vectors in $A$ and $B$.
Let $Q_A, Q_B$ be orthonormal basis for $A,B$ respectively. Then from our previous iterative algorithm, let $a_0 = Q_Ax_0, b_0 = Q_By_0$, so we want to maximize $(Q_Ax_0)^T(Q_By_0) = x_0^TQ_A^TQ_By_0$. Let’s compute the SVD of $Q_A^TQ_B = U\Sigma V^T$. Substituting yields, $x_0^TU\Sigma V^Ty_0$. Therefore, if we align $x_0$ with the top singular vector in $U$ and $y_0$ with the top singular vector in $V$, then the resulting value will be the largest singular value $\sigma$. In fact the singular values are the ordered solutions to the aforementioned iterative optimization procedure (think about why that is the case). The Grassman metric is now defined as:
$$ G(A,B) = (\sum_{i=1}^k [\cos^{-1}(\sigma_i)]^2)^{\frac{1}{2}} $$
I am not going to prove why the triangle inequality holds for this to be a metric, but you can look at some of the resources I linked below. I might revisit it and add it to the blog later.
import torch
import math
def grassman(A, B):
# get orthogonal basis
Q_A, R = torch.linalg.qr(A)
Q_B, R = torch.linalg.qr(B)
# get singular values
_, S, _ = torch.svd(Q_A.T @ Q_B)
# cosine inverse
thetas = torch.arccos(S)
# return norm
return torch.norm(thetas, p=2)
A = torch.tensor([
[1, 0],
[0, 1],
[0, 0]
]).float()
B = torch.tensor([
[math.sqrt(0.5), 0],
[0, 1],
[math.sqrt(0.5), 0]
]).float()
C = torch.tensor([
[0, 0],
[1, 0],
[0, 1]
]).float()
>>> grassman(A,B)
tensor(0.7854)
>>> grassman(A,C)
tensor(1.5708)
>>> grassman(C,B)
tensor(0.7854)
https://en.wikipedia.org/wiki/Grassmannian
https://kristianeschenburg.netlify.app/post/comparing-subspaces/
https://web.ma.utexas.edu/users/vandyke/notes/deep_learning_presentation/presentation.pdf