Kernel PCA
Centered case analysis
-
Let $\mathbf{S}$ be $D\times D$ sample covariance matrix, as discussed at conventional PCA: (assume $\mathbf{X}$ are Centered data set)
-
Recall that the principal components are defined by the eigenvectors $\boldsymbol{\mu}_i$ of the covariance matrix
-
In the $M\times M$ feature space, the sample covariance matrix is: (assume ${\phi(\boldsymbol{x}_n)}$ are also centred)
\[\begin{aligned} \mathbf{C} = \frac 1N \sum_{n=1}^N \phi(\boldsymbol{x}_n)\phi(\boldsymbol{x}_n)^T \end{aligned}\]-
eigenvector expansion:
\[\begin{aligned} \mathbf{C}\boldsymbol{v}_i = \lambda_i \boldsymbol{v}_i \end{aligned}\] -
Then out goal is to solve this eigenvalue problem without having to work explicitly in the feature space.
-
-
Matrix Form:
\[\begin{aligned} \mathbf{K}^2 \boldsymbol{a}_i = \lambda_i N \mathbf{K} \boldsymbol{a}_i \end{aligned}\]- \[\boldsymbol{a}_i = \{a_{i1},...,a_{in}\}\]
-
We can find soulution for $\boldsymbol{a}_i$ by solving the following eigenvalue problem: (assume $\lambda_i\neq 0$)
\[\begin{aligned} \mathbf{K}\boldsymbol{a}_i = \lambda_i N \boldsymbol{a}_i \end{aligned}\] -
Then
\[\begin{aligned} y_i(\boldsymbol{x}) &= \phi(\boldsymbol{x})^T\boldsymbol{v}_i \\&= \sum+{n=1}^N a_{in} \phi(\boldsymbol{x})^T\phi(\boldsymbol{x}) \\&= \sum_{n=1}^N k(\boldsymbol{x}, \boldsymbol{x}_n) \\&= \mathbf{k}^T(\boldsymbol{x})\boldsymbol{a}_i \end{aligned}\]
We can see in the fact that Kernel PCA involves the eigenvector expansion of the $N\times N$ matrix $\mathbf{K}$
using SVD to compute
\[\begin{aligned} & \mathbf{K} = \mathbf{\Phi}\mathbf{\Phi}^T \\ & \mathbf{K} = \mathbf{U}\mathbf{\Sigma}^2\mathbf{U}^T \end{aligned}\]Then the Components is : $\mathbf{U}$
Uncentered case analysis
根据前面的分析,已经知道,Kernel PCA 其实是先将 $D$-dim 数据根据 $\phi(\boldsymbol{x})$函数映射到$M\times M$ 维的特征空间,然后再在特征空间进行PCA,而且假设特征空间的数据已经中心化
但通常,我们并不希望直接在特征空间直接对数据中心化,因此现在假设 $\widetilde{\phi}(\boldsymbol{x})$ 是未中心化的特征数据
\[\begin{aligned} \phi(\boldsymbol{x}) &= \widetilde{\phi}(\boldsymbol{x}) - \frac 1N \sum_{l=1}^N \widetilde{\phi}(\boldsymbol{x}_l) \end{aligned}\]Then the Gram matrix element is:
\[\begin{aligned} K_{nm} =&\space \phi(\boldsymbol{x}_n)^T\phi(\boldsymbol{x}_m) \\ &= \widetilde{\phi}(\boldsymbol{x}_n)\widetilde{\phi}(\boldsymbol{x}_m) - \frac 1N \sum_{l=1}^N \widetilde{\phi}(\boldsymbol{x}_l)^T\widetilde{\phi}(\boldsymbol{x}_n) - \frac 1N \sum_{l=1}^N \widetilde{\phi}(\boldsymbol{x}_l)^T\widetilde{\phi}(\boldsymbol{x}_m) \\ &\space + \frac {1}{N^2} \sum_{j=1}^N\sum_{l=1} \widetilde{\phi}(\boldsymbol{x}_l)^T\widetilde{\phi}(\boldsymbol{x}_j) \end{aligned}\]Then the matrix form is given by:
\[\begin{aligned} \mathbf{K} = \widetilde{\mathbf{K}} - \mathbf{1}_N\widetilde{\mathbf{K}} - \widetilde{\mathbf{K}}\mathbf{1}_N + \mathbf{1}_N \widetilde{\mathbf{K}} \mathbf{1}_N \end{aligned}\]这样就又回到前面的问题中,对$\mathbf{K}$ 进行特征向量特征值求解,从而得到 Kernel Principal Component
Attention: 在Kernel PCA中,maximum number of Principal Component is $N$,but not $D$.