crabbymetrics
  • Home
  • API
  • Binding Crash Course
  • Supervised Learning
    • OLS
    • Ridge
    • Fixed Effects OLS
    • ElasticNet
    • Synthetic Control
    • Logit
    • Multinomial Logit
    • Poisson
    • TwoSLS
    • GMM
    • FTRL
    • MEstimator Poisson
  • Semiparametrics
    • Balancing Weights
    • EPLM
    • Average Derivative
    • Double ML And AIPW
    • Richer Regression
  • Unsupervised Learning
    • PCA And Kernel Basis
  • Ablations
    • Variance Estimators
    • Semiparametric Estimator Comparisons
    • Bridging Finite And Superpopulation
  • Optimization
    • Optimizers
    • GMM With Optimizers
  • Ding: First Course
    • Overview And TOC
    • Ch 1 Correlation And Simpson
    • Ch 2 Potential Outcomes
    • Ch 3 CRE And Fisher RT
    • Ch 4 CRE And Neyman
    • Ch 9 Bridging Finite And Superpopulation
    • Ch 11 Propensity Score
    • Ch 12 Double Robust ATE
    • Ch 13 Double Robust ATT
    • Ch 21 Experimental IV
    • Ch 23 Econometric IV

PCA And Kernel Basis Example

This page shows the two transformer classes in their natural habitats: PCA for linear low-rank structure, and KernelBasis for nonlinear geometry. The second half uses a swiss-roll manifold, where a Gaussian kernel basis followed by a linear projection does a much better job of preserving the latent ordering than ordinary PCA on the raw coordinates.

1 Imports

import matplotlib.pyplot as plt
import numpy as np

from crabbymetrics import KernelBasis, PCA

np.set_printoptions(precision=4, suppress=True)

2 PCA On A Linear Factor Model

rng = np.random.default_rng(2029)

n = 350
factors = rng.normal(size=(n, 2))
loadings = np.array(
    [
        [1.4, -0.2, 0.9, 0.0, 0.7, -0.1],
        [0.1, 1.2, -0.5, 1.1, 0.0, 0.8],
    ]
)
x_linear = factors @ loadings + 0.04 * rng.normal(size=(n, 6))

pca = PCA(2)
scores = pca.fit_transform(x_linear)
reconstructed = pca.inverse_transform(scores)
summary = pca.summary()

print("explained variance ratio:", np.asarray(summary["explained_variance_ratio"]))
print("reconstruction MSE:", round(float(np.mean((x_linear - reconstructed) ** 2)), 6))
explained variance ratio: [0.5787 0.4213]
reconstruction MSE: 0.001031
fig, axes = plt.subplots(1, 2, figsize=(12, 4.8), constrained_layout=True)

axes[0].scatter(
    x_linear[:, 0],
    x_linear[:, 1],
    c=factors[:, 0],
    cmap="viridis",
    s=20,
    alpha=0.8,
)
axes[0].set_title("Observed coordinates")
axes[0].set_xlabel("x[:, 0]")
axes[0].set_ylabel("x[:, 1]")

axes[1].scatter(
    scores[:, 0],
    scores[:, 1],
    c=factors[:, 0],
    cmap="viridis",
    s=20,
    alpha=0.8,
)
axes[1].set_title("Two-component PCA scores")
axes[1].set_xlabel("PC1")
axes[1].set_ylabel("PC2")

plt.show()

The six observed regressors were generated from two latent factors plus small noise, so linear PCA is exactly the right preprocessing move here: a 2D score matrix retains almost all of the signal and reconstructs the original design with tiny error.

3 Kernel Features On A Swiss Roll

rng = np.random.default_rng(2030)

n_roll = 220
t = rng.uniform(1.5 * np.pi, 4.5 * np.pi, size=n_roll)
height = rng.uniform(-8.0, 8.0, size=n_roll)
x_roll = np.column_stack([t * np.cos(t), height, t * np.sin(t)])

linear_roll = PCA(2).fit_transform(x_roll)

kernel = KernelBasis("gaussian", bandwidth=10.0)
kernel_features = kernel.fit_transform(x_roll)
kernel_features_centered = kernel_features - kernel_features.mean(axis=0, keepdims=True)
nonlinear_roll = PCA(2).fit_transform(kernel_features_centered)

linear_corr = max(abs(np.corrcoef(linear_roll[:, i], t)[0, 1]) for i in range(2))
kernel_corr = max(abs(np.corrcoef(nonlinear_roll[:, i], t)[0, 1]) for i in range(2))

print("best raw-PCA correlation with latent roll parameter:", round(float(linear_corr), 4))
print("best kernel-feature projection correlation:", round(float(kernel_corr), 4))
best raw-PCA correlation with latent roll parameter: 0.2031
best kernel-feature projection correlation: 0.2845
fig = plt.figure(figsize=(14, 4.8), constrained_layout=True)
ax0 = fig.add_subplot(1, 3, 1, projection="3d")
ax1 = fig.add_subplot(1, 3, 2)
ax2 = fig.add_subplot(1, 3, 3)

scatter0 = ax0.scatter(
    x_roll[:, 0],
    x_roll[:, 1],
    x_roll[:, 2],
    c=t,
    cmap="viridis",
    s=18,
    alpha=0.85,
)
ax0.set_title("Swiss roll in raw coordinates")
ax0.set_xlabel("x")
ax0.set_ylabel("height")
ax0.set_zlabel("z")

ax1.scatter(linear_roll[:, 0], linear_roll[:, 1], c=t, cmap="viridis", s=18, alpha=0.85)
ax1.set_title("Ordinary PCA on raw coordinates")
ax1.set_xlabel("PC1")
ax1.set_ylabel("PC2")

ax2.scatter(
    nonlinear_roll[:, 0],
    nonlinear_roll[:, 1],
    c=t,
    cmap="viridis",
    s=18,
    alpha=0.85,
)
ax2.set_title("KernelBasis + PCA")
ax2.set_xlabel("component 1")
ax2.set_ylabel("component 2")

fig.colorbar(scatter0, ax=[ax0, ax1, ax2], shrink=0.9, label="latent roll parameter")
plt.show()

KernelBasis is not an estimator by itself. It gives a richer feature map, here based on local Gaussian similarity across swiss-roll points. Once the data live in that nonlinear basis, a plain linear projection separates the latent roll parameter much more cleanly than PCA on the original 3D coordinates.