ML for Science and Engineering
More features should mean more information, right? Not always.
The Iris dataset has 4 features per flower — measurements of two structures:
| # | Feature | Unit |
|---|---|---|
| 1 | Sepal length | cm |
| 2 | Sepal width | cm |
| 3 | Petal length | cm |
| 4 | Petal width | cm |
Sepal = outer leaf-like part | Petal = inner colorful part
Pick two features: Sepal Length vs Petal Length.
| Sample | Sepal L. | Petal L. | Species |
|---|
Showing first 6 of 150 samples
The data is spread out more in some directions than others.
Before finding principal components, we preprocess each feature:
Data can be projected onto the axis of highest variation: $\mathbf{u}_1$.
The error is the distance from each original point to its projection.
Drag the line to rotate it. Find the direction that maximizes variance of projections.
Given centered data $\{\mathbf{x}^{(1)}, \dots, \mathbf{x}^{(n)}\}$ in $\mathbb{R}^d$, find the unit vector $\mathbf{u}$ along which projections have maximum variance.
The scalar projection is $z^{(i)} = \mathbf{x}^{(i)\top}\mathbf{u}$. We want to find $\mathbf{u}$ that maximizes the variance of $z$:
Rewrite the sum in matrix form:
Constrained optimization: maximize $\mathbf{u}^\top\mathbf{C}\mathbf{u}$ subject to $\mathbf{u}^\top\mathbf{u} = 1$.
Take the derivative and set to zero:
Iris dataset: 4 eigenvalues
Given $\mathbf{x} \in \mathbb{R}^d$, project to $\mathbb{R}^k$ using the top $k$ eigenvectors:
Variance retained by keeping $k$ of $d$ components:
Any matrix $\mathbf{X} \in \mathbb{R}^{m \times n}$ can be factored as:
Start from the SVD of the centered data matrix $\mathbf{X} = \mathbf{U}\boldsymbol{\Sigma}\mathbf{V}^\top$:
Compare with eigendecomposition of covariance $\mathbf{C} = \frac{1}{n}\mathbf{X}^\top\mathbf{X}$:
The SVD comes in three flavors, each progressively more compact. Assume $m \geq n$ (more rows than columns).
Outer-product form:
Keep only the first $r$ singular values (set the rest to zero):
# 1. Center the data
X_centered = X - X.mean(axis=0)
# 2. Compute the SVD
U, S, Vt = np.linalg.svd(X_centered, full_matrices=False)
# 3. Project onto top k components
X_pca = X_centered @ Vt[:k].T
np.linalg.svd does everything.
The Iris dataset: 150 flowers, 4 measurements each (sepal length, sepal width, petal length, petal width).
An image is a matrix. Apply SVD and keep only the first $r$ singular values.
Brunton & Kutz, Data-Driven Science and Engineering, Cambridge UP, 2019
2,410 face images, each $192 \times 168 = $ 32,256 pixels. Each face is a point in 32,256-dimensional space.
Goal: represent any face as a combination of a few basis faces (eigenfaces). PCA finds these basis directions.
Average Face
$\bar{\mathbf{x}} = \frac{1}{n}\sum \mathbf{x}^{(i)}$
First 8 Eigenfaces (Principal Components)








Each eigenface captures a different mode of variation (lighting, pose, expression).
Brunton & Kutz, Data-Driven Science and Engineering, Cambridge UP, 2019
Reconstruct a face using $k$ eigenfaces: $\hat{\mathbf{x}} = \bar{\mathbf{x}} + \sum_{i=1}^k c_i\,\mathbf{u}_i$
Original
Reconstruction
Brunton & Kutz, Data-Driven Science and Engineering, Cambridge UP, 2019
The scree plot shows singular values (or eigenvalues) in decreasing order. Look for the "elbow".
What if the data is a time series?
What if we want to decompose dynamics, not just static data?
Next lecture: Linear Dynamical Systems & DMD