ML for Science and Engineering - Lecture 8
"I'm an engineer. Someone gives me data. Why can't I just use machine learning and forget about ODEs?"
You might be right. Let's start over. You have:
Before any model, any algorithm: visualize. Then look again.
Every model encodes assumptions about the data. The central thread of this lecture:
The choice of feature vector $\phi$ is the inductive bias.
| $\phi = [\sin(\omega_k t), \cos(\omega_k t)]$ | Fourier decomposition |
| $\phi = [u_i, u_{i-1}, \dots, u_{i-p}]$ | Autoregressive model |
| $\phi = [1, x, x^2, x^3, \dots]$ | Polynomial regression |
| $\phi = [1, u, u^2, \sin(u), \dots]$ | SINDy (coming soon) |
Before predicting the future, understand the present.
Your voice is a time series. If someone sings into a microphone, what do you do with the data?
Decompose into interpretable components:
Not predicting future values. Just a representation.
Choose $\phi_k$ to be sines and cosines:
This is just $\boldsymbol{\theta}^\top \phi(t)$ with:
Design matrix $\Phi$ of size $N \times 2K$, solve least squares:
- - - target ━ your approximation | Drag the frequency bars to match the target signal
What if the frequency content changes over time? Spectrogram of a chirp signal:
Slide a window along the signal; at each position, compute the Fourier transform:
Step by step:
The window $w$:
A localized bump (e.g. Gaussian, Hann) that selects a short segment of the signal near time $t$.
Wide window → good frequency resolution, poor time resolution
Narrow window → good time resolution, poor frequency resolution
Sines and cosines are one choice. What others exist?
| Basis | $\phi_k(t)$ | Use case |
| Fourier | $\sin(k\omega t),\;\cos(k\omega t)$ | Periodic signals |
| Polynomials | $1,\; t,\; t^2,\; t^3,\; \dots$ | Smooth trends |
| Wavelets | localized in time and frequency | Transients, edges |
| Legendre | $P_k(t)$ | Orthogonal on $[-1,1]$ |
Where do sines and cosines come from? The wave equation:
Separation of variables: assume $u(x,t) = X(x)\,T(t)$
Boundary conditions $u(0,t)=u(L,t)=0$ force $\lambda_n = (n\pi/L)^2$:
$u(x,t) = \sum a_n \sin(n\pi x/L)\cos(\omega_n t)$ ━ superposition thin = individual modes
$\alpha = 0$: standard wave equation, modes decouple, superposition holds
$\alpha > 0$: nonlinear coupling transfers energy between modes $\omega_n$
Initial condition: all energy in mode $\omega_1$. What happens?
Expected: energy spreads to all modes (thermalization, equipartition)
Observed: energy returns to $\omega_1$ — near-perfect recurrence
Fourier tells us what's in the signal. But can we predict the next value?
The feature vector is now:
The autoregressive model of order $p$:
In matrix form: $\;\mathbf{u} = \Phi \mathbf{w}$
Given $u_i = \sin(\omega \cdot i)$ for $i = 0, \dots, 199$ with $\omega = 0.3$:
import numpy as np
N, omega = 200, 0.3
u = np.sin(omega * np.arange(N))
# Design matrix
Phi = np.column_stack([
u[1:-1], # u_{i-1}
u[:-2], # u_{i-2}
np.ones(N-2) # bias
])
y = u[2:] # target
# Least squares
w = np.linalg.lstsq(Phi, y, rcond=None)[0]
print(f"w1={w[0]:.4f}, w2={w[1]:.4f}")
Trigonometric identity:
Therefore:
■ true - - - AR(2) prediction
ARIMA = AR + Integration + Moving Average
What if the relationship is nonlinear?
Replace $\phi = [u_i, u_{i-1}]$ with polynomial features:
Still $\boldsymbol{\theta}^\top \phi$. Still linear regression! (Linear in $\theta$, nonlinear in $u$.)
Start with a next-step model:
Rewrite as:
Recognize: $\;u_{i+1} - u_i \approx \dot{u} \cdot \Delta t$, so:
AR(p) with $p > 1$: $u_{i+1}$ depends on $u_i, u_{i-1}, \dots$ (non-local in time)
ODE: $\dot{u} = g(u)$ depends only on current state (local)
But higher-order effects can be captured:
Population modeling: how does a population grow with limited resources?
Continuous (ODE):
Discrete (map):
Robert May (1976) showed this simple equation produces remarkably complex behavior:
Suppose you observe $x_0, x_1, \dots, x_N$ but don't know the governing equation.
Since $rx(1-x) = rx - rx^2$, try features $\phi(x) = [1, x, x^2]$:
You will:
Harder question: what if you don't know the right features?
Use a richer library: $\phi = [1, x, x^2, x^3, \sin(x), \dots]$
Most entries of $\theta$ should be zero. This is the idea behind sparse regression.
So far: learned discrete maps $u_{i+1} = f(u_i)$.
Alternative: learn the continuous ODE directly.
How to get $\dot{u}$ from data? Numerical differentiation:
What is $f$? We don't know. But we can guess candidate functions:
Build the design matrix and solve:
Most candidates are wrong. We want $\boldsymbol{\theta}$ to be sparse. This is the SINDy framework.
Next lectures:
| Method | Feature vector $\phi$ | Goal |
| Fourier | $[\sin(f_k t), \cos(f_k t)]$ | Representation |
| AR(p) | $[u_i, u_{i-1}, \dots]$ | Prediction |
| Logistic Map | $[1, x, x^2]$ | Discover map |
| SINDy | $[1, u, u^2, \sin(u), \dots]$ | Discover ODE |