Time Series Analysis

Starting from Data

ML for Science and Engineering - Lecture 8

What If We Throw Away Differential Equations?

"I'm an engineer. Someone gives me data. Why can't I just use machine learning and forget about ODEs?"

You might be right. Let's start over. You have:

Data: $(u_1, t_1), \; (u_2, t_2), \; \dots, \; (u_n, t_n)$

The question: Can we learn models directly from data, without knowing the physics?

First Rule: LOOK at the Data

Before any model, any algorithm: visualize. Then look again.

Periodic

Trend

Noisy

Chaotic

            What you look for: periodicity, trends, noise level, outliers, non-stationarity, correlations.
        

Inductive Bias

Every model encodes assumptions about the data. The central thread of this lecture:

$f(\mathbf{x}) = \boldsymbol{\theta}^\top \phi(\mathbf{x})$

The choice of feature vector $\phi$ is the inductive bias.

$\phi = [\sin(\omega_k t), \cos(\omega_k t)]$	Fourier decomposition
$\phi = [u_i, u_{i-1}, \dots, u_{i-p}]$	Autoregressive model
$\phi = [1, x, x^2, x^3, \dots]$	Polynomial regression
$\phi = [1, u, u^2, \sin(u), \dots]$	SINDy (coming soon)

Representation, not Prediction

Before predicting the future, understand the present.

Your voice is a time series. If someone sings into a microphone, what do you do with the data?

Spectrogram: decompose the signal into frequency components at each point in time.
                

Decompose into interpretable components:

$u(t) \approx \sum_{k=1}^{K} a_k \phi_k(t)$

Not predicting future values. Just a representation.

Fourier = Linear Model

Choose $\phi_k$ to be sines and cosines:

$u(t) \approx \sum_{k=1}^{K} \left[ w_k \sin(2\pi f_k t) + v_k \cos(2\pi f_k t) \right]$

This is just $\boldsymbol{\theta}^\top \phi(t)$ with:

$\phi(t) = [\sin(2\pi f_1 t),\; \cos(2\pi f_1 t),\; \dots,\; \sin(2\pi f_K t),\; \cos(2\pi f_K t)]$

Design matrix $\Phi$ of size $N \times 2K$, solve least squares:

$\hat{\boldsymbol{\theta}} = \arg\min_{\boldsymbol{\theta}} \| \mathbf{u} - \Phi \boldsymbol{\theta} \|^2$

Interactive: Fourier Superposition

Target:

- - - target ━ your approximation | Drag the frequency bars to match the target signal

Short-Time Fourier Transform

What if the frequency content changes over time? Spectrogram of a chirp signal:

$\displaystyle S(t, f) = \left| \int_{-\infty}^{\infty} u(\tau)\, w(\tau - t)\, e^{-2\pi i f \tau}\, d\tau \right|^2$

$u(\tau)$ — signal $w(\tau - t)$ — window at time $t$ $e^{-2\pi i f\tau}$ — Fourier kernel at freq $f$ $|\cdot|^2$ — power

How the STFT Works

Slide a window along the signal; at each position, compute the Fourier transform:

$\displaystyle S(t, f) = \left| \int_{-\infty}^{\infty} u(\tau)\, w(\tau - t)\, e^{-2\pi i f \tau}\, d\tau \right|^2$

Step by step:

Pick a time $t$. Center the window $w(\tau - t)$ there
Multiply: $u(\tau) \cdot w(\tau - t)$ isolates a local segment
The kernel $e^{-2\pi i f\tau}$ probes for frequency $f$ in that segment
$|\cdot|^2$ gives the power → one pixel of the spectrogram
Repeat for all $t$ and $f$ → full 2D map $S(t,f)$

The window $w$:

A localized bump (e.g. Gaussian, Hann) that selects a short segment of the signal near time $t$.

Wide window → good frequency resolution, poor time resolution
Narrow window → good time resolution, poor frequency resolution

This is the uncertainty principle: you cannot have perfect resolution in both time and frequency simultaneously.

General Modal Decomposition

Sines and cosines are one choice. What others exist?

Basis	$\phi_k(t)$	Use case
Fourier	$\sin(k\omega t),\;\cos(k\omega t)$	Periodic signals
Polynomials	$1,\; t,\; t^2,\; t^3,\; \dots$	Smooth trends
Wavelets	localized in time and frequency	Transients, edges
Legendre	$P_k(t)$	Orthogonal on $[-1,1]$

Key property: orthogonality $\langle \phi_i, \phi_j \rangle = \delta_{ij}$ ensures the basis spans the space efficiently. The choice of basis IS your inductive bias.

Physics of Harmonics

Where do sines and cosines come from? The wave equation:

$\displaystyle\frac{\partial^2 u}{\partial t^2} = c^2 \frac{\partial^2 u}{\partial x^2}$

Separation of variables: assume $u(x,t) = X(x)\,T(t)$

$\displaystyle\frac{X''}{X} = \frac{1}{c^2}\frac{T''}{T} = -\lambda \quad\Rightarrow\quad X'' + \lambda X = 0,\;\; T'' + c^2\lambda T = 0$

Boundary conditions $u(0,t)=u(L,t)=0$ force $\lambda_n = (n\pi/L)^2$:

$X_n(x) = \sin\!\left(\tfrac{n\pi x}{L}\right),\quad T_n(t) = \cos(\omega_n t),\quad \omega_n = \tfrac{n\pi c}{L}$

General solution by superposition: $\displaystyle u(x,t) = \sum_{n=1}^{\infty} a_n \sin\!\left(\tfrac{n\pi x}{L}\right) \cos(\omega_n t)$
Linear PDE → each mode satisfies the equation → their sum does too.

Standing Waves on a String

a₁: 1.00

a₂: 0.00

a₃: 0.00

a₄: 0.00

$u(x,t) = \sum a_n \sin(n\pi x/L)\cos(\omega_n t)$ ━ superposition thin = individual modes

The Fermi-Pasta-Ulam-Tsingou Problem

1953, Los Alamos. Fermi, Pasta, Ulam, and Tsingou simulated a chain
            of masses connected by nonlinear springs on the MANIAC I computer. They expected
            energy to spread across all modes (thermalization). Instead, the energy returned
            almost perfectly to the initial mode.
        

$\ddot{x}_i = \underbrace{(x_{i+1} - 2x_i + x_{i-1})}_{\text{linear}} + \;\alpha\!\underbrace{\left[(x_{i+1}\! - x_i)^2 - (x_i\! - x_{i-1})^2\right]}_{\text{nonlinear}}$

$\alpha = 0$: standard wave equation, modes decouple, superposition holds

$\alpha > 0$: nonlinear coupling transfers energy between modes $\omega_n$

Initial condition: all energy in mode $\omega_1$. What happens?

Expected: energy spreads to all modes (thermalization, equipartition)

Observed: energy returns to $\omega_1$ — near-perfect recurrence

FPUT: Energy Recurrence

t = 0 N = 32 masses, α = 0.25, initial mode: $\omega_1$

One of the earliest examples of computational scientific discovery: a numerical experiment revealed behavior that no one predicted, launching the study of nonlinear dynamics and solitons.

From Representation to Prediction

Fourier tells us what's in the signal. But can we predict the next value?

$u_{i+1} = f(u_i, u_{i-1}, \dots, u_{i-p+1})$

The feature vector is now:

$\phi = [u_i, \; u_{i-1}, \; \dots, \; u_{i-p+1}, \; 1]$

Still $\boldsymbol{\theta}^\top \phi(\mathbf{u})$. Still linear regression!

AR(p) Model

The autoregressive model of order $p$:

$u_i = w_1 u_{i-1} + w_2 u_{i-2} + \dots + w_p u_{i-p} + w_0$

In matrix form: $\;\mathbf{u} = \Phi \mathbf{w}$

$\begin{bmatrix} u_p \\ u_{p+1} \\ \vdots \\ u_n \end{bmatrix} = \begin{bmatrix} u_{p-1} & u_{p-2} & \cdots & u_0 & 1 \\ u_p & u_{p-1} & \cdots & u_1 & 1 \\ \vdots & & \ddots & & \vdots \\ u_{n-1} & u_{n-2} & \cdots & u_{n-p} & 1 \end{bmatrix} \begin{bmatrix} w_1 \\ w_2 \\ \vdots \\ w_p \\ w_0 \end{bmatrix}$

In-Class Exercise: AR(2) on a Sine Wave

Given $u_i = \sin(\omega \cdot i)$ for $i = 0, \dots, 199$ with $\omega = 0.3$:

Build design matrix: columns $[u_{i-1},\; u_{i-2},\; 1]$
Target vector: $u_i$ for $i = 2, \dots, N-1$
Solve via least squares

import numpy as np

N, omega = 200, 0.3
u = np.sin(omega * np.arange(N))

# Design matrix
Phi = np.column_stack([
    u[1:-1],     # u_{i-1}
    u[:-2],      # u_{i-2}
    np.ones(N-2) # bias
])
y = u[2:]        # target

# Least squares
w = np.linalg.lstsq(Phi, y, rcond=None)[0]
print(f"w1={w[0]:.4f}, w2={w[1]:.4f}")

AR(2): Analytical Result

Trigonometric identity:

$\sin(\omega i) = 2\cos(\omega)\sin(\omega(i-1)) - \sin(\omega(i-2))$

Therefore:

$w_1 = 2\cos(\omega), \quad w_2 = -1, \quad w_0 = 0$

Recover frequency: $\;\omega = \arccos(w_1 / 2)$. AR learns the frequency without us telling it!

Interactive: AR(2) Prediction

$\omega$: 0.30

■ true - - - AR(2) prediction

ARIMA and Extensions

ARIMA = AR + Integration + Moving Average

Differencing (the I in ARIMA):

                    $\Delta u_i = u_i - u_{i-1}$

                    Removes trends from the data.

Moving Average (MA):

                    $u_i = \dots + \theta_1 \varepsilon_{i-1} + \theta_2 \varepsilon_{i-2}$

                    Models dependence on past errors.

ARIMA is a workhorse of classical time series analysis: econometrics, demand forecasting, resource planning.

Nonlinear AR Extensions

What if the relationship is nonlinear?

Replace $\phi = [u_i, u_{i-1}]$ with polynomial features:

$\phi = [u_i, \; u_{i-1}, \; u_i^2, \; u_i u_{i-1}, \; u_{i-1}^2, \; \dots]$

Still $\boldsymbol{\theta}^\top \phi$. Still linear regression! (Linear in $\theta$, nonlinear in $u$.)

More features = more parameters = need more data (curse of dimensionality).

This idea of a "library of candidate functions" will return soon...

From Discrete Maps to ODEs

Start with a next-step model:

$u_{i+1} = f(u_i)$

Rewrite as:

$u_{i+1} = u_i + \underbrace{[f(u_i) - u_i]}_{g(u_i) \cdot \Delta t}$

Recognize: $\;u_{i+1} - u_i \approx \dot{u} \cdot \Delta t$, so:

$u_{i+1} = u_i + g(u_i)\,\Delta t$

This is the forward Euler method. Discrete AR models are discrete-time dynamical systems. ODEs are their continuous cousins.

$\Delta t$ Matters

Uniform spacing

Standard AR works fine

Non-uniform spacing

AR coefficients change with $\Delta t$!

            The ODE $\dot{u} = g(u)$ is independent of sampling rate. This is one reason to prefer differential equation models.
        

Multi-Step vs Local

AR(p) with $p > 1$: $u_{i+1}$ depends on $u_i, u_{i-1}, \dots$ (non-local in time)

ODE: $\dot{u} = g(u)$ depends only on current state (local)

But higher-order effects can be captured:

AR(2) for sine: $\;u_{i+1} = 2\cos(\omega)\,u_i - u_{i-1}$ $\quad\longleftrightarrow\quad$ ODE: $\;\ddot{u} + \omega^2 u = 0$

Autoregressive models and differential equations are two views of the same underlying dynamics.

✎ Your Turn

The Logistic Map

Population modeling: how does a population grow with limited resources?

Continuous (ODE):

$\dot{x} = rx(1 - x)$

Discrete (map):

$x_{n+1} = r \cdot x_n(1 - x_n)$

Robert May (1976) showed this simple equation produces remarkably complex behavior:

$r < 3$: stable fixed point
$r \approx 3.5$: period doubling
$r > 3.57$: chaos

Interactive: Logistic Map

$r$: 2.50

$x_0$: 0.20

Given Data, Find the Map

Suppose you observe $x_0, x_1, \dots, x_N$ but don't know the governing equation.

Since $rx(1-x) = rx - rx^2$, try features $\phi(x) = [1, x, x^2]$:

$\begin{bmatrix} 1 & x_0 & x_0^2 \\ 1 & x_1 & x_1^2 \\ \vdots & \vdots & \vdots \\ 1 & x_{N-1} & x_{N-1}^2 \end{bmatrix} \begin{bmatrix} \theta_0 \\ \theta_1 \\ \theta_2 \end{bmatrix} = \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_N \end{bmatrix}$

Should recover $\theta_0 = 0$, $\theta_1 = r$, $\theta_2 = -r$. We discover the governing equation from data alone!

Homework Preview

You will:

Generate logistic map data with noise
Build the design matrix and recover $r$
Study how noise level affects parameter recovery

Harder question: what if you don't know the right features?

Use a richer library: $\phi = [1, x, x^2, x^3, \sin(x), \dots]$

Most entries of $\theta$ should be zero. This is the idea behind sparse regression.

Learning $\dot{u} = f(u)$

So far: learned discrete maps $u_{i+1} = f(u_i)$.

Alternative: learn the continuous ODE directly.

$\dot{u} = f(u)$

How to get $\dot{u}$ from data? Numerical differentiation:

Forward: $\;\dot{u}(t_i) \approx \dfrac{u_{i+1} - u_i}{\Delta t}$

Centered: $\;\dot{u}(t_i) \approx \dfrac{u_{i+1} - u_{i-1}}{2\Delta t}$

Numerical differentiation amplifies noise. Smoothing or regularization is often needed.

The Library Idea

What is $f$? We don't know. But we can guess candidate functions:

Library: $\;\phi(u) = [1, \; u, \; u^2, \; u^3, \; \sin(u), \; \cos(u), \; \dots]$

Build the design matrix and solve:

$\dot{\mathbf{u}} = \Phi(\mathbf{u}) \, \boldsymbol{\theta}$

Data
$\{u_i, t_i\}$

$\rightarrow$

Differentiate
$\dot{u}_i$

$\rightarrow$

Build library
$\Phi(u)$

$\rightarrow$

Sparse
regression

$\rightarrow$

Discovered
equation

Most candidates are wrong. We want $\boldsymbol{\theta}$ to be sparse. This is the SINDy framework.

Looking Ahead

Next lectures:

Lecture 9: System Identification: fitting coefficients for known equation structures
Lecture 10: SINDy: discovering the equation structure itself from data

Read: Brunton, Proctor, Kutz. "Discovering governing equations from data by sparse identification of nonlinear dynamical systems." (2016)
        

The central thread: the design matrix $\Phi$ and feature vector $\phi$ connect Fourier, AR, logistic maps, and SINDy. It's all linear regression with the right features.

Summary

Method	Feature vector $\phi$	Goal
Fourier	$[\sin(f_k t), \cos(f_k t)]$	Representation
AR(p)	$[u_i, u_{i-1}, \dots]$	Prediction
Logistic Map	$[1, x, x^2]$	Discover map
SINDy	$[1, u, u^2, \sin(u), \dots]$	Discover ODE

Everything is $\boldsymbol{\theta}^\top \phi$. The art is choosing $\phi$.