Complex Systems

and Probabilistic Modeling

ML for Science - Lecture 6

Chaos, uncertainty, and the bridge from randomness to determinism

Where We Are

So far:

  • Empirical laws, linear regression
  • Differential equations (ODEs, PDEs)
  • Numerical methods: discretization, stability
  • Simulating physical systems

Today:

  • What are complex systems?
  • Chaos and predictability limits
  • Probability as a tool
  • From randomness to determinism
Key question: If we know the laws of physics, what's left to discover?

What is a Complex System?

"A system where the whole is much more than the sum of its parts."

Key properties:

  • Emergence: Global patterns arise from local interactions
  • Nonlinearity: Understanding parts doesn't mean understanding the whole
  • Multiple scales: Different behavior at different scales
  • Sensitivity: Small changes can have large effects

Examples of Complex Systems

Physical

  • Weather & climate
  • Fluid turbulence
  • Granular materials
  • Protein folding

Biological

  • Brain / neural systems
  • Ecosystems
  • Immune system
  • Cell signaling

Social/Tech

  • Social networks
  • Financial markets
  • Internet/routing
  • Cities & traffic
Common thread: We know the rules for the parts, but can't easily predict the whole.

Why Complex Systems Matter for ML

When applying ML to science, you often encounter:

Challenges:
  • High-dimensional, noisy data
  • Multiple interacting scales
  • Chaotic dynamics
  • Missing measurements
Opportunities:
  • Statistics may be predictable
  • Patterns emerge at right scale
  • Neural networks are themselves complex systems!

Rayleigh-Bénard Convection

Fluid heated from below, cooled from above — creates convection cells

Basic mechanism of atmospheric and oceanic circulation

From 7 Equations to 3: The Saltzman-Lorenz Story

Barry Saltzman (Yale, 1961) developed a 7-equation model for convection.

He showed it to Edward Lorenz at MIT - one solution "refused to settle down."

Lorenz noticed: 4 variables quickly became tiny. Only 3 were "keeping each other going."

Lorenz's insight:

"Barry gave me the go-ahead signal, and back at MIT the next morning I put the three equations on the computer..."

"...and sure enough, there was the same lack of periodicity."

Saltzman-Lorenz Exchange, 1961

The Lorenz Equations

Saltzman's 7 equations reduced to 3:

$\dot{x} = \sigma(y - x)$

$\dot{y} = x(\rho - z) - y$

$\dot{z} = xy - \beta z$
Variables:
  • $x$ = intensity of convection
  • $y$ = horizontal temperature diff
  • $z$ = vertical temperature deviation
Parameters (chaotic regime):

$\sigma = 10$, $\rho = 28$, $\beta = 8/3$

Just 3 coupled ODEs, yet the dynamics are incredibly complex

The Lorenz Attractor

x = 1.00
y = 1.00
z = 1.00
3
1500

The trajectory never repeats but stays on this strange "butterfly" shape

The Accident

Lorenz wanted to extend a simulation. Instead of starting over, he typed in values from a printout:

Printout showed:

0.506
Computer stored:

0.506127

A difference of about 0.0001 - surely that can't matter?

He goes for coffee, comes back, and...

The Weather is Completely Different!

The two simulations start nearly identical, then completely diverge.

Wait... what's happening here?
The equations are deterministic. Same input should give same output, right?

Sensitivity to Initial Conditions

10-4

■ Original vs ■ Perturbed by $\varepsilon$

No matter how small $\varepsilon$ is, the trajectories eventually diverge completely.

The Butterfly Effect

"Does the flap of a butterfly's wings in Brazil set off a tornado in Texas?"
- Edward Lorenz, 1972

Lorenz's discovery:

  • Deterministic $\neq$ Predictable
  • Tiny errors grow exponentially fast
  • Long-term weather prediction has a fundamental limit (~2 weeks)
This is chaos: sensitivity to initial conditions in deterministic systems.

Ensemble of Initial Conditions

What happens to a distribution of initial conditions in the Lorenz system?

t = 0.0

A tight cluster of initial conditions spreads across the entire attractor

From Trajectories to Distribution

100 trajectories starting within $10^{-6}$ — sample at time $t$ to get a distribution

t = 10

The histogram shows a Probability Mass Function (PMF) — counts in discrete bins

From PMF to PDF

What if $\Delta x \to 0$?

As we use more bins (smaller $\Delta x$), the histogram approaches a smooth curve:

10
In the limit $\Delta x \to 0$, the PMF becomes a Probability Density Function (PDF)

Estimating PDF from Samples

Click to add samples — each gets a Gaussian "kernel", and they sum to form the estimate

0.6 Samples: 0
$\hat{f}(x) = \frac{1}{nh} \sum_{i=1}^{n} K\left(\frac{x - x_i}{h}\right)$   where   $K(u) = \frac{1}{\sqrt{2\pi}} e^{-u^2/2}$ (Gaussian kernel)

Probability Review

(Optional background material)

Key concepts we'll use throughout the course:

  • Random variables, PMF, PDF
  • Mean, variance, expectation
  • Joint & conditional probability
  • Independence & Bayes' rule
  • Central Limit Theorem

Random Variables

A random variable $X$ maps outcomes to numbers:

$X: \Omega \to \mathbb{R}$   (sample space to real numbers)
Example: Coin flip
$\Omega = \{\text{Heads}, \text{Tails}\}$
$X(\text{Heads}) = 1$
$X(\text{Tails}) = 0$
Example: Temperature
$\Omega = $ all possible states
$X = $ temperature reading
$X \in \mathbb{R}$ (continuous)

Probability Mass Function (PMF)

For discrete random variables:

$P(X = x)$ = probability that $X$ takes value $x$
1 2 3 4 5 6 1/6 Fair Dice
Properties:
  • $P(X = x) \geq 0$
  • $\sum_x P(X = x) = 1$

Probability Density Function (PDF)

For continuous random variables, the PDF $f(x)$ gives probability via integration:

$P(a \leq X \leq b) = \int_a^b f(x) \, dx$ = shaded area
Key points:

$f(x)$ is the PDF (not a probability!)

$f(x) \geq 0$ and $\int_{-\infty}^{\infty} f(x) dx = 1$

$P(X = x) = 0$ for any single point

Only areas under the curve are probabilities.

Mean and Variance

Mean (Expected Value):

$\mu = E[X] = \int x \cdot f(x) \, dx$

The "center of mass" of the distribution

Variance:

$\sigma^2 = E[(X - \mu)^2]$

How spread out the distribution is

Why this matters: Instead of predicting a single value, we can predict the mean AND quantify our uncertainty (variance).

Joint & Conditional Probability

Joint Probability:

$P(X, Y)$ = probability of both $X$ and $Y$
Conditional Probability:

$P(X | Y) = \frac{P(X, Y)}{P(Y)}$
Example:
$X$ = it rains, $Y$ = cloudy

$P(\text{rain} | \text{cloudy})$ = probability of rain given it's cloudy

Usually $P(\text{rain} | \text{cloudy}) > P(\text{rain})$
Marginalization: $P(X) = \sum_y P(X, Y=y) = \sum_y P(X|Y=y) P(Y=y)$

Independence

Two random variables are independent if knowing one tells you nothing about the other:

$\begin{aligned} X \perp Y \;&\iff\; P(X, Y) = P(X) \cdot P(Y) \\ &\iff\; P(X|Y) = P(X) \end{aligned}$
Independent:
  • Coin flip 1 and coin flip 2
  • Weather in Tokyo and weather in Paris
NOT Independent:
  • Temperature and ice cream sales
  • Parent height and child height

Independence is a strong assumption — rarely true in real data!

Bayes' Rule

Inverting conditional probabilities:

$P(A | B) = \frac{P(B | A) \cdot P(A)}{P(B)}$
$P(A)$ — Prior
What we believed before seeing data

$P(B|A)$ — Likelihood
How likely is the data given our hypothesis?
$P(A|B)$ — Posterior
Updated belief after seeing data

$P(B)$ — Evidence
Normalizing constant
Key idea: Bayes' rule tells us how to update beliefs with new evidence. Foundation of Bayesian inference and many ML algorithms.

▶ 3Blue1Brown: Bayes theorem, the geometry of changing beliefs

Central Limit Theorem

One of the most important results in probability:

The sum (or average) of many independent random variables tends toward a Gaussian distribution, regardless of the original distribution.
If $X_1, X_2, \ldots, X_n$ are i.i.d. with mean $\mu$ and variance $\sigma^2$, then:

$\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i \xrightarrow{d} \mathcal{N}\left(\mu, \frac{\sigma^2}{n}\right)$ as $n \to \infty$
Why it matters:
  • Explains why Gaussians are everywhere
  • Justifies normal approximations
  • Foundation of statistical inference
Examples:
  • Measurement errors
  • Heights of people
  • Stock price changes (approx)

Brownian Motion

Robert Brown (1827):

Observed pollen grains moving erratically in water under a microscope.

Albert Einstein (1905):

Explained it as evidence for atoms! Tiny molecules randomly bump the particle.

Imagine being pushed randomly by a crowd - you end up doing a "random walk"

Random Walk

At each step, move randomly — left: trajectories, right: density estimate (KDE)

Walkers: 0

Deriving the Diffusion Equation

Consider a 1D random walk on a grid with spacing $\Delta x$ and time step $\Delta t$:

$x - 2\Delta x$
$x - \Delta x$
$x$
$x + \Delta x$
$x + 2\Delta x$
$p(x - \Delta x, t)$
$p(x + \Delta x, t)$
prob $= \tfrac{1}{2}$
prob $= \tfrac{1}{2}$
Master Equation:   $p(x, t + \Delta t) = \frac{1}{2} p(x - \Delta x, t) + \frac{1}{2} p(x + \Delta x, t)$
Probability at $x$ comes from particles jumping in from both neighbors

The Master Equation

Probability at position $x$ at time $t + \Delta t$ comes from neighbors:

$p(x, t + \Delta t) = \frac{1}{2} p(x - \Delta x, t) + \frac{1}{2} p(x + \Delta x, t)$

Subtracting $p(x, t)$ on both sides:

$\underbrace{p(x, t + \Delta t) - p(x, t)}_{\text{change in time}} = \tfrac{1}{2} \underbrace{\left[ p(x + \Delta x, t) - 2p(x, t) + p(x - \Delta x, t) \right]}_{\text{curvature in space}}$

Taking the Continuum Limit

Taylor expand for small $\Delta x$ and $\Delta t$:

$p(x, t + \Delta t) \approx p + \frac{\partial p}{\partial t} \Delta t$
$p(x \pm \Delta x, t) \approx p \pm \frac{\partial p}{\partial x} \Delta x + \frac{1}{2}\frac{\partial^2 p}{\partial x^2} (\Delta x)^2$

Substituting and simplifying:

$\frac{\partial p}{\partial t} \Delta t = \frac{1}{2} \cdot \frac{\partial^2 p}{\partial x^2} (\Delta x)^2$
The Diffusion Equation:   $\displaystyle\frac{\partial p}{\partial t} = D \frac{\partial^2 p}{\partial x^2}$   where   $D = \frac{(\Delta x)^2}{2 \Delta t}$

What if $\Delta x \to 0$? Random $dx$ at every $dt$?

The same physics can be written as a Stochastic Differential Equation:

PDE (density)

$\frac{\partial p}{\partial t} = D \frac{\partial^2 p}{\partial x^2}$

Evolution of probability density

SDE (trajectory)

$dX = \sqrt{2D}\, dW$

Langevin equation for single particle

$dW$ = Wiener process increment (Gaussian noise with $\langle dW \rangle = 0$, $\langle dW^2 \rangle = dt$)
Key insight: The PDE describes the ensemble, the SDE describes individual realizations. Both are equivalent!

From Randomness to Determinism

Microscopic

Individual particles walk randomly
Completely unpredictable!
Macroscopic

Density evolves deterministically
Perfectly predictable!

This is how we go from randomness at small scales to determinism at large scales.

The Right Scale Matters

A profound lesson for modeling:

If something seems unpredictable, maybe you're looking at the wrong scale.
Individual Scale
  • Molecules → chaotic
  • Neuron spikes → noisy
  • Individual trades → random
Statistical Scale
  • Temperature/pressure → smooth
  • Population activity → structured
  • Market trends → regularities

Complex Systems on Networks

Many complex systems have a network structure:

Internet
Routers, packets
Congestion, packet loss
Social
People, connections
Information spread
Neural
Neurons, synapses
Signal propagation
Common questions: What's the probability of packet loss? How does information spread? How do signals propagate?

How About Neurons to Brains?

The FitzHugh-Nagumo model: a 2D simplification of Hodgkin-Huxley (equivalent to the Van der Pol oscillator circuit)

$\dot{V} = V - \tfrac{V^3}{3} - W + I$
$\dot{W} = \varepsilon(V + a - bW)$
$V$membrane potential
$W$recovery variable
$I$input current
$\varepsilon$time scale
$I$ controls frequency:
$I = $ 0.50

Flow field shows state evolution. Cubic shape creates excitability: small push → large spike.

Coupled Neurons

Neurons interact through synaptic connections:

$\displaystyle\frac{dV_k}{dt} = V_k - \frac{V_k^3}{3} - W_k + I_k + \sum_{j \in \mathcal{N}(k)} g_{jk}(V_j - V_k)$
$V_1$
$V_2$
$V_3$
$V_4$
$V_5$
$V_6$
$g_{12}$
$g_{23}$
$g_{35}$

Coupling strength $g_{jk}$ determines influence of neuron $j$ on neuron $k$

Synchronization

Explore how coupling strength affects synchronization:

Intrinsic $I_k$
Coupling $g$: 0.000
Try it: At $g=0$ neurons oscillate at their natural frequencies (set by $I_k$). Increase $g$ to see them synchronize.

Preview: Artificial Neural Networks

Artificial neurons are a simplification:

Real neurons:
- Fire in time (spikes)
- Complex ion dynamics
- Many neurotransmitters
- Stochastic
Artificial neurons:
- Static input/output
- Simple: $y = \sigma(Wx + b)$
- Deterministic
- But still a complex system!
Key insight: Neural networks learn because they're complex systems that can adapt to data.

Summary

  1. Complex systems: The whole is more than the sum of parts
  2. Chaos: Deterministic $\neq$ Predictable (Lorenz system)
  3. Probability: A tool for dealing with uncertainty and complexity
  4. Uncertainty propagation: Distributions evolve deterministically
  5. Scale matters: Randomness at small scales can become determinism at large scales (Brownian motion $\to$ diffusion)
  6. Neural networks: Are themselves complex systems that learn from data
Next lecture: Neural networks as function approximators