Sequential Bayesian Learning: Linear Regression

9 Jun 2026

Introduction

Linear model: \[ y_{i} = \beta_{0} + \beta_{1} x_{i} + \epsilon_{i}, \quad \epsilon \overset{\text{iid}}{\sim} \mathrm{N}(0, \sigma^{2}) \] for \(i = 1, \ldots, N\) data points.

Similarly, we can write: \[ y_{i} \overset{\text{ind}}{\sim} \mathrm{N}(\beta_{0} + \beta_{1} x_{i}, \sigma^{2}) \]

Simulate data

We generate synthetic data (Bishop 2006) from the function:

\[ \begin{align} y = f(x; \beta_{0}, \beta_{1}) &= \beta_{0} + \beta_{1} x \\ &= -0.3 + 0.5 x \end{align} \]

For \(i = 1, \ldots, N\):

Choose a value \(x_{i} \sim \mathrm{U}(x \mid -1, 1)\)
Compuate \(y_{i} = f(x_{i})\)
Add Gaussian noise with standard deviation \(\sigma = 0.2\)

Simulate data

Code

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.gridspec import GridSpec
from scipy.stats import norm, multivariate_normal
plt.rcParams.update({
    "figure.constrained_layout.use": True,
    "axes.spines.top": False,
    "axes.spines.right": False
})
np.random.seed(123)

# Figure settings
xlim, ylim = (-1., 1.), (-1., 1.)
xx = np.linspace(-1, 1, 101, endpoint=True)

# True parameters
b0, b1 = -0.3, 0.5    # intercept and slope
sigma = 0.2           # standard deviation of the noise
tau = (1 / sigma)**2  # precision

# Sample N=20 data points (but process them sequentially)
N = 20
xs = np.random.uniform(low=-1, high=1, size=N)
ys = b0 + b1 * xs
ys = ys + np.random.normal(loc=0, scale=sigma, size=len(xs))

# Plot
_, ax = plt.subplots(figsize=(3, 3))
ax.plot(xx, b0 + b1 * xx, c="lightgray", lw=2)
ax.plot(xs, ys, "o", mec="dodgerblue", mfc="none", mew=2)
# ax.spines["bottom"].set_position("zero")
ax.set(xlabel="x", ylabel="y", xlim=xlim, ylim=ylim)
ax.set_aspect("equal")
plt.show()

Sequential learning

Before (i.e. “prior” to) observing any data, we belief that the intercept (\(\beta_{0}\)) and slope (\(\beta_{1}\)) can take many values, but they are centered around 0 with some uncertainty. We model this by assuming that they are derived from a normal distribution.

Code

# Prior
m = np.zeros(2)
C = (1 / 2.0) * np.eye(2)  # precision parameter alpha = 2, Eq. (3.52), p. 153

XX, YY = np.mgrid[-1:1.01:0.01, -1:1.01:0.01]
pts = np.dstack((XX, YY))

_, ax = plt.subplots(figsize=(3, 3))
ax.contourf(XX, YY, multivariate_normal.pdf(pts, mean=m, cov=C))
ax.set(xlabel="intercept, $\\beta_{0}$", ylabel="slope, $\\beta_{1}$")
ax.set_aspect("equal")
plt.show()

Sample parameters

We can sample some random combinations of intercept and slope (\(\beta_{0}\), \(\beta_{1}\)) from the joint distribution.

Code

betas = multivariate_normal.rvs(mean=m, cov=C, size=10)

_, axs = plt.subplots(1, 2, sharex=True, sharey=True, figsize=(6, 3))
axs[0].contourf(XX, YY, multivariate_normal.pdf(pts, mean=m, cov=C), alpha=0.5)
axs[0].plot(betas[0][0], betas[0][1], "x", mec="k")
axs[0].set(xlabel="intercept, $\\beta_{0}$", ylabel="slope, $\\beta_{1}$")

axs[1].plot(xx, betas[0][0] + betas[0][1] * xx, c="r")
axs[1].plot(xx, b0 + b1 * xx, c="lightgray")
axs[1].set(xlabel="$x$", ylabel="$y$")

for ax in axs:
    ax.set(xlim=xlim, ylim=ylim)

plt.show()

Sample parameters

We can sample some random combinations of intercept and slope (\(\beta_{0}\), \(\beta_{1}\)) from the joint distribution.

Code

_, axs = plt.subplots(1, 2, sharex=True, sharey=True, figsize=(6, 3))
axs[0].contourf(XX, YY, multivariate_normal.pdf(pts, mean=m, cov=C))
axs[0].plot(betas[:, 0], betas[:, 1], "x", mec="k")
axs[0].set(xlabel="intercept, $\\beta_{0}$", ylabel="slope, $\\beta_{1}$")

axs[1].plot(xx, np.stack((np.ones(len(xx)), xx)).T @ betas.T, c="r")
axs[1].plot(xx, b0 + b1 * xx, c="lightgray")
axs[1].set(xlabel="$x$", ylabel="$y$")

for ax in axs:
    ax.set(xlim=xlim, ylim=ylim)

plt.show()

Get some data

We observe our first data point, (\(x_{1}, y_{1}\))
We compute the likelihood of that data point given (\(\beta_{0}, \beta_{1}\)): \[ p(y_{1} \mid \beta_{0}, \beta_{1}, x_{1}) \]

Code

# Compute the likelihood of the first data point
# for all combinations of slope and intercept
likelihood = norm.pdf(ys[0], loc=XX + YY * xs[0], scale=sigma)

# Plot the data point, and the likelihood
_, axs = plt.subplots(1, 2, sharex=True, sharey=True, figsize=(6, 3))
axs[0].plot(xx, np.stack((np.ones(len(xx)), xx)).T @ betas.T, c="r", alpha=0.2)
axs[0].plot(xx, b0 + b1 * xx, c="gray")
axs[0].plot(xs[0], ys[0], "o", mfc="none", mec="dodgerblue", mew=2)
axs[0].set(xlabel="$x$", ylabel="$y$")

axs[1].contourf(XX, YY, likelihood, alpha=0.7)
axs[1].set(xlabel="intercept, $\\beta_0$", ylabel="slope, $\\beta_1$")

for ax in axs:
    ax.set(xlim=xlim, ylim=ylim)
plt.show()

Update our belief

After (“posterior”) observing the data point, we can narrow down the uncertainties of the intercept and slope using Bayes’ rule: \[ p(\beta_{0}, \beta_{1} \mid y_{1}) \propto p(y_{1} \mid \beta_{0}, \beta_{1})\ p(\beta_{0}, \beta_{1}) \]

Code

def update_params(m_prior, C_prior, x_i, y_i, beta=25.0):
    """Update the model parameters.

    Parameters
    ----------
    m_prior : np.ndarray
        The prior for the mean.
    C_prior: np.ndarray
        The prior for the variance-covariance matrix.
    x_i, y_i : float
        The data point.
    beta : float
        The precision.
    
    Returns
    -------
    m_post : np.ndarray
        The posterior of the mean.
    C_post : np.ndarray
        The posterior of the variance-covariance matrix.
    """

    # Design vector `phi` for a linear fit: [1, x]
    phi = np.array([1.0, x_i])

    # Compute the inverse of the prior covariance
    C_prior_inv = np.linalg.inv(C_prior)

    # Update the variance-covariance matrix
    C_post_inv = C_prior_inv + beta * np.outer(phi, phi)
    C_post = np.linalg.inv(C_post_inv)

    # Update the mean
    m_post = C_post @ (C_prior_inv @ m_prior + beta * y_i * phi)
    return m_post, C_post

ms, Cs = [], []
for i, (x_i, y_i) in enumerate(zip(xs, ys)):
    if i == 0:
        m_post, C_post = update_params(m_prior=m, C_prior=C, x_i=x_i, y_i=y_i, beta=tau)
        ms.append(m_post)
        Cs.append(C_post)
    else:
        m_post, C_post = update_params(
            m_prior=ms[-1],
            C_prior=Cs[-1],
            x_i=x_i,
            y_i=y_i,
            beta=tau
        )
        ms.append(m_post)
        Cs.append(C_post)

_, axs = plt.subplots(1, 3, sharex=True, sharey=True, figsize=(9, 3))
axs[0].contourf(XX, YY, multivariate_normal.pdf(pts, mean=m, cov=C))
axs[1].contourf(XX, YY, likelihood)
axs[2].contourf(XX, YY, multivariate_normal.pdf(pts, mean=ms[0], cov=Cs[0]))
for ax in axs:
    ax.set(xlabel=f"intercept, $\\beta_0$", ylabel=f"slope, $\\beta_1$", xlim=xlim, ylim=ylim)
axs[0].set_title("prior")
axs[1].set_title("likelihood")
axs[2].set_title("posterior")
plt.show()

Sample parameters

Code

betas = multivariate_normal.rvs(mean=ms[0], cov=Cs[0], size=10)

_, axs = plt.subplots(1, 2, sharex=True, sharey=True, figsize=(6, 3))
axs[0].contourf(XX, YY, multivariate_normal.pdf(pts, mean=ms[0], cov=Cs[0]))
axs[0].plot(betas[:, 0], betas[:, 1], "x", mec="k")
axs[0].set(xlabel="intercept, $\\beta_{0}$", ylabel="slope, $\\beta_{1}$")

axs[1].plot(xx, np.stack((np.ones(len(xx)), xx)).T @ betas.T, c="r")
axs[1].plot(xx, b0 + b1 * xx, c="lightgray")
axs[1].set(xlabel="$x$", ylabel="$y$")

for ax in axs:
    ax.set(xlim=xlim, ylim=ylim)
plt.show()

Get some more data

We observe our next data point, (\(x_{2}, y_{2}\))
We compute the likelihood of that data point given (\(\beta_{0}, \beta_{1}\)): \[ p(y_{2} \mid \beta_{0}, \beta_{1}, (x_{1}, y_{1}), x_{2}) \]

Code

# Compute the likelihood of the first data point
# for all combinations of slope and intercept
likelihood = norm.pdf(ys[1], loc=XX + YY * xs[1], scale=sigma)

# Plot the data point, and the likelihood
_, axs = plt.subplots(1, 2, sharex=True, sharey=True, figsize=(6, 3))
axs[0].plot(xx, np.stack((np.ones(len(xx)), xx)).T @ betas.T, c="r", alpha=0.2)
axs[0].plot(xx, b0 + b1 * xx, c="gray")
axs[0].plot(xs[:2], ys[:2], "o", mfc="none", mec="dodgerblue", mew=2)
axs[0].set(xlabel="$x$", ylabel="$y$")

axs[1].contourf(XX, YY, likelihood, alpha=0.7)
axs[1].set(xlabel="intercept, $\\beta_0$", ylabel="slope, $\\beta_1$")

for ax in axs:
    ax.set(xlim=xlim, ylim=ylim)
plt.show()

Update our beliefs

We update our beliefs using the prior and the likelihood

Code

_, axs = plt.subplots(1, 3, sharex=True, sharey=True, figsize=(9, 3))
axs[0].contourf(XX, YY, multivariate_normal.pdf(pts, mean=ms[0], cov=Cs[0]))
axs[1].contourf(XX, YY, likelihood)
axs[2].contourf(XX, YY, multivariate_normal.pdf(pts, mean=ms[1], cov=Cs[1]))
for ax in axs:
    ax.set(xlabel=f"intercept, $\\beta_0$", ylabel=f"slope, $\\beta_1$", xlim=xlim, ylim=ylim)
axs[0].set_title("prior")
axs[1].set_title("likelihood")
axs[2].set_title("posterior")
plt.show()

Get some more data

We continue this process iteratively

Code

_, axs = plt.subplots(2, 4, sharex=True, sharey=True, figsize=(12, 3))

for col_idx, n in enumerate([3, 8, 12, 19]):
    axs[0, col_idx].contourf(XX, YY, multivariate_normal.pdf(pts, mean=ms[n], cov=Cs[n]))

    betas = multivariate_normal.rvs(mean=ms[n], cov=Cs[n], size=10)
    axs[0, col_idx].plot(betas[:, 0], betas[:, 1], "x", mec="k")

    axs[1, col_idx].plot(xx, np.stack((np.ones(len(xx)), xx)).T @ betas.T, c="r")
    axs[1, col_idx].plot(xx, b0 + b1 * xx, c="k")
    axs[1, col_idx].plot(xs[:n], ys[:n], "o", mfc="none", mec="dodgerblue", mew=2)

    axs[0, col_idx].set_xlim(xlim)
    axs[0, col_idx].set_ylim(ylim)
    axs[0, col_idx].set_aspect("equal")
    axs[1, col_idx].set_ylim(ylim)
    axs[1, col_idx].set_aspect("equal")
plt.show()

Using `pymc`

Code

import pymc as pm
import arviz as az

with pm.Model() as lin_model:
    # Priors
    alpha = 2.0
    beta_0 = pm.Normal("beta_0", mu=0, tau=alpha)
    beta_1 = pm.Normal("beta_1", mu=0, tau=alpha)

    # Likelihood
    mu = beta_0 + beta_1 * xs
    y_obs = pm.Normal("y_obs", mu=mu, sigma=sigma, observed=ys)

    # Infernece
    trace = pm.sample(draws=5000, tune=2000, return_inferencedata=True)

_, axs = plt.subplots(2, 2)
az.plot_trace(trace, axes=axs)
axs[0, 0].axvline(b0, c="k")
axs[1, 0].axvline(b1, c="k")
for ax in axs[0, :]:
    ax.set_ylabel(f"intercept, $\\beta_0$")
for ax in axs[1, :]:
    ax.set_ylabel(f"slope, $\\beta_1$")
plt.show()

Discussion

Foot placement

Stable gait is defined as gait that does not lead to falls (Bruijn and Dieën 2018).
Foot placement is one control mechanism of to establish stable gait (Lang et al. 2026).
Foot placement models typically model step width (SW) and step length (SL).

\[ \begin{align} \mathrm{SW} &= \beta_{0} + \beta_{\mathrm{pos}_{\mathrm{ml}}} \mathrm{CoM}_{\mathrm{pos}_{\mathrm{ml}}} + \beta_{\mathrm{vel}_{\mathrm{ml}}} \mathrm{CoM}_{\mathrm{vel}_{\mathrm{ml}}} + \epsilon_{\mathrm{ml}} \\ \mathrm{SL} &= \beta_{0} + \beta_{\mathrm{pos}_{\mathrm{ap}}} \mathrm{CoM}_{\mathrm{pos}_{\mathrm{ap}}} + \beta_{\mathrm{vel}_{\mathrm{ap}}} \mathrm{CoM}_{\mathrm{vel}_{\mathrm{ap}}} + \epsilon_{\mathrm{ap}} \end{align} \]

References

Bishop, Christopher M. 2006. “Chapter 3: Linear Models for Regression.” In Pattern Recognition and Machine Learning, 137–78. Springer Science+Business Media, LLC.

Bruijn, Sjoerd M., and Jaap H. van Dieën. 2018. “Control of Human Gait Stability Through Foot Placement.” Journal of The Royal Society Interface 15 (143): 20170816. https://doi.org/10.1098/rsif.2017.0816.

Lang, Charlotte, Deepak K. Ravi, Sjoerd M. Bruijn, Jeffrey M. Hausdorff, Jaap H. van Dieën, and Anina Moira van Leeuwen. 2026. “How Do We Tread? Differences in Stability-Related Foot Placement Control Between Overground and Treadmill Walking in Young Adults.” PLOS One 21 (3): e0344704. https://doi.org/10.1371/journal.pone.0344704.

Sequential Bayesian Learning: Linear Regression

Introduction

Simulate data

Simulate data

Sequential learning

Sample parameters

Sample parameters

Get some data

Update our belief

Sample parameters

Get some more data

Update our beliefs

Get some more data

Using pymc

Discussion

Foot placement

References

Using `pymc`