Probabilistic Machine Learning: Advanced Topics

Kevin P. Murphy · MIT Press · 2023

probabilistic-machine-learningbayesian-inferencegenerative-modelsdeep-learningreinforcement-learningcausality

Probabilistic Machine Learning: Advanced Topics

Authors: Kevin P. Murphy (primary), with contributing sections/chapters by Alex Alemi, Jeff Bilmes, Marco Cuturi, Alexander D'Amour, Finale Doshi-Velez, Roy Frostig, Justin Gilmer, Been Kim, Durk Kingma, Simon Kornblith, Balaji Lakshminarayanan, Lihong Li, George Papamakarios, Ben Poole, Mihaela Rosca, Vinayak Rao, Yang Song, Victor Veitch, Andrew Wilson, and others Year: 2023 (first printing); 2026 (second printing) Tags: probabilistic-machine-learning, bayesian-inference, generative-models, variational-inference, reinforcement-learning, textbook

TL;DR

Graduate-level textbook (36 chapters, ~1,250 pages) covering advanced probabilistic ML across six parts: fundamentals, inference, prediction, generation, discovery, and action. Sequel to Murphy's introductory PML volume, it extends the scope beyond supervised function approximation to generative modeling, latent variable discovery, causal inference, and sequential decision-making, unified under a Bayesian/probabilistic framework.

First pass — the five C's

Category. Comprehensive graduate textbook / survey reference — not a research paper reporting novel empirical results.

Context. Sequel to Murphy (2022) PML: An Introduction; positions itself against Pearl's "glorified curve fitting" critique of supervised ML. Draws on Bishop's PRML, Goodfellow et al.'s Deep Learning, and a very large cited literature across inference, generative modeling, causality, and RL. Lakshminarayanan, Papamakarios, Song, Kingma, Veitch, and others contribute frontier chapters from their own research specialties.

Correctness. Load-bearing assumptions: (1) the probabilistic/Bayesian framing is the right unifying lens for all six parts — this is a philosophical stance, not demonstrated; (2) readers enter with solid probability, statistics, linear algebra, and optimization; (3) JAX-based code on Google Colab adequately reproduces conceptual figures. These assumptions appear internally consistent but limit the book's accessibility and its claim to cover "all the basics."

Contributions. - Unified single-volume treatment spanning inference algorithms, modern deep generative models (VAEs, flows, diffusion, GANs), unsupervised discovery, causality, and RL. - Research-frontier chapters co-authored by domain specialists (e.g., normalizing flows by Papamakarios & Lakshminarayanan; GANs by Mohamed, Lakshminarayanan & Rosca; causality by D'Amour & Veitch). - Reproducible figures via linked Jupyter notebooks (probml.github.io) using JAX. - Explicit integration of causality (do-calculus, IV strategies, DiD) and interpretability into an ML curriculum, topics often absent from competing texts.

Clarity. Writing is generally precise and mathematically careful; multi-author origin creates noticeable stylistic variation across chapters, and several chapters (30, 31) are acknowledged stubs with minimal content.

Second pass — content

Main thrust: A unified probabilistic framework — Bayesian inference over parsimonious generative models — is applied systematically to prediction, generation, discovery, and control; the book argues this framework yields more robust and data-efficient ML systems than pure function approximation.

Supporting evidence: - No novel empirical experiments; all figures are pedagogical illustrations or reproductions of published results, generated via JAX notebooks. - Part II covers exact and approximate inference (Kalman filtering, message passing, VI, MCMC, SMC) with algorithmic derivations and complexity analyses. - Part IV covers five major generative model families (VAEs, autoregressive, normalizing flows, EBMs, diffusion, GANs) each with training objectives and known failure modes (e.g., posterior collapse in VAEs, mode collapse in GANs). - Part VI covers MDPs, RL (value-based, policy-based, model-based), contextual bandits, and causal inference (RCTs, IV, DiD, do-calculus) with worked examples. - Python/JAX code for nearly all figures is publicly linked; notebook links embedded in PDF captions.

Figures & tables: Figures are programmatically generated and captioned with notebook names (e.g., gauss_plot_2d.ipynb). Axes are generally labeled. Because the book contains no original empirical benchmarks, there are no error bars, confidence intervals, or statistical significance reports on novel results — all quantitative claims originate from cited papers. Visualization quality varies across chapters due to multi-contributor figure production.

Follow-up references: - Murphy (2022) PML: An Introduction — required prerequisite; covers supervised learning foundations this volume assumes. - Guttman (2022) Exercises in PML — companion exercises and solutions for both PML volumes. - Pearl & Mackenzie (2018) The Book of Why — motivation for the causality chapters and the "glorified curve fitting" critique. - Lin et al. (2021) and Deisenroth et al. (2020) — cited as alternative mathematical background sources.

Third pass — critique

Implicit assumptions: - Bayesian/probabilistic framing is assumed universally superior; no systematic comparison with frequentist or non-probabilistic deep learning practices that often outperform Bayesian alternatives in benchmarks. - JAX is assumed as the implementation language — readers using PyTorch or TensorFlow gain no directly runnable code. - "Advanced topics" scope implicitly assumes fluency with the prequel; the book is not self-contained for most readers. - The frontier coverage reflects a ~2022 knowledge cutoff; rapidly evolving areas (LLMs, diffusion models, RLHF) are acknowledged as undercovered.

Missing context or citations: - Chapter 22.5 on Large Language Models is a single-section stub that does not engage with transformer scaling laws, instruction tuning, or RLHF — prominent omissions given the 2023 publication date. - Chapters 30 (Graph learning) and 31 (Nonparametric Bayesian models) are explicitly incomplete; readers are not warned of this in the table of contents. - Conformal prediction (Ch. 14.3) is introduced but disconnected from the rest of the prediction chapters; its relationship to Bayesian credible intervals is not analyzed. - Adversarial robustness (Ch. 19.8) does not engage with certified defenses literature or the broader ML security literature.

Possible experimental / analytical issues: - No original benchmarks, ablations, or reproducibility studies: all quantitative claims are inherited from cited works and cannot be independently verified within the book. - The multi-author structure creates uneven depth: some chapters (Gaussian processes, causality) are thorough research-level treatments; others (graph learning, nonparametric Bayes) are placeholders that could mislead readers into thinking the topic is well-covered. - Code reproducibility depends on external infrastructure (Google Colab, probml.github.io) that is not under the author's permanent control; link rot is a real risk for a reference text. - The "parsimonious representations" thesis is stated motivationally but not tested; no chapter compares the practical benefits of probabilistic vs. non-probabilistic approaches on held-out benchmarks.

Ideas for future work: - Complete the stub chapters (30, 31) and expand the LLM section to cover instruction tuning, RLHF, and scaling laws. - Add a systematic empirical comparison chapter contrasting Bayesian and non-Bayesian methods on calibration and OOD benchmarks, grounding the book's central thesis in data. - Develop a PyTorch-native code companion to broaden accessibility beyond JAX users. - Extend the causality chapter to cover emerging ML-for-causality methods (causal representation learning, neural IV estimators) that have proliferated since the 2022 writing cutoff.

Methods

variational inference
Markov chain Monte Carlo
sequential Monte Carlo
Kalman filtering
Gaussian processes
normalizing flows
diffusion models
variational autoencoders
generative adversarial networks
message passing
expectation propagation
Hamiltonian Monte Carlo
stochastic gradient descent
natural gradient descent
Bayesian optimization

Claims

The book expands the scope of ML beyond function approximation to encompass generation, discovery, and decision making under uncertainty using probabilistic models.
Adopting a model-based approach with parsimonious representations of the data generating process enables more robust and data-efficient systems.
Bayesian inference applied to probabilistic models provides a unifying framework for learning, inference, and decision making across diverse ML tasks.
Generative AI and latent variable models allow uncovering meaningful hidden structure in high-dimensional data such as images and text.
Causal inference and decision making under uncertainty represent essential extensions beyond standard supervised learning.