Algorithms for Decision Making

Mykel J. Kochenderfer, Tim A. Wheeler, Kyle H. Wray · The MIT Press · 2022

decision-makingreinforcement-learningprobabilistic-reasoningmarkov-decision-processesmultiagent-systemsplanning-under-uncertainty

Algorithms for Decision Making

Authors: Mykel J. Kochenderfer, Tim A. Wheeler, Kyle H. Wray Year: 2022 Tags: decision-making-under-uncertainty, markov-decision-processes, pomdp, reinforcement-learning, multiagent-systems, probabilistic-reasoning

TL;DR

A graduate-level textbook unifying algorithms for decision making under four sources of uncertainty — outcome, model, state, and interaction — spanning Bayesian networks, MDPs, POMDPs, reinforcement learning, and multiagent games. All algorithms are implemented in Julia with readability prioritized over runtime efficiency. Serves advanced undergraduates, graduate students, and professionals in CS, engineering, statistics, and operations research.

First pass — the five C's

Category. Textbook / unified pedagogical survey of a research area.

Context. AI and operations research subfields of sequential decision making and probabilistic reasoning. Explicitly builds on: Russell & Norvig (Artificial Intelligence: A Modern Approach, 4th ed.) as the broader AI reference; Sutton & Barto (Reinforcement Learning: An Introduction, 2nd ed.) for the RL foundation; Kochenderfer (Decision Making Under Uncertainty, MIT Press 2015) as a predecessor volume; Morgenstern & von Neumann (Theory of Games and Economic Behavior) for game-theoretic grounding.

Correctness. Load-bearing assumptions are pedagogical rather than empirical: (1) the MDP/POMDP hierarchy is the right abstraction for the problem classes covered; (2) discrete-time formulation is sufficient (continuous-time control theory is explicitly out of scope); (3) multivariable calculus, linear algebra, and basic probability are adequate prerequisites. These assumptions are reasonable given the book's scope but do bound what it covers.

Contributions. - Single-volume treatment connecting probabilistic inference, sequential planning, model-free RL, belief-state planning, and multiagent systems under one organizational framework (four types of uncertainty). - Complete Julia implementations of every algorithm, prioritizing human readability. - Chapter 14 on policy validation covering rare-event simulation, robustness analysis, trade analysis, and adversarial analysis — uncommon in competing texts. - Coverage of Dec-POMDPs and partially observable Markov games, extending multiagent treatment beyond what most RL textbooks include.

Clarity. Consistently structured (summary + exercises per chapter) with Tufte-inspired wide-margin sidenotes for citations and asides; writing is accessible and well-paced for the intended audience, though the breadth (27 chapters) means some topics receive only introductory depth.

Second pass — content

Main thrust: The book progresses through five parts — probabilistic reasoning, sequential MDPs, model-uncertain RL, state-uncertain POMDPs, and multiagent systems — providing mathematical formulations and Julia-implemented algorithms for each, organized around the principle of maximum expected utility.

Supporting evidence: - Part I (Ch. 2–6): Bayesian networks, exact and approximate inference (variable elimination, belief propagation, Gibbs sampling, particle methods), parameter/structure learning, utility theory, decision networks, value of information. - Part II (Ch. 7–14): Exact MDP solvers (policy iteration, value iteration, LP formulation), approximate value functions (nearest neighbor through neural regression), online planning (MCTS, heuristic search), policy search (cross-entropy, evolution strategies), policy gradient (REINFORCE, reward-to-go, PPO-style clamped surrogate), actor-critic, and policy validation. - Part III (Ch. 15–18): Bandit exploration, model-based RL (Bayes-adaptive MDPs, posterior sampling), Q-learning, Sarsa, eligibility traces, experience replay, and imitation learning (behavioral cloning through GAIL). - Part IV (Ch. 19–23): Kalman/particle filters, exact POMDP value iteration (alpha vectors), offline approximations (PBVI, sawtooth bounds), online belief-state planning (POMCP, DESPOT-style), and finite state controller abstractions. - Part V (Ch. 24–27): Normal-form games (Nash, correlated equilibria, fictitious play), Markov games, POMGs, Dec-POMDPs with dynamic programming and heuristic search. - Appendices cover: mathematical foundations, probability distributions, complexity theory, neural network architectures, search algorithms, 15 benchmark problems (HexWorld through collaborative predator-prey), and a Julia language primer.

Figures & tables: The preface states figures, examples, and exercises are provided per chapter; Tufte-style layout with sidenotes is used throughout. From the excerpt alone, axis labels, error bars, and statistical significance of figures cannot be assessed — the body of algorithmic content dominates over empirical plots.

Follow-up references: - Sutton & Barto, Reinforcement Learning: An Introduction (2nd ed.) — the standard RL reference this book assumes familiarity with or complements. - Russell & Norvig, Artificial Intelligence: A Modern Approach (4th ed.) — broader AI context the book situates itself against. - Kochenderfer, Decision Making Under Uncertainty (MIT Press, 2015) — predecessor volume with more focused POMDP depth, cited directly for collision avoidance application. - Hillier, Introduction to Operations Research — cited for OR foundations including dynamic and linear programming underpinnings.

Third pass — critique

Implicit assumptions: - Discrete-time Markov structure is assumed throughout; problems with continuous-time dynamics or non-Markovian history dependence are out of scope without acknowledgment of when this is limiting. - Tabular or low-dimensional state spaces are implicitly assumed for many exact methods; scalability gaps to high-dimensional continuous spaces are noted but not quantified. - Julia readability is assumed to transfer to understanding — readers unfamiliar with Julia may find the code appendix insufficient for practical implementation. - The maximum expected utility framework is adopted without deep treatment of its behavioral critiques (prospect theory, ambiguity aversion), though irrationality is briefly noted in Ch. 6.

Missing context or citations: - No engagement with Bertsekas (Dynamic Programming and Optimal Control) or Puterman (Markov Decision Processes), the canonical technical references in this space. - Deep RL at scale (DQN, A3C, SAC) appears only partially through actor-critic and experience replay; no discussion of sample efficiency benchmarks or Atari/MuJoCo results that define the empirical landscape. - Constrained MDPs and safe RL are absent from the table of contents despite being active research areas with practical safety relevance. - Model-based deep RL (Dyna, World Models, MuZero) is not covered.

Possible experimental / analytical issues: - As a textbook, no original empirical claims are made, so standard experimental critique does not apply. However, the stated appendix benchmark problems (e.g., HexWorld, Cart-Pole, Mountain Car) are illustrative only — no algorithm comparison tables or convergence plots appear to be provided, making it hard to build intuition for when one method outperforms another. - The "interpretability over efficiency" design philosophy of the Julia code means timing or scaling results are not presented; practitioners cannot gauge real-world feasibility from the implementations. - No discussion of hyperparameter sensitivity or practical tuning guidance for the implemented algorithms.

Ideas for future work: - A companion empirical chapter benchmarking the implemented algorithms head-to-head on the appendix problems would make algorithm selection guidance concrete. - Extension to continuous-time formulations (stochastic differential equations, Hamilton-Jacobi-Bellman) would close the gap with control theory audiences. - A chapter on constrained/safe MDPs (CMDPs, shielding, Lagrangian methods) would address the growing safety-critical deployment context the introduction motivates. - Scaling discussions connecting the tabular methods to deep function approximation (e.g., how PBVI connects to deep POMDP solvers) would bridge the gap to current research practice.

Methods

Bayesian networks
value iteration
policy iteration
Monte Carlo tree search
Q-learning
policy gradient
actor-critic
particle filter
Kalman filter
POMDP solvers
imitation learning
inverse reinforcement learning
evolutionary strategies
cross-entropy method

Claims

Decision making under uncertainty can be systematically addressed through four categories of uncertainty: outcome, model, state, and interaction uncertainty.
Markov decision processes (MDPs) and partially observable MDPs (POMDPs) provide rigorous mathematical frameworks for sequential decision problems.
Reinforcement learning algorithms enable agents to learn optimal behavior without requiring a fully known model by balancing exploration and exploitation.
Multiagent decision problems can be formalized via Markov games, partially observable Markov games, and decentralized POMDPs.
The maximum expected utility principle, grounded in utility theory, forms the foundation for rational decision making under uncertainty.