Author: Lorenzo Maggi
-
Understanding PPO from first principles
Proximal Policy Optimization (PPO) algorithm is arguably the default choice in modern reinforcement learning (RL) libraries. In this post we understand how to derive PPO from first principles. First, we brush up our memory on the underlying Markov Decision Process (MDP) model. 1. Preliminaries on Markov Decision Process (MDP) In an MDP, an agent (say,…
-
The plumber’s secret weapon: The Divergence Theorem
This post explores the Gauss’s divergence theorem through intuitive and visual reasoning. To engage the reader’s imagination, we use water flux as our running example, although the reasoning applies to any vector field, e.g., electric, magnetic, heat or gravity field. Moreover, to keep things simple we work on the two dimensions, although the same principles…
-
Lagrangian multipliers, normal cones and KKT optimality conditions
We consider constrained optimization problems of the kind: where the feasibility region is a polytope, i.e., is the set of such that: where are real matrices of size and , respectively, and are column vectors. Equivalently, we can rewrite (1) as: where are the -th row of and , respectively, and denotes the scalar product. In this post we…
-
Get real! Solving a complex-value linear system using real arithmetic
Consider a linear system of complex-valued equations of the kind , where are complex square and rectangular matrices, respectively, and is the unknown complex matrix. To compute , a natural option is to use any algorithm available in the literature initially conceived for real systems, such as Gauss elimination, LU/Cholesky decomposition, and translate its operations to complex arithmetic. However, there may…
-
Oldies but goldies: MMSE estimator
In signal processing, a classic problem consists in estimating a signal, in the form of a complex column vector , by observing a related signal , which has been produced by multiplying the unknown by a known matrix and adding noise : 1. Assumptions The random signal and noise are independent of each other, Gaussian…
-
A from-scratch implementation of Kolmogorov-Arnold Networks (KAN)…and MLP
Kolmogorov-Arnold networks (KAN) are generating significant interest in the AI community due to their potential for accuracy and interpretability. We implement KAN (and MLPs, incidentally) from scratch in simple Python (no Torch / TensorFlow). The code is available on GitHub.
-
Backpropagation
Chain rule to the rescue! To train a deep neural network (DNN), one needs to repeatedly compute the gradient of the empirical loss with respect to all the DNN parameters . Since their number can be huge, it is crucial that the procedure for gradient computation grows “slowly” with . Back-propagation addresses this successfully by exploiting shrewdly…
-
Fifty (four, actually) shades of conformal prediction
In this post we review different methods to compute prediction intervals, containing the next (unknown) observation with high probability and being at the heart of Conformal Prediction (CP). We will highlight that each method is characterized by a different and non-trivial trade-off between computational complexity, coverage properties and the size of the prediction interval. Scenario. We are…
-
Conformalized quantile regression
Pimp quantile regression with strong coverage guarantees Suppose that we are given a historical dataset containing samples of the form , where and are the -th realizations of (predictor) variable and of (predicted) variable , respectively. As a running example, let us consider the following dataset: Our goal #1 is to estimate the trend of variable…
-
Quantile regression
An expressive and robust alternative to least square For regression problems, least square regression (LSR) arguably gets the lion share of data scientists’ attention. The reasons are several: LSR is taught in virtually every introductory statistics course, it is intuitive and is readily available in most of software libraries. LSR estimates the mean of the predicted variable…
-
Retirement, Stopping times and Bandits: The Gittins index
“A colleague of high repute asked an equally well-known colleague:— What would you say if you were told that the multi-armed bandit problem had been solved?— Sir, the multi-armed bandit problem is not of such a nature that it can be solved.” Peter Whittle In our busy daily life, while multi-tasking we are constantly faced…
-
Make any predictor uncertainty-aware via conformal prediction
The importance of being uncertainty-aware. When making a prediction (in a regression or a classification setting) based on observed inputs , it is often important to know which range the true value will fall in, with high confidence. For instance, a trader would be interested in knowing within which boundaries the stock price remains, rather than a…
-

The geometry of Gaussian variables
Gaussian (or normal) variables are all around the place. Their expressive power is certified by the Central Limit Theorem, stating that the mean of independent (and not necessarily Gaussian!) random variables tends to a Gaussian variable. And even when a variable is definitely not Gaussian, it is sometimes convenient to approximate it as one, via Laplace…
-

Is Reinforcement Learning all you need?
When attacking a new problem, the algorithm designer typically follows 3 main steps: When reporting her/his work, the algorithm designer will proudly focus on step 3), briefly mention 2) and likely sweep 1) under the carpet. Yet, skimming alternatives off is a crucial step, that inevitably impacts (positively or negatively) months of hard work on…
-
Policy gradient for black-box optimization
Policy gradient method are widely used in the Reinforcement Learning settings. In this post we build policy gradient from the ground up, starting from the easier static scenario first, where we maximize a reward function depending solely on our control variable . In subsequent posts, we will turn our attention to the contextual bandit setting,…