ML without tears

Author: Lorenzo Maggi

Understanding PPO from first principles

Proximal Policy Optimization (PPO) algorithm is arguably the default choice in modern reinforcement learning (RL) libraries. In this post we understand how to derive PPO from first principles. First, we brush up our memory on the underlying Markov Decision Process (MDP) model. 1. Preliminaries on Markov Decision Process (MDP) In an MDP, an agent (say,…

November 17, 2025
The plumber’s secret weapon: The Divergence Theorem

This post explores the Gauss’s divergence theorem through intuitive and visual reasoning. To engage the reader’s imagination, we use water flux as our running example, although the reasoning applies to any vector field, e.g., electric, magnetic, heat or gravity field. Moreover, to keep things simple we work on the two dimensions, although the same principles…

January 5, 2025
Lagrangian multipliers, normal cones and KKT optimality conditions

We consider constrained optimization problems of the kind: where the feasibility region is a polytope, i.e., is the set of such that: where are real matrices of size and , respectively, and are column vectors. Equivalently, we can rewrite (1) as: where are the -th row of and , respectively, and denotes the scalar product. In this post we…

November 6, 2024
Get real! Solving a complex-value linear system using real arithmetic

Consider a linear system of complex-valued equations of the kind , where are complex square and rectangular matrices, respectively, and is the unknown complex matrix. To compute , a natural option is to use any algorithm available in the literature initially conceived for real systems, such as Gauss elimination, LU/Cholesky decomposition, and translate its operations to complex arithmetic. However, there may…

October 14, 2024
Oldies but goldies: MMSE estimator

In signal processing, a classic problem consists in estimating a signal, in the form of a complex column vector , by observing a related signal , which has been produced by multiplying the unknown by a known matrix and adding noise : 1. Assumptions The random signal and noise are independent of each other, Gaussian…

June 24, 2024
A from-scratch implementation of Kolmogorov-Arnold Networks (KAN)…and MLP

Kolmogorov-Arnold networks (KAN) are generating significant interest in the AI community due to their potential for accuracy and interpretability. We implement KAN (and MLPs, incidentally) from scratch in simple Python (no Torch / TensorFlow). The code is available on GitHub.

May 15, 2024
Backpropagation

Chain rule to the rescue! To train a deep neural network (DNN), one needs to repeatedly compute the gradient of the empirical loss with respect to all the DNN parameters . Since their number can be huge, it is crucial that the procedure for gradient computation grows “slowly” with . Back-propagation addresses this successfully by exploiting shrewdly…

March 5, 2024
Fifty (four, actually) shades of conformal prediction

In this post we review different methods to compute prediction intervals, containing the next (unknown) observation with high probability and being at the heart of Conformal Prediction (CP). We will highlight that each method is characterized by a different and non-trivial trade-off between computational complexity, coverage properties and the size of the prediction interval. Scenario. We are…

February 4, 2024
Conformalized quantile regression

Pimp quantile regression with strong coverage guarantees Suppose that we are given a historical dataset containing samples of the form , where and are the -th realizations of (predictor) variable and of (predicted) variable , respectively. As a running example, let us consider the following dataset: Our goal #1 is to estimate the trend of variable…

January 17, 2024
Quantile regression

An expressive and robust alternative to least square For regression problems, least square regression (LSR) arguably gets the lion share of data scientists’ attention. The reasons are several: LSR is taught in virtually every introductory statistics course, it is intuitive and is readily available in most of software libraries. LSR estimates the mean of the predicted variable…

January 3, 2024
Retirement, Stopping times and Bandits: The Gittins index

“A colleague of high repute asked an equally well-known colleague:— What would you say if you were told that the multi-armed bandit problem had been solved?— Sir, the multi-armed bandit problem is not of such a nature that it can be solved.” Peter Whittle In our busy daily life, while multi-tasking we are constantly faced…

November 24, 2023
Make any predictor uncertainty-aware via conformal prediction

The importance of being uncertainty-aware. When making a prediction (in a regression or a classification setting) based on observed inputs , it is often important to know which range the true value will fall in, with high confidence. For instance, a trader would be interested in knowing within which boundaries the stock price remains, rather than a…

November 15, 2023
The geometry of Gaussian variables

Gaussian (or normal) variables are all around the place. Their expressive power is certified by the Central Limit Theorem, stating that the mean of independent (and not necessarily Gaussian!) random variables tends to a Gaussian variable. And even when a variable is definitely not Gaussian, it is sometimes convenient to approximate it as one, via Laplace…

November 6, 2023
Is Reinforcement Learning all you need?

When attacking a new problem, the algorithm designer typically follows 3 main steps: When reporting her/his work, the algorithm designer will proudly focus on step 3), briefly mention 2) and likely sweep 1) under the carpet. Yet, skimming alternatives off is a crucial step, that inevitably impacts (positively or negatively) months of hard work on…

October 27, 2023
Policy gradient for black-box optimization

Policy gradient method are widely used in the Reinforcement Learning settings. In this post we build policy gradient from the ground up, starting from the easier static scenario first, where we maximize a reward function depending solely on our control variable . In subsequent posts, we will turn our attention to the contextual bandit setting,…

October 26, 2023