machine learning

Variational Inference

July 6, 2022

This is a primer on variational inference in machine learning, based on sections of Jordan et al. (An Introduction to Variational Methods for Graphical Models; 1999). I go over the mathematical forms of variational inference, and I include a discussion on what it means for something to be “variational.” I hope this conveys a bit of the generating ideas that give rise to the various forms of variational inference. …

machine-learning variational-ml

Modular Neural Networks

April 13, 2021

$$ \newcommand{\0}{\mathrm{false}} \newcommand{\1}{\mathrm{true}} \newcommand{\mb}{\mathbb} \newcommand{\mc}{\mathcal} \newcommand{\mf}{\mathfrak} \newcommand{\and}{\wedge} \newcommand{\or}{\vee} \newcommand{\a}{\alpha} \newcommand{\s}{\sigma} \newcommand{\t}{\theta} \newcommand{\T}{\Theta} \newcommand{\D}{\Delta} \newcommand{\o}{\omega} \newcommand{\O}{\Omega} \newcommand{\x}{\xi} \newcommand{\z}{\zeta} \newcommand{\fa}{\forall} \newcommand{\ex}{\exists} \newcommand{\X}{\mc{X}} \newcommand{\Y}{\mc{Y}} \newcommand{\Z}{\mc{Z}} \newcommand{\P}{\Psi} \newcommand{\y}{\psi} \newcommand{\p}{\phi} \newcommand{\l}{\lambda} \newcommand{\B}{\mb{B}} \newcommand{\m}{\times} \newcommand{\E}{\mb{E}} \newcommand{\H}{\mb{H}} \newcommand{\I}{\mb{I}} \newcommand{\R}{\mb{R}} \newcommand{\e}{\varepsilon} \newcommand{\set}[1]{\left\{#1\right\}} \newcommand{\par}[1]{\left(#1\right)} \newcommand{\abs}[1]{\left\lvert#1\right\rvert} \newcommand{\inv}[1]{{#1}^{-1}} \newcommand{\ceil}[1]{\left\lceil#1\right\rceil} \newcommand{\dom}[2]{#1_{\mid #2}} \newcommand{\df}{\overset{\mathrm{def}}{=}} \newcommand{\M}{\mc{M}} \newcommand{\up}[1]{^{(#1)}} $$ I wrote up these notes in preparation for my guest lecture in Tom Dean’s Stanford course, CS379C: Computational Models of the Neocortex. Selected papers Towards Modular Algorithm Induction (Abolafia et al.…

machine learning

Variational Solomonoff Induction

February 18, 2021

$$ \newcommand{\mb}{\mathbb} \newcommand{\mc}{\mathcal} \newcommand{\E}{\mb{E}} \newcommand{\B}{\mb{B}} \newcommand{\R}{\mb{R}} \newcommand{\kl}[2]{D_{KL}\left(#1\ \| \ #2\right)} \newcommand{\argmin}[1]{\underset{#1}{\mathrm{argmin}}\ } \newcommand{\argmax}[1]{\underset{#1}{\mathrm{argmax}}\ } \newcommand{\abs}[1]{\left\lvert#1\right\rvert} \newcommand{\set}[1]{\left\{#1\right\}} \newcommand{\ve}{\varepsilon} \newcommand{\t}{\theta} \newcommand{\T}{\Theta} \newcommand{\o}{\omega} \newcommand{\O}{\Omega} \newcommand{\sm}{\mathrm{softmax}} $$ The free energy principle is a variational Bayesian method for approximating posteriors. Can free energy minimization combined with program synthesis methods from machine learning tractably approximate Solomonoff induction (i.e. universal inference)? In these notes, I explore what the combination of these ideas looks like. Machine learning I want to make an important clarification about “Bayesian machine learning”.…

free-energy machine-learning variational-ml