My general interest in single-motivation theories stems from the belief that a common ancestor for all multi-cellular organisms might imply common principles of intelligent behaviour. It’s a somewhat reductive hypothesis and as I argued last week, some of these theories might be too reductive, but I think it’s a useful working hypothesis that can take behavioural scientists very far1. However, until recently I wasn’t properly acquainted with the free energy principle which, from a distance, appears to be one of the more plausible single-motivation theories.

The free energy principle is a theory developed by Karl Friston and others to explain how biological systems tend to avoid disorder by limiting themselves to a small number of favorable states. It comes across as a rather abstract mathematical theory but thanks to a critical thought experiment proposed by Romain Brette I found an opportunity to take a closer look at this theory. In fact, I promised Brette that I would run a computer simulation demonstrating that his thought experiment rests upon flawed assumptions(code here).

In this context, the goal of this blog post is to explain the main idea of the free energy principle and dissect Romain Brette’s thought experiment in order to develop a practical understanding of this theory.

The Free Energy Principle:

In [1], Karl Friston proposes that the Free Energy principle may be a rough guide to the brain and makes the following points:

  1. The free energy principle basically applies to any biological system that resists a tendency to disorder.
  2. The free energy principle rests upon the fact that self-organising biological systems resist a tendency to disorder and therefore minimise entropy of their sensory states.
  3. Assuming that corresponds to a generative model describing the biological system and refers to the system’s sensory states, under ergodic assumptions, the entropy is:

Now, given that entropy is the long-term average of surprise(think of a monte carlo simulation), agents must avoid surprising states where surprise is defined relative to homeostatic conditions of that particular organism2.

The three points above are sufficient to understand Romain Brette’s thought experiment though I must emphasise that surprisal here is defined in terms of the agent’s homeostatic conditions so minimisation of surprisal corresponds to both minimisation of epistemic uncertainty(i.e. unknown unkowns) as well as statistical uncertainty(i.e. known unknowns).

Romain Brette’s thought experiment:

In Brette’s article, he summarises the free energy principle in the following manner:

The free energy principle is the theory that the brain manipulates a probabilistic generative model of its sensory inputs, which it tries to optimise by either changing the model(learning) or changing the inputs(by acting).

Although I haven’t mentioned anything about the human brain so far, this is a relatively good summary, and Brette proceeds with the following food vs. no food thought experiment:

  1. An agent has two kinds of observations/stimuli: food and the absence of food.
  2. This agent has two possible actions: seek food or don’t seek food.
  3. When the agent seeks food there’s a 20% probability of getting food.
  4. When the agent doesn’t seek food there’s a 100% probability of getting no food.

What should a surprise minimising agent do? Romain presents the following argument:

What does the free energy principle tell us? To minimize surprise, it seems clear that I should sit: I am certain to not see food. No surprise at all. The proposed solution is that you have a prior expectation to see food. So to minimize the surprise, you should put yourself into a situation where you might see food, ie to seek food. This seems to work. However, if there is any learning at all, then you will quickly observe that the probability of seeing food is actually 20%, and your expectations should be adjusted accordingly. Also, I will also observe that between two food expeditions, the probability to see food is 0%. Once this has been observed, surprise is minimal when I do not seek food. So, I die of hunger. It follows that the free energy principle does not survive Darwinian competition.

Basically, Romain argues that surprise is minimal when the organism doesn’t seek food assuming that Friston’s definition of surprisal corresponds to minimisation of statistical uncertainty. Given that Friston’s surprisal is defined in terms of the agent’s homeostatic conditions, this assumption is precisely where Romain’s analysis breaks down. It also helps to simulate such toy problems on a computer, if possible, because in a simulation you have to make every modelling assumption clear.

A reasonable model of Brette’s problem:

To simulate Romain’s problem, I made the following assumptions:

  1. We have an organism which has to eat times on average in the last 24 hours and can eat at most once per hour.
  2. The homeostatic conditions of our organism are given by a Gaussian distribution centered at with unit variance, a Gaussian food critic if you will. This specifies that our organism should’t eat much less than times a day and shouldn’t eat a lot more than times a day. In fact, this explains why living organisms tend to have masses that are normally distributed during adulthood.
  3. A food policy consists of a 24-dimensional vector where the values range from 0.0 to 1.0 and we want to maximise the negative log probability that the total consumption is drawn from the Gaussian food critic.
  4. Food policies are the output of a generative neural network(setup using TensorFlow) whose inputs are either one or zero to indicate a survival prior, with one indicating a preference for survival.
  5. The backpropagation algorithm, in this case Adagrad [5], functions as a homeostatic regulator by updating the network with variations in the network weights proportional to the negative logarithmic loss(i.e. surprisal).

Assuming , I ran a simulation in the following notebook and found that the discovered food policy differs significantly from Romain’s expectation that the agent would choose to not look for food in order to minimise surprisal. In fact, our simple agent manages to get three meals per day on average so it survives.

Overall, this is a relatively simple problem with a fixed prior(i.e. fixed belief) as the organism doesn’t have to do more than eat. So I can minimise surprise directly but in general, if we have adjustable beliefs(ex. models of physics and their physical parameters/constants) then we have a much harder problem and that’s where I would need to use the KL-divergence and invoke free energy minimisation, rather than directly minimising surprisal. However, these models and their parameters would still be evaluated with respect to homeostatic constraints. This guarantees that the organism isn’t simply trying to minimise statistical uncertainty.


Until recently, the Free Energy Principle has been a constant source of mockery from neuroscientists who misunderstood it and so I hope that by growing a collection of free-energy motivated reinforcement learning examples on Github we may finally have a constructive discussion between scientists. Moreover, I have been asked whether it’s not immodest for Karl Friston to suggest that his theory might be a model for human behaviour. Well, my answer to that question is the same answer I would give to the critics of Empowerment[7].

Let’s see how far ingenious implementations(i.e. experiments) using these formalisms can take us. That’s the only way we’ll know what the limitations of these theories are.


  1. The free-energy principle: a rough guide to the brain? (K. Friston. 2009.)
  2. The Markov blankets of life: autonomy, active inference and the free energy principle (M. Kirchhoff, T. Parr, E. Palacios, K. Friston and J. Kiverstein. 2018.)
  3. Free-Energy Minimization and the Dark-Room Problem (K. Friston, C. Thornton and A. Clark. 2012.)
  4. What is computational neuroscience? (XXIX) The free energy principle (R. Brette. 2018.)
  5. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization (J. Duchi, )
  6. Empowerment — An Introduction. C. Salge et al. 2013.
  7. Reward, Motivation, and Reinforcement Learning (P. Dayan and B. Balleine. 2002.)


  1. The notion of utility maximisation in economics, though limited, has been very useful for example. 

  2. In [2], homeostatic conditions of an organism are defined in terms of Markov Blankets which are equivalent to the boundaries of a system in a statistical sense. I would encourage the reader to go into that paper after going through this blog post but this concept isn’t essential for understanding Romain’s thought experiment, so we’ll ignore this formalism for now.