Kolmogorov Complexity and the Invariance Theorem

May 5, 2021


It is a truth universally acknowledged that any good scientific theory is necessarily computable. But, how should we make sense of the historical effectiveness of Occam’s razor otherwise known as the unreasonable effectiveness of mathematics in the natural sciences?

In order to make progress in this direction, we first need to clarify what we mean by the simplicity of a computer program.

Conditional Kolmogorov Complexity:

Let’s define the space of computer programs \(P:=\{0,1\}^*\). The conditional Kolmogorov Complexity of a string \(y \in P\) relative to \(x \in P\) is defined as the length of the shortest program \(f\) expressed in a Turing-complete language \(\mathcal{L}\) which when given \(x\) as input, outputs \(y\):

\begin{equation} \forall x,y \in P, K_{\mathcal{L}}(y \lvert x) = \min \{|f|:f \in P \land \mathcal{A}(f \circ x) = y\} \end{equation}

where \(\mathcal{A}\) represents a Universal Turing Machine such as the Babbage Analytical engine.

In particular, if \(x\) is the empty string \(\epsilon\) such that \(\lvert \epsilon \rvert = 0\):

\begin{equation} \forall y \in P, K_{\mathcal{L}}(y) := K_{\mathcal{L}}(y | \epsilon) \end{equation}

which we simply call the Kolmogorov Complexity of \(y\).

Existence and uniqueness of the shortest program:


Given the description language \(\mathcal{L}\), for any program \(x \in P\) we may define the countable set:

\begin{equation} F = \{f \in P: \mathcal{A}(f)=x\} \end{equation}

and program length \(\lvert \cdot \rvert\) induces a partial order on \(F\) since for any \(f,g \in F\) we have:

\begin{equation} 0 \leq |f| \leq |g| \quad \text{XOR} \quad 0 \leq |g| < |f| \end{equation}


\begin{equation} \forall x,y \in P, x \neq y \implies \mathcal{A}(p_x) \neq \mathcal{A}(p_y) \end{equation}

where \(p_x\) and \(p_y\) are the shortest programs that output \(x\) and \(y\) respectively.

Given the existence and uniqueness of the shortest program sought by the Kolmogorov Complexity operator, we may conclude that \(K_{\mathcal{L}}(x)\) is well-defined.

Kolmogorov Complexity is not computable:

Let’s suppose that \(K_{\mathcal{L}}(\cdot)\) is computable. For any program \(f\) we would be able to determine whether \(f\) halts on the input \(x\) by computing:

\begin{equation} K_{\mathcal{L}}(f \circ x | x) \end{equation}

So we would be able to solve the halting problem.

Kolmogorov Complexity is limit-computable:

For any given string \(x \in P\), we may define the auxiliary constant \(M = \lvert x \rvert + \lvert \mathcal{L}\rvert\) where \(\lvert \mathcal{L} \rvert\) is the size of the description language \(\mathcal{L}\). Now, if we define the set of computer programs \(P_n \subset {0,1}^M\) which are time-bounded by \(n\) operations then we may define the sequence:

\begin{equation} \phi_n(x) = \{|p|: p \in P_n \land \mathcal{A}(p) = x\} \end{equation}

\begin{equation} \phi_{n+1}(x) \leq \phi_n(x) \end{equation}

and we may deduce that:

\begin{equation} \lim_{n \to \infty} \phi_n(x) \sim K_{\mathcal{L}}(x) \end{equation}

The Invariance Theorem or the fundamental theorem of Algorithmic Information Theory:

Given two Turing-complete languages \(\mathcal{L}\) and \(\mathcal{L}'\),

\begin{equation} \exists c \in \mathbb{N} \forall x \in P, \lvert K_{\mathcal{L}}(x) - K_{\mathcal{L’}}(x) \rvert \leq c \end{equation}

which implies that:

\begin{equation} \forall x \in P, K_{\mathcal{L}}(x) \sim K_{\mathcal{L’}}(x) \end{equation}


Let’s suppose \(p_x\) is the shortest program expressible in \(\mathcal{L}\) that outputs \(x\) and \(c\) is the size of the virtual machine \(V\)(expressible in \(\mathcal{L}'\)) that simulates any program expressible in \(\mathcal{L}\). Then we may conclude that:

\begin{equation} K_{\mathcal{L’}}(x) \leq K_{\mathcal{L}}(x) + c \end{equation}

so we have:

\begin{equation} \lvert K_{\mathcal{L}}(x) - K_{\mathcal{L’}}(x) \rvert \leq c \end{equation}


\(1.\) A. N. Kolmogorov Three approaches to the quantitative definition of information. Problems of Information and Transmission, 1(1):1–7, 1965

\(2.\) G. J. Chaitin On the length of programs for computing finite binary sequences: Statistical considerations. Journal of the ACM, 16(1):145–159, 1969.

\(3.\) R. J. Solomonoff A formal theory of inductive inference: Parts 1 and 2. Information and Control, 7:1–22 and 224–254, 1964.

\(4.\) Paul Vitányi. How Incomputable Is Kolmogorov Complexity?. Entropy. 2020.

Kolmogorov Complexity and the Invariance Theorem - May 5, 2021 - Pauli Space