# Learnability and PAC-learnability

April 23, 2021

## Consistent learning algorithms:

Given the spaces \(X\), \(Y\), \(Z=X \times Y\) a hypothesis space \(H\) is a space of functions:

\begin{equation} \forall h \in H, h: X \rightarrow Y \end{equation}

and a learning algorithm \(\mathcal{A}\) over \(H\) is a computable map from \(Z' \subset Z\) to \(H\).

Now, let’s define a loss function:

\begin{equation} \mathcal{L}: Y \times Y \rightarrow \mathbb{R}_+ \end{equation}

which may be squared loss for example.

For a given distribution \(\rho\) on \(Z\), the expected risk of a hypothesis \(h \in H\) is:

\begin{equation} \mathcal{E}(h) := \mathbb{E}_{\rho}[\mathcal{L}(h(x),y)] \end{equation}

where given a learning algorithm \(\mathcal{A}\), ex. empirical risk minimisation, and \(S_n= \{(x_i,y_i)\}_{i=1}^n \sim \rho^n\) we have:

\begin{equation} h_n = \mathcal{A}(S_n) \end{equation}

It follows that if we define the optimal risk:

\begin{equation} \mathcal{E^*}[H] := \inf_{h \in H} \mathcal{E}(h) \end{equation}

then \(\mathcal{A}\) is consistent if \(\mathcal{E}(h_n)\) converges to \(\mathcal{E^*}[H]\):

\begin{equation} \forall \epsilon, \delta > 0 \exists N \in \mathbb{N} \forall n \geq N, P_{\rho^n}[\mathcal{E}(h_n)-\mathcal{E^*}[H] \geq \epsilon] < \delta \end{equation}

## Sample complexity:

The sample complexity of a consistent algorithm \(\mathcal{A}\) is the smallest integer \(N\), i.e. smallest number of samples, as a function of \(\rho, \epsilon, \delta\).

## Learnability:

\(H\) is learnable if there exists a consistent algorithm \(\mathcal{A}\) with finite sample complexity:

\begin{equation} \forall \epsilon, \delta > 0, N(\rho, \epsilon, \delta) < \infty \end{equation}

## Probably-Approximately-Correct(PAC) learnability:

\(H\) is PAC-learnable if \(N(\rho,\epsilon,\delta)\) is polynomial in \(\frac{1}{\epsilon}\) and \(\frac{1}{\delta}\).

## References:

\(1.\) Shai Shalev-Shwartz and Shai Ben-David. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press. 2014.

\(2.\) Kakade, Sham (2003), On the Sample Complexity of Reinforcement Learning (PDF), PhD Thesis, University College London: Gatsby Computational Neuroscience Unit.

\(3.\) Leslie Valiant. Probably Approximately Correct: Nature’s Algorithms for Learning and Prospering in a Complex World. Basic Books. 2013.