# Learnability and PAC-learnability

April 23, 2021

## Consistent learning algorithms:

Given the spaces $$X$$, $$Y$$, $$Z=X \times Y$$ a hypothesis space $$H$$ is a space of functions:

\begin{equation} \forall h \in H, h: X \rightarrow Y \end{equation}

and a learning algorithm $$\mathcal{A}$$ over $$H$$ is a computable map from $$Z' \subset Z$$ to $$H$$.

Now, let’s define a loss function:

\begin{equation} \mathcal{L}: Y \times Y \rightarrow \mathbb{R}_+ \end{equation}

which may be squared loss for example.

For a given distribution $$\rho$$ on $$Z$$, the expected risk of a hypothesis $$h \in H$$ is:

\begin{equation} \mathcal{E}(h) := \mathbb{E}_{\rho}[\mathcal{L}(h(x),y)] \end{equation}

where given a learning algorithm $$\mathcal{A}$$, ex. empirical risk minimisation, and $$S_n= \{(x_i,y_i)\}_{i=1}^n \sim \rho^n$$ we have:

\begin{equation} h_n = \mathcal{A}(S_n) \end{equation}

It follows that if we define the optimal risk:

\begin{equation} \mathcal{E^*}[H] := \inf_{h \in H} \mathcal{E}(h) \end{equation}

then $$\mathcal{A}$$ is consistent if $$\mathcal{E}(h_n)$$ converges to $$\mathcal{E^*}[H]$$:

\begin{equation} \forall \epsilon, \delta > 0 \exists N \in \mathbb{N} \forall n \geq N, P_{\rho^n}[\mathcal{E}(h_n)-\mathcal{E^*}[H] \geq \epsilon] < \delta \end{equation}

## Sample complexity:

The sample complexity of a consistent algorithm $$\mathcal{A}$$ is the smallest integer $$N$$, i.e. smallest number of samples, as a function of $$\rho, \epsilon, \delta$$.

## Learnability:

$$H$$ is learnable if there exists a consistent algorithm $$\mathcal{A}$$ with finite sample complexity:

\begin{equation} \forall \epsilon, \delta > 0, N(\rho, \epsilon, \delta) < \infty \end{equation}

## Probably-Approximately-Correct(PAC) learnability:

$$H$$ is PAC-learnable if $$N(\rho,\epsilon,\delta)$$ is polynomial in $$\frac{1}{\epsilon}$$ and $$\frac{1}{\delta}$$.

$$1.$$ Shai Shalev-Shwartz and Shai Ben-David. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press. 2014.

$$2.$$ Kakade, Sham (2003), On the Sample Complexity of Reinforcement Learning (PDF), PhD Thesis, University College London: Gatsby Computational Neuroscience Unit.

$$3.$$ Leslie Valiant. Probably Approximately Correct: Nature’s Algorithms for Learning and Prospering in a Complex World. Basic Books. 2013.