\documentclass[a4paper, twoside, 12pt, fleqn]{article}
\linespread{1.4}% interlinea: col valore 1.3 interlinea 1 e 1/2; col valore 1.6 interlinea 2
\usepackage[english]{babel}
\usepackage{graphicx} %questo permette di usare files grafici vari. Meglio non specificare il dirver
%\usepackage{epsf} %questo permette di usare i grafici EPS
\usepackage{latexsym} %questo serve per alcuni simboli pi comuni
\usepackage{amsfonts}
\usepackage{float}
\restylefloat{figure} %questo e il precedente permette di dire ``H'' (qui, assolutamente!) per le immagini
\pagestyle{myheadings} %questo dice di mettere nell'header solo i numeri di pagina, e niente footer. Si pu anche personalizzare l'header
\setlength{\parindent}{0pt} %questo riduce a zero il rientro iniziale
\usepackage[applemac]{inputenc}%questo comando permette di usare le lettere accentate in ambiente Apple Macintosh
\usepackage{layout}%questo permette di variare parametri di layout, come i quattro seguenti
%\setlength{\voffset}{-3cm}
\setlength{\hoffset}{-1cm}
\addtolength{\textwidth}{2cm}
%\addtolength{\textheight}{3.5cm}
\newtheorem{definizione}{Definizione}[section] %questo permette di usare l'ambiente ``definizione'', in modo da numerare le definizioni in modo corretto
\newtheorem{teorema}{Teorema}[section]
\usepackage[colorlinks=true]{hyperref} %questo permette i link
\begin{document}
\begin{titlepage}
%\begin{center}
%\vspace{1.5 cm}
%\Huge{\textbf{Notes from the lectures by prof Suhov}}\\
%\large{\textbf{Trieste - june 2006}}\\
%\vspace{16 cm}
%\end{center}
%\textbf{Fabio Grazioso}
\title{\textbf{Classical Information Theory} \\Notes from the lectures by prof Suhov\\ Trieste - june 2006}
\author{Fabio Grazioso \and ...}
\maketitle
\end{titlepage}
\tableofcontents
\newpage
\section{Lecture 1, Entropy}
\subsection{Random variable}
Be $X$ a random variable, and
\begin{equation}
\mathbb{P}(X=x_{i})
\end{equation}
be the probability that it's value is $x_{i}$.
In some cases, for brevity we will write
\begin{equation}
\mathbb{P}(X=x_{i}) \equiv p(X).
\end{equation}
\subsection{Random string}
Let's consider now a random variable that consists in a sequence of random variables. We can say that it's a random string, or a random array.
For example let's say that $X$ it's the outcome of the toss of a coin, so $x_{1} = ``tail''$, and $x_{2} = ``head''$, or $x_{1} = 1$, and $x_{2} = 2$ for simplicity.
Let's toss the coin $m$ times, We will use the following formalism:
\begin{equation} \label{rnd-sring}
X_{1}^{n} \equiv \left(X_{1}, X_{2},\ldots,X_{n}\right)
\end{equation}
We will write the possible outcomes as:
\begin{equation} \label{rnd-sring-out}
X_{1}^{n} = (x_{1}, x_{2}, \ldots, x_{n}) = \left\{
\begin{array}{l}
(0,0,0,\ldots,0)\\
(0,0,0,\ldots,1)\\
\ldots\\
(1,1,1,\ldots,1)
\end{array}
\right.
\end{equation}
and the number of possible outcomes is $2^{n}$ (cfr. appendix (\ref{disp-rep}))
There exist a particular way of ordering these outcomes, called ``lexicographic'', and then several other possible orderings.
If the coin is \emph{fair}, then each
In the example of the coin, each single random variable can take only two walues.
If the coin is unfair, the probability of a string dipends on the number of $1$s (and so on the number of $0$s) that it contains. If we define:
\begin{eqnarray*}
n_{1} \equiv \mathrm{number \ of\ ``}1\mathrm{''\ in\ the\ string}\\
n_{0} \equiv \mathrm{number \ of\ ``}0\mathrm{''\ in\ the\ string}.
\end{eqnarray*}
then the probability of a string to occour is
[]
\subsection{Markow's chains}
Gambling has been the motivation for the progress of the classical probability theory.
Let's now introduce a more general concept, that will be useful afterwards: the Markov's chain.
We have a Markov's chain if the result of a trial depends on the result of the previous:
\begin{equation}
\mathbb{P}(x_{1}^{n}= j_{1}^{n}) = \lambda_{j_{1}} p_{j_{1}j_{2}} p_{j_{2}j_{3}} p_{j_{n-1}j_{n}}, \forall j_{1}^{n} \in I_{m}^{n}
\end{equation}
Let's suppose to have a language, for example a text in an unknown language []
\subsubsection{Stationary Markov chain}
\begin{equation}
H(X) = -\sum_{j=1}^{m} p_{j}\,\, log_{2} p_{j}
\end{equation}
\begin{equation}
H(X) \geq 0
\end{equation}
\begin{equation}
h(X) = \lim_{n \rightarrow \infty} \frac{1}{n} H(X_{1}^{n})
\end{equation}
\begin{equation}
H(X) \leq log_{2}m \,\, \mathrm{attained \,\, off} \,\,\mathbb{P}(X=x_{i})=\frac{1}{m}
\end{equation}
\begin{equation}
H(X_{1}^{n})=-\sum_{x_{1}^{n}}
\end{equation}
\begin{equation}
H(X) = h(X) \equiv -\sum_{j=1}^{m} p_{j} log_{2} p_{j}= -\mathbb{E} \log_{2}p_{X}(X)
\end{equation}
\begin{equation}
x \to log_{2}p_{X}(x) = log_{2} \mathbb{P}(X=x)
\end{equation}
\subsection{Physical interpretation of entropy (t=27' 25'')}
The entropy is defined as
\begin{displaymath}
H = \sum_{j}\,p_{j}\,\,\log_{2}p_{j}.
\end{displaymath}
Now, how can we interpret and memorize the essence of this definition?
Here I sum $p_{j}\,\log p_{j}$. It should remind you a mean value.\\
If I have a random variable, I can call it $\xi$, or I can call it $X$, I can write down the \emph{mean value} of this random variable as:
\begin{displaymath}
\mathbb{E} [\xi] = \sum_{i} x_{i}\, \mathbb{P}(\xi=x_{i})
\end{displaymath}
where $x_{i}$ are the values that the variable can take, and $\mathbb{P}(\xi=x_{i})$ are the respective probabilities.
But I can also write:
\begin{displaymath}
\mathbb{E} [X] = \sum_{i} x_{i}\, \mathbb{P}(X=x_{i})
\end{displaymath}
So you can notice that the way how we denote the random variable it's not important! What is important is that the \emph{possible values it can take} are assigned, and the \emph{probability that they occur} are assigned.
Now, what is the random variable of which the definition of the entropy $-\sum_{j}\,\mathbb{P}(x_{j})\,\,\log_{2}\mathbb{P}(x_{j})$ is the mean value? It is a quite difficult question, but it's difficult psycologically. This is (except for a sign) the expectation value of the random variable ``$log_{2}\mathbb{P}(X)$'', and \textbf{not} of $log_{2}X$ !!
The notation ``$log_{2}\mathbb{P}(X)$'' is selfexplanatory: $\log_{2}\mathbb{P}(X)$ is the (logarithm of) \emph{the probability} that $X$ takes a value!
What is psicologically difficult, here?
We have a random variable, $X$, and it ``produces'' another random variable, that is $\mathbb{P}(X)$ wich is ``the probability of $X$''.
$X$ can take the values $\{x_{1},\,x_{2}, , x_{m}\}$, and each of them has a probability $\{p(x_{1}),\,p(x_{2}), , p(x_{m})\}$. The first is the set of the possible value of $X$, and the other is the set of the possible values of $\mathbb{P}(X)$.
Of course the random variable $\mathbb{P}(X)$ is related related, depends on the random variable $X$. It may seem that we have to know the value that has taken $X$ to define $log_{2}\mathbb{P}(X)$, but it's not so!
We only need to know the expected probability of (all) the possible values that $X$ can take, and this is an ``a priori'' property of the random variable $X$.
In other words we define $\log_{2}\mathbb{P}(X)$ just knowing the probability distribution of the random variable $X$, and we \emph{don't need to know the actual outcomes} of $X$.
If we replace $X$ with another variable, with the same probability distribution, the entropy it's the same. It's like we change the name of the variable in the previous example.
Since to define the mean value of a random variable we have seen that we just need the possible values it takes and their probability, we can write the mean value of $\log_{2}\mathbb{P}(X)$.
(t 32'20'') With this said, is now clear that the entropy is the \emph{expected amount of information which you gain} from this random variable. Because from each value $x_{i}$ you gain an amount of information $\log_{2} p(x_{i})$.
[] that is: the mean value of the variable $- log_{2} (p_{i})$
\subsubsection{Other thoughts about entropy}
Here I write down some thoughts developed by me and Alejandro.
Let's think we have a random variable, with different outcomes:
\begin{displaymath}
X = \left\{
\begin{array}{l}
x_{1} = 1001010001001\\
x_{2} = 101\\
x_{3} = 10100101\\
\ldots\\
x_{n} = 10010.
\end{array}
\right.
\end{displaymath}
The different outcomes have different lengths, and so there is needed a different amount of ``means'' to store or transmit them (amount of memory to store, or amount of ``bandwidth'' to transmit).
Let's say we want to ``encode'' the possible outcomes, by numbering them: instead to store the actual outcome we store a numerical ``label''.
Which is the smartest way to choose this encoding, to save the most space is possible?
The labels themselves have a different length, a different amount of memory (or bandwidth) needed, so the best is to first order the possible outcomes in a \emph{decreasing probability order}, and number the most probable with the shortest numerical label, and the less probable with the longest numerical label!!
If $p_{i}=p(x_{i})=p(X=x_{i})$ is the probability of the outcome $x_{i}$, the ``length of the label'' will be something proportional to the inverse of the probability (the more is the probability, the less is the number):
\begin{displaymath}
\iota(x_{i}) \propto p(x_{i})^{-1}
\end{displaymath}
So, if we use this ``smartest'' way of storing the information, this quantity represents the amount of information ``needed to store the outcome $x_{i}$''. We can say it's the smallest amount of memory with which is possible to store it.
Once we have understood the reason why the information amount is directly proportional to the inverse of the probability, on the notes by Suhov (pag 2) we can also find a reason to use the logarithmic dependance.
The reason is that for two independent events, that is two outcomes of two independent random variables, we expect that the amount of information of the joint event is the sum of the amounts of information of the single events:
\begin{equation}
\iota(x_{i} \land y_{j}) = \iota(x_{i}) + \iota(y_{j})
\end{equation}
and since $\iota(x_{i}) \propto p(x_{i})^{-1}$, we need a continue monotone function $\iota (x_{i}) = \phi[p(x_{i})]$ such that
\begin{displaymath}
\phi[p(x_{i} \land y_{j})] = \phi[p(x_{i})] + \phi[p(y_{j})]
\end{displaymath}
but since the joint probability of independent events is the product of the probabilities of the single events, we have
\begin{displaymath}
\phi[p(x_{i}) p(y_{j})] = \phi[p(x_{i})] + \phi[p(y_{j})]
\end{displaymath}
Another property we wish is that if the event $x_{i}$ has probability $p(x_{i}) = \frac{1}{2}$ we want $\iota(x_{i}) = 1$.
The function that satisfy all those properties is the $\log$. The only arbitrary choice is the base of the $\log$, and we choose $2$ becouse we think at the binary random variables:
\begin{displaymath} \label{information}
\iota(x_{i}) = \log_{2} p(x_{i})^{-1}.
\end{displaymath}
Finally, entropy is the expected value of this quantity, because we sum over all the possible outcomes, and we ``weight'' the sum with the probabilities:
\begin{displaymath} \label{information}
\iota(x_{i}) = \log_{2} p(x_{i})^{-1}
\end{displaymath}
\subsubsection{Properties of entropy}
\begin{equation}
0 \leq H(X) \leq \log_{2}m
\end{equation}
where $m$ is the number of possible outcomes.
\begin{equation}
H(X) = 0 \Leftrightarrow X = const
\end{equation}
$X=const$ means that the probability of one of the outcomes is $1$, and the others are (of course) $0$.
\begin{equation}
H(X) = \log_{2}m \Leftrightarrow X \mathrm{\ is\ equiprobable}
\end{equation}
indeed in this case we have
\begin{eqnarray*}
H(X)& =& -\sum_{x_{i}} p(x_{i}) \log_{2}p(x_{i}) \\
&=& -\sum_{x_{i}} \frac{1}{m} \log_{2}\frac{1}{m}\\
&=& m\,\frac{1}{m}\,\log_{2}m.
\end{eqnarray*}
\subsection{Joint Entropy}
Let's consider a ``string'', or ``vector'' of random variables
\begin{displaymath}
X_{1}^{n}\equiv\left\{X^{(1)}, X^{(2)}, X^{(n)}\right\}.
\end{displaymath}
The Entropy of $X_{1}^{n}$ is
\begin{equation}
H(X_{1}^{n}) = -\sum _{ \left\{x_{i}^{(1)},x_{j}^{(2)},,x_{k}^{(n)}\right\} } p(X^{(1)}, X^{(2)}, X^{(n)})\,\, \log_{2} p(X^{(1)}, X^{(2)}, X^{(n)}).
\end{equation}
For the simple case of a vector of just two random variables $(X,Y)$ we have
\begin{equation}
H(X,Y) = -\sum_{x_{i},y_{j}} p(X, Y)\,\, \log_{2} p(X, Y).
\end{equation}
Here the probability $p(X,Y) \equiv p(X=x_{i},Y=y_{j} $ means the probability that occurs a certain couple of outcomes.
[]
\subsubsection{Properties of the joint entropy}
\begin{equation} \label{joint-property}
H(X) \, ,\, H(Y)\, \leq \, H(X,Y)\, \leq \, H(X) \, + \, H(Y)
\end{equation}
\subsection{Conditional Entropy}
Let's consider now the conditional probability of a couple of variables:
\begin{equation}
p(X|Y).
\end{equation}
The meaning of this probability is ``the probability of the random variable $X$ in a case when the variable $Y$ is given''.
This means that we consider the situation in which we already know the outcome of the r.v. $Y$, since ``the lecture'' of it has happened, and we express the probability of the outcomes of the $X$ r.v.. The point is that the fact that we know the outcome of $Y$ can influence the probability distribution of $X$.
About this kind of probability, let's write down a couple of relations that will be useful in the following:
\begin{enumerate}
\item relation between joint probability and conditional probability:
\begin{equation} \label{prob-cond}
p(X,Y) = p(X|Y)\,p(Y)
\end{equation}
\item a strange way to write a probability:
\begin{equation} \label{nested-sum}
p(Y) = \sum_{x_{i}}p(X,Y)
\end{equation}
\end{enumerate}
This stated, let's consider the following relation:
\begin{equation} \label{cond-entr}
H(X|Y) = H(X,Y) - H(Y).
\end{equation}
This relation is intuitive enough: if we have already known the r.v. $Y$, the entropy, i.e. the expected amount of information is the one of the couple $(X,Y)$, from which we have to subtract the entropy of $Y$.
Is interesting to point out here that in this case the classical theory diverges from the quantum one!
Indeed for the classical case this quantity is strictly positive, because of the previous relation (\ref{joint-property}), but in the classical chase we will see it is not!
This is one of the interesting thing of the quantum information theory.
Let's try to expand the right term of the relation (\ref{cond-entr}):
\begin{displaymath}
H(X,Y) - H(Y) \equiv -\sum_{x_{i},y_{j}} p(X, Y)\,\, \log_{2} p(X, Y) - \sum_{y_{j}} p(Y)\,\, \log_{2} p(Y)
\end{displaymath}
now we use the relation (\ref{nested-sum}):
\begin{eqnarray*}
\phantom{H(X,Y) - H(Y)} &=& -\sum_{x_{i},y_{j}} p(X, Y)\,\, \log_{2} p(X, Y) - \sum_{y_{j}} \left[\sum_{x_{i}} p(X , Y)\right]\,\, \log_{2} p(Y)\\
&=& -\sum_{x_{i},y_{j}} p(X, Y)\,\, \log_{2} p(X, Y) - \sum_{x_{i},y_{j}} p(X , Y)\,\, \log_{2} p(Y)\\
&=& -\sum_{x_{i},y_{j}} p(X, Y)\,\left[ \log_{2} p(X, Y) - \log_{2} p(Y)\right]
\end{eqnarray*}
and a $\log$'s property
\begin{displaymath}
\phantom{H(X,Y) - H(Y)} = -\sum_{x_{i},y_{j}} p(X, Y)\, \log_{2} \frac{p(X, Y)}{ p(Y)}
\end{displaymath}
and the relation (\ref{prob-cond})
\begin{equation} \label{}
\phantom{H(X,Y) - H(Y)} = -\sum_{x_{i},y_{j}} p(X, Y)\, \log_{2} p(X| Y).
\end{equation}
\subsubsection{Analysis of a wrong intuition}
If you state that
\begin{displaymath}
H(X) = \sum_{x_{i}} p(X) \,\log_{2}p(X)
\end{displaymath}
and then that
\begin{displaymath}
H(X,Y) = \sum_{x_{i},y_{j}} p(X,Y) \,\log_{2}p(X,Y)
\end{displaymath}
when you finally arrive at the conditional entropy you could be tempted to write:
\begin{equation}
H(X|Y) = -\sum_{x_{i},y_{j}} p(X|Y) \,\log_{2}p(X|Y).
\end{equation}
Let's see what is this quantity, and in which way it's different form the conditional entropy.
When we write $(X|Y)$ we read it ``X, known Y''. This means that we know that $Y=y_{j}$ and we have the random variable $X$, but $X$ has been conditioned in some how (it's probability distribution has changed) by the knowledge of $Y$.
In some sense, when we write $(X|Y)$ we should think at $X$ as the variable, and at $Y$ as a ``parameter''.
So, let's write
\begin{equation}
\tilde{H}_{y_{j}}(X) = -\sum_{x_{i}} p(X|Y) \,\log_{2}p(X|Y).
\end{equation}
Notice that the sum is only over $x_{i}$, and $Y$ is a parameter.
So this isn't the ``erroneus quantity'' $-\sum_{x_{i},y_{j}} p(X|Y) \,\log_{2}p(X|Y)$ (sum over both $X$ and $Y$).
(Question: if you sum over the possible values of the variable $X$, does $\tilde{H}_{y_{j}}(X)$ still depend on $X$? Or not?!?)
What is this quantity $\tilde{H}_{y_{j}}(X)$? I still don't have a good idea, but Alejandro noticed that if you look at the expected value of it, you obtain the (correct) expression of $H(X|Y)$:
\begin{eqnarray*}
\mathbb{E} [\tilde{H}_{y_{j}}(X)] &=& \sum_{y_{j}}p(Y) \tilde{H}_{y_{j}}(X) \\
&=& \sum_{y_{j}} p(Y) \left[\sum_{x_{i}} p(X|Y) \,\log_{2}p(X|Y)\right]\\
&=& \sum_{x_{i},y_{j}} p(Y) p(X|Y) \,\log_{2}p(X|Y)\\
&=& \sum_{x_{i},y_{j}} p(X,Y) \,\log_{2}p(X|Y)
\end{eqnarray*}
where, in the last passage, I have used the relation (\ref{prob-cond}).
I have listened at the recording of Suhov's lecture, and he says the same thing... :-)
\subsection{Mutual entropy}
Let's define the \emph{mutual} entropy by writing:
\begin{eqnarray} \label{mutual-entropy}
H(X:Y) &\equiv& H(X) + H(Y) - H(X,Y)\\
&=& -\sum_{x_{i},y_{j}}\mathbb{P}(X=x_{i},Y=y_{j}\,\log_{2}\frac{\mathbb{P}(X=x_{i})\,\mathbb{P}(Y=y_{j})}{\mathbb{P}(X=x_{i},Y=y_{j})}.
\end{eqnarray}
Since the definition is symmetrical, it's easy to see that:
\begin{equation}
H(X:Y) = H(Y:X).
\end{equation}
\subsection{Relative entropy}
\begin{equation}
H(X||Y) \equiv -\sum_{x_{i}}\mathbb{P}(X=x_{i})\, \log_{2}\frac{\mathbb{P}(Y=x_{i})}{\mathbb{P}(X=x_{i})}
\end{equation}
\subsection{Entropy rate}
Given a random string of lenght $n$: $X_{1}^{n} = \{ X^{1}, X^{2},\,, X^{n} \}$ we define
\begin{equation}
h \equiv \lim_{n \to \infty} \frac{1}{n} \, H(X_{1}^{n}).
\end{equation}
\subsubsection{particular cases}
The entropy rate of a sequence of independent random variables is:
\begin{equation}
h = H(X^{1}).
\end{equation}
The entropy rate of a stationary Markov chain is:
\begin{equation}
h = H(X^{2}|X^{1}).
\end{equation}
\subsection{Comments}
\begin{itemize}
\item The concept of entropy is not so much useful in the case of a single random variable. It's use is mainly in the study of a sequence of random variables.
\item following the concept of the entropy of a single random variable, are introduced similar concepts for random strings; we have \begin{itemize} \item joint entropy \item conditional entropy \item mutual entropy \item relative entropy. \end{itemize} One of the most elusive concept, although one of the most useful, is the conditional entropy.
\item the key to understand the properties of these quantities is relation (\ref{joint-property}).
\item there are relations about conditional, mutual and relative similar to (\ref{joint-property}), that is about joint entropy (see Suhof notes).
\end{itemize}
\section{Lecture 2 - Shannon's $1\protect^{st}$ coding theorem}
Entropy rate
\begin{equation}
h = \lim_{n \rightarrow \infty} \frac{1}{n} H(X_{1}^{n})
\end{equation}
For example, in coin tossing
\begin{equation}
p>\frac{1}{2} > 1-p
\end{equation}
then
\begin{equation}
x_{1}^{n} = (1,1,,1)\qquad (all 1s)
\end{equation}
\begin{equation}
\lim_{n \to \infty} \frac{1}{2} log_{2} M_{n}(\varepsilon)=h
\end{equation}
\begin{equation}
\frac{1}{n} log_{2}M_{n}(\varepsilon) \to h \Leftrightarrow M_{n}(\varepsilon) \tilde 2^{n\,h} << 2^{n\, log_{2} m} m^{h}
\end{equation}
\begin{equation}
\mathbb{X}_{1}^{n}=x_{1}^{n} \tilde 2^{-n\,h}
\end{equation}
$M_{n} =$ number of strings that you have to select in order to have that is less than $\varepsilon$
$2^{n\,h} =$ number of strings of lenght $n\,h$
by the law of the large numbers
\begin{equation}
\frac{n_{1}}{n} \to p\,,\frac{n_{0}}{n} \to (1-p)
\end{equation}
Shannon:
you have a string of lenght $n$. suppose that by incrising to $R^{-1} \,n$, you have a chance to beat of errors in the channel.
The maximum value of $R$ is the channel capacity.
If this is possible, then R is called a \emph{reliable transmission rate}.
Take $C=\sup[R:R^{-1} \quad \mathrm{reliable}]$
then $C=$ the channel capacity
$\bullet$ Memoryless channels:
distort every digit indipendently.
then it is clear that this channel is characterized by the channel matrix $\Pi$
\begin{displaymath}
\mathbf{\Pi} =
\left( \begin{array}{cc}
(1-p) & p \\
p & (1-p)
\end{array} \right)
\end{displaymath}
Binary Simmetric Channel
For a BSC
\begin{equation}
C=1-p\,\log_{2}p \,\, -(1-p)\log_{2}(1-p)
\end{equation}
$X_{1}^{n}$ want to increse $n$ to $R_{n}^{-1}$
then there exixst an enchoding and a decoding such that the error probability can be made small then $R$ is challed a reliable transmission sote.
\appendix
\section{Combinatory Theory} \label{app-combinatory}
\subsection{Dispositions without repetitions (or simple dispositions)}
A \emph{disposition without repetitions} is a sequence of objects, took from a set of possible symbols, where is not possible to repeat many times the same object. The order matters, i.e. sequences with the same elements in a different order are different dispositions.
The number of possible dispositions without repetitions, with length $n$ and choosing in a set of $k$ possible objects is:
\begin{equation}
D_{k}^{n} = P_{k}^{n} = \frac{n!}{(n-k)!}
\end{equation}
\subsubsection{proof}
The number of possible choices for the first symbol is $k$. The number of possible choices for the second symbol is $k-1$ (since one symbol is gone and we cannot repeat it). For a sequence of $n$ symbols the possibilities are the product of these possibilities:
\begin{displaymath}
k \, (k-1)\, , (k-n+1)
\end{displaymath}
we can represent this quantity as a ratio of factorials: we write $k! = k (k-1) (k-2) 1$, and then we divide for the product of the ``unwanted terms'':
\begin{displaymath}
k \, (k-1)\, , (k-n+1) = \frac{k (k-1) (k-2) (k-n+1) (k-n) 1}{(k-n) (k-n-1) 1} = \frac{k!}{(k-n)!}.
\end{displaymath}
QED
It's possible to show that permutations are special cases of simple dispositions.
\subsection{Dispositions with repetitions} \label{disp-rep}
A \emph{disposition with repetitions} is a sequence of objects, took from a set of possible symbols, where is possible to repeat many times the same object. The order matters, i.e. sequences with the same elements in a different order are different dispositions.
The number of possible dispositions with repetitions, with length $n$ and choosing in a set of $k$ possible objects is:
\begin{equation} \label{disp-rep}
DR_{k}^{n} = k^{n}
\end{equation}
\subsubsection{proof}
The possibilities for the choice of the first element are $k$, the possibilities for the choice of the second element are $k$, and so on. For a sequence of $n$ symbols the possibilities are the product of these possibilities. QED
\subsection{Permutations}
\subsection{Combinations without repetitions (or simple combinations)}
A combination is a sequence of symbols taken out from a set of possible, where the order doesn't matters, i.e. sequences with the same elements in a different order are considered the same combination. A simple one is when is not allowed to repeat a symbol.
The number of possible simple combinations, with length $n$ and choosing in a set of $k$ possible objects is:
\begin{equation}
C_{k}^{n} = \frac{n!}{k\,(n-k)!} \equiv \left( \begin{array}{c}n\\k \end{array} \right)
\end{equation}
\subsubsection{proof}
\subsection{Combinations with repetitions}
A combination is a sequence of symbols taken out from a set of possible, where the order doesn't matters, i.e. sequences with the same elements in a different order are considered the same combination. In this case is allowed to repeat a symbol.
The number of possible combinations with repetitions, with length $n$ and choosing in a set of $k$ possible objects is:
\begin{equation}
CR_{n}^{k} = \frac{(n+k-1)!}{k!\,(n-1)!} \equiv \left( \begin{array}{c}n+k-1\\k \end{array} \right)
\end{equation}
\subsubsection{proof}
\end{document}