I'm supposing $f \in C^1(U)$. As you noted, by using the Mean Value Theorem one can show that for every $x,y \in U$ we have |f(x) - f(y)| \le \sup_{z \in U} \|Df(z)\| |x-y|. For the version of Mean Value Theorem used, see the section "Mean value theorem for vector-valued functions" at http://en.wikipedia.org/wiki/Mean_value_theorem. This of course implies the inequality $\frac{|f(x) - f(y)|}{|x-y|} \le \sup_{z \in U} \|Df(z)\|.$ To show the other direction, let $\varepsilon > 0$ be arbitrary and let $z_n \in U$, $n=1,2,\dots$, be a sequence of points so that $\|Df(z_n)\| \to \sup_{z \in U} \|Df(z)\|.$ Then by the definition of differentiability for every $n \in \mathbb{N}$ there exists a neighbourhood $U$ of $0$ in $\mathbb{R}^n$ so that for every $t_n \in U$ we have $\frac{|f(z_n + t_n) - f(z_n) - Df(z_n)t_n|}{|t_n|} \le \varepsilon.$ By the triangle inequality we hence have that $\left|\left|\frac{f(z_n + t_n) - f(z_n)}{|t_n|}\right| - \left|\frac{Df(z_n)t_n}{|t_n|}\right|\right| \le \varepsilon.$ By choosing $t_n$ so that $\frac{|Df(z_n)t_n|}{|t_n|} = \|Df(z_n)\|$ and $z_n + t_n \in U$, we have found $x_n = z_n + t_n$ and $y_n = z_n$ so that $\left|\left|\frac{f(x_n) - f(y_n)}{|x_n - y_n|}\right| - \|Df(z_n)\|\right| \le \varepsilon.$ Because $\varepsilon > 0$ is arbitrary and $\|Df(z_n)\| \to \sup_{z \in U} \|Df(z)\|$ we get the result.