1
$\begingroup$

I was reading the paper "Neural Networks for Optimal Approximation of Smooth and Analytic functions by H.N. Mhaskar" and in the third paragraph it said:

A very common assumption about the function class is defined in terms of the number of derivatives that a function possesses. For example, one is interested in approximating all functions of s real variables having a continuous gradient. By a suitable normalization, one may assume that the gradient is bounded by 1.

my question is mainly about the last sentence:

By a suitable normalization, one may assume that the gradient is bounded by 1.

I am trying to understand what that sentence means and its implications. It feels to me that it implies that given a class of functions $ \cal F $ then one can choose some "normalization" (whatever that means, probably dividing by some suitable norm or some carefully crafted function using the norm) such that we can process any function $ f \in \cal F $ and get the "normalized" $\hat f = N(f)$ so that we control the size of its derivatives. In particular it seems we can always choose some $N$ such that the derivatives of $ \hat f $ are bounded by 1. Is that what it means?

Also, it seemed to claim that one can always normalize the gradient to be upper bounded by 1 in a function class without changing the strength of one’s approximation result. Is this true? Do the conclusion of approximation theory remain unchanged even with such a normalization? If so why is that? Why do we even normalize if that is an option? Whats wrong with working with the original function class?

0 Answers 0