I have been reading about the absolute conditional number of a problem and have some questions. I understand the overall idea, though I'm confused about the following:
Suppose we have $\delta$x as denoting a small perturbation of x and we define
$$ \delta f = f(x + \delta x) - f(x) $$
be the corresponding change in f.
Here f: X $->$ Y where x $\in$ X is an n-dimensional vector and y $\in$ Y is a M-dimensional vector.
I understand that the absolute conditional number $k(x)$ of the problem $f$ is:
$$ k(x) = supremum: ||\delta f|| / || \delta x || $$
As ||$\delta$x|| -> 0
Now from what the author says we also have (where J is the Jacobian):
$\delta$$f$ = $J(x)$$\delta$x, as $||$$\delta$x$||$ -> $0$. (equation 1)
And finally the author concludes with the fact that:
$$ k(x) = ||J(x)|| $$
where the above (equation 2) is the induced matrix norm.
My first question is why does the dot product of each row of J(x) with a small change in x yield $\delta$$f$?
Here each row of J(x) would be the partial derivatives at some x value ( so thats a number) then we multiply that by a perturbation amount. So a derivative at a point times a very small value should be interpreted how? My thoughts (single variable version):
Suppose we have a derivative value of D at a point x. Then D is the tangent line slope at x. Then this slope is really D/1. So we have a vertical step size of D for every step size of 1 along the x axis. So back to our example multiplying this by our perturbation amount this would yield the corresponding 'fraction' of slope D since our perturbation amount is -> 0. So this total amount would be the change our function f makes over that small change in x, namely $\delta$$x$, and this product would become much more accurate as $\delta$$x$ becomes smaller as it will more closely be resembling the instantaneous slope at that point x.
Then my second question is how would one show k(x) = $||J(x)||$
My thoughts: this makes sense if given what I said above for equation 1 is true. If J(x) produces large partial derivatives then multiplying by a perturbation vector x would produce a larger output vector y, then whatever the least upper bound the scaler $C$ of the two vector norms is would be k(x) at that point x.
Namely: ||J(x)||||x|| = C||x||