If we play around with the graph in our head, we can look at it from two perspectives: $(U,V(U))$ and $(V,U(V))$. That is, if we look at it from how it is presented, we see that the area of the second box in the equation is $\int_{p}^{q}V(U)dU.$ In their notation, that's exactly $\int_{p}^{q}vdu$. This follows from the definition of a definite integral.
Now, if we flip the graph (turn your head sideways), reflect it to the left, and look at it from the perspective of $(V,U(V))$, we see that the area underneath the flipped curve is $\int_{r}^{s}U(V)dV$. This is, in their notation, $\int_{r}^{s}udv$.
This is why we have that the total sum of the areas is $\int_{r}^{s}udv+\int_{p}^{q}vdu.$
There is not a loss of rigor what so ever in this diagram. It is quite unsettling at first, but what we're really just doing is this:
Let $v: U \to V$ be a map where, for a $u \in U$, we have $u \mapsto v(u)$. The domain and range thus form the set of ordered pairs $\{(u,v(u)): u\in U\}$. Our other map, $v^{-1}:V \to U$, is the flip-then-reflect-left map with the mapping $v(u) \mapsto u$. It is the set of ordered pairs $\{(v(u),u): u\in U\}$. This is the inverse map of $v$ by definition.
Having that $u=f(x_0)$ for some $x_0$ and $v(u)=g(x_1)$ for some $x_1$, however, does not necessarily imply $g$ is the inverse of $f$.