While this is seldom emphasized, certain tensors already appear quite naturally in the context of multi-variable calculus on $\mathbb{R}^n$. If one wants to treat all the higher-order derivatives of a function $f$ in a unified, basis-independent and consistent way, one is lead naturally to the notion of (symmetric) tensors of various orders. From this perspective, it is natural to discuss tensors of arbitrary order together and bundle them up because higher order tensors appear naturally as the derivatives of lower order tensors. For example, if you care about the second derivative (also known as the Hessian) of a scalar function (a $(0,0)$-tensor), you should care about $(0,2)$-tensors.
Let me demonstrate how this works:
Let $f = (f_1, \dots, f_m) \colon \mathbb{R}^n \rightarrow \mathbb{R}^m$ be a smooth map (in the sense that all possible partial derivatives of the $f_i$ of all orders exist). Then:
- The first derivative (or differential) of $f$ at a point $p \in \mathbb{R}^n$ is defined as the unique linear map $(Df)(p) = Df|_p \colon \mathbb{R}^n \rightarrow \mathbb{R}^m$ which satisfies
$$ \lim_{h \to 0} \frac{f(p + h) - f(p) - (Df|_p)(h)}{\| h \|_{\mathbb{R}^n}} = 0. $$
When $m = 1$, the scalar $Df|_p(h)$ gives us the directional derivative $\frac{d}{dt} f(p + th)|_{t = 0}$ of $f$ at a point $p$ in the direction $h$. In general, the linear map $Df|_p$ can be represented with respect to the standard bases of $\mathbb{R}^n$ and $\mathbb{R}^m$ as a $m \times n$ matrix
$$ \begin{pmatrix} \frac{\partial f_1}{\partial x^1} & \dots & \frac{\partial f_1}{\partial x^n} \\
\vdots & \ddots & \vdots \\
\frac{\partial f_m}{\partial x^1} & \dots & \frac{\partial f_m}{\partial x^n} \end{pmatrix} $$
but when we move to the context of manifolds, we won't have a notion of "standard bases" so it is best to think of the first derivative $Df_p$ as a linear map and not as a matrix.
Finally, a linear map $\mathbb{R}^n \rightarrow \mathbb{R}^m$ can be identified naturally with an element of the tensor product $\left( \mathbb{R}^n \right)^{*} \otimes \mathbb{R}^m$ and so the total first derivative $p \mapsto Df(p)$ is a smooth map from $\mathbb{R}^n$ to $\left( \mathbb{R}^n \right)^{*} \otimes \mathbb{R}^m$.
- The second derivative $(D^2f)(p) = D^2f|_p$ of $f$ at a point $p$ is the first derivative of the map $p \mapsto Df(p)$ at $p$. The map $p \mapsto Df(p)$ is a smooth map from $\mathbb{R}^n$ to $\operatorname{Hom}(\mathbb{R}^n, \mathbb{R}^m)$ (which we can identify by using bases with $M_{m \times n}(\mathbb{R}) \cong \mathbb{R}^{m \times n}$) so $(D^2f)(p)$ is a linear map with signature
$$(D^2f)(p) \colon \mathbb{R}^n \rightarrow \operatorname{Hom}(\mathbb{R}^n, \mathbb{R}^m). $$
Such maps are naturally identified with bilinear maps $\mathbb{R}^n \times \mathbb{R}^n \rightarrow \mathbb{R}^m$ which again, can be identified as elements of $\left( \mathbb{R}^n \right)^{*} \otimes \left( \mathbb{R}^n \right)^{*} \otimes \mathbb{R}^m$.
More generally, the $k$-th derivative of $f$ turns out to be a smooth map from $\mathbb{R}^n$ to the space
$$\underbrace{\left( \mathbb{R}^n \right)^{*} \otimes \dots \otimes \left( \mathbb{R}^n \right)^{*}}_{k\text{ times}} \otimes \mathbb{R}^m $$
which, in your notation, would be a $(0,k)$ ($\mathbb{R}^m$-valued) tensor field on $\mathbb{R}^n$. If $m = 1$, this is just a $(0,k)$ tensor field.
If $n = m$ (and then we can think of $f$ as a vector field on $\mathbb{R}^n$), the $k$-th derivative of $f$ is a $(1,k)$-tensor on $\mathbb{R}^n$.