Let me give a sketch of the answer in the level of the differential geometry of surfaces in $\mathbb{R}^3$. I follow the framework of W.Klingenberg's "A Course of Differential geometry" where you can find all the necessary details, if you wish.
Consider a (regular) surface $S$ parametrized by a smooth immersion $f:U \rightarrow \mathbb{R}^3$. Vectors $f_i := \operatorname{d}{f}(\partial_i)$ form a basis of the tangent plane to surface $S$ at every point $u \in U$ (visualize this as the tangent plane to $S$ passing through the point $p = f(u)$). We can complete this basis to a basis of the tangent space to $\mathbb{R}^3$ at point $p$ if we add a vector orthogonal to the tangent plane $T_p S$. It is convenient to use the unit normal $n = \frac{f_1 \times f_2}{|f_1 \times f_2|}$. The triple $(f_1,f_2,n)$ is called a local frame along $f$. Notice that we can write any vector field $X$ tangent to $S$ as $X = X^i f_i$ (from now on I use the Einstein summation convention).
Let us write $f_{i j} := \frac{\partial{f}}{\partial{u^i}}$. These are some vectors in $\mathbb{R}^3 \equiv T_p{\mathbb{R}^3}$, and they can be expanded in terms of $(f_1,f_2,n)$ as $ f_{i j} = \Gamma^k_{i j} f_k + II_{i j} n \tag{1} $ where $\Gamma^k_{i j}$ and $II_{i j}$ are some smooth functions defined on $U$.
Using the fact that $f_{i j} \cdot n = 0$ we can get from (1) that $ f_{i j} \cdot f_k = \Gamma^l_{i j} f_l \cdot f_k = \Gamma^l_{i j} g_{l k} =: \Gamma_{i j k} $ Immediately we see that $ \Gamma^k_{i j} = \Gamma^k_{j i} $ or $ \Gamma_{i j k} = \Gamma_{j i k} \tag{2} $ These can be seen as three equations for six indeterminates $\Gamma_{i j k}$ (recall that $i,j,k = 1,2$) One more equation we obtain by differentiating the first fundamental form as follows $ g_{i j, k}= \frac{\partial}{\partial u^k} g_{i j} = \frac{\partial}{\partial u^k} (f_i \cdot f_j) = f_{i k} \cdot f_j + f_j \cdot f_{j k} = \Gamma_{i k j} + \Gamma_{j k i} \tag{3} $ Cyclically permuting the indices $i,j,k$ in (3) we obtain the remaining two equations to close the system which is linear. Solving it we find $ \Gamma_{i j k} = \frac{1}{2}(g_{i k,j} + g_{j k,i} - g_{i j,k}) $ or, raising index $k$, $ \Gamma^k_{i j} = \frac{1}{2}g^{k l}(g_{i l,j} + g_{j l,i} - g_{i j,l}) $
Now we can observe that $\Gamma^k_{i j} f_k$ behaves as a good derivative operation on the tangent vectors (being "good" we formalize by the notion of connection). In particular, it satisfies the product rule, etc.
It makes now sence to introduce the notation $ \nabla_{f_i}{f_j} := \Gamma^k_{i j} f_k $ This operation is called the covariant derivative of vector $f_j$ in the direction of vector $f_i$. It can be extended to arbitrary tangent vector fields by (requiring!) linearity.
(A few remarks on the notation. Notice that we do not distinguish between $\partial_i \equiv \frac{\partial}{\partial u^i}$ and $f_i$ (we identify them!), so $\nabla_{f_i}{f_j}$ is just the same thing as $\nabla_{\frac{\partial}{\partial u^i}}{\frac{\partial}{\partial u^j}}$ which we can even write as $\nabla_i \partial_j$ to get a greater simplicity).
Correction. The operation $\nabla_{f_i}{f_j} := \Gamma^k_{i j} f_k$ is defined on coordinate frame and produces a tangent vector again. To extend this operation onto all tangent vectors and make it to behave as a derivation we need to define it in an appropriate way, that is we require that the product rule holds, and extend by linearity. Of course, we also need to ask that $\nabla_{i}{\phi} = \partial_{i}{\phi}$ for any smooth function $\phi:U \rightarrow \mathbb{R}$