This answer is a pitch for the 'Weak Solution Techniques' especially intended for the case where your sampled data might have noise (like in the real world). You could choose just about any basis: $[\phi_k(x)| k \in 0:N]$ (one of my favorites is B-splines). The great part about such a basis is that you know the derivatives of all the basis vectors in advance. 
In this approach you would represent $y(x) \approx f(x;a) = \sum_{k=0}^{N} a_k \phi_k(x)$ and seek to find the best coefficients $a_k$ in the sense that they minimize an energy functional. I usually take a variational approach to this problem - this way we can accommodate any noise that might be in the data $y_k$.  In particular, we can know exactly how well we are are approximating the data by setting up a cost functional such as:
$$ J(a) = \Bigl\|y-\sum_{k=0}^{N} a_k \phi_k(x) \Bigr\|_2^2 + \beta_1\biggl\|\frac{dy}{dx}-\sum_{k=0}^{N} a_k \frac{d\phi_k}{dx}  \biggr\|_2^2 + \cdots $$
the dots are because you could keep on adding penalties for each kind of information that you have. At this point you need to enforce the necessary condition for extrema: namely, for each $k$, $$\frac{\partial J(a)}{\partial a_k} = 0$$
Lets call the matrix of sampled basis functions $[\phi_0(x_k), \phi_1(x_k), \phi_2(x_k),\cdots ] =\Phi$
and similarly $\bigl[\frac{d\phi_0}{dx},\frac{d\phi_1}{dx},\frac{d\phi_2}{dx},\cdots \bigr] = d\Phi$
then the necessary condition yields the Least Squares style equation:
$(\Phi^*\Phi+\beta_1d\Phi^*d\Phi)a = \Phi^*y + \beta_1d\Phi^*\frac{dy}{dx}$
where $\Phi^*$ and $d\Phi^*$ are the adjoints and $y$ and $dy/dx$ are the sampled data.
There is a lot of literature on how to choose the regularization parameter $\beta_1$ as well as other kinds of penalties (Tikhonov, $L_1$, TV) that you might add (as Bayesian priors)