You already have got a couple of good relevant points, so I'm just gonna add one I haven't seen so far among the answers.
- The result of an operator with a well defined center pixel is on the same grid where you could argue that forward or backward difference are off by a fraction of 1/2 samples in either dimension (compared to the in-grid), this could be impractical for many reasons and in many circumstances.
Here we can see how the grid moves from before and to after forward x-difference and forward y-difference respectively: None of the 3 grids are the same!

And for the central difference scheme all of them align nicely on top of each other: 
And as many things in engineering and science estimating a differential is one of many operations which need to be compatible or align in some sense for them to be useful.
EDIT
As proposed by @Ian in the comment to avoid the grids not being aligned, one could use (expressed in the functions Z-transform):
$$\mathcal{Z}\{d_k(x_1,x_2)\} = 1-{z_k}^2$$
But we see that it will simply be a lazy filtering by one grid step of the central difference:
$$\mathcal{Z}\{d_k(x_1,x_2)\} = \underset{\text{lazy}}{\underbrace{z_k}}\underset{\text{central diff.}}{\underbrace{({z_k}^{-1}-{z_k}^1)}}$$
So in practice we would be calculating the exact same thing, just moving the result one step to the left (or down).