Here is crude information of what I have in mind. Note that it might be one way of formulating the problem.
Firstly, note that a real machine learning problem is a discrete problem, in the sense that some data points, with the corresponding labels (classes) are given to you and you have to guess the label of a new data point, according to the rule
$$\hat{Y}(X)=\frac{1}{k}\sum_{X_{i} \epsilon N_{k}(X)} Y_{i}$$
So, given is labeled data points $T=\{(X_i,Y_i)|X_i \in \mathbb{R}^n,Y_i \in\{0,1,...,c-1\}\}$, where we have $c$ classes.
At this point, we would like to define a function $f:\mathbb{R}^n \rightarrow\mathbb{R}$, which is continuous, such that the floor of the function ($\lfloor{f(X)}\rfloor$)gives the correct classes, when inputs $X_i$ from the set $T$ is given as input to it.
For any optimisation, you need to define a cost function. One can be defined as below.
$$Cost=\sum_{i}(\hat{Y}(X_i)-\lfloor{f(X_i)}\rfloor)^2=\sum_{i}(\frac{1}{k}\sum_{X_{i} \epsilon N_{k}(x)} Y_{i}-\lfloor{f(X_i)}\rfloor)^2$$
In order to use calculus to analyse the problem, we need to make further modifications.
- Firstly, the set $T$, which is finite in practice, should be defined for all the inputs $x_i \in \mathbb{R}^n$.
- The second thing is to get rid of the discontinuity that is produced by the function $\lfloor{f(x)}\rfloor$. To that goal, we allow any number in $\mathbb{R}$ to be a possible class. In other words, we have an infinite number of classes now. Therefore, we do not need the floor function anymore.
- Last thing is to have a density defined on data space. For simplicity, have the distribution to be uniform. The result would be that the cost function can be written as below.
$$Cost=\int_{X \in \mathbb{R}^n} \bigg(\frac{1}{V(S_r(X))}\int_{T \in S_r(X)}f(T)dT -f(X) \bigg )^2 dX$$
$S_r(X)$ is the open sphere, centred at $X$ with radius $r$. $V(S_r(x))$ is the volume of the open sphere. $V(S_r(x))$ is supposed to be the continuous counterpart of $k$. Also, we have $V(S_r(x))=\alpha r^n$, where $\alpha$ is a constant.
$$V(S_r(x))=\alpha r^n$$
We may rewrite the cost function to get.
$$Cost=\int_{X \in \mathbb{R}^n} \bigg( \frac{1}{\alpha r^n} \int_{T \in S_r(X)}f(T)dT -f(X) \bigg )^2 dX$$
The term $\int_{T \in S_r(X)}f(T)dT$ can be approximated by
$$\int_{T \in \mathbb{R}^n} \delta (X,r)f(T)dT$$
where $\delta(X,r)$ is Gaussian distribution with mean $X$ and variance $r$ and $r=d(T,X)$ is the Euclidean distance between $X$ and $T$. As $r$ goes to $0$, the integral goes to $\alpha r^n f(X)$.
According to the last two equations, you should come to a conclusion that for $r=0$ you get the minimum. Note that $r=0$ is equivalent to $k=1$, which means that you want to consider only the point $X$ itself for evaluation of the label $Y(X)$.
You can also take the derivative of the cost function, with respect to $r$ and put it equal to $0$, which is a tedious process.