The insight and engineering rule of thumb is that
we need to observe at least half of a sine wave to determine its amplitude and frequency.
Of course, as a rule of thumb it is a drastic cutoff, but it is quite intuitive and close to "real".
Now take the sum of two sinusoids with near frequencies $ \sin \left( {2\pi \left( {f - \Delta f/2} \right)t} \right) + \sin \left( {2\pi \left( {f + \Delta f/2} \right)t} \right) = 2\cos \left( {2\pi \,\Delta f/2\,t} \right)\sin \left( {2\pi \,f\,t} \right) $ we obtain a sine of frequency $f$ amplitude-modulated by a (co)sine of frequency $\Delta f/2$.
(a well known phenomenon of which I do not remember the name in english).
According to the given rule, we need to "see" at least half of the modulating wave to be able and reconstruct the signal, and thus the difference in frequency. If the duration , i.e. "persistance" of the signals, is much less we will only see one sinusoid with the average frequency and double amplitude.
Calling $T$ the duration of the observation window we shall have $ {1 \over 2}{1 \over {\Delta f/2}} < T\quad \Rightarrow \quad 1 < T\Delta f $
Passing to a more rigorous mathematical analysis, consider that the Box function (a window of duration $T$) $ R(t,T) = \left\{ {\matrix{ 1 & { - T/2 \le t \le T/2} \cr 0 & {otherwise} \cr } } \right. = U(t + T/2) - U(t - T/2) $ where $U$ is step function, has a frequency spectrum (bilateral Fourier Transform) given by $ G(f,T) = T{\rm sinc}(f\,T) = {{{\rm sin}(\pi f\,T)} \over {\pi f}} $

A sinusoid of duration $T$ is the product of the Box function for a sine function.
The spectrum will be the convolution of a Dirac at frequency $f$ (apart that at $-f$) with the $G(f,T) =T{\rm sinc}(f\,T) $ function , i.e. the same centered at $f$.
Then, approximating the $G(f,T)$ to a box function with a cutoff at $f \pm 1/(2T)$ leads to that we can distinguish between frequencies with a separation of at least $1/T$, i.e. to the formula given above.