Whether a job starts running without any other jobs in the queue depends only on the durations of the previous jobs, not on its own duration. Thus we can begin the analysis at the point when a job starts running without any other jobs in the queue. The density for encountering it with running time $\tau$ is proportional to the product of the density $\mu\mathrm e^{-\mu\tau}$ for it to have running time $\tau$ and the probability $1-\mathrm e^{-\lambda\tau}$ of another job arriving before it finishes, where $\lambda$ is the rate of arrival. Then normalization yields the density
$ (\lambda+\mu)\frac\mu\lambda\mathrm e^{-\mu\tau}\left(1-\mathrm e^{-\lambda\tau}\right)\;. $
Alternatively, we can use the reversibility of the process by noting that the point at which the job started is the last occurence of either an arrival or a departure of the reversed process, so the distribution for the previous duration of the running job is exponential with parameter $\lambda+\mu$; then a convolution with the exponential distribution for the remaining duration of the running job with parameter $\mu$ also yields the above result.
Whereas the first approach would be cumbersome to generalize to $k$ jobs present upon arrival, since the probability for a job of a given length to be encountered in that state would need to be calculated, the second approach generalizes quite readily: If there are $k$ jobs, the point at which the currently running job started is the last departure of the reversed process or the $k$-th last arrival of the reversed process, whichever is more recent. We can get the tail (i.e. the complement of the cumulative distribution function) of the distribution for the duration since this event by multiplying the tails of the distributions for one departure and for $k$ arrivals. The tail for one departure of the reverse process is $\mathrm e^{-\lambda\tau}$; the tail for $k$ arrivals of the reverse process is
$\mathrm e^{-\mu\tau}\sum_{n=0}^{k-1}\frac{(\mu\tau)^n}{n!}$
(see here). Thus the density for the previous duration of the running job given that $k$ jobs are encountered is
$-\frac{\partial}{\partial \tau}\left(\mathrm e^{-(\lambda+\mu)\tau}\sum_{n=0}^{k-1}\frac{(\mu\tau)^n}{n!}\right)\;.$
Convolving this density with the Erlang density for $k$ departures of the forward process yields the distribution you want:
$ -\int_0^\tau\frac{\mu^k(\tau-t)^{k-1}\mathrm e^{-\mu(\tau-t)}}{(k-1)!}\frac{\partial}{\partial t}\left(\mathrm e^{-(\lambda+\mu) t}\sum_{n=0}^{k-1}\frac{(\mu t)^n}{n!}\right)\mathrm dt\;. $
Unfortunately I don't see a way to simplify this significantly; neither moving the derivative to the other factor by integration by parts nor using the fact that the expression being differentiated is $\mathrm e^{-\lambda t}$ times an integral seems to help much.
We can recover the above result for $k=1$,
$ \begin{align} -\int_0^\tau\mu\mathrm e^{-\mu(\tau-t)}\frac{\partial}{\partial t}\left(\mathrm e^{-(\lambda+\mu) t}\right)\mathrm dt &=(\lambda+\mu)\int_0^\tau\mu\mathrm e^{-\mu(\tau-t)}\left(\mathrm e^{-(\lambda+\mu) t}\right)\mathrm dt \\ &=(\lambda+\mu)\mu\mathrm e^{-\mu\tau}\int_0^\tau\mathrm e^{-\lambda t}\mathrm dt \\ &=(\lambda+\mu)\frac\mu\lambda\mathrm e^{-\mu\tau}\left(1-\mathrm e^{-\lambda\tau}\right)\;, \end{align} $
but already for $k=2$ the result is rather complicated:
$ \begin{align} &-\int_0^\tau\mu^2(\tau-t)\mathrm e^{-\mu(\tau-t)}\frac{\partial}{\partial t}\left(\mathrm e^{-(\lambda+\mu) t}\sum_{n=0}^1\frac{(\mu t)^n}{n!}\right)\mathrm dt \\ =& -\mu^2\mathrm e^{-\mu\tau}\int_0^\tau(\tau-t)\mathrm e^{\mu t}\frac{\partial}{\partial t}\left(\mathrm e^{-(\lambda+\mu) t}(1+\mu t)\right)\mathrm dt \\ =& \mu^2\mathrm e^{-\mu\tau}\left(\tau+\frac1{\lambda^3}\left(\lambda\mu(\lambda+\mu)\tau\left(\mathrm e^{-\lambda\tau}+1\right)+\left((\lambda+\mu)^2+\mu^2\right)\left(\mathrm e^{-\lambda\tau}-1\right)\right)\right) \end{align} $
(computation). The first two moments of this distribution are
$ \langle\tau\rangle=\frac2\mu+\frac{\lambda+2\mu}{(\lambda+\mu)^2} $
and
$ \langle\tau^2\rangle=\frac6{\mu^2}+2\frac{2\lambda^2+7\lambda\mu+7\mu^2}{\mu(\lambda+\mu)^3}\;. $
I've confirmed these in simulations with $\lambda=2$ and $\mu=3$, where $\langle\tau\rangle=74/75\approx0.987$ and $\langle\tau^2\rangle=476/375\approx1.269$. Here's the code.