I have a very simple question on the LASSO estimator (I am a beginner). This is the LASSO problem
$$ \hat{\beta}_n:=\operatorname*{argmin}_{\beta \in B} ||Y-X\beta||^2_2+\lambda||\beta||_1 $$
where $Y$ is an $n\times 1$ vector, $X$ is an $n\times k$ matrix, $\beta$ is a $k\times 1$ vector, $n$ is the sample size and $k$ is the number of regressors.
I was trying to understand why the LASSO estimator does not have a closed form when $X$ is non-orthogonal. One way to see it is by taking F.O.C. as explained here.
My question is: when taking F.O.C. we find the term$\frac{\partial |\beta_j|}{\partial \beta_j}|_{\hat{\beta}_{n,j}}$ for $j=1,...,k$. Writing that requires assuming that $\beta_j\neq 0$ for $j=1,...,k$ because otherwise the absolute value function is non differentiable. However, the econometrics motivation of the LASSO estimator is that, for some $j=1,...,k$, $\beta_j=0$ at the population level. How do these two things reconcile? Is it because $\beta_j\neq 0$ at the sample level as a consequence of sampling error?