In many applications that is not with high requirements, it is common to use $(A^{\text T}A+\lambda I)^{-1}A^{\text T}$ or $A^{\text T}(AA^{\text T}+\lambda I)^{-1}$ ($\lambda$ is small) to approximate the Moore-Penrose pseudoinverse $A^{\dagger}$. But from the properties of Moore-Penrose pseudoinverse, how do we know that $(A^{\text T}A+\lambda I)^{-1}A^{\text T}$ and $A^{\text T}(AA^{\text T}+\lambda I)^{-1}$ will be close to the exact solution?
It seems that $ \lim_{\lambda \rightarrow 0}(A^{\text T}A+\lambda I)^{-1}A^{\text T} = A^{\dagger} $ I want to know why this happens (You can try in Matlab for verification). Besides, it is also possible for $A^{\text T}A + \lambda I$ to be singular. The whole thing is confusing me.
And the question following this is, if we want to restrict the error of $(A^{\text T}A+\lambda I)^{-1}A^{\text T}$ approximating the $A^{\dagger}$ to a certain extent, can we know the upper bound for the $\lambda$ according to some criterions?