I have some basic questions on the practical meaning of Markov Chain Monte Carlo (MCMC) methods, as, for example, the Gibbs sampling
Suppose I have a random matrix $\underbrace{Y}_{n\times n}$ defined on the probability space $(\Omega, \mathcal{F}, \mathbb{P})$ with support $\{A_i\}_{i=1}^3$, $A_i\in \mathbb{R}^n$ for $i=1,2,3$.
Additionally, $\mathbb{P}(Y=A_i)=f(A_i,\theta)$ where $\theta\in \Theta\subseteq \mathbb{R}$ for $i=1,2,3$.
Assume $\theta$ is known.
Suppose that $f(A_i;\theta)$ is particularly hard to compute for $i=1,2,3$. One way to overcome this issue is by simulating it using a MCMC method:
1) I start with a certain realisation of $Y$
2) I update $Y$ using a certain transition probability (suppose I am able to compute this)
3) I repeat step 2) $T$ times with $T$ very large
The outcome of this algorithm is for example $$ A_1, A_2, A_1, A_3, A_1, A_1, A_1, A_2,..., A_3 $$
(*) The magic is that if I plot the probability mass function of $$ A_1, A_2, A_1, A_3, A_1, A_1, A_1, A_2,..., A_3 $$ I get something very close to $f(A_1;\theta)$, $f(A_2;\theta)$, $f(A_3;\theta)$. Moreover if I compute the average of $$ A_1, A_2, A_1, A_3, A_1, A_1, A_1, A_2,..., A_3 $$ I get something very close to $E(Y)$.
Questions: is my summary above correct?
I am confused on the words often used in other sources to state (*), such as "the distribution of $Y$ converges to $f$ for $T\rightarrow \infty$", or "the sampler converges to a stable (stationary) distribution", etc. Are these equivalent ways to state (*)?
What is meant by "stationary distribution"? Stationary reminds me something that does not change but I don't see the relation here.