I can't (yet) describe the theory behind it, but I wrote an article on fitting Leslie matrices to data: http://arxiv.org/abs/1203.2313
The above article references the method as presented in Caswell's book on matrix population models, and only really works for a specific format of transition matrix. The method can actually be made quite general and should be framed in terms of Kronecker products and the columnize-ing vec operator. I am working on that article currently, and will post a draft if you ask me to.
To get you started consider a data matrix $D$ at time t being the result of multiplication by an unknown transition matrix $X$:
$D_{t+1}=XD_{t}$
This is equivalent to the following with $I$ a conformable identity matrix:
$D_{t+1}=IXD_{t}$ (1)
Now you can use the following formula (from various textbooks):
$vec(ABC)=(C^{T}\otimes A)vec(B)$
For our case, B is the unknown transition matrix, A is the identity matrix, C is the data matrix. Take $vec()$ of both sides of (1):
$vec(D_{t+1}) = vec(IXD_{t}) = (D_{t} \otimes I)vec(X) $
This is a standard matrix equation. Rewrite as
$ (D_{t} \otimes I)^{-1} vec(D_{t+1}) = vec(X)$
and solve if possible. It is unlikely that there is a single solution to this (see textbooks etc), but you can minimize the following
$|| (D_{t} \otimes I)^{-1} vec(D_{t+1}) - vec(X) ||_2$ (2)
Then you de-vec $vec(X)$ to get the original transition matrix.
Caveats: So far I have found that you need to constrain the heck out of X to make it usable and enforce sparsity, and you also probably want to minimize the norm of (2) for an underdetermined solution. I do the following in the paper, and I am working on articulating how to do the latter with SVD now.
If you are using this, please cite me, and feel free to IM or ask questions here.