One example arises in the study of superprocesses. I know only a little bit about these, so what follows may be a bit sketchy and possibly wrong. Allison Etheridge's book An Introduction to Superprocesses looks like a promising place to read more.
For this example, let's take $X$ to be some nice compact manifold, say for instance a sphere $S^d$ or torus $(S^1)^d$, on which we know how to run Brownian motion. ($X = \mathbb{R}^d$ might be more natural but its non-compactness complicates things slightly.) Also, to start, I actually would rather consider positive finite measures which need not have total mass 1. So let $\mathcal{M}(X)$ denote the space of all such measures on $X$. Under the weak topology, this is still a nice space; it is Polish (and if I am not mistaken, even locally compact), and so its Borel $\sigma$-field gives it a nice measurable structure as well.
A typical example is branching Brownian motion. Imagine that you start with $n$ particles in $X$, each moving independently according to Brownian motion. However, after an exponentially distributed amount of time, a particle splits into a random number of new "offspring" particles. Each new particle starts at the time and place where the split occurred, and evolves according to its own (conditionally) independent branching Brownian motion (and may itself split). (One could allow the number of new particles to be zero, in which case the particle can be seen as having died.)
If we want to consider branching Brownian motion as a stochastic process $Y_t$, we need to decide in what state space it should take its values: it has to be able to represent the location of all the particles alive at a given moment, however many that happens to be. Our first guess might be to represent the state of the process as a (finite) subset of $X$, so that our state space becomes the set of all finite subsets of $X$; let me call it $\mathcal{F}(X)$. However, this is a bit awkward:
First, we would have to decide how to define an appropriate topology and measurable structure on $\mathcal{F}(X)$, which inherits enough of the topology of $X$ that we can keep track of the fact that each particle moves continuously.
Second, it doesn't account for the possibility of two particles at the same location, as happens when a particle has just split, or if two particles have collided (though in this model they just pass through each other without interacting). So maybe we need multisets or something.
Third and most seriously, $\mathcal{F}(X)$ is not a vector space. We cannot make sense of the expectation of $Y_t$. Also, if we want to study the behavior as the initial number of particles goes to infinity, we will want to be able to rescale $Y_t$ somehow.
The solution to these problems is as follows: instead of representing the state of the process by a subset of $X$, we represent it by a measure, where the location of the particles are marked by unit point masses. So the state where there are particles at locations $\{x_1, \dots, x_m\}$ corresponds to the measure $\sum_{i=1}^m \delta_{x_i}$. We can thus think of $Y_t$ as a stochastic process taking its values in $\mathcal{M}(X)$. This solves all the above problems:
$\mathcal{M}(X)$ has a nice natural topology and measurable structure, as discussed above. We can also show, for example, that the process $Y_t$ is càdlàg; the jumps correspond to times when a particle splits (so if a single particle at location $x$ splits into two offspring, the process jumps from $\delta_x$ to $2 \delta_x$.
This model naturally counts locations with multiplicity, by just putting more mass at a point if it's occupied by several particles.
$\mathcal{M}(X)$ is (almost) a vector space (okay, it's a cone in the vector space of signed measures); we can scale and take positive linear combinations, and that's enough to handle expectations and scaling limits.
So if $Y_t$ is a process valued in $\mathcal{M}(X)$, then the law of $Y_t$ at any fixed time $t$ is a probability measure on $\mathcal{M}(X)$, i.e. an element of $\mathcal{P}(\mathcal{M}(X))$. (The law of the entire process is a probability measure on the Skorohod space $\mathcal{D}([0, \infty), \mathcal{M}(X))$ of càdlàg paths in $\mathcal{M}(X)$, which is rather more complicated, but still a Polish space.) We get other nice properties of $Y_t$ as well; for instance, thanks to the exponential reproduction times and independence of the offspring, $Y_t$ is Markov.
There are other models where the number of particles remains constant; in this case, we can renormalize our measures to have total mass 1, and get a process valued in $\mathcal{P}(X)$, whose one-dimensional distributions are elements of $\mathcal{P}(\mathcal{P}(X))$. Or, we could say each particle starts with a certain amount of mass, and when it splits, that mass is divided among its offspring so that the total amount of mass in the system is conserved. Then we can represent the state with particles at $x_1, \dots, x_m$ with respective masses $c_1, \dots, c_m$ by the measure $\sum c_i \delta_{x_i}$; if we take the total mass of the system to be 1 we again have a $\mathcal{P}(X)$ valued process. (Here, to preserve the Markov property I guess we have to assume that if two particles collide they coalesce.)