Assuming $M$ is a differentiable manifold, this can be shown by choosing normal coordinates about any point $P\in M$. The following is expanding on the method suggested by Jason DeVito in the comments.
Putting a metric on the manifold, we can define the exponential map $\exp_P\colon T_PM\to M$ and then, choosing an orthonormal basis $e_1,\ldots,e_n$ for $T_PM$, we get a map $ \begin{align} f\colon\mathbb{R}^n&\to M,\\ (x^1,\ldots,x^n)&\mapsto\exp_P(x^ie_i) \end{align} $ (using the summation convention). As the manifold is compact, this is an onto mapping (only completeness is required). The idea, then, is that there is a circled open subset $U\subseteq\mathbb{R}^n$ on which this becomes a coordinate map with dense image. That $U$ is circled (aka balanced) means that the line segment joining any point $x\in U$ to the origin is in $U$. It can be seen that any circled open set is diffeomorphic to $\mathbb{R}^n$ (also see star domain).
Let $D$ be the set of $x\in\mathbb{R}^n$ such that $[0,1]\to M$, $t\mapsto f(tx)$ is a minimizing geodesic. Equivalently, $D$ is the set of $x\in\mathbb{R}^n$ with $d(P,f(x))=\Vert x\Vert$, from which we see that it is closed. Also, let $U$ be the interior of $D$. The set $\partial D\equiv D\setminus U$ is called the cut locus (more precisely, the cut locus is the subset of $T_PM$ corresponding to $\partial D$).
Fixing some $x\in\mathbb{R}^n\setminus\{0\}$, let us consider the line $t\mapsto f(tx)$ for $t\ge0$. Define $r > 0$ to be the maximum real number with $rx\in D$. For any $0 < r_0 < r$, we can show the following.
- On $[0,r_0]$, $t\mapsto f(tx)$ is the unique minimizing geodesic joining $P$ to $f(r_0x)$.
- The derivative of $f$ at $r_0x$ is invertible, so $f$ is nonsingular at $r_0x$.
- $r_0x$ is in $U$.
- $f(D)=M$ and $f(\partial D)\subseteq M$ has zero measure.
Property (3) implies that $U$ is circled, so it is diffeomorphic to $\mathbb{R}^n$. Property (1) says that $f$ is one-to-one on $U$ and, by (2), it is a diffeomorphism. Then, by (4), $f(U)\supseteq M\setminus f(\partial D)$ is dense, so $f$ gives a diffeomorphism from $U$ to a dense subset of $M$.
I'll give proofs of these statements now, although they do seem quite standard. See also these notes and, in particular, Lemma 5.3 for the proofs of (1) and (2) and Lemma 5.4 for the proof of (4).
Proof of (1): Consider a minimizing geodesic $\gamma\colon[0,r_0]\to M$ joining P to $f(r_0x)$, which will have length no more than $r_0\Vert x\Vert$. Then, we can extend $\gamma(t)$ to $r_0\le t\le r$ by setting $\gamma(t)=f(tx)$. As this curve joins $P$ to $f(rx)$ and is of length no more than $r\Vert x\Vert$, it must be a geodesic. So, $\gamma(t)=f(tx)$ for all $t\le r$, and $t\mapsto f(tx)$ is a unique minimizing geodesic on $[0,r_0]$.
Proof of (2): Choose a nonzero $y\in\mathbb{R}^n$ and consider the vector field $t\mapsto Y_t$ given by $Y_t=t\nabla_y f(tx)=\frac{\partial}{\partial s}f(t(x+sy))\vert_{s=0}$. By geodesic deviation, this is a Jacobi field. Also, $Y_0=0$ so, if $\nabla_yf(tx)$ was zero, $P$ and $f(r_0x)$ would be conjugate points along $t\mapsto f(tx)$. Then, it is a standard result that $t\mapsto f(tx)$ is not a minimizing geodesic on $[0,r]$ for any $r > r_0$ (see characterization of the cut locus), contradicting the choice of $r$. So, $\nabla_yf(r_0x)\not=0$, and $f$ is nonsingular at $r_0x$.
Proof of (3): If not, then there would be a sequence $x_i\not\in D$ tending to $r_0x$. By the definition of $D$, there exist $y_i\in\mathbb{R}^n$ with $f(y_i)=f(x_i)$ and $\Vert y_i\Vert < \Vert x_i\Vert$. Passing to a subsequence, we can suppose that $y_i$ tends to a limit $y$. So, by continuity, $\Vert y\Vert\le r_0\Vert x\Vert$ and $f(y)=f(r_0x)$. By (1), this means that $y=r_0x$. But, then, setting $a_i=\Vert y_i-x_i\Vert$, (2) contradicts the limit $\nabla_{(y_i-x_i)/a_i}f(r_0x)\sim (f(y_i)-f(x_i))/a_i=0$.
Proof of (4): By completeness, for any $Q\in M$, there is at least one minimizing geodesic joining $P$ to $Q$. So, $f(x)=Q$ for some $x\in D$, and $f(D)=M$. Next, (3) implies that, for any $x\in\mathbb{R}^n\setminus\{0\}$, the radial line $t\mapsto tx$ ($t\ge0$) intersects $\partial D$ at a single point. This means that $\partial D$ has zero measure. As $f$ is locally Lipschitz, it maps zero measure sets to zero measure sets, so $f(\partial D)$ has zero measure.
Finally, I'll admit that the details are a bit trickier than I initially thought when I commented that the method can be made to work "without too much trouble".