For every $t\geqslant0$, let $Y_t=\int\limits_0^tX_s\mathrm ds$. As explained by @Sasha, $Z_t=(X_t,Y_t)$ is gaussian hence the values
$$
x(t)=\mathrm E(X_t),\quad
y(t)=\mathrm E(Y_t),\quad
u(t)=\mathrm E(X_t^2),\quad
v(t)=\mathrm E(Y_t^2),\quad
w(t)=\mathrm E(X_tY_t),
$$
fully determine its distribution. To compute these, a convenient method is to consider directly the ordinary differential equations that the expectations of the solutions of stochastic differential equations solve, as follows.
First, $(X_t)$ and $(Y_t)$ solve the system $X_0=x_0$, $Y_0=0$, and
$$
\mathrm dX_t=-aX_t\mathrm dt+\sigma\mathrm dW_t,\quad
\mathrm dY_t=X_t\mathrm dt,
$$
and $(W_t)$ is a martingale hence
$$
x'(t)=-ax(t),\quad
y'(t)=x(t).
$$
Since one knows that $x(0)=x_0$ and $y(0)=0$, this linear differential system determines $x(t)$ and $y(t)$ for every $t\geqslant0$.
Turning to the second order quantities, let us first recall a general form of Itô's formula: for every $n\geqslant1$, every regular enough real valued function $\varphi$ defined on (an open set of) $\mathbb R^n$, and every $n$ dimensional diffusion $(\xi_t)$,
$$
\mathrm d\varphi(\xi_t)=\mathrm{grad}\, \varphi(\xi_t)\cdot\mathrm d\xi_t+\frac12H(\varphi)(\xi_t)\cdot\mathrm d\langle\xi,\xi\rangle_t,
$$
where $\mathrm{grad}\, \varphi(\xi)$ is the gradient of $\varphi$ at $\xi=(\xi^i)_{1\leqslant i\leqslant n}$ hence
$$
\mathrm{grad}\, \varphi(\xi_t)\cdot\mathrm d\xi_t=\sum\limits_{i=1}^n\partial_i\varphi(\xi_t)\mathrm d\xi^i_t,
$$
and $H(\varphi)(\xi)$ is the Hessian matrix of $\varphi$ at $\xi$ hence
$$
H(\varphi)(\xi_t)\cdot\mathrm d\langle\xi,\xi\rangle_t=\sum\limits_{i=1}^n\sum\limits_{j=1}^n\partial^2_{ij}\varphi(\xi_t)\mathrm d\langle\xi^i,\xi^j\rangle_t.
$$
First application: for $\varphi(\xi)=(\xi)^2$ and $\xi_t=X_t$, $\mathrm d\langle X,X\rangle_t=\sigma^2\mathrm dt$ yields
$$
\mathrm d(X_t^2)=2X_t\mathrm dX_t+\mathrm d\langle X,X\rangle_t=-2aX_t^2\mathrm dt+\mathrm d\text{(MG)}_t+\sigma^2\mathrm dt,
$$
where $\mathrm d\text{(MG)}_t$ is a martingale term whose value is irrelevant. Hence
$$
u'(t)=-2au(t)+\sigma^2.
$$
Second application: for $\varphi(\xi)=(\xi)^2$ and $\xi_t=Y_t$, $\mathrm d\langle Y,Y\rangle_t=0$ yields
$$
\mathrm d(Y_t^2)=2Y_t\mathrm dY_t=2Y_tX_t\mathrm dt,
$$
hence
$$
v'(t)=2w(t).
$$
Third application: for $\varphi(\xi^1,\xi^2)=\xi^1\xi^2$ and $\xi_t=(X_t,Y_t)$, $\mathrm d\langle X,Y\rangle_t=0$ yields
$$
\mathrm d(X_tY_t)=X_t\mathrm dY_t+Y_t\mathrm dX_t=X_t^2\mathrm dt-aX_tY_t\mathrm dt+\mathrm d\text{(MG)}_t,
$$
hence
$$
w'(t)=u(t)-aw(t).
$$
The three differential equations written above are a first order linear system in $(u(t),v(t),w(t))$. Since $(u(0),v(0),w(0))=(x_0^2,0,0)$, the values of $u(t)=\mathrm E(X_t^2)$, $v(t)=\mathrm E(Y_t^2)$ and $w(t)=\mathrm E(X_tY_t)$ follow, for every $t\geqslant0$.
To answer directly the OP's question, one can also keep in mind the expectations $x(t)=\mathrm E(X_t)$ and $y(t)=\mathrm E(Y_t)$ and write the differential system satisfied by the elements of the covariance matrix $C(t)=\mathrm E((X_t,Y_t)^T(X_t,Y_t))-(\mathrm E(X_t),\mathrm E(Y_t))^T(\mathrm E(X_t),\mathrm E(Y_t))$, namely
$$
U(t)=\mathrm{var}(X_t)=u(t)-x(t)^2,\quad
V(t)=\mathrm{var}(Y_t)=v(t)-y(t)^2,
$$
and
$$
W(t)=\mathrm{cov}(X_t,Y_t)=w(t)-x(t)y(t).
$$
One gets
$$
U'(t)=-2aU(t)+\sigma^2,\quad
V'(t)=2W(t),\quad
W'(t)=U(t)-aW(t).
$$
Together with the initial condition that $U(0)=V(0)=W(0)=0$, this fully determines the covariance matrix $C(t)=\begin{pmatrix}U(t) & W(t)\\ W(t) & V(t)\end{pmatrix}$ for every $t\geqslant0$.