For every $t\geqslant0$, let $Y_t=\int\limits_0^tX_s\mathrm ds$. As explained by @Sasha, $Z_t=(X_t,Y_t)$ is gaussian hence the values $ x(t)=\mathrm E(X_t),\quad y(t)=\mathrm E(Y_t),\quad u(t)=\mathrm E(X_t^2),\quad v(t)=\mathrm E(Y_t^2),\quad w(t)=\mathrm E(X_tY_t), $ fully determine its distribution. To compute these, a convenient method is to consider directly the ordinary differential equations that the expectations of the solutions of stochastic differential equations solve, as follows.
First, $(X_t)$ and $(Y_t)$ solve the system $X_0=x_0$, $Y_0=0$, and $ \mathrm dX_t=-aX_t\mathrm dt+\sigma\mathrm dW_t,\quad \mathrm dY_t=X_t\mathrm dt, $ and $(W_t)$ is a martingale hence x'(t)=-ax(t),\quad y'(t)=x(t). Since one knows that $x(0)=x_0$ and $y(0)=0$, this linear differential system determines $x(t)$ and $y(t)$ for every $t\geqslant0$.
Turning to the second order quantities, let us first recall a general form of Itô's formula: for every $n\geqslant1$, every regular enough real valued function $\varphi$ defined on (an open set of) $\mathbb R^n$, and every $n$ dimensional diffusion $(\xi_t)$, $ \mathrm d\varphi(\xi_t)=\mathrm{grad}\, \varphi(\xi_t)\cdot\mathrm d\xi_t+\frac12H(\varphi)(\xi_t)\cdot\mathrm d\langle\xi,\xi\rangle_t, $ where $\mathrm{grad}\, \varphi(\xi)$ is the gradient of $\varphi$ at $\xi=(\xi^i)_{1\leqslant i\leqslant n}$ hence $ \mathrm{grad}\, \varphi(\xi_t)\cdot\mathrm d\xi_t=\sum\limits_{i=1}^n\partial_i\varphi(\xi_t)\mathrm d\xi^i_t, $ and $H(\varphi)(\xi)$ is the Hessian matrix of $\varphi$ at $\xi$ hence $ H(\varphi)(\xi_t)\cdot\mathrm d\langle\xi,\xi\rangle_t=\sum\limits_{i=1}^n\sum\limits_{j=1}^n\partial^2_{ij}\varphi(\xi_t)\mathrm d\langle\xi^i,\xi^j\rangle_t. $ First application: for $\varphi(\xi)=(\xi)^2$ and $\xi_t=X_t$, $\mathrm d\langle X,X\rangle_t=\sigma^2\mathrm dt$ yields $ \mathrm d(X_t^2)=2X_t\mathrm dX_t+\mathrm d\langle X,X\rangle_t=-2aX_t^2\mathrm dt+\mathrm d\text{(MG)}_t+\sigma^2\mathrm dt, $ where $\mathrm d\text{(MG)}_t$ is a martingale term whose value is irrelevant. Hence u'(t)=-2au(t)+\sigma^2. Second application: for $\varphi(\xi)=(\xi)^2$ and $\xi_t=Y_t$, $\mathrm d\langle Y,Y\rangle_t=0$ yields $ \mathrm d(Y_t^2)=2Y_t\mathrm dY_t=2Y_tX_t\mathrm dt, $ hence v'(t)=2w(t). Third application: for $\varphi(\xi^1,\xi^2)=\xi^1\xi^2$ and $\xi_t=(X_t,Y_t)$, $\mathrm d\langle X,Y\rangle_t=0$ yields $ \mathrm d(X_tY_t)=X_t\mathrm dY_t+Y_t\mathrm dX_t=X_t^2\mathrm dt-aX_tY_t\mathrm dt+\mathrm d\text{(MG)}_t, $ hence w'(t)=u(t)-aw(t). The three differential equations written above are a first order linear system in $(u(t),v(t),w(t))$. Since $(u(0),v(0),w(0))=(x_0^2,0,0)$, the values of $u(t)=\mathrm E(X_t^2)$, $v(t)=\mathrm E(Y_t^2)$ and $w(t)=\mathrm E(X_tY_t)$ follow, for every $t\geqslant0$.
To answer directly the OP's question, one can also keep in mind the expectations $x(t)=\mathrm E(X_t)$ and $y(t)=\mathrm E(Y_t)$ and write the differential system satisfied by the elements of the covariance matrix $C(t)=\mathrm E((X_t,Y_t)^T(X_t,Y_t))-(\mathrm E(X_t),\mathrm E(Y_t))^T(\mathrm E(X_t),\mathrm E(Y_t))$, namely $ U(t)=\mathrm{var}(X_t)=u(t)-x(t)^2,\quad V(t)=\mathrm{var}(Y_t)=v(t)-y(t)^2, $ and $ W(t)=\mathrm{cov}(X_t,Y_t)=w(t)-x(t)y(t). $ One gets U'(t)=-2aU(t)+\sigma^2,\quad V'(t)=2W(t),\quad W'(t)=U(t)-aW(t). Together with the initial condition that $U(0)=V(0)=W(0)=0$, this fully determines the covariance matrix $C(t)=\begin{pmatrix}U(t) & W(t)\\ W(t) & V(t)\end{pmatrix}$ for every $t\geqslant0$.