Since this is homework, I probably shouldn't write down a complete solution. But let's at least write down a definition for the convolution general enough for the situation described above (taken from my lecture notes of the course "Distribution et équations aux derivées partiélles" by André Cérezo):
Théorême Soient $S,T \in \mathcal{D}'(\mathbb{R}^n)$, $F=(\operatorname{supp} S_x)\times(\operatorname{supp} T_y)\subset \mathbb{R}^{2n}$, et $\Delta=\{ (x,-x)|x\in \mathbb{R}^n\}\subset \mathbb{R}^{2n}$. Supposons que, pour tout $K\Subset\mathbb{R}^n$, le fermé $(K\times\{0\}+\Delta)\cap F$ soit un compact de $\mathbb{R}^{2n}$. Alors la formule $(*)\qquad\forall \varphi\in \mathcal{D}(\mathbb{R}^n)\qquad =$ définit une distribution sur $\mathbb{R}^n$, appelée "produit de convolution" de $S$ et $T$.
Here $K\Subset\mathbb{R}^n$ means that $K$ is compact. We have $\mathcal{S}'(\mathbb R)\subset\mathcal{D}'(\mathbb R)$, so the first step is to verify the additional condition. This gives us $u*v\in\mathcal{D}'(\mathbb R)$. Now all that is left to show is $u*v\in\mathcal{S}'(\mathbb R)$.
Edit (the requested translation of the cited theorem)
Theorem Let $S,T \in \mathcal{D}'(\mathbb{R}^n)$, $F=(\operatorname{supp} S_x)\times(\operatorname{supp} T_y)\subset \mathbb{R}^{2n}$, and $\Delta=\{ (x,-x)|x\in \mathbb{R}^n\}\subset \mathbb{R}^{2n}$. Assume that for all $K\Subset\mathbb{R}^n$, the closed set $(K\times\{0\}+\Delta)\cap F$ is always compact. Then the formula $(*)\qquad\forall \varphi\in \mathcal{D}(\mathbb{R}^n)\qquad =$ defines a distribution on $\mathbb{R}^n$. It is called the "convolution" of $S$ and $T$.