Without loss of generality (by replacing $f$ by $f - \alpha$ you can assume that $\iint_{\mathbb{R}_1\times\mathbb{R}_2} f \mathrm{d}(\mu_1 \otimes\mu_2) = 0$.
Let $g(x) = \int_{\mathbb{R}_2} f(x,y)\mathrm{d}\mu_2$ as you did, we note that by the above assumption $\int g(x) \mathrm{d}\mu_1 = 0$. Now we compute
$ \begin{align} \iint |f(x,y)|^2 \mathrm{d}(\mu_1\otimes\mu_2) & \leq 2\iint |f(x,y) - g(x)|^2 + |g(x)|^2 \mathrm{d}(\mu_1\otimes \mu_2) \tag{1}\\ & = 2 \int_{\mathbb{R}_1} \int_{\mathbb{R}_2} \left| f(x,y) - \left(\int_{\mathbb{R}_2} f(x,y')\mathrm{d}\mu_2\right)\mathrm{d}\mu_2\right|^2 \mathrm{d}\mu_1 + 2\int_{\mathbb{R}_1} \left|g(x) \right|^2 \mathrm{d}\mu_1 \tag{2} \\ & \leq C_2 \int_{\mathbb{R}_1} \int_{\mathbb{R}_2} \left|\nabla_y f(x,y)\right|^2 \mathrm{d}\mu_2\mathrm{d}\mu_1 + C_1\int_{\mathbb{R}_1} \left| \nabla_x g(x)\right|^2 \mathrm{d}\mu_1 \tag{3} \end{align} $
where we used
- The triangle inequality $|a + b| \leq |a| + |b|$ and the arithmetric-mean-geometric-mean inequality which implies $2|a||b| \leq |a|^2 + |b|^2$.
- The definition of $g$ and the fact that $\mu_2$ is a probability measure (so has total mass 1).
- In both terms we use Poincare inequality. In the first we use it on the function $f(x,y)$ in the $y$ variable. In the second we use it on the function $g(x)$ in the $x$ variable, noting that $\int g \mathrm{d}\mu_1= 0$.
To finish the proof, we note that by Minkowski's inequality
$ \left(\int_{\mathbb{R}_1} |\nabla g|^2\mathrm{d}\mu_1\right)^\frac12 \leq \int_{\mathbb{R}_2} \sqrt{ \int_{\mathbb{R}_1} |\nabla_x f(x,y)|^2 \mathrm{d}\mu_1}\mathrm{d}\mu_2 $
and by Holder's inequality on a probability space
$ \int_{\mathbb{R}_2} \sqrt{ \int_{\mathbb{R}_1} |\nabla_x f(x,y)|^2 \mathrm{d}\mu_1}\mathrm{d}\mu_2 \leq \left(\int_{\mathbb{R}_2} \int_{\mathbb{R}_1} |\nabla_x f(x,y)|^2 \mathrm{d}\mu_1 \mathrm{d}\mu_2 \right)^\frac12 $
which leads to our final conclusion that
$ \iint |f(x,y)|^2 \mathrm{d}(\mu_1\otimes\mu_2) \leq \max(C_1,C_2) \iint |\nabla f(x,y)|^2 \mathrm{d}(\mu_1\otimes \mu_2) $
In the $L^2$ case using the properties of the inner product we can get rid of a factor of 2 if we replace step (1) by
$\begin{align} \iint |f(x,y)|^2 \mathrm{d}(\mu_1\otimes \mu_2) &= \iint \left( f(x,y) - g(x) + g(x) \right)^2 \mathrm{d}(\mu_1 \otimes \mu_2) \\ &= \iint |f - g|^2 + |g|^2 + 2 g(f-g) \mathrm{d}(\mu_1\otimes \mu_2) \end{align} $
Now, observe that $\int_{\mathbb{R}_2} f(x,y)-g(x) \mathrm{d}\mu_2 = 0$ by definition, and hence also $\int_{\mathbb{R}_2} g(x)\cdot (f(x,y) - g(x)) \mathrm{d}\mu_2$. This implies that $ \iint |f(x,y)|^2 \mathrm{d}(\mu_1\otimes \mu_2) = \iint |f - g|^2 + |g|^2 \mathrm{d}(\mu_1\otimes \mu_2) $ which is one factor of 2 better than (1).