Given matrices $\mathrm A_1 \in \mathbb R^{m \times n}$, $\mathrm A_2 \in \mathbb R^{p \times q}$, $\mathrm B \in \mathbb R^{m \times q}$ and an integer $0 \leq r \leq \min \{n,p\}$, we have the following rank-constrained least-squares problem in matrix $\mathrm X \in \mathbb R^{n \times p}$
$\begin{array}{ll} \text{minimize} & \| \mathrm A_1 \mathrm X \mathrm A_2 - \mathrm B \|_{\text{F}}^2\\ \text{subject to} & \mbox{rank} (\mathrm X) \leq r\end{array}$
The nice case
If $\mathrm A_1$ and $\mathrm A_2$ are square (i.e., $m = n$ and $q = p$) and orthogonal, then
$\| \mathrm A_1 \mathrm X \mathrm A_2 - \mathrm B \|_{\text{F}} = \| \mathrm A_1 (\mathrm X - \mathrm A_1^{\top} \mathrm B \mathrm A_2^{\top}) \mathrm A_2 \|_{\text{F}} = \| \mathrm X - \mathrm A_1^{\top} \mathrm B \mathrm A_2^{\top} \|_{\text{F}}$
The rank-constrained least-squares problem
$\begin{array}{ll} \text{minimize} & \| \mathrm X - \mathrm A_1^{\top} \mathrm B \mathrm A_2^{\top} \|_{\text{F}}^2\\ \text{subject to} & \mbox{rank} (\mathrm X) \leq r\end{array}$
is a classical low-rank matrix approximation problem that can be solved by truncating the SVD of $n \times p$ matrix $\mathrm A_1^{\top} \mathrm B \mathrm A_2^{\top}$, keeping only the $r$ largest singular values. If $\mbox{rank} (\mathrm A_1^{\top} \mathrm B \mathrm A_2^{\top}) \leq r$, there is no need to truncate the SVD, of course.
Matrix equations
We introduce matrix variables $\mathrm Y_1 \in \mathbb R^{n \times r}$ and $\mathrm Y_2 \in \mathbb R^{p \times r}$ such that
$\mathrm X = \mathrm Y_1 \mathrm Y_2^{\top}$
whose rank is at most $r$ by construction. Consider the bilinear matrix equation
$\mathrm A_1 \mathrm Y_1 \mathrm Y_2^{\top} \mathrm A_2 = \mathrm B$
We have a system of $m q$ bilinear equations in $(n + p) r$ unknowns. If this system is feasible, then the minimum is zero. If this system is infeasible, then the minimum is positive. However, how do we solve this bilinear matrix equation?
Let us look for a least-squares solution. The cost function is
$\| \mathrm A_1 \mathrm X \mathrm A_2 - \mathrm B \|_{\text{F}}^2 = \| \mathrm A_1 \mathrm Y_1 \mathrm Y_2^{\top} \mathrm A_2 - \mathrm B \|_{\text{F}}^2 = \\ = \mbox{tr} (\mathrm A_2^{\top} \mathrm Y_2 \mathrm Y_1^{\top} \mathrm A_1^{\top} \mathrm A_1 \mathrm Y_1 \mathrm Y_2^{\top} \mathrm A_2) - 2 \langle \mathrm B , \mathrm A_1 \mathrm Y_1 \mathrm Y_2^{\top} \mathrm A_2 \rangle + \| \mathrm B \|_{\text{F}}^2$
Differentiating this cost function with respect to matrices $\mathrm Y_1$ and $\mathrm Y_2$ and finding where the partial derivatives vanish, we eventually obtain the following coupled cubic matrix equations
$\boxed{\begin{array}{rl} \quad\mathrm A_1^{\top} \left( \mathrm A_1 \mathrm Y_1 \mathrm Y_2^{\top} \mathrm A_2 - \mathrm B \right) \mathrm A_2^{\top} \mathrm Y_2 &= \mathrm O_{n \times r}\\ \mathrm Y_1^{\top} \mathrm A_1^{\top} \left( \mathrm A_1 \mathrm Y_1 \mathrm Y_2^{\top} \mathrm A_2 - \mathrm B \right) \mathrm A_2^{\top} \quad &= \mathrm O_{r \times p} \end{array}}$
We have a system of $nr + pr = (n+p) r$ coupled cubic equations in the $n r$ entries of matrix $\mathrm Y_1$ and in the $p r$ entries of matrix $\mathrm Y_2$. Note that the total number of unknowns is $(n + p) r$, the same as the number of equations. How do we solve this system of cubic equations?
Semidefinite programs
Using the $2$-norm instead of the Frobenius norm, we have
$\begin{array}{ll} \text{minimize} & \| \mathrm A_1 \mathrm X \mathrm A_2 - \mathrm B \|_2\\ \text{subject to} & \mbox{rank} (\mathrm X) \leq r\end{array}$
Do note that minimizing the norm $\| \mathrm A_1 \mathrm X \mathrm A_2 - \mathrm B \|_2$ is equivalent to minimizing $t$ subject to the two inequality constraints $\| \mathrm A_1 \mathrm X \mathrm A_2 - \mathrm B \|_2 \leq t$ and $t \geq 0$. Hence, we have the following optimization problem in $\mathrm X \in \mathbb R^{n \times p}$ and $t \geq 0$
$\begin{array}{ll} \text{minimize} & t\\ \text{subject to} & \| \mathrm A_1 \mathrm X \mathrm A_2 - \mathrm B \|_2 \leq t\\ & t \geq 0\\ & \mbox{rank} (\mathrm X) \leq r\end{array}$
Using the Schur complement test for positive semidefiniteness, the inequality $\| \mathrm A_1 \mathrm X \mathrm A_2 - \mathrm B \|_2 \leq t$ can be replaced by a linear matrix inequality (LMI) [0], as follows
$\begin{array}{ll} \text{minimize} & t\\ \text{subject to} & \begin{bmatrix} t \, \mathrm I_m & \mathrm A_1 \mathrm X \mathrm A_2 - \mathrm B\\ (\mathrm A_1 \mathrm X \mathrm A_2 - \mathrm B)^{\top} & t \, \mathrm I_q\end{bmatrix} \succeq \mathrm O_{m+q}\\ & t \geq 0\\ & \mbox{rank} (\mathrm X) \leq r\end{array}$
Dropping the rank constraint and penalizing the rank in the objective function, we obtain
$\begin{array}{ll} \text{minimize} & t + \gamma \, \mbox{rank} (\mathrm X)\\ \text{subject to} & \begin{bmatrix} t \, \mathrm I_m & \mathrm A_1 \mathrm X \mathrm A_2 - \mathrm B\\ (\mathrm A_1 \mathrm X \mathrm A_2 - \mathrm B)^{\top} & t \, \mathrm I_q\end{bmatrix} \succeq \mathrm O_{m+q}\\ & t \geq 0\end{array}$
where $\gamma \geq 0$. Unfortunately, the objective function is now non-convex. Using the nuclear norm as a convex proxy for the rank, we obtain
$\begin{array}{ll} \text{minimize} & t + \gamma \, \| \mathrm X \|_*\\ \text{subject to} & \begin{bmatrix} t \, \mathrm I_m & \mathrm A_1 \mathrm X \mathrm A_2 - \mathrm B\\ (\mathrm A_1 \mathrm X \mathrm A_2 - \mathrm B)^{\top} & t \, \mathrm I_q\end{bmatrix} \succeq \mathrm O_{m+q}\\ & t \geq 0\end{array}$
where $\gamma \geq 0$ is a parameter to be adjusted to "modulate" the rank of $\mathrm X$ until we (hopefully) satisfy the constraint $\mbox{rank} (\mathrm X) \leq r$. Nuclear norm minimization can be done via semidefinite programming (SDP) [0]. Thus, we obtain the following SDP in $\mathrm X \in \mathbb R^{n \times p}$, $\mathrm W_1 \in \mathbb R^{m \times m}$, $\mathrm W_2 \in \mathbb R^{q \times q}$, and $t \geq 0$
$\boxed{\begin{array}{ll} \text{minimize} & t + \frac{\gamma}{2} \mbox{tr} (\mathrm W_1) + \frac{\gamma}{2} \mbox{tr} (\mathrm W_2)\\ \text{subject to} & \begin{bmatrix} \mathrm W_1 & \mathrm X\\ \mathrm X ^{\top} & \mathrm W_2\end{bmatrix} \succeq \mathrm O_{m+q}\\ & \begin{bmatrix} t \, \mathrm I_m & \mathrm A_1 \mathrm X \mathrm A_2 - \mathrm B\\ (\mathrm A_1 \mathrm X \mathrm A_2 - \mathrm B)^{\top} & t \, \mathrm I_q\end{bmatrix} \succeq \mathrm O_{m+q}\\ & t \geq 0\end{array}}$
All the constraints can be written as a single LMI, as follows
$\begin{array}{ll} \text{minimize} & t + \frac{\gamma}{2} \mbox{tr} (\mathrm W_1) + \frac{\gamma}{2} \mbox{tr} (\mathrm W_2)\\ \text{subject to} & \begin{bmatrix} \mathrm W_1 & \mathrm X & & & \\ \mathrm X ^{\top} & \mathrm W_2 & & & \\ & & t \, \mathrm I_m & \mathrm A_1 \mathrm X \mathrm A_2 - \mathrm B & \\ & & (\mathrm A_1 \mathrm X \mathrm A_2 - \mathrm B)^{\top} & t \, \mathrm I_q & \\ & & & & t\end{bmatrix} \succeq \mathrm O_{2m+2q+1}\end{array}$
Reference
[0] Benjamin Recht, Maryam Fazel, Pablo A. Parrilo, Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization, July 19, 2007.