I think the Schur complement is the right way to go:
Let us assume we would like to minimize a function $F$, where Gauss-Newton type methods such as LM (Levenberg Marquardt) are applicable. Thus, we have a function $F(\mathbf x) = f(\mathbf x)^\top f(\mathbf x)$ with $f( \bar{\mathbf{x}} ) \approx \mathbf 0$ at the mininum $\bar{\mathbf{x}}$.
In standard LM, we would repeatedly solve the normal equation $\mathtt H\delta = -\frac{\partial f(\mathbf x)}{\partial \mathbf x}^\top f(\mathbf x)$ with $\mathtt H=\left(\frac{\partial f(\mathbf x)}{\partial \mathbf x}^\top\frac{\partial f(\mathbf x)}{\partial \mathbf x} + \lambda \mathtt I\right)$ and update our estimate: \mathbf{x}' = \mathbf{x}+\delta.
Now let us assume further that $\mathbf x = \begin{bmatrix}\mathbf x_1\\\mathbf x_2\end{bmatrix}$ and we can solve \mathbf{x_2'} = \arg\min_{\mathbf x_2}F(\mathbf x_1,\mathbf x_2) for any fixed $\mathbf x_1$ very effciently.
Now, the Jacobian $\frac{\partial f(\mathbf x)}{\partial \mathbf x}= \left(\frac{\partial f(\mathbf x)}{\partial \mathbf x_1}, \frac{\partial f(\mathbf x)}{\partial \mathbf x_2} \right)$ and the augmented Hessian $\mathtt H = \begin{bmatrix}\mathtt A & \mathtt B\\ \mathtt B^\top & \mathtt C\end{bmatrix}$ with $\mathtt A = \frac{\partial f(\mathbf x)}{\partial \mathbf x_1}^\top\frac{\partial f(\mathbf x)}{\partial \mathbf x_1}+\lambda \mathtt I$, $\mathtt B = \frac{\partial f(\mathbf x)}{\partial \mathbf x_1}^\top\frac{\partial f(\mathbf x)}{\partial \mathbf x_2}$ and $\mathtt C = \frac{\partial f(\mathbf x)}{\partial \mathbf x_2}^\top\frac{\partial f(\mathbf x)}{\partial \mathbf x_2}+\lambda \mathtt I$. Thus, we get the following normal equation:
$ \begin{bmatrix}\mathtt A & \mathtt B\\ \mathtt B^\top & \mathtt C\end{bmatrix}\cdot \begin{bmatrix}\delta_1\\\delta_2\end{bmatrix} = \begin{bmatrix}\mathbf b_1\\\mathbf b_2 \end{bmatrix}$ with $\mathbf b_1 = -\frac{\partial f(\mathbf x)}{\partial \mathbf x_1}f(\mathbf x)$ and $\mathbf b_2 = -\frac{\partial f(\mathbf x)}{\partial \mathbf x_2}f(\mathbf x)$.
Using the Schur complement, we can solve for $\delta_1$:
$(\mathtt A - \mathtt B\mathtt C^{-1} \mathtt B^\top)\delta_1 = \mathbf b_1-\mathtt B\mathtt C^{-1} \mathbf b_2$
Afterwards, we can minimize $\mathbf x_2$ for a fixed \mathbf x_1' =\mathbf x_1+\delta_1.
Thus, we can apply the following LM scheme:
$\text{repeat until convergence}$
$\quad\quad\mathbf b_1 := -\frac{\partial f(\mathbf x)}{\partial \mathbf x_1}f(\mathbf x)$
$\quad\quad\mathbf b_2 := -\frac{\partial f(\mathbf x)}{\partial \mathbf x_2}f(\mathbf x)$
$\quad\quad\mathtt A := \frac{\partial f(\mathbf x)}{\partial \mathbf x_1}^\top\frac{\partial f(\mathbf x)}{\partial \mathbf x_1}+\lambda \mathtt I$
$\quad\quad\mathtt B := \frac{\partial f(\mathbf x)}{\partial \mathbf x_1}^\top\frac{\partial f(\mathbf x)}{\partial \mathbf x_2}$
$\quad\quad\mathtt C := \frac{\partial f(\mathbf x)}{\partial \mathbf x_2}^\top\frac{\partial f(\mathbf x)}{\partial \mathbf x_2}+\lambda \mathtt I$
$\quad\quad\delta_1 := (\mathtt A - \mathtt B\mathtt C^{-1} \mathtt B^\top)^{-1}( \mathbf b_1-\mathtt B\mathtt C^{-1} \mathbf b_2)$
\quad\quad \mathbf x'_1 := \mathbf x_1 + \delta_1
\quad\quad\mathbf x_2' := \arg\min_{\mathbf x_2}F(\mathbf x_1',\mathbf x_2)
\quad\quad \text{if } F(\mathbf x_1',\mathbf x_2') \ll F(\mathbf x_1,\mathbf x_2)
\quad\quad\quad\quad \mathbf x_1 := \mathbf x_1'
\quad\quad\quad\quad \mathbf x_2 := \mathbf x_2'
$\quad\quad\quad\quad \text{decrease } \lambda$
$\quad\quad \text{else}$
$\quad\quad\quad\quad \text{increase } \lambda$
$\quad\quad \text{end}$
$\text{end}$
Update: Alternatively/equivalently, we can simply solve the whole normal equation, (ignore $\delta_2$) and minimize $\mathbf x_2$ afterwards:
$\text{repeat until convergence}$
$\quad\quad$ solve $\begin{bmatrix}\mathtt A & \mathtt B\\ \mathtt B^\top & \mathtt C\end{bmatrix}\cdot \begin{bmatrix}\delta_1\\\delta_2\end{bmatrix} = \begin{bmatrix}\mathbf b_1\\\mathbf b_2 \end{bmatrix}$
\quad\quad \mathbf x'_1 := \mathbf x_1 + \delta_1
\quad\quad\mathbf x_2' := \arg\min_{\mathbf x_2}F(\mathbf x_1',\mathbf x_2)
\quad\quad \text{if } F(\mathbf x_1',\mathbf x_2') \ll F(\mathbf x_1,\mathbf x_2)
\quad\quad\quad\quad \mathbf x_1 := \mathbf x_1'
\quad\quad\quad\quad \mathbf x_2 := \mathbf x_2'
$\quad\quad\quad\quad \text{decrease } \lambda$
$\quad\quad \text{else}$
$\quad\quad\quad\quad \text{increase } \lambda$
$\quad\quad \text{end}$
$\text{end}$
Which of the two formulations is more efficient depends on the specific problem (sparseness structure and dimensionality of $\mathtt{ A, B, C}$ etc.).
Note that this scheme is much better than doing alternation, since the update $\delta_1$ takes the following update on $\mathbf x_2$ into account.
Update2: Of course, I did not invent this scheme. I am not aware of any textbook mentioning it, but I have seen something like this in the context of bundle adjustment: http://research.microsoft.com/pubs/131806/Jeong-CVPR10.pdf (embedded point iteration).