0
$\begingroup$

I'd like to ask if somebody could help me parametrize $5x5$ doubly stochastic matrix, say $P$, i.e., a square matrix of nonnegative real numbers, each of whose rows and columns sums to 1. (I need to estimate matrix P and plan to use an unconstrained estimation procedure. Hence, I try to reparametrize P such that it accounts for the full set of constraints.)

I tried to represent an element $p_{ij}$ of P, where $i=1:4$, $j=1:4$, as: \begin{equation} p_{ij} = \frac{\exp(\alpha_{ij})}{1 + \sum_{k=1}^4 \exp(\alpha_{ik}) + \sum_{k=1}^4 \exp(\alpha_{kj}) - \exp(\alpha_{ij})} \end{equation} an element $p_{i5}$ of $P$, where $i=1:4$, as: \begin{equation} p_{i5} = 1 - \sum_{k=1}^4 \exp(\alpha_{ik}), \end{equation} and an element $p_{5j}$ of $P$, where $j=1:5$, as: \begin{equation} p_{5j} = 1 - \sum_{k=1}^4 \exp(\alpha_{kj}), \end{equation} where $\alpha_{11}\in R, \alpha_{12}\in R,..,\alpha_{44}\in R$. These 16 parameters could be estimated by unconstrained MLE. Unfortunately, the above parametrization doesn't guarantee $p_{55}$ being positive and unconstrained optimization doesn't apply there. Thus, I'm looking for an alternative way of parametric matrix P.

1 Answers 1

0

Adams & Zemel (see https://arxiv.org/pdf/1106.1925) devised a technique, based on the Sinkhorn-Knopp theorem, which they call Sinkhorn propagation. In essence: A $n \times n$ doubly stochastic matrix (DSM) can be parametrized with a strictly positive $n \times n$ parameter matrix $M$. Due to the Sinkhorn-Knopp theorem, $M$ can be iteratively row-normalized, column-normalized, row-normalized, column-normalized, etc., and the sequence of resulting normalized matrices $M_1, M_2, \ldots, M_\infty$ converges to a DSM. By truncating this sequence to a finite number of row/column normalization steps, you obtain a tight approximation to a DSM.

For optimizing over DSMs, you can define a function $f=n_{row}(n_{col}(n_{row}(\ldots(M)\ldots)))$, where $n_{row}$ is a row-wise normalization and $n_{col}$ is a column-wise normalization, that maps from any strictly positive $M$ into a DSM. By approximating $f$ as the composition of a finite number of normalizations, you can then use gradient descent on $f$ w.r.t. $M$ as part of a MLE procedure.