9
$\begingroup$

I have $n$ data blocks and $k$ parity blocks distributed across $m$ boxes where each box can contain atmost $b$ blocks. Each parity block is Ex-or of some data blocks (for ease of understanding we can assume each data/parity block as a single bit) and each data block is involved in atleast one Ex-or operation to get some parity block. Each data block can be involved in more than one parity block operation, infact will be as that will increase the fault tolerance of the code. So, we have $k$ parity equations as follows:

$$\begin{aligned} P_1 &= D_{11} \oplus D_{12} \oplus \cdots \\ \;\vdots \\ P_k &= D_{k1} \oplus D_{k2} \oplus \cdots \end{aligned}$$

Now if some data or parity block fails and we cannot recover the lost data blocks from the non-failed data/parity blocks we call this situation a "dataloss" situation. Now, I will give a simple example to explain this. Assume that we have $4$ data blocks and $3$ parity blocks where the parity equations are as follows: $$\begin{aligned} P_1 &= D_{1} \oplus D_{3} \oplus D_{4}\ \\ P_2 &= D_{1} \oplus D_{2} \oplus D_{3}\ \\ P_3 &= D_{2} \oplus D_{3} \oplus D_{4} \end{aligned}$$

We represent them by a matrix called generator matrix which is a $4$x$7$ matrix which is as follows:

$$ G=\left(\begin{array}{ccccccc} 1&0&0&0&1&1&0\\ 0&1&0&0&0&1&1\\ 0&0&1&0&1&1&1\\ 0&0&0&1&1&0&1 \end{array}\right) $$

Now, suppose $D1$,$D2$,$D3$ fails and we want to check whether that causes dataloss. So, we remove the columns corresponding to $D1$,$D2$,$D3$ from G and call the matrix we get "recovery matrix" which is

$$ G'=\left(\begin{array}{cccc} 0&1&1&0\\ 0&0&1&1\\ 0&1&1&1\\ 1&1&0&1 \end{array}\right) $$ here. Now, we check the rank of $G'$. If this is less than $n$ then we have $dataloss$, otherwise not. Here $rank$(G')=4 and $n$=4 and we don't have any $dataloss$. Basically, if we Ex-or $P1$,$P2$,$P3$ then we can recover $D3$, from that we can recover $D1$,$D2$.

A data/parity block fails if and only if the box containing it fails and if a box fails then all the data and parity blocks inside it fails. Now we want to formulate the following optimization problem:

Given $n$, $k$, $m$ and the $k$ parity equations (we are not allowed to choose which data blocks are xor-ed to form a check block, we are given an erasure code system) we want an assignment of the disk and parity blocks across $m$ boxes such that we can minimise the number cases where a box failure causes "dataloss"; i.e I want to minimise the function $G=\sum_{i=0}^m\;g(i)$, where

$$g(i) = \begin{cases} 1 & \text{if failure of box }i\text{ causes "dataloss"} \\ 0 & \text{otherwise} \end{cases}$$

Now the assignment of disk/parity blocks across $m$ boxes can be represented by a matrix of dimension $(n+k) \times m$, lets call it $B$. The $(i,j)$-th entry of the matrix is $1$ if the $j$-th box contains $i$-th data block ($(i-n)$-th parity block) for $i \le n$ (for $i>n$), otherwise it is $0$. So, the function $G$ is a function of all those matrix entries and the constraints we have are:

  1. each matrix entry is either 0 or 1
  2. sum of the entries in a row is always 1 (i.e each data/parity block is present in exactly one box)
  3. sum of the entries in each column is at least 1 (i.e. no box is empty)
  4. summation of all the matrix entries is $n+k$

Now, the function $G$ is coming as a non-linear function of the matrix entries. Also the domain set of $G$ is set of all $n$-tuples where elements of a tuple are 0 or 1. So, the domain set is not a convex set, so convex optimization techniques can't be applied here. How to solve this optimization problem?

If I refine problem like this: How much of the $n+k$ blocks can be covered using m subsets each of size atmost b then I think it is possible to formulate it as a maximum independent set problem. We will form a graph of n+k vertices where each vertex corresponds to one block(data/parity) but the definition of edges is not clear to me (this is the main problem I am trying to solve). Then the maximum independent set of this graph will give us the maximum subset (of the n+k columns) that can be covered.

Somehow I feel that the problem is NP-Complete.

Hope I have described the problem clearly.

======

Edit (comment from Jyrki Lahtonen): This was also posted at MO. I try to redirect eventual interested MO-posters here, because at the moment I think we have a better description of the question here.

  • 0
    In "Each parity block is Ex-or of some data block", I think you mean "data blocks" in the plural? The singular doesn't seem consistent with what follows. Also, in "each data block is involved in some Ex-or operation to get some parity block", is the implication that each data block is involved in *exactly* one parity block operation?2011-08-14
  • 0
    I think that you should tell a little bit more: How many blocks fit into a single box? Are we allowed to try to be clever in choosing which data blocks are XORed to form a check block? (that has an effect on the sets of blocks that we need to have in order to recover a data block) If I understood it correctly, you are optimizing a code for distributed storage. Then your luck is in, because P. Vijay Kumar from IISc, Bangalore, is one of the leading experts of the field (AFAIK). You may also benefit from a study of ideas underlying [fountain codes](http://en.wikipedia.org/wiki/Fountain_code)?2011-08-14
  • 0
    @all Thanks for yor reply. I forgot to mention one constraint that "each box can contain at most b blocks".2011-08-14
  • 0
    @ joriki: each data block can be involved in more than one parity block operation, infact will be as that will increase the fault tolerance of the code2011-08-14
  • 0
    @Jyrki: no we are not allowed to choose which data blocks are xor-ed to form a check block. We are given an erasure code system and we want to optimize the configuration for that code. So, the code is fixed and we want to find which configuration will give me maximum reliability rather than what you are saying.2011-08-14
  • 0
    @Prasenjit: Ok, so you want to find the best way to split the bits of the erasure code into boxes? Is that the question?2011-08-14
  • 0
    @ Jyrki: yes, you are right2011-08-14
  • 0
    @Prasenjit: Is data recovery by Gaussian elimination (see my toy example) ok, or would you insist on a converging message passing algorithm on the Tanner graph or something like that?2011-08-14
  • 0
    @Jyrki: data recovery is by Gaussian elimination because the decoding algorithm I know for a Tanner graph is not giving correct result all time.2011-08-14
  • 0
    @Jyrki: I don't have the required reputation to talk on chat, let's talk here.2011-08-14
  • 0
    @all: due to the problem of solving this optimization problem I think that it is better to search for some algorithm/approximation algorithm that will give me good result.2011-08-15
  • 0
    It seems that the minimization problem is NP-complete even if we assume that the B matrix entries are in [0,1] rather than in {0,1}2011-08-23

3 Answers 3