2
$\begingroup$

Suppose, $G$ is a $k \times n$ binary matrix with $\operatorname{rank}(G) = k$. The first $k$ columns of $G$ are linearly independent and the next $n-k$ columns are linear combinations of the first $k$ columns.

$G$ matrix can be a generator matrix of a linear code where the fist $k$ columns are the $k \times k$ identity matrix and the next $n-k$ column contain the parity bits and these columns are linear combinations if the first $k$ columns.

I am choosing $u$ number of columns randomly from $G$ to form the matrix $G_u$, where $u<=k$.

I want to find the probability that $\operatorname{rank}(G_u) = u$ for any random sub-matrix.

Really appreciate any suggestions to solve this problem.

Thanks in advance.

  • 0
    I don't think there's enough information to give an answer. For example, if the last $n-k$ columns are all zeros then the probability is very small; for a different matrix of the same size, the probability could be 1.2012-04-03
  • 0
    I am actually talking about a generator matrix of a linear channel code in which the first k columns are the identity matrix and the next n-k columns contain the parity bits. G = [ I | P ]. I have updated my question as well. Thanks for your comment.2012-04-03
  • 1
    I don't know what restrictions there are on $P$, but I suspect it's still the case that even if you fix $k,n,u$ the answer will depend on $P$. Take the case where $n-k=1$; the answer will (for some values of $u$) depend on how many ones there are in $P$.2012-04-03
  • 0
    Let's say P does not contain any all zero column vectors. Each column of P will be a linear combination of any of the k columns. I do not understand how the answer will depend on the number of ones in P. Thanks again...2012-04-03

1 Answers 1

2

As an example showing, why the answer depends on $P$, consider the matrices ($n=4$, $k=u=2$) $$ G_1=\pmatrix{1&0&1&1\cr0&1&0&0\cr},\quad G_2=\pmatrix{1&0&1&0\cr0&1&0&1},\quad G_3=\pmatrix{1&0&1&1\cr0&1&0&1}. $$ Two columns of $G_1$ will be linearly independent, iff the second column is among them. There the probability of a full rank submatrix is $3/6=1/2$. OTOH two columns of $G_2$ will be linearly independent unless you happened to pick 1st and 3rd or 2nd and 4th, so the probability of full rank is $4/6=2/3$. As the last case, two columns of $G_3$ will be linearly independent iff they are distinct, so the only pair violating that condition is 1st and 3rd. Therefore the probability of full rank has gone up to $5/6$.

It is easy to prove that with this set of parameters: $N=4, k=u=2$ it is impossible that the probability of a full rank submatrix would be equal to $1.$ Probability of full rank can be $1$ only, if the minimum Hamming distance of the dual code is $>u$ (and this is not possible for many choices of $n,k,u$). The way to see this is to view $G$ as a parity check matrix. If some $u$ columns are linearly dependent, then some subset of those columns sums to zero. Putting 1s in the positions of that subset we get a non-zero word of the dual code that has weight $\le u$. If the minimum Hamming distance of the dual code is $>u$, such sets do not exist, and the probability of full rank is equal to $1$.

So it all comes down to finding the minimum Hamming distance of the dual code.

  • 0
    I understand above. Thanks. The probability of rank does depend on P. But for large k and n, when I chose u columns randomly then there is a fair possibility of having Pr(rank(Gu)=u) = 1. I want to find a way to prove this analytically. My simulation results show that this is true.2012-04-03
  • 0
    My simulation results show that for N=100, k =50 and if I run for u = 1 to 50 Pr(rank(Gu)=u)=1 for all u2012-04-03
  • 0
    @anne, I doubt it very much. The dual code also has parameters $n=100,k=50$. I don't remember the bounds on the minimum Hamming distance of such codes, but I very much doubt that it could be as high as $25$. And that would likely require a very specific choice of $G$! But surely you haven't done an exhaustive search of the set of $u$ columns either! The low weight words of the dual code may be relatively rare, and if your simulation has just been lucky. It is very hard to prove by simulation that the probability is exactly $1$ :-)2012-04-03
  • 0
    I select the u number of columns over 100 trials and in each trials the column selection is different. If I select 25 column randomly during each trials and check the rank of that matrix, then the matrix has rank = 25. This is true for any random matrix created in Matlab. When I think of the hamming distance property you state, the generator matrix of the dual code will be sparse and therefore the minimum distance will not be > u for sure.2012-04-03
  • 1
    Let's take your $n=100$, $k=50$ case. Your first 50 columns form an identity matrix. Suppose the 51st column just has two ones in it. Say $u=20$. If your 20 columns include the 51st and include the two columns of the first 50 that have their ones in the same places as the 51st column, you have linear dependence, and rank less than 20. The probability of 20 chosen columns including those three particular ones is low, but it isn't zero.2012-04-03
  • 0
    @anne: You are aware that (here $n=100, u=20$) $${100\choose 20}>10^{20},$$ so 100 samples is not gonna cut it! It probably didn't happen that the dual of your code had a high minimum Hamming distance. The sample was just too small.2012-04-03
  • 0
    I agree with you Gerry Myerson... Do you think there is a way to analytically compute this probability?2012-04-03
  • 0
    If you choose 20 columns from 100, the probability of including any 3 particular columns is 97-choose-17 divided by 100-choose-20, which is 19/2695.2012-04-04
  • 0
    @anne, it is known that the problem of deciding, whether a given code has a non-zero word of Hamming weight below a prescribed bound is NP-hard. That pretty much kills all hope of deciding, whether your probability is exactly one or something less than that. If you would be satisfied with a rough estimate, then that it is easier. The weight distribution of a random linear code follows the binomial distribution very closely. That might enable you to give a ballpark figure. But taking everything into account is still quite a task.2012-04-04