7
$\begingroup$

Suppose we are given $k$ points in $\{0,1\}^n$ (using Hamming distance as metric). Consider the two points that have the smallest distance between them. Does there exist any results bounding this distance? In particular I am interested in an explicit function $f(k,n)$ bounding this distance from above. Another way to ask would be to give a lower bound on how many entries the pair with the smallest distance between them are sharing, or how far away from each other can the closest pair be?

Any reference to such a function or related problems will help.

  • 4
    Well you need to add some precision : are you looking for a probabilistic bound? Asymptotic bound? Because an obvious bound would be $D \ge 1$ (if $D$ is the distance between those two points) which leads me to my next question : what is your metric? The Hamming distance? $\mathbb R^n$'s standard norm? We can't do better than $1$ as a lower bound since it can be attained for some set of $k$ points... (I'm assuming that two points who differ only in one component have distance $1$).2011-07-26
  • 0
    Not quite sure what you want - the minimum distance is at least 1, but it seems that you are thinking about a different problem.2011-07-26
  • 1
    Maybe some motivation on the question would help. Why do you require this bound?2011-07-26
  • 0
    @Patrick: The euclidean distance between two vectors $u,v$ in $\{0,1\}^n$ is indeed different from their hamming distance, but they are connected by a surprisingly tight relation: $d_{euc}(u,v) = \sqrt{d_{ham}(u,v)}$. So, I guess the answers for euclidean distance and standard distance are the same, up to a square root factor.2011-07-26
  • 5
    I don't know if you intended this, but one can also ask how _large_ can the minimum distance be. This is one of the questions underlying [coding theory](http://en.wikipedia.org/wiki/Coding_theory). Did you have something like this in mind?2011-07-26
  • 0
    I felt that the tag (analytic-geometry) was inappropriate, so I went ahead and removed it. Hope that was ok. Also, as Patrick and Mark point out, the question does not make sense as stated (we cannot say anything better than $f(n,k)=1$). I am down-voting it temporarily.2011-07-26
  • 0
    @Srivatsan: “one can also ask how large can the minimum distance be.” The question asks for an (upper) bound of the minimum distance, so the question in your comment is the same as the question being asked here. It is indeed the question in coding theory.2011-07-26
  • 1
    Have you looked at any books on coding theory that give plenty of upper bounds on the size of code of length $n$ and minimum Hamming distance $d$? IIRC J. van Lint's book Introduction to Coding Theory in Springer's GTM series describes several such bounds and methods of obtaining them. I'm supposed to work on something else at this seminar I am, so cannot say more at this time. Will check back later...2011-07-26
  • 0
    @Tsuyoshi It seems that the question has been edited subsequent to my comment. The original question asked for a lower bound (see the comments of Patrick and Mark also in this regard).2011-07-26
  • 0
    @Srivatsan: Ah, I see!2011-07-26
  • 0
    Finally, we have a question. =) Bound the minimal distance above using euclidean norm. ^_^!2011-07-26
  • 0
    Thanks for the edit - it makes more sense now.2011-07-26
  • 0
    @Patrick Da Silva Indeed thinking of the vectors as real vectors to bound the distance is a useful idea, and gives several non-trivial elementary bounds. But the best bounds for this problem seem to exploit the inherent combinatorial structure.2011-08-01

1 Answers 1

4

I think this is a nice question. As mentioned in the comments, this question underlies the theory of error-correcting codes. A number of nice texts are available on this subject, e.g.: MacWilliams & Sloane and van Lint. Some lecture notes, from the CS perspective, are available here and here.

Equip the boolean hypercube $\{0,1\}^n$ with the hamming metric as usual. A "code" is just a set $C$ of points in the hypercube. (More generally, one can also consider "$q$-ary codes", which are just codes for $\{0,1,\ldots, q-1\}^n$.) It's size is $|C|$ and it's minimum distance is $\min_{x,y \in C, x \neq y} d(x,y)$. The question is now to understand the trade-off between the size and the minimum distance of the code. The exact trade-off is still open, but we do know many bounds.

Lower bounds on the size. I think the Gilbert-Varshamov (GV) bound is the only non-trivial lower bound for the boolean case. See also here. One can show this bound by either considering a random code or a greedy packing of codes. (Actually, we can do slightly better; see the Jiang-Vardy paper for details.)

Upper bounds on the size. Here, there a number of interesting bounds. The most prominent are the Singleton bound (which is of interest mainly when $q$ is "large"), Griesmer bound, Hamming bound, Plotkin bound, Johnson & Elias-Bassalygo bound, and finally the McEliece-Rodemich-Rumsey-Welch bounds (there are two MRRW bounds). I cannot find a suitable link or reference, but see the Navon-Samorodnitsky paper for a proof of the first MRRW bound; this paper gives a delightful application of Fourier transforms to codes. The proofs of some of the elementary bounds in the list can be found in the lecture notes linked above also.

While the there has been no improvement in the lower bound on the rate over random codes, the upper bound has been improved several times, until the MRRW bound (1977). For this reason, it is sometimes conjectured that the GV bound is indeed the best possible for binary codes. It is worth pointing out here that we do have codes beating the GV bound for $q \geq 100$. (The $100$ is just a convenient number. I will find and post the exact result and their reference later.)

I regret that my answer has turned out to be just a bibliography of the known bounds, without any explanation. The question was too broad, and I can attempt a better job if the question is made narrow in scope and more focused (for instance, a separate question on only the elementary bounds, or only the MRRW bounds).

  • 0
    My original answer had some comment about the question not being research-level; I removed it since it was out-of-place and unnecessary. (Actually, I misunderstood that the question was posted on SE CSTheory, hence the comment.)2011-07-26
  • 0
    +1: Quite a comprehensive list. For some elementary discussion of these bounds see http://cstheory.stackexchange.com/q/6218/59322011-07-26
  • 1
    The GV can bound be improved for alphabets $q$ that are even powers of prime numbers at least 49. That result depends on relatively deep results from algebraic geometry (modular curves). The game played there is about the number of rational points vs genus of the curve "asymptotic optimality". Once the existence of curves with suitable parameters is shown, then the codes are constructed by evaluating elements of its function field with suitably bounded pole divisors. The Riemann-Roch theorem gives the dimension (=size) of the code.2011-07-26
  • 0
    Thanks a lot Srivatsan for the nice answer for the nice references. I know that the question is very broad and a little wavy.2011-07-27