1
$\begingroup$

What is a good hash function for small nonsingular matrices over a field $\mathbb{F}_p$ for $p$ prime? I'm looking for an integer function which is close to being injective (but not necessarily perfectly injective) and doesn't take too long to compute in Matlab.

Things I've tried include the trace, the 3 diagonal elements (this can be turned into an integer function $A \mapsto A_{11}+pA_{22}+p^2A_{33}+...$), 4 randomly chosen elements, and the 3 column sums. But all of these make my program take an unreasonably long time for groups of order more than a few thousand.

Many thanks for any help with this!

  • 1
    Does the hash value need to be in $\mathbb F_p$?2012-08-28
  • 0
    No, not necessarily. It should be an integer though (which is why I ruled out e.g. sum of eigenvalues).2012-08-28
  • 0
    What exactly does "close to being injective" mean here? IOW how large should/can the values be. For example, there are $(p^2-1)(p^2-p)$ (or close to $p^4$) non-singular 2x2 matrices, so will you allow values up to $p^4$ or is something smaller ok? (for large $p$ nearly all matrices are non-singular, so...)2012-08-28
  • 0
    Oh, and sum of eigenvalues (counted suitably, i.e. with multiplicities as roots of the characteristic polynomial, and in a suitable extension field of $\mathbb{F}_p$) is always the trace, so it is an element of $\mathbb{F}_p$.2012-08-28
  • 0
    @JyrkiLahtonen: values up to $p^4$ should be fine. The primes I'm considering are all less than 15. By 'close to being injective', I mean it doesn't matter if two or three matrices have the same hash function, but not twenty matrices.2012-08-28
  • 0
    The number of non-singular $n\times n$ matrices is $$\prod_{i=0}^{n-1}(p^n-p^i).$$ So it looks like your integers need to be of size close to $p^{n^2-1}$ to be somewhere close to injective (that $p^4$ was for two by two matrices only).2012-08-28
  • 0
    Or in other words, the hash needs to depend on (at least) all but one of the matrix entries. Is that a problem?2012-08-28
  • 0
    For example, if your matrices are 4x4, and you use three column sums $S_1$, $S_2$, $S_3$, then there will be at least $p^3(p-1)$ repetitions, because you can fill in the fourth column in that many ways. In practice there will be much more collisions, because several different columns give rise to same column sums...2012-08-28
  • 0
    @JyrkiLahtonen: in practice I'm looking at subgroups of $GL_d(\mathbb{F}_p)$ of size up to about 15000, each generated by 2 matrices. Does this mean there's a simpler hash function? I don't really mind how many matrix entries it depends on, as long as it's quick to compute!2012-08-28
  • 0
    Oh. I misunderstood. I assumed that you wanted a hash function defined on the group of **all** non-singular matrices. 15000 or so is not too large for a full table. I think that there are relatively efficient algorithms for doing that in Magma or GAP.2012-08-28

1 Answers 1

1

There is a very simple algorithm which gives you a hash function, assuming you store your matrix entries with integers in $\{0,1,...,p-1\}$:

hash = 0 for i=1..n   for j=1..n     hash=p*hash+A[i][j] 

Also please note that the examples you provide are not really good hash functions. There are way too many collisions, which would give serious performance hits if used in e.g. a hash table. The example I provided above is a perfect hash function (i.e. an injective map from the $\mathbb{F}_p$ matrices to $\mathbb{Z}$), so no collisions can exist here (unless $n$ and $p$ are large enough to cause overflow, but even then the probability is very small).

I don't think one can do much better than this, since any good hash would take all matrix entries into account, and this only does one addition and one multiplication per entry. If this is still not working for you, chances are that you don't just need a hash.

  • 0
    Thanks - this is more or less what I ended up using!2012-10-14