The topic/problem is related to hashing for data structures used in programming, but I seek formal treatment. I hope that by studying the problem I will be enlightened of the fundamental limitations of hashing. So, the purpose is educational.
Let's consider a family of functions $f_i$ which will be applied on a random variable $X$, with unknown weight function $w$, drawn from some finite set $P$.
$f_i: P \rightarrow Q$
$\Vert P\Vert=n>\Vert Q\Vert =m$ (therefore $f$ is irreversible)
The entropy for $X$ is:
$H(X) = q$
What is the function $f_i$ that maximizes
$\min_w \{H(f_i(X))\}\qquad(\max)$.
Clarification: The objective function we are trying to maximize is the minimum entropy for $f_i(X)$, taken over the various possible weight functions $w$ for $X$.
To paraphrase, what is the transformation that promises to preserve the most entropy in its output, regardless of the input's distribution (, as long as the entropy of the input is some fixed $q$.) The question can also be raised with the ratio
$\min_w \left\{\dfrac{H(f_i(X))}{H(X)}\right\}\qquad(\max)$,
or the difference
$\min_w \{H(f_i(X)) - H(X)\}\qquad(\max)$,
and then $q$ can be permitted to vary. I suspect there will be subtleties that I am not trained well enough to see (; such as $q=0$).
The asymptotic behavior is interesting - when $P$ is much larger than $Q$ and $Q$ is sufficiently large. That is, $\frac n m\to+\infty$ and $m\to+\infty$. Particularly, finding a sequence of functions that tends to preserve the entropy completely as $m$ and $n$ grow indefinitely, or at least preserves it up to $\max(q, log_2m)$, would have great significance. If there are no such functions, is there a proof of that being the case?
I will be happy to have such result for the continuous case, if it simplifies things.
Also, I am interested to find literature that performs comparative study of the effect of certain well-known functions (modulo, division, etc.) on the entropy of their input, based on the general properties of the input's distribution (, not bound to particular distributions.)
Post-graduate CS level. Not the most math savvy, but I have passed introductory course to probability and statistics, and to information theory.
Thanks and Regards,