5
$\begingroup$

Introduce the function $g(n)$ defined for positive integer arguments $n$ as the count of squares of positive integers among the numbers that can be formed by taking some subsequence of the digits of the binary representation of $n$ and interpreting this subsequence as a binary number. All subsequences contribute.

For example, $5 = (101)_2$ and the subsequences of digits are $ (1)_2 = 1, (0)_2 = 0, (1)_2 = 1, (10)_2 = 2, (11)_2 = 3, (01)_1 = 1, (101)_2 = 5$ and this includes the square "one" three times, so $g(5) = 3.$

When $n = 9 = (1001)_2$, we obtain the following subsequences: $ (1)_2 = 1, (0)_2 = 0, (0)_2 = 0, (1)_2 = 1, (10)_2 = 2, (10)_2 = 2, (11)_2 = 3, (00)_2 = 0, (01)_2 = 1, (01)_2 = 1$ and $ (100)_2 = 4, (101)_2 = 5, (101)_2 = 5, (001)_2 = 1, (1001)_2 = 9$ and this includes seven squares, so $g(9) = 7.$

The behavior of $g(n)$ is highly erratic, ranging from linear for $n=2^k$ to logarithmic for $n=2^k-1,$ which motivates us to ask the following question. What is the first term of the asymptotic expansion of the average order of $g(n)$? I.e. find the asymptotics of $ \frac{1}{n} \sum_{k=1}^n g(k).$

It might be useful to establish and prove closed form expressions for special values of $g(n)$, e.g. we have $g(2^k) = 2^{k-1}$ and $g(2^k-1) = k$ as pointed out earlier (prove these). Here are the first few terms of $g(n):$ $1, 1, 2, 2, 3, 2, 3, 4, 7, 4, 5, 4, 4, 3, 4, 8, 15, 9, 12, 8, 9, 6, 7, 8, 11$

This problem is not suited to numeric experimentation because $g(n)$ fluctuates so rapidly. For those who want to try anyway here is some C code. (I was going to memoize this the same way I memoized the Maple routine, but never got around to it. This is why the code looks as it does.) C Code for the square indicator sum over binary digit subsequences

  #include  #include  #include  #include   unsigned long long isqrt(unsigned long long n) {   if(!n) return 0;    unsigned long long a = 1, b = n, mid;    do {     mid = (a+b)/2;     if(mid*mid>n){       b = mid;     }     else{       a = mid;     }   } while(b-a>1);    return a; }  unsigned long long g(unsigned long long n) {   int bits[256], len = 0;   while(n>0){     bits[len++] = n%2;     n >>= 1;   }    unsigned long long res = 0, ind;   for(ind=0; ind<(1<0){       if(varind%2){         subseq[pos++] = bits[srcpos];       }       srcpos++;       varind >>= 1;     }      unsigned long long val = 0;     int seqpos = 0;     while(seqpos0 && cs*cs == val){       res++;     }   }    return res; }  unsigned long long gavg(unsigned long long max) {   unsigned long long res = 0, n;    for(n=1; n<=max; n++){     res += g(n);   }    return res; }  int main(int argc, char *argv) {   unsigned long long max;   char *line;    while((line = readline("> ")) != NULL){     if(sscanf(line, "%llu", &max)==1){       unsigned long gsum = gavg(max);        long double asympt = (long double)gsum;       asympt = asympt/(long double)max;        printf("%llu %lle\n", gsum, asympt);     }     free(line);   }    exit(0); } 

This is the Maple code, which is noticeably slower.Maple Code for the square indicator sum

  g := proc(n)         option remember;         local dlist, ind, flind, sseq, len, sel, val, r, res;          dlist := convert(n, base, 2);         len := nops(dlist);          res := 0;         for ind from 0 to 2^len-1 do             sel := convert(ind, base, 2);              sseq := [];             for flind to nops(sel) do                 if sel[flind] = 1 then                    sseq := [op(sseq), dlist[flind]];                 fi;             od;              val := add(sseq[k]*2^(k-1), k=1..nops(sseq));              r := floor(sqrt(val));             if val>0 and r*r = val then                res := res+1;             fi;         od;          res; end;  avg := n -> add(g(k), k=1..n); 
  • 1
    @ErickWong I would prefer seeing a solution and then comment on that so that stackexchange.com won't complain about this turning into a chat. I do have some relevant ideas myself. Publishing yours might enable another reader to help clinch it where the constant is concerned. I suspect if you were to post C code for an efficient computation of $\sum g(n)$ that would really deepen a reader's understanding of the problem and possibly even enable the computation of additional terms in the asymptotic expansion. Thanks! Back tomorrow.2012-12-11

1 Answers 1

1

This is an answer to the (corrected) bounty question, but I would still like to know if there is an asymptotic formula for the average value of $g(k)$ or whether it fluctuates between multiples of $n^b$.

Let $n = 2^m$ be a power of two, and let $F(n)$ be the number of squares found as subsequences of all binary strings of length $m$, and let $G(n)$ be the exact value of $\sum_{k=0}^{n-1} g(k)$. (It is convenient to start counting at $0$ rather than $1$ but it does not affect the asymptotics.)

It is certainly not true that $F(n) = G(n)$, but it is a useful approximation in two ways. On one hand, $F(n) \ge G(n)$, because every binary string of length $m$ corresponds to some integer in $[0,2^m-1]$ but it has strictly more subsequences because of zero-padding. On the other hand, $F(n) \le \sum_{k=n}^{2n-1} g(k) = G(2n) - G(n)$, because adding a leading $1$ to each binary string of length $m$ yields a legitimate binary integer in $[2^m,2^{m+1}-1]$.

The benefit of this simplified model is that it is very easy to estimate $F(n)$. For each length $1 \le \ell \le m$, there are $m\choose\ell$ subsequences of length $\ell$ within any $m$-bit string. For each choice there will be about $\sqrt{2^\ell}$ (more precisely, $\lfloor \sqrt{2^\ell-1}\rfloor+1$) ways to choose a perfect square to fill in that subsequence. Then there are $2^{m-\ell}$ ways to fill in the remaining bits. Therefore

$F(n) \approx \sum_{\ell=1}^m {m\choose\ell} (\sqrt{2})^{\ell} 2^{m-\ell} = (\sqrt{2}+2)^m - 2^m,$ by the binomial formula. Thus $F(n) \sim n^c$ where $c = \log_2(2+\sqrt{2})$, and $\tfrac1n F(n) \sim n^b$ where $b = c-1 = 0.77155\ldots$.

Recall that $G(n) \le F(n) \le G(2n)-G(n)$. Since $F(n) \sim n^c$, the left inequality easily gives $G(n) = O(n^c)$, while the right inequality yields $G(n) \ge G(n)-G(n/2) \ge F(n/2) = \Omega(n^c)$. Therefore $G(n) = \Theta(n^c)$ when $n$ is a power of $2$, and since it is monotonically increasing, this growth rate can be seen to extend to all $n$. The desired function $q(n)$ is thus $n^b$.

  • 0
    @MarkoRiedel I will think about a good way to calculate $g(n)$ when I get a chance. My current impression is that it could be much faster to compute $\sum_{k=1}^n g(k)$ than it is to find $g(1), \ldots, g(n)$.2012-12-17