5
$\begingroup$

Introduce the function $g(n)$ defined for positive integer arguments $n$ as the count of squares of positive integers among the numbers that can be formed by taking some subsequence of the digits of the binary representation of $n$ and interpreting this subsequence as a binary number. All subsequences contribute.

For example, $5 = (101)_2$ and the subsequences of digits are $$ (1)_2 = 1, (0)_2 = 0, (1)_2 = 1, (10)_2 = 2, (11)_2 = 3, (01)_1 = 1, (101)_2 = 5$$ and this includes the square "one" three times, so $g(5) = 3.$

When $n = 9 = (1001)_2$, we obtain the following subsequences: $$ (1)_2 = 1, (0)_2 = 0, (0)_2 = 0, (1)_2 = 1, (10)_2 = 2, (10)_2 = 2, (11)_2 = 3, (00)_2 = 0, (01)_2 = 1, (01)_2 = 1$$ and $$ (100)_2 = 4, (101)_2 = 5, (101)_2 = 5, (001)_2 = 1, (1001)_2 = 9$$ and this includes seven squares, so $g(9) = 7.$

The behavior of $g(n)$ is highly erratic, ranging from linear for $n=2^k$ to logarithmic for $n=2^k-1,$ which motivates us to ask the following question. What is the first term of the asymptotic expansion of the average order of $g(n)$? I.e. find the asymptotics of $$ \frac{1}{n} \sum_{k=1}^n g(k).$$

It might be useful to establish and prove closed form expressions for special values of $g(n)$, e.g. we have $g(2^k) = 2^{k-1}$ and $g(2^k-1) = k$ as pointed out earlier (prove these). Here are the first few terms of $g(n):$ $$1, 1, 2, 2, 3, 2, 3, 4, 7, 4, 5, 4, 4, 3, 4, 8, 15, 9, 12, 8, 9, 6, 7, 8, 11$$

This problem is not suited to numeric experimentation because $g(n)$ fluctuates so rapidly. For those who want to try anyway here is some C code. (I was going to memoize this the same way I memoized the Maple routine, but never got around to it. This is why the code looks as it does.) C Code for the square indicator sum over binary digit subsequences

  #include  #include  #include  #include   unsigned long long isqrt(unsigned long long n) {   if(!n) return 0;    unsigned long long a = 1, b = n, mid;    do {     mid = (a+b)/2;     if(mid*mid>n){       b = mid;     }     else{       a = mid;     }   } while(b-a>1);    return a; }  unsigned long long g(unsigned long long n) {   int bits[256], len = 0;   while(n>0){     bits[len++] = n%2;     n >>= 1;   }    unsigned long long res = 0, ind;   for(ind=0; ind<(1<0){       if(varind%2){         subseq[pos++] = bits[srcpos];       }       srcpos++;       varind >>= 1;     }      unsigned long long val = 0;     int seqpos = 0;     while(seqpos0 && cs*cs == val){       res++;     }   }    return res; }  unsigned long long gavg(unsigned long long max) {   unsigned long long res = 0, n;    for(n=1; n<=max; n++){     res += g(n);   }    return res; }  int main(int argc, char *argv) {   unsigned long long max;   char *line;    while((line = readline("> ")) != NULL){     if(sscanf(line, "%llu", &max)==1){       unsigned long gsum = gavg(max);        long double asympt = (long double)gsum;       asympt = asympt/(long double)max;        printf("%llu %lle\n", gsum, asympt);     }     free(line);   }    exit(0); } 

This is the Maple code, which is noticeably slower.Maple Code for the square indicator sum

  g := proc(n)         option remember;         local dlist, ind, flind, sseq, len, sel, val, r, res;          dlist := convert(n, base, 2);         len := nops(dlist);          res := 0;         for ind from 0 to 2^len-1 do             sel := convert(ind, base, 2);              sseq := [];             for flind to nops(sel) do                 if sel[flind] = 1 then                    sseq := [op(sseq), dlist[flind]];                 fi;             od;              val := add(sseq[k]*2^(k-1), k=1..nops(sseq));              r := floor(sqrt(val));             if val>0 and r*r = val then                res := res+1;             fi;         od;          res; end;  avg := n -> add(g(k), k=1..n); 
  • 0
    Have you checked the oeis?2012-12-08
  • 0
    Yes I checked the OEIS for $g(n)$ and several non-trivial subsequences, with no luck yet. Back in a couple of hours.2012-12-08
  • 0
    why have you posted a picture of source code.....2012-12-08
  • 0
    Sorry about that, I tried pasting the code as text but I could not get the formatting to look right! I do understand that this inhibits cut and paste. I will readily supply the code if anyone is interested. Better yet, once I master how to post C code and preserve the formatting I will of course use that. I wonder if one can attach a code file the same way as one attaches a picture?2012-12-08
  • 0
    Then again it probably isn't good manners to post too much code to a math website, you want to keep it simple and useful.2012-12-08
  • 0
    @MarkoRiedel code is posted here, especially when it is directly relevant to the problem. Not only would code as text be preferable, it is more portable (others can run it and verify), it can be indexed by search algorithms, and it is has a lifetime longer than the image hosting site.2012-12-10
  • 0
    @Hooked thanks for clarifying. In the meantime I discovered the function sgml-quote that Emacs offers in sgml-mode. I applied it to my code and it quoted all the HTML special characters. I pasted the result into the message, which looks correct now.2012-12-10
  • 1
    To all: I made a mistake in my bounty comment, and now it won't let me fix the comment. Of course $q(n)$ should be such that $\frac{1}{n} \sum_{k=1}^n g(k) \in \theta(q(n))$, the case for $O(q(n))$ is easy.2012-12-10
  • 0
    If there is an average order of magnitude, it would have to be about $n^{0.27155\ldots}$, since that is the average value when summed up to powers of $2$. Here the exponent is equal to $\log_2(1+\sqrt{2})-1$. I suppose this implies $\Theta(n^{0.27155})$, but it would be nice to know if there is a multiplicative constant.2012-12-10
  • 0
    @ErickWong I am not quite sure where you are getting your numeric results. Under the hypothesis that $q(n) = n^b$, with $b$ some constant and up to $2^{14}$, I get a total sum of $15220639$ with $b\sim 7.042518e-01$. for $2^{15}$, I get $52842848$ and $b\sim 7.103470e-01$ and for $2^{17}$, I get $632292251$ and $b\sim 7.197657e-01$. Maybe your implementation is different from mine? Should we compare them? I mentioned earlier $g(n)$ fluctuates so much that even for large values the average does not readily converge. Comparing the values of the sum can help ensure that the implementations match.2012-12-10
  • 0
    For what it's worth when $max = 2^{18}$ I get a total sum of $2180840462$ and a hypothetical exponent $b\sim 7.234576e-01$. Maybe you can use the sum value to check your implementation, but it does not look like your hypothesis holds. The timing on this shows that the computation that we would need for a few more significant digits of $b$ assuming this is the right model, is definitely out of reach. It looks more promising to instead investigate relations between $g(n)$ for different $n$ to speed up the computation that way.2012-12-11
  • 0
    @MarkoRiedel Ah, after going over it more carefully I see I had a misplaced factor of $2$. The exponent should be $\log_2(1+\sqrt{2}) - \tfrac12$, which is about $0.77155$. Glad to see the value of computational verification.2012-12-11
  • 0
    @ErickWong That sounds very promising, I hope you post some of your calculations. I will award the bounty to a solution that includes the multiplicative constant.2012-12-11
  • 0
    @MarkoRiedel I suspect it is possible to compute exact values of $\sum g(n)$ quite efficiently by partitioning $[1,n]$ into dyadic blocks. Each block might be calculated by a more elaborate form of my answer: the idea is it's nicer to work with sums over ranges of $n$ where the first $r$ bits are frozen and the remaining bits are independent. This is a fairly common trick seen in algorithmic programming contests.2012-12-11
  • 1
    @ErickWong I would prefer seeing a solution and then comment on that so that stackexchange.com won't complain about this turning into a chat. I do have some relevant ideas myself. Publishing yours might enable another reader to help clinch it where the constant is concerned. I suspect if you were to post C code for an efficient computation of $\sum g(n)$ that would really deepen a reader's understanding of the problem and possibly even enable the computation of additional terms in the asymptotic expansion. Thanks! Back tomorrow.2012-12-11

1 Answers 1

1

This is an answer to the (corrected) bounty question, but I would still like to know if there is an asymptotic formula for the average value of $g(k)$ or whether it fluctuates between multiples of $n^b$.

Let $n = 2^m$ be a power of two, and let $F(n)$ be the number of squares found as subsequences of all binary strings of length $m$, and let $G(n)$ be the exact value of $\sum_{k=0}^{n-1} g(k)$. (It is convenient to start counting at $0$ rather than $1$ but it does not affect the asymptotics.)

It is certainly not true that $F(n) = G(n)$, but it is a useful approximation in two ways. On one hand, $F(n) \ge G(n)$, because every binary string of length $m$ corresponds to some integer in $[0,2^m-1]$ but it has strictly more subsequences because of zero-padding. On the other hand, $F(n) \le \sum_{k=n}^{2n-1} g(k) = G(2n) - G(n)$, because adding a leading $1$ to each binary string of length $m$ yields a legitimate binary integer in $[2^m,2^{m+1}-1]$.

The benefit of this simplified model is that it is very easy to estimate $F(n)$. For each length $1 \le \ell \le m$, there are $m\choose\ell$ subsequences of length $\ell$ within any $m$-bit string. For each choice there will be about $\sqrt{2^\ell}$ (more precisely, $\lfloor \sqrt{2^\ell-1}\rfloor+1$) ways to choose a perfect square to fill in that subsequence. Then there are $2^{m-\ell}$ ways to fill in the remaining bits. Therefore

$$F(n) \approx \sum_{\ell=1}^m {m\choose\ell} (\sqrt{2})^{\ell} 2^{m-\ell} = (\sqrt{2}+2)^m - 2^m,$$ by the binomial formula. Thus $F(n) \sim n^c$ where $c = \log_2(2+\sqrt{2})$, and $\tfrac1n F(n) \sim n^b$ where $b = c-1 = 0.77155\ldots$.

Recall that $G(n) \le F(n) \le G(2n)-G(n)$. Since $F(n) \sim n^c$, the left inequality easily gives $G(n) = O(n^c)$, while the right inequality yields $G(n) \ge G(n)-G(n/2) \ge F(n/2) = \Omega(n^c)$. Therefore $G(n) = \Theta(n^c)$ when $n$ is a power of $2$, and since it is monotonically increasing, this growth rate can be seen to extend to all $n$. The desired function $q(n)$ is thus $n^b$.

  • 0
    I have studied this response in detail and it appears to be sound, so I will likely award it the bounty. Could you clarify two more things -- when you say "it has more subsequences because of zero padding" do you mean that the squares being counted by $F(n)$ include subsequences that start with zero padding on the left, none of which would contribute to $g(n)$? For the upper bound you consider integers of $m+1$ bits starting with $2^m$. This time there is no loss due to zero padding because of the $1$ on the left? And subsequences including the leftmost bit make it an upper bound?2012-12-13
  • 0
    @MarkoRiedel That's exactly right.2012-12-13
  • 0
    I will award the bounty. I do not think that we have an asymptotic expansion, it looks like it keeps on fluctuating around $n^b.$ As I noted in my original post there are some simple identities for $g(n)$ for certain values of $n$, like $n=2^k.$ I wonder if it would be interesting to establish more of these and perhaps gain some insight into the sequence that way. I suspect this can be done algorithmically.2012-12-13
  • 1
    Essentially the problem is how to compute $g(n)$ for $n$ odd. There are efficient recurrences when $n = q 2^k$, $q$ odd. Let $g_0(n) = g(n)$ and let $g_1(n)$ be like $g(n)$ except that it counts numbers of the form $2m^2$ instead of squares. Then we have $$g_0(q 2^k) = g_0(q) \sum_{r\ge 0} \binom{k}{2r} + g_1(q) \sum_{r\ge 0} \binom{k}{2r+1} $$ and $$g_1(q 2^k) = g_0(q) \sum_{r\ge 0} \binom{k}{2r+1} + g_1(q) \sum_{r\ge 0} \binom{k}{2r}.$$ As pointed out these are very efficient, but there does not appear anything of the sort for odd values.2012-12-14
  • 0
    It is worth noting that the above recurrence holds for bases other than two that are not squares. When the base (call it $d$) is a square, like four, the recurrence becomes $$g(q d^k) = g(q) 2^k$$ (with $q$ not a multiple of $d$). It seems there are many interesting properties here that can be investigated.2012-12-15
  • 0
    The above recurrence actually simplifies to $$ g_0(q2^k) = g_0(q)2^{k-1} + g_1(q)2^{k-1}$$ and $$ g_1(q2^k) = g_0(q)2^{k-1} + g_1(q)2^{k-1}.$$2012-12-16
  • 0
    @MarkoRiedel I will think about a good way to calculate $g(n)$ when I get a chance. My current impression is that it could be much faster to compute $\sum_{k=1}^n g(k)$ than it is to find $g(1), \ldots, g(n)$.2012-12-17