4
$\begingroup$

If $A$ is a set of natural numbers and $k$ is a natural number greater than 1. We define: $B_k(A) = \{w|w $ is the representation of $x$ in base $k$ for $x\in A\}$.

Note that we do not allow leading $0$s. Find such a set $A$ that $B_2(A)$ is a regular language but $B_3(A)$ is not.

My insights: I wanted to try the usual approach using the pumping lemma. The problem is, the numbers that we choose to put into the set, should have a pattern in base $3$ that we can use when applying the pumping lemma and they also should have a pattern in base $2$ so that we know $B_2(A)$ is regular. For example I tried multiples of $6$ but I found no patterns.

  • 0
    What are your thoughts?2017-02-19
  • 1
    @ErikWong, I wanted to try trivial sets like $2^n - 1$. But the pumping length of the machine can be large and there is no nice pattern when you convert it to base $3$. So I'm trying to find numbers that have a pattern in both bases.2017-02-19
  • 1
    One way to prove that $\{2^n-1\}$ is not regular in base $3$ is to look at the **lengths** of the words in the language. The fact that $(\ln 2)/(\ln 3)$ is irrational comes into play with this approach.2017-02-19
  • 2
    That's actually a very nice idea. You should definitely enrich the question with these insights you already had. It shows that you have actually put thought into the question and it also makes the question sound more interesting than a typical pumping-lemma exercise. Without that additional detail, people are voting to close as off-topic (lacking context).2017-02-19
  • 2
    The multiples of $k$ in base $b$ always form a regular language.2017-02-20
  • 1
    @ErickWong, I thought about what you said, I can't find a relation between regularity and the lengths of the words. Is there one?2017-02-21
  • 0
    @J.Doe Yeah there is, I will write up a different answer that uses this method.2017-02-21

2 Answers 2

3

$\def\<#1>{[#1]_3}$Let $\$ be the number represented by the base-3 numeral $z$. Since the set of numbers $A = \{2^n - 1 \mid n \in \mathbb{N}\}$ is represented in binary by the regular expression $1^*$, if we prove that

$$L = B_3(A) = \{ z \in \{0,1,2\}^* \mid \exists n \in \mathbb{N} \,.\,\ = 2^n-1\} \cap \{1,2\}\cdot \{0,1,2\}^*$$

is not regular we are done.

We are going to use the pumping lemma for regular languages. Suppose $L$ is regular and $z$ is a word in $L$ longer than the pumping length $p$ of $L$. As such, it can be written as $uvw$ so that $|uv| \leq p$, $|v| \geq 1$, and for $i \in \mathbb{N}$, $uv^iw \in L$.

We write:

$$\ = \ 3^{|w|}3^{|v|} + \3^{|w|} + \ \enspace,$$

and also:

$$\ = \ 3^{|w|}3^{i|v|} + \3^{|w|}\Big(\sum_{0 \leq j < i} 3^{j|v|}\Big) + \ \enspace.$$

Subtraction and some algebra yield, for $i \geq 1$,

$$\ - \ = 3^{|w|+|v|} \,\frac{3^{(i-1)|v|} - 1}{3^{|v|}-1}\Big(\(3^{|v|}-1) + \\Big) \enspace.$$

Suppose $\ = 2^n-1$; for $m > n$, $2^m - 1 - (2^n - 1) = 2^n(2^{m-n} -1)$. Therefore all the differences $\ - \$ for $i \geq 1$ must be multiples of $2^n = \ + 1$.

However, when $\frac{3^{(i-1)|v|} - 1}{3^{|v|}-1}$ is odd (which happens for alternating values of $i$ because it is the summation of terms that are all odd) the largest power of $2$ that is a factor of $\ - \$ is bounded by the value of

$$ \ (3^{|v|} -1) + \ \enspace, $$

which is less than $3^p$. On the other hand, $z$ has at least $p+1$ ternary digits and no leading zeros, which means that $\+1$ is greater than $3^p$. We have reached a contradiction and we have to abandon the assumption that $L$ is regular.

  • 0
    Thank you very much for your elaborate solution.2017-02-21
  • 0
    $\def\<#1>{[#1]_3}$There is a typo in equation $\ - \ = 3^{|w|+|u|} \,\frac{3^{(i-1)|v|} - 1}{3^{|v|}-1}\Big(\(3^{|v|}-1) + \\Big)$. It should be $\ - \ = 3^{|w|+|v|} \,\frac{3^{(i-1)|v|} - 1}{3^{|v|}-1}\Big(\(3^{|v|}-1) + \\Big)$ instead.2017-04-08
  • 0
    Thanks for the heads-up and for the careful reading!2017-04-08
2

Here is an argument that doesn't use the pumping lemma in a conventional way. Instead it draws on a simpler version of the ideas surrounding Parikh's theorem. Here's a lemma:

Let $L$ be a regular language and let $S = \{ |w| : w \in L\}$ be the set of possible lengths of words belonging to $L$. Then there exists positive integers $m$, $n$ and a (possibly empty) set $A$ of residues modulo $n$ such that, for all $a > m$, $a \in S$ iff the remainder of $a$ modulo $n$ belongs to $A$.

To see this, note that we can replace every letter in the alphabet with '$a$' without affecting the word lengths, so it suffices to look at an arbitrary DFA for a language over the singleton alphabet $\{a\}$. Then every state has exactly one out-arrow, so (assuming the DFA has the minimal number of states) it consists of a directed path that leads into a cycle (this will be familiar to anyone who's studied Pollard's $\rho$-algorithm or Floyd's algorithm for cycle detection).

The parameter $m$ may be taken as the length of the initial path segment, and $n$ as the cycle length. The set $A$ is naturally determined by which states in the cycle are accepting states.

Aside: In fact this even holds more generally for context-free languages (it's a standard exercise to show that any CFL on a singleton alphabet is also regular).

From the lemma, we can see that $S$ always has a natural density $\lim_{N\to \infty} \frac1N | S \cap [1,N] |$, in fact it is equal to $|A|/n$. In other words, there is a rational number $q$ such that the asymptotic proportion of integers belonging to $S$ is $q$.

Punchline: Now with all that machinery aside (which is not specific to this problem), we have a very easy way to see that $\{4^n - 1\}$ is not regular in base $3$ (sorry, just realized I had to adjust the example slightly). That's because the length of $4^n - 1$ in base $3$ is asymptotically $n (\log 4) / (\log 3)$, so the set of lengths will have natural density equal to $(\log 4) / (\log 3)$. This number is irrational (by unique factorization), which contradicts the above observation. It took a lot of effort to get this far but the final conclusion is very light on calculation and makes for a pretty simple "big picture" summary.