87
$\begingroup$

Consider the following expression.

$1631310734315390891207403279946696528907777175176794464896666909137684785971138$ $2649033004075188224$ This is a $98$ decimal digit number. This can be represented as $424^{37}$ which has just 5 digits.

or consider this number:

$1690735149233357049107817709433863585132662626950821430178417760728661153414605$ $2484795771275896190661372675631981127803129649521785142469703500691286617000071$ $8058938908895318046488014239482587502405094704563355293891175819575253800433524$ $5277559791129790156439596789351751130805731546751249418933225268643093524912185$ $5914917866181252548011072665616976069886958296149475308550144566145651839224313$ $3318400757678300223742779393224526956540729201436933362390428757552466287676706$ $382965998179063631507434507871764226500558776264$

This $200$ decimal digit number can be simply expressed as $\log_e 56$ when we discard first $6$ numbers and then consider first $200$ digits.

Now the question is, is it possible to represent any and every huge random number using very few characters as possible, theoretically.

...Also, is there any standard way to reduce it mathematically?

  • 0
    but it is looking the same as before(if you see the lines only)2016-12-22

5 Answers 5

157

No. The problem is very simple: there are more huge numbers than abbreviated forms. Suppose that I allow you to use the numerals 0-9, the English alphabet, and spaces to describe whatever numbers you want; that's still only $37$ characters. There are only $37^{140}$ expressions you can write down using these characters which is $140$ characters or shorter (the length of a tweet!), so there are only $37^{140}$ numbers that you could conceivably tweet to someone else. In particular, since $\log_{10} 37^{140} = 219.5$, this implies that there is at least one number with $220$ digits which is impossible to tweet to someone else without using something other than numerals and English.

All right, I'll be even more generous: I'll let you use all $127$ ASCII characters. That means there are $127^{140}$ possible tweets, some of which mathematically describe some kind of number, and since $\log_{10} 127^{140} = 294.5$, there is at least one number with $295$ digits which is impossible to tweet to someone else (again, assuming that Twitter uses ASCII). I could give you the entire Unicode alphabet and you still won't be able to tweet, for example, the kind of prime numbers that public key cryptosystems use.

These kinds of questions are studied in computer science and information theory; you will find the word Kolmogorov complexity tossed around, as well as the word entropy.

  • 0
    Thanks, makes sense now. I wasn't thinking of it in terms of the pigeonhole principle before but it seems fairly obvious now.2017-03-16
28

If follows from the pigeonhole principle that any lossless compression algorithm that decreases the size of at least one number must increase the size of another - else the compression wouldn't be invertible. For deeper results consult any textbook algorithmic information theory (Kolmogorov complexity), e.g. Li; Vitanyi. An introduction to Kolmogorov complexity and its applications.

5

As in other answers, at the very least a pigeon-hole principle shows that "most" numbers cannot be "described" in fewer characters than their decimal (or whatever-you-pick) expression...

To my mind, the relevant developed body of ideas is "Solomonov-Chaitin-Kolmogorov" complexity, which is about descriptional (or program-length) complexity, rather than run-time "complexity".

This does remind one of the "smallest boring number" pseudo-paradox, which argues that the first non-interesting number has some interest because it is the first...

The bottom line is that "size" of numbers is not reliably comparable to "descriptional complexity", in general, although most large-ish numbers are also descriptionally complex, by pigeon-hole.

There is a book by Li and Vitanyi which is not only authoritative, but fascinatingly readable...

  • 0
    Solomonoff (Ray).2013-02-15
1

The way I understand this question, the answer is "yes." Simply write the number in, say, hexadecimal notation, and it will always be shorter than in decimal (if it is >=100000). If you are looking for the shortest representation, however, that is well defined only if you specify exactly what counts as a valid representation of a number. After all, every number you come up with can technically be represented in just one digit if you choose a sufficiently large base (at least one higher than the number itself).

About the standard way to reduce it: I'd say the decimal system is the standard way of reducing numbers from just writing the appropriate number of dashes. :-) In the sense pointed out by Bill Dubuque, there is no representation that is more space efficient in principle, that is if you transfer the problem to a more abstract domain. If, instead, you are looking for the shortest representation using a given set of mathematical symbols, then no, there is no standard way to find it. I also suspect no algorithm can do better than brute force search (that is, first check all numbers representable using 1 character, then those representable in 2 characters, and so on). In any case, nobody would write a number as "loge 56 (log 56 base e) when we discard first 6 numbers and then consider first 200 digits," so no conceivable set of standard symbols will account for that anyway.

  • 0
    the Turing machine description is compact in the sense that one can easily replace 200 by any positive integer. The resulting, say, 400,000-digit number can be described by a Turing machine much shorter than 400,000 characters long, but the point is that this can't always be possible. $A$s for your objection to J. Mangaldan's answer, the more important point here is that we can make a choice of encoding relative to which a shortest representation exists, and regardless of this choice of encoding the arguments given in the answers continue to hold.2010-08-28
1

It follows that, for any singular or batched compression scheme, this and this are correct.

However, it equally follows that, agnostic to efficiency, for any arbitrary number that can be reduced to some relation in the infinite space of all possible relations, the answer is a solid yes. For example, we understand your current notation to be $log_e56$ units away from 0 in a discrete or continuous sequence of real numbers, and we might exploit this property to build a similar relational partitioning scheme expressed in fewer bytes of local data.

The difference is, in the case of infinite heuristic space, you're hiding rules of greater complexity than your input data behind the scenes. This can work very well in limited computational contexts (for example, in situations where it's more expensive to send a message over the wire than to perform a computation locally), but in the domain of pure expression, it's definably less efficient.

Putting this concept into practice, you can always get away with elegant, tweet-length expressions of your data provided the reference is not contained within the text body. The right approach is to provide your simplification to establish context and provide a pointer (ie, in the form of a link) to the reference. As long as it can be reasonably assumed that the space for such pointers is unbounded (ie, by providing an ever-widening character space), you'll always be able to provide a necessarily terse expression.

  • 1
    This is something that was definitely on my mind. The key is "hiding rules of greater complexity than your input data behind the scenes. " For example I can tweet, "Distance between where I am standing to statue of liberty." The answer varies based on my current location.2011-11-05