4
$\begingroup$

I was trying to convert file size from bytes to human understandable value and found one interesting solution. I will provide it on php with explanation.

function bytesConvert($size) {     $base = log($size)/log(1024);     $suffix = array('', 'Kb', 'Mb', 'Gb', 'Tb');      return round(pow(1024, $base - floor($base)), 2) . $suffix[floor($base)]; } 

Where:

  • log - Natural logarithm;
  • round - Rounds a float with specified precision;
  • pow - Exponential expression; Returns base raised to the power of exp;
  • floor - Round fractions down;

I use this solution and it works. But it's like Cargo cult for me. I understand every single action at this function but can't get a clue why it works. I will be very grateful if somebody will explain it.

2 Answers 2

2

In points:

  • The suffixes Kilo, Mega, Giga, Tera, etc. are equivalent of 1000^1, 1000^2, 1000^3, 1000^4, etc.
    • In fact the given algorithm is wrong since it uses powers $1024^i$ and therefore should use suffixes like Kibi, Mebi, Gibi, etc., compare Wikipedia;
    • On the other hand, those are not very popular and its use might be just counter-productive.
    • Still, it is good to be aware of the issue ;-)
  • The $\log_bn$ is a number $k$ such that $b^k = n$;
    • So $log_{1000}1000 = 1$, $log_{1000}1000000 = 2$, and so on,
    • Also, the logarithm function is continous and monotonic, hence $1 < log_{1000}1234 < 2 < log_{1000}7654321 < log_{1000}87654321 < 3$ \begin{align*} \log_{1000}1234 &\approx 1.03043\ldots & 1000^1 &= 1000 & 1000^{0.030438\ldots} &= 1.234 \\ \log_{1000}7654321 &\approx 2.29463\ldots & 1000^2 &= 1000000 &1000^{0.29463\ldots} &= 7.654321 \end{align*}
  • So, by separating the integral and fractional part of $\log_{1000}n$ you know how many triples of zeros you should append...

    • {'', 'K', 'M', 'G', 'T'}[$\lfloor\log_{1000}42)\rfloor$] == ' '

    • {'', 'K', 'M', 'G', 'T'}[$\lfloor\log_{1000}1234\rfloor$] == 'K'

    • {'', 'K', 'M', 'G', 'T'}[$\lfloor\log_{1000}7654321\rfloor$] == 'M'

    • {'', 'K', 'M', 'G', 'T'}[$\lfloor\log_{1000}123456789\rfloor$] == 'G'

  • ...and what should be the prefix:

    • $\log_{1000}42 - 0 = \log_{1000}42 - \lfloor\log_{1000}42\rfloor \approx 0.541\ldots$, $\quad1000^{0.541\ldots} = 42$
    • $\log_{1000}1234 - 1 \approx 0.030438\ldots$, $\quad1000^{0.030438\ldots} = 1.234$
  • This all works for any base like 10, 16, 1000, 1024, whatever, and so your algorithm follows ;-)

I'm sorry if the explanation came too basic for you, but I didn't know your background; I hope it will help ;-)

  • 0
    @dtldarek Your explanation is pretty fine for me. Only one thing, I always thought suffixes Kilo, Mega, Giga, Tera, etc. are equivalent of 1024^i and Kibi, Mebi, Gibi 1000^i.2012-11-30
0

The first line gives $base=\log_{1024} size$, the power of $1024$ that size is. The suffix comes because each time base increases by $1$, we want a different suffix. For example, if $2 \lt base \lt 3$ we have a size between $1024^2$ and $1024^3$, so the size is in the range of Megabytes. The round part just divides the size by the size of a Megabyte or whatever suffix we are using.