7
$\begingroup$

You are given a large number of LEGO blocks of size 1. You can build blocks of other sizes using smaller blocks. For example, you can build a block of size 2 using two of size 1 blocks and then build a block of size 6, using three of size 2 blocks. However, you can't mix blocks of different sizes to build a block, nor can you use a fraction of a block. For instance, to build a block of size 3, you can't use a block of size 2 and a block of size 1, nor can you use one and a half blocks of size 2. You must use three blocks of size 1.

You are asked to build n blocks, each of which has a different size greater than 1. A block of size 1 costs \$1. Once you build a block of a larger size, using one costs only \$1, regardless of size. For example, it costs 1 * 5 = \$5 to build a block of size 5 using five blocks of size 1. Now that you have a block of size 5, you can use them to build even a larger block, say a block of size 10, using two of these for a cost of 2 * 1 = \$2.

The question is to find an algorithem to build all these n blocks with the least amount money.

Example 1: You're asked to build a block of size 7 and a block of size 10.

Solution: Build a block of size 7 using seven blocks of size 1 for \$7. Build a block of size 5 using five of size-1 blocks for \$5 and build a size 10 block with two of these for \$2. The total cost will be 7 + 5 + 2 = \$14.

Example 2: You're asked to build a block of size 10 and a block of size 20.

Solution: First build a block of size 5 with five blocks of size 1 for \$5. Then use two of size 5 blocks to build a block of size 10 for \$2 and use two of size 10 blocks to build a block of size 20 for a total cost of 5 + 2 + 2 = \$9, which is much cheaper than building each with size-1 blocks that would cost 10 + 20 = \$30.

Example 3: You're asked to build a block of size 6, a block of size 40, a block of size 60.

Solution: Build a size-3 block with three blocks of size 1 for a cost of \$3 and build a block of size 6 with two of these for \$2. Build a block of size 4 for \$4 and use five of these to build a size-20 block for \$5. Finally, use two of a size-20 block to build a size-40 block and three to build a size-60 block for \$2 and \$3, respectively. The total cost then will be 3 + 2 + 4 + 5 + 2 + 3 = \$19.

  • 0
    @Henning Makholm: Could this be reduced to the vertex cover problem? Vertices=primes, edges=we build a block of size prime$_1 \cdot$ prime$_2$?2012-02-03

2 Answers 2

5

As Patrick87 points out, $\forall m,n\ge 2 : mn \ge m+n$, so we always want to construct using prime factors.

Denote the target block sizes as $\{t_i\}$. We can express the construction costs as a directed graph with vertices corresponding to $\mathbb{Z}^+$ and edges $n\to pn$ with weight $p$ for all $n$ and all prime $p$. The task is to find the connected subgraph containing $\{t_i\} \cup \{1\}$ which has least total weight.

We can make the following observations:

  1. If there is only one target, $n$, the optimal solution is $\text{sopfr}(n)$ where sopfr (A001414) is the sum of prime factors with repetition. In fact we can claim a stronger property: any path from $1$ to $n$ will have total weight $\text{sopfr}(n)$, differing only in the order of the primes.

  2. If the targets can be partitioned into $\{a_j\}, \{b_k\}$ such that $\forall j,k : \gcd(a_j, b_k) = 1$ then the problem can be split into independent subproblems for the $\{a_j\}$ and $\{b_k\}$. Proof: if a vertex other than $1$ is included in the optimal subgraph for the $\{a_j\}$ then it is not a factor of any of the $\{b_k\}$, and hence were it in the optimal subgraph for the $\{b_k\}$ it could be removed, contradicting the optimality.

  3. If there is a common divisor $d$ of all the $\{t_i\}$ then we can reduce to the problem with targets $\left\{\frac{t_i}{d}\right\}$ and add $\text{sopfr}(d)$ to get an optimal solution. As Ross Millikan observes, this is obvious; it's easy to prove when there are only two targets, and when there are more I believe a proof by contradiction is possible, although not very elegant.

This is already a complete solution for $|\{t_i\}| \le 2$.

In the non-trivial case with three targets they must be decomposable as $axy, bxz, cyz$ where $x,y,z>1$ and $\gcd(ay,bz) = \gcd(ax,cz) = \gcd(bx, cy) = 1$. Then suppose wlog that we construct the first pairwise GCD, $x$. This leads to total cost $\begin{eqnarray}C & = & \text{sopfr}(x) + \text{sopfr}(ay) + \text{sopfr}(bz) + \text{sopfr}(cyz) \\ & = & \text{sopfr}(x) + 2\text{sopfr}(y) + 2\text{sopfr}(z) + \text{sopfr}(a) + \text{sopfr}(b) + \text{sopfr}(c) \end{eqnarray}$ Therefore it is optimal to construct the pairwise GCD with the greatest sopfr in this case. (NB not necessarily the greatest pairwise GCD - consider $\text{sopfr}(8) = 6 < \text{sopfr}(7)$).

If you just need a better-than-trivial approximation the following greedy strategy is better than nothing:

  1. Apply the three observations above, if applicable.
  2. Otherwise find the pair of targets $t_j$ and $t_k$ with the pairwise GCD with greatest sopfr, $d$; solve for targets $\{t_i | i\ne j \wedge i\ne k\} \cup \{d\}$; and add $\text{sopfr}\left(\frac{t_j}{d}\right) + \text{sopfr}\left(\frac{t_k}{d}\right)$.

But there's a reduction from vertex cover which proves that the problem is NP-complete, so don't hold out much hope for a tractable general solution.


With credit to user946850 for the key idea:

We sketch an argument that the problem is NP-complete by reduction from the vertex cover problem. Since NP is formally a class of decision problems, we must restate the problem in a decision form:

Given target block sizes $\{t_i\}$ and a target cost $T$, is it possible to construct all of the target sizes with total cost not exceeding $T$?

The vertex cover problem is stated as:

Given a non-directed graph $G = (V, E)$ and a target cover size $k$ does there exist a subset $S \subseteq V$ such that $|S| \le k$ and $\forall (v_i, v_j)\in E : v_i \in C \vee v_j \in C$?

We can assume wlog that the graph is connected, so (abusing notation by allowing the sets in a scalar context to be interpreted as their sizes) $E = \Omega(V)$ and $E = O(V^2)$.

The reduction strategy we take is to map each $v_i \in V$ to a distinct prime $p_i$ and each edge $(v_i, v_j) \in E$ to a target block size $p_i p_j$. Then any vertex cover $C'$ can be represented as building the primes corresponding to the vertices in $C'$ and from these building the targets. The cost will be the sum of $|C'| + |E|$ non-distinct primes drawn from $\{p_i\}$. If we call the smallest of the primes used $p_{min}$ and the greatest $p_{max}$, we can set a target cost of $(k + E)p_{max}$ and it is sufficient that $(k+E+1)p_{min}$ is greater for the answer given by the reduced problem to be the answer to the original problem.

Therefore we need to find a prime $p_{min}$ such that there are $V$ primes in the half-open interval $[\;p_{min},\;(1+\epsilon)p_{min})$ where $\epsilon^{-1} = k + E$. We also need to identify all $V$ primes. Here we will keep things simple (hence "sketch an argument" rather than "prove") by merely using the simplest form of the prime number theorem. We're going to check $\Theta(\epsilon\; p_{min})$ numbers for primality, and we need $V$ hits, so we want $\frac{p_{min}}{\log p_{min}} = \Theta(\epsilon^{-1} V )$. This will give $p_{min} = O(E^2 V^2)$ (which can be strengthened to $p_{min} = O((EV)^{1+\alpha})$ for any $\alpha > 0$), which is polynomial in the size of the input to the vertex cover problem, that input size being $\Theta(E \log V)$. The number of candidate numbers which must be checked is also polynomially bounded, and since their sizes are polynomially bounded and primality testing is in P the total cost of performing the reduction is polynomial.

  • 0
    @PeterTaylor: Thank you. I have found the essential arguments in the article about [prime gaps](http://en.wikipedia.org/wiki/Prime_gap#Upper_bounds), section "Upper bounds". To me, this means that the gap is below a certain limit for primes large enough -- now the process of collecting primes becomes a fully deterministic process with guaranteed asymptotic run time.2012-02-07
1

Here is what I believe to be an algorithm that should work; it's not very good in the sense of computational complexity, but (if it's correct) at least you have a starting point to optimize (maybe through dynamic programming?) or do something smarter.

The algorithm is based on the following observation: an optimal solution is one for which the cost of building the largest block, plus the cost of any blocks required to build the largest block and all of the other blocks, is minimal. Additionally, the cost of a block of size 1 is 1; the cost of a block of size $n$ if $n$ is a prime number; and the cost of a block of size $n$ can be the cost of building a block of size $a$ and replicating it $b$ times, for any $a$ and $b$ such that $ab = n$. Note that, for composite numbers, it never makes sense to build it out of all 1-blocks; this is because if $n = ab$ and $n >= 4$ and neither $a = 1$ nor $b = 1$, $a + b <= n$, a statement I leave without proof, but in which I feel fairly confident.

That being said, here's a candidate algorithm:

  1. If no number needs to be built, the answer is that it costs zero and that no further work needs to be done, so the algorithm returns.
  2. Otherwise, start with the biggest number $n_{0}$ you need to build. Determine the set $S_{0}$ of all ordered pairs $(a, b)$ such that $n = ab$.
  3. If $|S_{0}| = 1$, then $n_{0}$ is one, and the cheapest way to build $n_{0}$ is to use a single block of size 1; solve the problem with this largest number removed, and the overall solution is the solution to the subproblem plus building a single 1-block.
  4. Otherwise, if $|S_{0}| = 2$, then $n_{0}$ is prime, and the cheapest way to build $n_{0}$ is to use $n_{0}$ blocks of size 1; solve the problem with this largest number removed, and the overall solution is the solution to the subproblem plus building an $n_{0}$ block out of 1-blocks.
  5. Otherwise, if $|S_{0}| > 2$, then $n_{0}$ is composite, and we can build it by building a cheaper block and replicating it. Specifically, one of the ordered pairs in $S_{0}$ will give the proper method: building a block of size $a$ and replicating it $b$ times. So, for each ordered pair in $S_{0}$, compute $b$ plus the minimal cost of solving the subproblem where you no longer have to build $n_{0}$ but you must build $a$ instead. Once you have done this for all ordered pairs, simply select a pair for which a minimal cost was returned; the solution to the problem is this cost (if you want the construction method, either store it as you're solving subproblems corresponding to each ordered pair, or once a minimal cost is determined, re-solve the subproblem corresponding to a pair with minimal cost, and print out the seris of decisions).
  • 0
    @Ross, yes, the interesting question is which GCDs to build when you have several targets.2012-02-04