3
$\begingroup$

Any wine can be made up of many components. Each component is basically a bunch of grapes from a specific place and time, so each component has three attributes: vintage, varietal, and appellation.

While grapes come in to a winery as single components (eg. 2016 Russian River Valley Cabernet Sauvignon), a wine may end up composed of many components (after blending).

Given a summary of a finished wine's properties, how can you find a set of components — and their percentage contribution to the finished wine — that add up to that wine?

Here's what we have:

VINTAGE       %           VARIETY           %         APPELLATION               %
2014        98.00         Cab Sauv.       78.95       Russian River Valley    37.02
2015         1.50         Petit Sirah      9.12       Lake County             35.81
2016         0.50         Syrah            7.34       Sonoma Coast            22.79
                          Petit Verdot     4.42       Sonoma                   4.07
                          Sauv. Blanc      0.14       Napa                     0.31
                          Concentrate      0.03

Ideally we are able to find the smallest set of components, but any set of components that could solve this would be acceptable.

What is the answer(s) for this specific data set, and what is the right approach for this problem in general?

  • 0
    Apologies if the tags are off! Any advice on tagging would be appreciated.2017-01-18
  • 2
    Do you know what the possible inputs are? Or can the possible inputs be any of the possible combinations (of which there are a lot, already 60 in this small example)? If there are any of the possible combinations, then you're looking at solving 4 linear equations and 3 linear inequalities in 60 unknowns.2017-01-18
  • 0
    @ian It's all possible combinations, unfortunately.2017-01-18
  • 1
    My previous comment had some serious counting errors, though I've cleaned those up in my answer.2017-01-18

2 Answers 2

2

One solution is take all $3\times 6 \times 5 = 90$ possible combinations and simply multiply the relevant percentages together to give an overall percentage for each particular combination. For your example of a 2016 Cabernet Sauvignon from Russian River Valley, that would give $0.50\%\times 78.95\% \times 37.02\% = 0.14613645\%$

Another solution would be to draw from choices until they are used up to give $3+ 6 + 5 - 2=12$ possible combinations, for example using $37.02\%$ as 2014 Cabernet Sauvignon from Russian River Valley, which uses all the Russian River Valley, so the next choice is Lake County, and so on to give something like

                                                  %
2014    Cab Sauv.       Russian River Valley    37.02 Russian River Valley used up 
2014    Cab Sauv.       Lake County             35.81 Lake County used up   
2014    Cab Sauv.       Sonoma Coast             6.12 Cab Sauv. used up  
2014    Petit Sirah     Sonoma Coast             9.12 Petit Sirah used up 
2014    Syrah           Sonoma Coast             7.34 Syrah used up 
2014    Petit Verdot    Sonoma Coast             0.21 Sonoma Coast used up 
2014    Petit Verdot    Sonoma                   2.38 2014 used up 
2015    Petit Verdot    Sonoma                   1.50 2015 used up 
2016    Petit Verdot    Sonoma                   0.19 Sonoma used up  
2016    Petit Verdot    Napa                     0.14 Petit Verdot used up 
2016    Sauv. Blanc     Napa                     0.14 Sauv. Blanc used up 
2016    Concentrate     Napa                     0.03 all used up 

If the percentages are random real numbers, you are unlikely to be able to produce a shorter list than this, though many other lists of $12$ are possible and can be generated by reordering the input data. The question of whether there are shorter lists from a particular set of data may be computationally complicated (my guess is it could be an NP problem)

1

Your problem is essentially to solve $15$ linear equations and $180$ linear inequalities in $90$ unknowns. $181$ of these are easy to understand:

  • The sum of all the proportions adds to $1$ (or $100$, if you prefer percentages, but I will stick with $1$).
  • Each proportion is nonnegative.
  • Each proportion is less than or equal to $1$.

The remaining $14$ equations, say we index them by $i=1,\dots,14$, look like $\sum_{j \in S_i} p_j=b_i$, where $S_i$ is the set of wine combinations with the attribute $i$ (e.g. being 2014), $p_j$ is the proportion of wine $j$, and $b_i$ is the known fraction of the final wine which has attribute $i$.

So for example, taking the lexicographic ordering on the set of wine combinations, the first equation reads $\sum_{i=1}^{30} p_i=0.98$, and represents the fact that $98\%$ of your final wine is from 2014.

The solution to this system is highly non-unique; it has $78$ free parameters. (Three of the equations are redundant, one for each wine property.) You may seek out the most sparse possible solution (i.e. the one with the largest number of zeros), though I don't know how to easily do this.