Not long ago someone sent me an email about my SubsetSum problem post, asking how to reconstruct the set of numbers that were used to create the sum. This is pretty straightforward, until you consider scenarios in which there is more than one correct subset and you want a specific one or all of them.

But, first thing’s first. Let’s consider the easy case – we assume there is only one correct result. For example consider the follwing set { 15, 20, 35, 40 } and the sum of 90 (15+35+40). Using the standard version of the algorithm, we’ll receive the table as follows (top row has values instead of indexes to increase readability, also, I skip non-relevant rows:

X 0 15 20 35 40
0 true true true true true
15 false true true true true
20 false false true true true
35 false false true true true
40 false false false false true
50 false false false true true
55 false false false false true
90 false false false false true

And so, to reconstruct values used, we’re going to determine which number was used to create each subset.

we start with our final value of 90, the first column in which we have ‘true’ is the last one, representing the value ’40’. We add 40 to the result set and substract this value from our final sum.And so recursively, we search for numbers until we reach 0. Starting from 90 we’ll find 40, then search for 90-40 = 50, 50 is true for column representing 35, so we’ll look for 50-35 = 15, we’ll find 15 in the second column and will not search for 15-15 = 0 (and we’re done)

The code can look something like this:

 

So what will happen if we want to find out what numbers to use to create 35? It can be either 15+20, or just 35. Algorithm in current form will return answer using as little numbers as possible , so it will be just 35. What to do to make it return both ?

Well, we need to remember in how many ways we can create any number from previous numbers. To do this, we’ll use int[][] instead of boolean[][] and increment the value each time we can find a new way to create a number

So the new table will look like this

X 0 15 20 35 40
0 1 1 1 1 1
15 0 1 1 1 1
20 0 0 1 1 1
35 0 0 1 2 2

We can see from this table that 35 can be created from either 35 (leading to 35-35 = 0 and end of algorithm)  or from 20 (leading to 35-20 = 15).

We’ll need to branch on every such occasion, so it’s easiest to use recursion. It may look something like this:

 

There are two things I believe are worth noticing:

1. It’s relatively easy to apply the same optimizations as in the original post – directly if we have unique sums or more indirectly by creating lists of starting positions if we have multiple subsets giving the same sum.

2.It’s also relatively easy to count all possible subsets without enumerating them (just one line change, instead of table[i][j]++ do table[i][j] += table[i – numbers[j – 1]][j – 1];, table[sum].length – 1] will hold result after algorithm is done)

There is a lot of fun in modifying this to achieve results like ‘subset using least numbers’ or ‘subset minimizing the use of even numbers’, etc. most of which require just small changes. I encourage you to to play with it as a fun exercise.

The Algorithm (the problem)

Yesterday I’ve encountered the following problem – from a set of positive integer numbers, determine all positive integers that can be constructed from any subset of that set.

For example set {1,3,5} can produce the following numbers {1,3,5,4,6,8,9}. In general case it’s a NP complete problem solvable by determining all combinations (this is what I implemented). However for anything larger than ~25 numbers in the set it’s strating to get too long to find the answer. Moreover, a ridiculous set like this {1,1,1,1,1,1,1,1,1,1,1,1…} that can be enumerated by fifth-grader becomes imposible to process.

So of course I went on looking for a smarter solution and quickly enough found this: http://en.wikipedia.org/wiki/Subset_sum_problem

Admittedly, yesterday was a long day, but I was sitting in front of the screen staring at this and couldn’t for the life of me understand how and why it works. So – let me put it out here is somewhat more elaborate but kind of step-by-step way (Once rested I understood it today and even managed to improve it a bit).

This isn’t exactly the problem as stated (find if subset sums to X or find all things subsets sum up to), but the algorithm used is the same.

 

How it works (on example)

Wikipedia states it’s pseudo-polynomial, the reason for it is simply that it doesn’t grow polynomialy with the number of items in the set, but with the range of numbers. So your problem with set like {1,2,3,4} will not take the same amount of time as set {1,2,3,99999999} (we’ll see why)

So – the general idea is to go through all numbers from 1 to sum({set}) (so for set {1,2,3,4} we’ll go through all numbers 1..(1+2+3+4 = 10)) and try to determine if this particular number can be constructed using first 1…X elements of the set. While trying to determine this we’ll construct an array storing the values for each number.

The array to store the results will be of type boolean and sizes [SUM_OF_NUMBERS_IN_SET+1][COUNT_OF_NUMBERS_IN_SET+1]

let’s work on an example – a trivial array of {1,3} the sum of elements is 4, the number of elements is 2.

We’ll produce an empty boolean array that will look like this

X 0 1 2
0 true true true
1 false false false
2 false false false
3 false false false
4 false false false

 

this is just initially set up table, the true,true in the first row indicates, that with no elements, or with only first item or with all items from the set, we can create a subset that sums up to zero (empty set) .

Now the algorithm kicks in – for each number (1…4) we’ll determine if with first 1…2 elements of the set, we can create this number. So – how can we determine that – there are just two simple cases – either we can create this number from a smaller subset (without including current number) or  we can create this number from currently analysed number and we know that the remainder we can create from previously seen numbers. The boolean array will help us keep track of that. For each element, we’ll check if either an element to the left is true (first case) or an element higher up (by the amount of currently analysed element) and adjacent to left is true (second case).

In pseudocode that will be: table[i][j] = table[i][j-1] | table[i-set[j-1]][j-1] where set[j-1]  is the value of element from the set (indexed from zero, hence j-1)

So – how will our example unfold:

row 1:

X 0 1 2
0 true true true
1 false true true
2 false false false
3 false false false
4 false false false

[1][1]  we set to true because table[1-1][0] (that is table[0][0]) is true (second case)
[1][2]  we set to true because table[1][1] is true (first case)

for next iteration, on row indexed by 2, we’ll not set anything to true:

[2][1] we’ll not set to true because neither [2][0] nor [1][0] is true
[2][2] we’ll not set to true because neither [2][1] nor [-1][0] is true

for the third row:

[3][1] we’ll not set to true because neither [3][0] nor [2][0] is true
[3][2] we’ll set to true because [0][1] is true

the last one is interesting :

[4][1] we’ll not set to true, because neither [4][0] nor [3][0] (this is the equivalent of saying, with just the first element from the set, we can’t construct 4)
[4][2] we’ll set to true because [1][1] is set to true (this is the equivalent of saying, with the first two elements, we can construct 4 – take 3, and somehow constructed 1 from first element from the set)

So the final table will look like this:

X 0 1 2
0 true true true
1 false true true
2 false false false
3 false true true
4 false false true

 

The answer to question ‘can we construct X’ from the set will be in table[X][set_size -1] (last element of the row)

Code

The actual code looks like this:

 

And no we can print what values are possible to create from such set:

 

 Improvement

The improvement isn’t that big (not big-O kind of thing), but in practical applications it matters. The main observation is that we only set the value to true at certain index for a number and then we keep it true. So we only need to track at which index we found we can make this number (if). The improved code (memory footprint is way smaller) can go like this:

That’s it! Hope you’ll find this instructive and won’t waste too much time trying to understand how exactly are those indices set and why on a sunday evening.