# Find all distinct subsets that sum to a given number

Posted on

Problem

The function wants to achieve to find all distinct subsets that sum up to an given number.

``````results = []

def rec_subset_sum(current_elements, available_elements):
for i, inspected_element in enumerate(current_elements):
# Get list of all smaller available elements
smaller_elements = [q for q in available_elements if q < inspected_element]
sum_smaller = sum(smaller_elements)
# Check if they can replace the inspected_element
if inspected_element <= sum_smaller:
new_replace_elements = []
r = inspected_element
# Since available_elements are sorted get from the behind of list all elements so long until they sum up to inspected_element
while r != 0:
val = smaller_elements[-1]
new_replace_elements.append(val)
del smaller_elements[-1]
r -= val
next_available_elements = [a for a in available_elements]
# For the next list remove all the new used elements fromt the current available_elements
for a in new_replace_elements:
next_available_elements.remove(a)
# For the next current elements replace the inspected_elements with the new choosen one
next_elements = [a for a in current_elements]
next_available_elements += [next_elements[i]] # if we have replaced it, we can use it again
del next_elements[i]
next_elements += new_replace_elements
results.append(next_elements)
# Start the recursion
rec_subset_sum(sorted(next_elements), sorted(next_available_elements))

mx = [1, 2, 4, 8, 128]
cx = [2, 4, 4, 8, 8, 8, 16, 16, 16, 16, 16, 32, 32, 32, 32, 32, 64, 64, 64, 64, 64]
rec_subset_sum(mx, cx)

print "redundant ones", len(results)
print "this is what we want", len(set(map(tuple,results)))
``````

The function wants to achieve to find all distinct subsets that sum up to `sum([1, 2, 4, 8, 128]) = 143`. It is allowed to create new subsets by taking an element from `mx` and replace that element with elements from `cx`, that have the same value. (Then the replaced element could be used again)

It is assumed that `cx` contains only powers of 2 and `mx` is the binary representation (so basically the representation that uses the biggest powers). So the algorithm basically loops through `mx` checks if one element can be replaced and if yes replaces it in a new `mx`,`cx` (next_elements, next_available_elements) and calls itself recursively. If not it checks the next element in `mx` (current_elements) if it can be replaced.

The function is slow, one of the problems is that it returns many redundant subsets, which are not needed and are filtered out after that. How can the performance be improved?

Larger Testcases:

``````mx = [1, 16, 32, 256, 512, 8192]
cx = [2, 2, 4, 4, 4, 8, 8, 8, 8, 16, 16, 16, 16, 32, 32, 32, 32, 32, 64, 64, 64, 64, 64, 64, 64, 128, 128, 128, 128, 128, 128, 128, 128, 256, 256, 256, 256, 256, 256, 256, 512, 512, 512, 512, 512, 512, 512, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 4096, 4096, 4096, 4096, 4096, 4096, 4096, 4096]

Testcase 2:
mx = [1, 8, 16, 32, 256, 512, 2048, 4096, 65536, 131072, 262144, 524288]
cx = [2, 2, 4, 4, 4, 8, 8, 8, 16, 16, 16, 16, 32, 32, 32, 32, 32, 64, 64, 64, 64, 64, 64, 64, 128, 128, 128, 128, 128, 128, 128, 128, 256, 256, 256, 256, 256, 256, 256, 256, 512, 512, 512, 512, 512, 512, 512, 512, 512, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 2048, 4096, 4096, 4096, 4096, 4096, 4096, 4096, 4096, 4096, 4096, 8192, 8192, 8192, 8192, 8192, 8192, 8192, 8192, 8192, 8192, 8192, 16384, 16384, 16384, 16384, 16384, 16384, 16384, 16384, 16384, 16384, 16384, 32768, 32768, 32768, 32768, 32768, 32768, 32768, 32768, 32768, 32768, 32768, 65536, 65536, 65536, 65536, 65536, 65536, 65536, 65536, 65536, 65536, 131072, 131072, 131072, 131072, 131072, 131072, 131072, 131072, 131072, 131072, 262144, 262144, 262144, 262144, 262144, 262144, 262144, 262144, 262144, 262144]
``````

If even larger are needed i will add it.

Solution

1. You wrote about subsets. The wanted number should be probably
`print ("this is what we want", len(set([tuple(sorted(x)) for x in results])))`
2. What’s about subset with 0 changes ‘set(mx)’?
3. Number of subsets can be very large. Do you really need all subsets or only need number of subsets?
4. With first three points the problem could be reduce to known one: How many ways you can make change for amount `sum(mx)` with `mx + cx` coins?