# Get subsets of a set given as a list

Posted on

Problem

This code is meant to create a list of subsets of a given set which is represented as a list.

I’m doing this to improve my style and to improve my knowledge of fundamental algorithms/data structures for an upcoming coding interview.

def subsets(s):
if s == []:
return [s]
sets = [s]
for i in range(0, len(s)):
tmp_subsets = subsets(s[:i] + s[i+1:])
for subset in tmp_subsets:
if subset not in sets:
sets.append(subset)
return sets


I’m also not sure if using the set() data structure would be considered cheating in an interview context.

If there are any ideas about how to improve its performance without significantly compromising on readability I would also be open to that.

Solution

This is unnecessary:

if s == []:
return [s]


You can safely remove it, the program will still work.

This step is not so great:

if subset not in sets:
sets.append(subset)


For two reasons:

• Checking if a list contains some items is an O(n)$O\left(n\right)$$O(n)$ operation
• Comparing sets is not free either

A more efficient solution is to count from 0 until 1 << n, and use the bitmap of the count to decide the elements that are part of a subset.

def subsets(s):
sets = []
for i in range(1 << len(s)):
subset = [s[bit] for bit in range(len(s)) if is_bit_set(i, bit)]
sets.append(subset)
return sets

def is_bit_set(num, bit):
return num & (1 << bit) > 0


That is, in a subset, each element may be either present, or not. So each element has only 2 possible states: in or out. 1 or 0. If we look at the binary representation of numbers from 0 to 2^n -1, where n is the number of elements, for example when n=3, we have:

    cba
0 = 000
1 = 001
2 = 010
3 = 011
4 = 100
5 = 101
6 = 110
7 = 111


There are 8 possible subsets, and the bits represent whether an element is in the subset or not.
This is the idea used by the program:

• The outer loop goes from 0 until 2^n - 1.
• The inner loop goes from 0 until n - 1.
• 1 << bit is 1 shifted to the left bit times.

For example, when i = 3, that corresponds to bits 011.
We loop from 0 until 2, comparing i against 001, 010, and 100.
For these values, the expression i & (1 << bit) will be evaluated as
011 & 001 = 001, 011 & 010 = 010, and 011 & 100 = 000, respectively.
The first two are greater than 0, the last one is not.

No need to reinvent the wheel here.
You can get subsets with length r as tuples of a set s by using itertools.combinations.
Doing this for all possible subset lengths:

def subsets(s):
for cardinality in range(len(s) + 1):
yield from combinations(s, cardinality)


If you want the subsets as sets instead of tuples and within a list you can invoke the method above via:

sub_sets = [set(sub_set) for sub_set in subsets(s)]


@janos Answer is great, but it was easier for me to think about it with one less bitwise operations.

def subsets(s):
"""
:type s: list[]
"""
sets = []
for i in range(2**len(s)):
subset = []
for j in range(len(s)):
if i & (1 << j) > 0:
subset.append(s[j])
sets.append(subset)
return sets


Explanation:

So we know there are $$2n2n2^n$$ subsets in a set where $$nnn$$ is the size of the set. So we loop through all those numbers: for i in range(2**len(s)):.

This gives us a range $$0−2n0−2n0 - 2^n$$ which corresponds to these binary numbers (example of a set of size 3):

0 = 000
1 = 001
2 = 010
3 = 011
4 = 100
5 = 101
6 = 110
7 = 111
...
n - 1 = 0b(n-1)


As you can see, the binary representation is every subset possible where 1 means it is in the set and 0 means it is not.

Now we just need to use these indexes to find out what should be in each set.
(While under the hood the numbers are stored as binary, if you try printing them you’ll see an int.)

So we’ll do a nested for loop: for j in range(len(s)):

Here we do need to do a bitwise operation: i & (1 << j) > 0 where i is the current index of the loop described earlier. j is the index of which element in the second loop we’re at.

1 << j just converts the index to the binary representation. For example, the first element (at index 0): 1 << 0 = 001, or the second 1 << 1 = 010, etc.

The & operator just makes sure the indexes match up. 000 & 101 = 0 but 100 & 101 = 100. So anytime it is greater than 0, it means we’ve come across an element that belongs in our set.

Let’s take the set {a, b, c} and loop through our range. At 0, 0 & anything = 0, so we have the empty set.

Let’s skip to i = 6: 6 & (1 << 0) > 0 returns false because in binary: 110 & 001 = 0. But the second iteration (j = 1) 6 & (1 << 1) > 0 returns true (110 & 010 = 1). And it will for the 3rd element too. Thus giving you the subset {b, c}.

The runtime of this algorithm is $$O(2n)O(2n)O(2^n)$$ if that isn’t clear! Hope this helps!