Recently I’ve encoundered a problem that required me to examine all possible partitions of a given set. Set partition is a collection of disjoint subsets whose union is the whole set. For example a set of {1,2,3} (being a partition by itself) can be devided in the following ways:

  • {1,2,3}
  • {1,2}, {3}
  • {1,3}, {2}
  • {1}, {2,3}
  • {1}, {2}, {3}

The problem was – how to generate those partitions efficiently. Moreover, we want to generate only the partitions where there is a specific number of sets. There is only one division where 3 or 1 sets are produced, and 3 partitions where we get 2 sets. The total number of possibilities for a set of given size is defined by Bell number

There is a connected problem to this one called Restricted growth functions (RGS) – such function is defined through inequality a[i] <= 1 + max(a[1]…a[i-1]) for i = 2,…,n. The reason those are connected is that you can express one by the other. A string produced by such function – for example 010 can be mapped to a set partition of {1,3}, {2} – first and third digits are zeroes, so first and third element of the set belong to subset zero, second digit is one, so the second set element belongs to the subset one, etc.

A string of 0121123 would express a set of {1},{2,4,5},{3,6},{7}.

The naive solution greatly repeats itself by not taking into account the mapping between Set partition and RGS.
Generating RGS is fairly easy:

Lazy generation and an iterative approach is somewhat more difficult – it relies on keeping track of a largest so far used value and iterating through all possibilities that satisfy the RGS inequality just as an Integer Partitioning algorithm would. I’ll refer you to the attached file for the implementation, just let me mention one important thing here that I find interesting.

This simple recursive approach generates all possible combinations that include any number of subsets from N 1-element to 1 N-element. Which is not exactly what I wanted.

I started out with an implementation that generated all the RGSes in an iterative manner, that also generated those permutations that didn’t satisfy the minGroups/maxGroup criteria (it skipped them, not returning from iterator). This worked fine, though was doing an awful lot of unnecessary skips. Generating 3 element subsets from 14 element set was taking 1,5 s.

First observation I made was that we can quickly determine whether we have enough groups in our permutation by examining the highest group used and if at any point there is exactly the number of ‘digits’ to generate as the desired number of groups minus currently highest used, then  the ending digits are fixed (starting from current highest +1 in increasing order). This way we can ensure that we’ll always have the required number of groups.

The second observation regards the other end of the spectrum – maxGroup – all we need to do is to control whether the group number we’re assigning is not higher as the max number of groups and if so, disallow it just as we disallow any digits that would break the RGS inequality.

With those two observations acted upon in implementation we can generate only the RGSes that satisfy the conditions in linear time, thus reducing the example of 3 element subsets from 14 element set to 0,078s.

I attach the code for partition generating class without detailed explanations (other than ones above), if you find something difficult to understand (I’ve tried to keep the implementation concise), do let me know in comments and I’ll be sure to follow up.

Leave a reply

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">