**The Balanced Partition Problem**

The other night, at about 11pm, a friend of mine decided to start asking me for help on his homework that was due that night. I normally just try to provide general hints and not do the problems myself (it’s his homework), when I see an interesting problem I can play around with in Python, it’s tough to ignore.

The problem was essentially something like: ‘you have N buckets which each have a different weight. give an algorithm that separates them into two piles as close to the same weight as possible’. Which is one way to state ‘The Balanced Partition Problem’, and Python is a great language for quickly throwing together and testing algorithms.

First things first: build the different sized ‘buckets’:

buckets = [] for x in xrange(0,40): y = random.randint(2,500) while y in buckets: y = random.randint(2,500) buckets.append(y)

**Attempt #1:**

The first idea that occurred to me was to solve it like you would a simple root solver. This was the code I came up with:

t = time.time() # method 1 suba = list(buckets) subb = list() suma = sum(suba) sumb = sum(subb) diff = abs(suma-sumb) it = 0 while min(suba)*2 < diff: (v,i) = min(izip((abs(x*2-diff) for x in suba) ,count())) v = suba[i] suba.remove(v) subb.append(v) suma -= v sumb += v diff = suma-sumb it += 1 print 'meth1: {0}s'.format(time.time()-t) print '{0} iterations.'.format(it) print 'suma: {0}\nsumb: {1}'.format(suma,sumb) print ' diff: {0}, min ele: {1}'.format(abs(suma-sumb),min(buckets))

The code starts with all buckets in ‘set a’, and each iteration it looks for the element in ‘set a’ that will come closest to reducing the difference to zero and moving it to ‘set b’. This stops when the difference is too small to be fixed by the smallest element in ‘set a’.

This algorithm is very fast (linear), and gives two sets that are pretty close to equal. Unfortunately this doesn’t always produce the best result, as there may be cases where the difference between a single element in one set and a subset of elements in the other set is smaller than the smallest element in either set. This method doesn’t account for these different types of combinations, but for a simple and (very) quick method, it gets pretty close to equal.

**Attempt #2: Brute Force**

How close is method 1? Well, I guess we have to find the best possible solution to figure that out, and if you can’t think of a clever way to do it, there is always the brute force method. Since order doesn’t matter in a sum, we just need to iterate over every possible combination of subsets and pick the one which has the sum closest to half that of the sum of the full set. Since we’re using Python, we can use the combinations function (since 2.6), which will improve our speed a bit:

t = time.time() ##### method 2 best = sum(buckets) target = best/2 for i in xrange(0, len(buckets)): for s in combinations(buckets,i): x = sum(s) if abs(x-target) < abs(best-target): best = x bests = s suba = sorted(bests) subb = sorted(set(buckets)-set(bests)) suma = sum(suba) sumb = sum(subb) print 'meth2: {0}s'.format(time.time()-t) print 'suba: {0}\nsubb: {1}\nsuma: {2}\nsumb: {3}'.format(sorted(suba),sorted(subb),suma,sumb) print ' diff: {0}, min ele: {1}'.format(abs(suma-sumb),min(buckets))

Now I was originally using a set size of 20, which this method does in a couple seconds, but because of the complexity of the combination function, bumping that size up to just 30 takes method 2 over 12 minutes on my desktop (method 1 takes 1.5e-5s). Making the size any larger makes this method unfeasible.

**Attempt #3: A Better Solution**

Before asking me for help, my friend found this clever solution to the problem that’s much better than the brute force one:

http://people.csail.mit.edu/bdean/6.046/dp/dp_4.swf

Pretty nice, but then what do you expect from MIT?

Looking at his method, he says that checking for all possible will take time. This is true, but why check them all? From the recursive equation he gives:

and the initial condition:

The recursive formula can be used to generate all the cases where . Additionally, because I am using Python, I decided to make P a dictionary (actually a dictionary of dictionaries) to keep track of the subsets. Using this method, we need to generate considerably less than subsets. The code is:

t = time.time() ##### method 3 best = sum(buckets) target = best/2 bests = () P = {} P[-1] = {0:()} for i in xrange(len(buckets)): x = buckets[i] P[i] = {} for j in P[i-1].keys(): P[i][j] = P[i-1][j] P[i][j+x] = P[i-1][j] + (x,) if abs((j+x)-target)<abs(best-target): best = j+x bests = P[i][j+x] suba = sorted(bests) subb = sorted(set(buckets)-set(suba)) suma = sum(suba) sumb = sum(subb) print 'meth3: {0}s'.format(time.time()-t) print 'suma: {0}\nsumb: {1}'.format(suma,sumb) print ' diff: {0}, min ele: {1}'.format(abs(suma-sumb),min(buckets))

This method yields the best answer very fast, and the python code could probably be optimized to improve the speed. On my desktop it takes ~0.1s for 30 ‘buckets’ (vs. 12+ minutes for method 2), and still only ~20s for 200 ‘buckets’. It is interesting to note that method 1 produces very close to the best result (usually single different differences for 200 buckets, if not exactly zero) and still takes 5e-3s for 200 buckets, so for an approximation it is quite good.

The final improvement can be made by realizing that since your target is half the sum of the full set, then one of the subsets must be equal to or less than this target, so we can skip computing and checking sums over this value. This two line change cuts the execution time in half:

t = time.time() ##### method 3 best = sum(buckets) target = best/2 bests = () P = {} P[-1] = {0:()} for i in xrange(len(buckets)): x = buckets[i] P[i] = {} for j in P[i-1].keys(): P[i][j] = P[i-1][j] if j+x > target: continue P[i][j+x] = P[i-1][j] + (x,) if abs((j+x)-target)<abs(best-target): best = j+x bests = P[i][j+x] suba = sorted(bests) subb = sorted(set(buckets)-set(suba)) suma = sum(suba) sumb = sum(subb) print 'meth3: {0}s'.format(time.time()-t) print 'suma: {0}\nsumb: {1}'.format(suma,sumb) print ' diff: {0}, min ele: {1}'.format(abs(suma-sumb),min(buckets))

Ok, I know I said the final improvement, but I thought to myself, ‘what if I check if the target is reached early, then bail out’. This requires a little refactoring of the code into a function so that I can use return to break out of it early:

t = time.time() ##### method 3 best = sum(buckets) bests = () def m3(): global best, bests target = best/2 P = {} P[-1] = {0:()} for i in xrange(len(buckets)): x = buckets[i] P[i] = {} for j in P[i-1].keys(): P[i][j] = P[i-1][j] if j+x > target: continue P[i][j+x] = P[i-1][j] + (x,) if abs((j+x)-target)<abs(best-target): best = j+x bests = P[i][j+x] if best == target: return m3() suba = sorted(bests) subb = sorted(set(buckets)-set(suba)) suma = sum(suba) sumb = sum(subb) print 'meth3: {0}s'.format(time.time()-t) print 'suma: {0}\nsumb: {1}'.format(suma,sumb) print ' diff: {0}, min ele: {1}'.format(abs(suma-sumb),min(buckets))

As it happens, after running several tests, it seems that the function reaches the target early quite often, and this change cuts the runtime, on average, down to of what it was.

Thats all I’ve done with it so far. It can probably be improved further, but for a just-for-fun little exercise with Python, it’s pretty interesting (to me at least).

February 7, 2012 at 3:01 am |

excellent work dude! totally loved it! 🙂

May 24, 2012 at 11:07 am |

How would you expand this to a problem to dividing N buckets into multiple piles?

June 27, 2012 at 8:36 am |

Traceback (most recent call last):

(v,i) = min( izip((abs(x*2-diff) for x in suba), count()) )

NameError: name ‘count’ is not defined

June 27, 2012 at 9:22 am |

Kevin: I guess I should have posted the whole file. The imports are:

import random

from itertools import izip, count, combinations

import time

June 27, 2012 at 8:06 pm |

Thanks a lot.

November 6, 2012 at 2:35 am |

I tried method 3 #1 with this test data:

[604,1569,554,1192,963,1096,565,531,1387,462,1899,1934,1752,409,1410,1790,623,1480,165,516,571,1241,747,1048,1585,480,854,145,1785,320,1837,1866,398,1836,1470,771,152,899,1785,1239,1132,438,1899,71,686,242,690,1760,1933,524,386,1831,1724,345,583,621,199,1677,795,106,1440,584,654,142,146,1713,994,1702,376,1656,1648,443,1920,827,207,1348,1492,486,1594,1040,577,280,175,597,1752,1537,940,281,1554,1685,1744,1006,637,680,518,176,1218,315,1360,488,1678,674,1719,761,493,1807,51,1883,944,606,1532,81,1637,296,1941,1809,1456,731,1768,1533,87,1497,1058,1309,657,1032,1403,1193,1856,1518,816,1582,789,318,1988,1929,1272,1926,155,1946,109,1699,264,1067,1067,1424,137,56,869,163,951,101,445,642,1996,1010,585,200,250,550,1172,928,1543,1675,1220,677,1444,1877,797,1651,439,1836,415,354,1375,32,1663,506,792,725,722,895,1671,1955,1718,672,1668,993,1171,1542,1510,1152,1920,1536,213,1327,1046,1623,1448,1162]

meth3: 35.846351862s

suma: 102942

sumb: 98120

diff: 4822, min ele: 32

39520020

Is it ok 🙂

November 7, 2012 at 12:05 am |

Yea, I noticed this bug when I tried to update the code to do 3 buckets (which I should probably get around to posting). The problem isn’t in the method itself, but after I find a solution, I use:

to get all the elements in bests that are not in suba. This obviously fails if there are duplicate entries (sets only hold unique entries), thus subb drops duplicate numbers, making its total smaller than it really is.

I fixed this by doing it the ‘manual way’:

November 7, 2012 at 12:15 am

Yes, I created test data with duplicated numbers. Sorry for not inform you about that.

November 30, 2014 at 10:37 am |

Attempt 3 worked great.

But is it possible to have both lists a and b be of the same size? Not sure what code to change…thought I made a change but it did not work.