I am attempting to write a program for a competition (we're allowed to consult Stack Exchange, as long as I'm not given physical code) that takes in a list of 5000 people's names (distributed fairly evenly from A-Z, but are real names - not random character strings, and each name is unique) and sorts this list of people into "rooms" of exactly 4 people each, with each person being placed in exactly one room only (cannot reuse names, cannot omit names).
Each "solution" consists of 1250 of such rooms (I store it in a 2d array and am using java), and the solution's "score" is defined to be the sum of each room's score, where the room score is the number of common letters held in each of the 4 peoples' names. A given letter is only counted more than once if it appears exactly that many times in each person's name.
I began with algorithms that simply placed the names into random rooms just to test the waters and see what the low-end of scores might look like (around 650). I then tried doing a simple alphabetical sort of the list containing the names (using Collections.sort()), which considerably improved the score (around 3800), but still had quite a bit of room for improvement.
My latest algorithm has been a slightly modified version of simulated annealing, where I begin with the alphabetically sorted list and continuously switch two people from two rooms (by picking two random room numbers and two random indices in the rooms). Inside each outer level iteration, I update the annealing temperature and conduct an inner loop search of 1000 iterations (at the temperature of the outer loop) where I get 1000 random swaps, pick the best one (no temperature-probability selection in the inner loop), and use that in the outer loop, accepting it as the new solution if its score is higher and accepting it with some probability if it is lower.
This SA approach has gotten me a score in the low 6000's, and runs quite quickly, but it seems to have asymptoted. Are there any other algorithms or optimizations I should consider? Should I move away from a stochastic approach and look for construction approaches that make use of the alphabetical nature of the words themselves? Thanks!