Problem
Overview
I’ve created an asynchronous and recursive method of generating word squares, of any size (known as the order of the square), from a known word list. If you’re unfamiliar with word squares:
A word square is a special type of acrostic. It consists of a set of words written out in a square grid, such that the same words can be read both horizontally and vertically. The number of words, which is equal to the number of letters in each word, is known as the “order” of the square. For example, this is an order 5 square:
H E A R T E M B E R A B U S E R E S I N T R E N D
Dependencies
I’m presently using my hard coded copy of YAWL (Yet Another Word List), which is available on my GitHub. In the code that follows, this is what provides WordLists.AllWords
and the using
directive using BruteForceDictionary
.
The Code
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Threading.Tasks;
using BruteForceDictionary;
namespace puzzlr {
class Program {
#region Fields
private static int _registeredSquares = 0;
private static Dictionary<int, List<List<string>>> _wordSquares = new Dictionary<int, List<List<string>>>();
#endregion
#region Entry
static void Main(string[] args) {
int[] orders = { 3, 5 };
foreach (int order in orders) {
_wordSquares.Add(order, new List<List<string>>());
BuildSquares(order);
}
Console.WriteLine("Done");
Console.ReadKey();
}
#endregion
#region Processing
private static async Task BuildSquares(int order) {
Task.Run(async () => {
var wordList = WordLists.AllWords.Where(w => w.Length == order);
foreach (var word in wordList) {
Task.Run(async () => {
var existingWords = new List<string> { word };
await TraverseSquare(order, wordList, existingWords);
});
}
});
}
private static async Task TraverseSquare(int order, IEnumerable<string> wordList, List<string> existingWords) {
string startsWith = string.Empty;
for (int i = 0; i < existingWords.Count; i++)
startsWith += existingWords.ToArray()[i][existingWords.Count];
if (existingWords.Count + 1 == order) {
string lastWord = wordList.FirstOrDefault(w => !existingWords.Any(a => a.Equals(w)) && w.StartsWith(startsWith));
if (!string.IsNullOrWhiteSpace(lastWord)) {
existingWords.Add(lastWord);
await RegisterSquare(order, existingWords);
}
} else {
foreach (var word in wordList.Where(w => !existingWords.Any(a => a.Equals(w)) && w.StartsWith(startsWith))) {
Task.Run(async () => {
var candidateList = new List<string>(existingWords);
candidateList.Add(word);
await TraverseSquare(order, wordList, candidateList);
});
}
}
}
#endregion
#region Reporting
private static async Task RegisterSquare(int order, List<string> words) {
string filename = $@"C:UsersJamieDavisDesktopSquaressquares_{ order}x{ order}.txt";
string data = $"{string.Join(", ", words)}";
await WriteSquareAsync(filename, data);
if (_registeredSquares++ % 25 == 0)
Console.WriteLine(data);
}
private static async Task WriteSquareAsync(string filename, string square) {
using (var stream = new FileStream(filename, FileMode.Append, FileAccess.Write, FileShare.None, 4096, true))
using (var sw = new StreamWriter(stream))
await sw.WriteLineAsync(messaage);
}
#endregion
}
}
My Concern
For squares of order 5 and higher, this seems incredibly slow. I’ve had it running for 8 hours and it’s still presenting 5 letter words starting with “ab” and my output file has added about 1MB since I went to bed 6 hours ago. My thought on how this would process is that it would start a new asynchronous process to work on all words possible for a given word set, recursively. So if I use the example square above, the processing would:
- Look at all possible 2nd words for “heart“.
- Look at all possible 3rd words for “heart” and the specified second word.
- Look at all possible 4th words for “heart” and the specified second and third words.
- Get the first word that fits for “heart” and the specified second, third and fourth words.
Is this really what’s going on though? I’ve been unable to prove it, and I’m still new to asynchronous processing, especially in the aspect of async/await
.
How can I speed up the generation of squares with higher orders?
I understand that the search chain gets exponentially larger as the order of the square increases, but I believe this can still be improved from a performance aspect.
Solution
Algorithm
It is always better to start out with a good decent algorithm before we try to optimize by vectorizing etc. Let’s look at what we need to do.
-
Iterate through all the words (List<string>)
For every word in the dictionary we can start with a matrix like this and fill in the first column and row.
-
To fill in the last column and row we can use a dictionary of words starting with the same letter as the last one if our initial word (SortedDictionary<char, List<string>>)
-
Now we need to fill in the middle rows. To help with this we can use a dictionary of words starting with a value and ending with a value (SortedDictionary<char, SortedDictionary<char, List<string>>>)
Implementation
- Parsing and building dictionaries. This reads the word list from a file and adds all words with the correct length to the dictionaries:
class WordSearch {
List<string> words = new List<string>();
SortedDictionary<char, List<string>> firstCharDict = new SortedDictionary<char, List<string>>();
SortedDictionary<char, SortedDictionary<char, List<string>>> firstLastCharDict = new SortedDictionary<char, SortedDictionary<char, List<string>>>();
public void parse(string path, int length)
{
int finalIndex = length - 1;
var lines = File.ReadAllLines(path);
foreach (var line in lines)
{
if (line.Length == length)
{
char firstChar = line[0];
char lastChar = line[finalIndex];
// add to word list
words.Add(line);
// add to first char dictionary
List<string> stringlist;
if (firstCharDict.TryGetValue(firstChar, out stringlist) == false)
{
stringlist = new List<String>();
firstCharDict[firstChar] = stringlist;
}
stringlist.Add(line);
// add to first + last char dictionary
SortedDictionary<char, List<string>> subdict;
if (firstLastCharDict.TryGetValue(firstChar, out subdict) == false)
{
subdict = new SortedDictionary<char, List<string>>();
firstLastCharDict[firstChar] = subdict;
}
if (subdict.TryGetValue(lastChar, out stringlist) == false)
{
stringlist = new List<String>();
subdict[lastChar] = stringlist;
}
stringlist.Add(line);
}
}
}
- Creating the matrix and iterate through words:
public void Find(int length)
{
char[,] matrix = new char[length, length];
int finalIndex = length - 1;
foreach (var word in words)
{
// set first row and column
for (int i = 0; i < length; i++)
{
matrix[0, i] = word[i];
matrix[i, 0] = word[i];
}
var crosses = firstCharDict[word[finalIndex]];
foreach (var cross in crosses)
{
// set last row and column
for (int c = 1; c < length; c++)
{
matrix[finalIndex, c] = cross[c];
matrix[c, finalIndex] = cross[c];
}
InnerRows(matrix, 1, length);
}
}
}
- Calculating inner rows with a recursive function:
public void InnerRows(char[,] matrix, int row, int length)
{
int finalIndex = length - 1;
if (row < finalIndex)
{
var firstChar = matrix[row, 0];
var lastChar = matrix[row, finalIndex];
List<String> words;
if (firstLastCharDict[firstChar].TryGetValue(lastChar, out words))
{
foreach (var word in words)
{
for (int c = 1; c < finalIndex; c++)
matrix[row, c] = word[c];
InnerRows(matrix, row + 1, length);
}
}
}
else
{
TestInnerSymetric(matrix, length, finalIndex);
}
}
- Test for symmetry and add output if valid
public void TestInnerSymetric(char[,] matrix, int length, int finalIndex)
{
for (int r = 1; r < finalIndex; r++)
{
for (int c = 1; c < finalIndex; c++)
{
if (matrix[r, c] != matrix[c, r])
return;
}
}
string text = "";
for (int r = 0; r < length; r++)
{
for (int c = 0; c < length; c++)
{
text += matrix[r, c];
}
text += "n";
}
found.Add(text);
}
Multithreading
It is only at this point that you should start to consider vectorizing. A simple strategy is to create a parallel loop over the first char- dictionary:
public void Find(int length)
{
int finalIndex = length - 1;
Parallel.ForEach(firstCharDict.Keys, key =>
{
char[,] matrix = new char[length, length];
foreach (var word in firstCharDict[key])
{
// set first row and column
for (int i = 0; i < length; i++)
{
matrix[0, i] = word[i];
matrix[i, 0] = word[i];
}
var crosses = firstCharDict[word[finalIndex]];
foreach (var cross in crosses)
{
// set last row and column
for (int c = 1; c < length; c++)
{
matrix[finalIndex, c] = cross[c];
matrix[c, finalIndex] = cross[c];
}
InnerRows(matrix, 1, length);
}
}
});
}
Putting it all together
This contains additional optimization and bug fixes to allow it to work with order > 13 squares.
using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Threading.Tasks;
class WordSearch
{
private List<string> words = new List<string>();
private SortedDictionary<char, List<string>> firstCharDict = new SortedDictionary<char, List<string>>();
private SortedDictionary<char, SortedDictionary<char, List<string>>> firstLastCharDict = new SortedDictionary<char, SortedDictionary<char, List<string>>>();
public ConcurrentBag<string> found = new ConcurrentBag<string>();
public void parse(string path, int length)
{
int finalIndex = length - 1;
var lines = File.ReadAllLines(path);
foreach (var line in lines)
{
if (line.Length == length)
{
char firstChar = line[0];
char lastChar = line[finalIndex];
// add to word list
words.Add(line);
// add to first char dictionary
List<string> stringlist;
if (firstCharDict.TryGetValue(firstChar, out stringlist) == false)
{
stringlist = new List<String>();
firstCharDict[firstChar] = stringlist;
}
stringlist.Add(line);
// add to first + last char dictionary
SortedDictionary<char, List<string>> subdict;
if (firstLastCharDict.TryGetValue(firstChar, out subdict) == false)
{
subdict = new SortedDictionary<char, List<string>>();
firstLastCharDict[firstChar] = subdict;
}
if (subdict.TryGetValue(lastChar, out stringlist) == false)
{
stringlist = new List<String>();
subdict[lastChar] = stringlist;
}
stringlist.Add(line);
}
}
}
public void Find(int length)
{
int finalIndex = length - 1;
Parallel.ForEach(firstCharDict.Keys, key =>
//foreach (var key in firstCharDict.Keys)
{
char[,] matrix = new char[length, length];
foreach (var word in firstCharDict[key])
{
// set first row and column
for (int i = 0; i < length; i++)
{
matrix[0, i] = word[i];
matrix[i, 0] = word[i];
}
List<string> crosses;
if (firstCharDict.TryGetValue(word[finalIndex], out crosses))
{
foreach (var cross in crosses)
{
// set last row and column
for (int c = 1; c < length; c++)
{
matrix[finalIndex, c] = cross[c];
matrix[c, finalIndex] = cross[c];
}
InnerRows(matrix, 1, length);
}
}
Console.WriteLine(found.Count);
}
});
}
public void InnerRows(char[,] matrix, int row, int length)
{
int finalIndex = length - 1;
if (row < finalIndex)
{
var firstChar = matrix[row, 0];
var lastChar = matrix[row, finalIndex];
SortedDictionary<char, List<String>> lastCharDict;
if (firstLastCharDict.TryGetValue(firstChar, out lastCharDict))
{
List<String> words;
if (lastCharDict.TryGetValue(lastChar, out words))
{
foreach (var word in words)
{
for (int c = 1; c < finalIndex; c++)
matrix[row, c] = word[c];
// check if symetric
bool symetric = true;
for (int r = 1; r <= row; r++)
{
for (int c = 1; c <= row; c++)
{
if (matrix[r, c] != matrix[c, r])
symetric = false;
}
}
if (symetric)
InnerRows(matrix, row + 1, length);
}
}
}
}
else
{
Output(matrix, length);
}
}
public void Output(char[,] matrix, int length)
{
string text = "";
for (int r = 0; r < length; r++)
{
for (int c = 0; c < length; c++)
{
text += matrix[r, c];
}
text += "n";
}
found.Add(text);
}
}
class Program
{
static int Main()
{
int length = 5;
Stopwatch sw = new Stopwatch();
sw.Start();
string path = @"c:sharedword.list";
var ws = new WordSearch();
ws.parse(path, length);
ws.Find(length);
Console.WriteLine("found = " + ws.found.Count);
Console.WriteLine("ms = " + sw.ElapsedMilliseconds);
return 0;
}
}
Currently, it takes about 6 minutes to calculate all the order-5 word squares.
Next stop, the holy grail, 10×10..
Profiling results and algorithmic suggestions
I ran the program under the profiler built into Visual Studio. Most of the time, about 75%, is spent in this line:
string lastWord = wordList.FirstOrDefault(w => !existingWords.Any(a => a.Equals(w)) && w.StartsWith(startsWith));
And most of the rest of the time is spent in the similar line which foreach
-es over the matches.
Which is both good and bad. It’s good because this line is actually doing real work, so it’s not the case that all time is being spent on overhead such as the repeated existingWords.ToArray()
(which nevertheless bothers me, you can after all just index into a List<string>
). On the other hand it’s bad because “find a word that starts with some letters” is a problem that could be solved more efficiently with a different algorithm.
For example, if the list of words is sorted (which it is, but in any case you could sort it at a low cost since it only happens once), then you could use binary search to find the first candidate word, then loop over the words until a match is found, and crucially you can stop the loop when the word from the wordlist does not start with the prefix because no word that comes after that can start with that prefix (thanks to the list being sorted). That loop will be pretty short, typically one iteration, since it usually either finds the word immediately or concludes that there won’t be a match immediately. It still needs to be a loop, because a word that is in existingWords
might be found, and then the next word must be selected.
For the other case, where the words are iterated over, a similar thing can be done except in that case of course the loop would often not be short, because it has to go over all matching words.
Using that, I got well into the squares that start with a “b” in a handful of seconds, and generated megabytes worth of squares. By the way I had to reduce the frequency of progress reporting from % 25
to % 2500
, otherwise that became the bottleneck.
Another trick is that you can expand the prefix check to positions that are not just the “current” index. Once it is fast to ask “is there any word that starts with this prefix” (which with binary search, it is), we may as well use it. Checking it for all positions prevents the algorithm from searching for, for example, valid 3rd and 4th words, when it could already know that the 5th word will be impossible to find based on the first two letters alone. This gave me a speedup of somewhere between 1.5x and 2x, but I stopped it after getting into the B’s and this trick has a variable speedup depending on which words the square starts with, so it may not be the same across the entire range. I definitely expect it to help though.
Fire-and-forget uses of async
are not great
The implication of Task.Run
-ing an async
thing (and then not doing anything with the resulting Task
object) is that the async code will run but you never gets anything back from it (it just goes off and performs side-effects). Including exceptions. If they happen, they just disappear. That’s a dangerous bug-hiding “feature”, if anything goes wrong you just won’t know about it – and that actually happened, more about that below. I expect this is also why you added the unusual Console.ReadKey
, the program would just exit otherwise. Printing “Done” as the first line is also a consequence of this.
All of those things can be solved at once by actually awaiting the tasks. You can still create all the tasks first, and then await them, to give them the chance to run in parallel. But do await them, somehow, although getting this right is not that simple. An additional problem is that if you await the tasks using await Task.WhenAll(tasks)
(which is normally recommended), the tasks here take so long that realistically you wouldn’t get to see the exceptions anyway. Using multiple await
s sort of works, but leaves tasks running when others have failed. Unfortunately it’s not an easy thing to get right, but you can find implementations of WhenAllFailFast
in various places that will both:
- Actually give you exceptions when they happen, and
- Cancel the other tasks, so they don’t keep spewing out results after an exception occured
By the way I actually wouldn’t recommend doing this kind of parallelism on top of async
, but judging by the title of this question, you wanted to do it. So, it’s fine. But normally I would recommend non-async task-parallelism for heavy duty computing, and async mainly to avoid blocking the UI while doing IO. Plain old non-async task-parallelism would have had similar issues with exceptions though.
Multi-threaded file IO
The reason that a ton of exceptions is thrown is that multiple threads (Task.Run
runs code in parallel on threadpool threads) are repeatedly trying to open a file in exclusive mode. Opening the file in non-exclusive mode is not a good solution either. Really the file should only be opened once, and then writes should be synchronized to avoid smashing them together.
Ideally the writes should even be batched, which could be accomplished by enqueuing the found squares into a multiple-producer single-consumer queue and dedicating a task to pulling the results and writing them in larger chunks. I didn’t go so far as to do that, but based on the performance difference between doing the IO and not doing it all, I expect it may help, but it won’t help tons: optimistically assuming that the cost of IO approaches zero, maybe a 1.3x speedup could be achieved, but it couldn’t get better than that because even without IO that’s the best it gets (this is relative to the fast algorithm after my changes, with a slower algorithm it would make less of a difference, and if there are even better tricks than what I mentioned then the difference could become more significant).