Using a LinkedList type of adjacency list to create the word game: WordLadder

Posted on

Problem

The user puts in a starting word and an ending word. My code creates a ladder between these words of words that are different by one letter.

Start: gain

End: fire

Output:

gain
gait
wait
wart
ware
wire
fire

I created a linked list of linked lists and a hash map that uses the word length as a key so that it checks only possible words that can be in the wordladder.

public class PathDictionary {

private static final int MAX_WORD_LENGTH = 4;
private static final int MIN_WORD_LENGTH = 2;
private static final int MAX_PATH_LENGTH = 7;

private static HashSet<String> words                           = new HashSet<>();
private static ArrayList<String> wordList                      = new ArrayList<>();
private static HashMap<Integer, ArrayList<String>> wordLengths = new HashMap<>();
private static ArrayList<LinkedList<String>> adjDictionary     = new ArrayList<>();

public PathDictionary(InputStream inputStream) throws IOException {
    if (inputStream == null) {
        return;
    }

    BufferedReader in = new BufferedReader(new InputStreamReader(inputStream));
    String line = null;
    Log.i("TAGGER", "Loading dict");

    while((line = in.readLine()) != null) {
        String word = line.trim();
        if (word.length() > MAX_WORD_LENGTH && word.length() < MIN_WORD_LENGTH) {
            continue;
        }

        words.add(word);
        wordList.add(word);

        int size = word.length();

        if(wordLengths.containsKey(size) == false){
            ArrayList<String> newTemp = new ArrayList<>();
            newTemp.add(word);
            wordLengths.put(size, newTemp);
        } else {
            ArrayList<String> addToTemp = wordLengths.get(size);
            addToTemp.add(word);
            wordLengths.put(size, addToTemp);
        }

    }

    for(int i = 0; i < wordList.size(); i++){
        // add into path
        String input = wordList.get(i);

        LinkedList<String> neighbours = neighbours(input);
        adjDictionary.add(new LinkedList<String>());
        adjDictionary.get(i).add(input); // adds root node

        String display = "";
        for(int j = 0; j < neighbours.size(); j++){
            adjDictionary.get(i).add(neighbours.get(j)); // adds neighbors to the root node

            display += " " + neighbours.get(j);
        }
      //  Log.d("TAGGER", "Vertice: " + adjDictionary.get(i).get(0) + " Edges: " + adjDictionary.get(i));


    }


}

public boolean isWord(String word) {
    return words.contains(word.toLowerCase());
}

private LinkedList<String> neighbours(String word) {

    int size = word.length();
    LinkedList<String> neighbours = new LinkedList<>();
    ArrayList<String> possibleNeighbours  = wordLengths.get(size);

    for(String validWord: possibleNeighbours){
        int count = 0;

        // the root
        if(validWord.equals(word)) continue;

        for(int i = 0; i < size; i++){

            if(!validWord.contains("" + word.charAt(i))){
                count++;
            }
        }

        if(!(count > 1)) neighbours.add(validWord);
    }


    return neighbours;
}

public String[] findPath(String start, String end) {
    ArrayDeque pathsExplored = new ArrayDeque();
    ArrayList<String> path   = new ArrayList<String>();
    int pathSize = 0;

    path.add(start);
    pathsExplored.add(path);

    while(!pathsExplored.isEmpty()){
        pathSize++;

        if(pathSize > MAX_PATH_LENGTH) return null;

        //While the queue is not empty, get the path from the head of the queue and consider all of its neighbours.
        ArrayList<String> tempPath = (ArrayList) pathsExplored.remove();
        int lastElementInPath = tempPath.size();
        String currentWord    = tempPath.get(lastElementInPath - 1);

        int root = wordList.indexOf(currentWord);
        LinkedList<String> neighbors = adjDictionary.get(root);

        //If the neighbour is the desired end, return the corresponding path.
        if(neighbors.contains(end)){
            String [] pathFound = new String [path.size()];
            Log.i("TAGGER", "PATH FOUND!");
            return tempPath.toArray(pathFound);
        }

        //Otherwise, create new paths representing the current path plus each of the neighbours (avoiding loops) and add them to the queue.
        for(String word: neighbors){
            //stops loops last word in should not be considered
            if(word.equals(tempPath.get(tempPath.size() - 1))){
                continue;
            }

            //create new paths with all neighbors using old path
            ArrayList<String> newPath = new ArrayList<>(tempPath);
            newPath.add(word);
            pathsExplored.add(newPath);

        }



    }

    return null;
   }
}

My code is really messy; it feels like I’m doing too much, but it gets the job done. I’m only using a “sample.txt” file with only words to test the path from gain to fire. When I use a bigger/better dictionary file (65,000 words) my code runs for a very long time simply trying to initialize all the words into the graph. I’d like some refactoring advice and if possible help on how best to tackle this.

Unit 7: Graph Traversal (Android Studio)

Solution

Well, the runtime of your code can be maybe improved a bit by cleaning up the current code a bit, but the main bottleneck is the inefficient algorithm, so I will not comment on your particular code and style, and I will focus on the algorithm only (if you still insist on code clean-up of current version, let me know in comments, I may try to rewrite few bits of it more to my style to show you a thing or two, but I think reviewing better algorithm would make more sense).

Also I would focus on the search algorithm, not on the init. I’m afraid with huge word list even init can be very costly, and worth optimization, but the initial data may be preprocessed ahead, so you then read not only word list, but complete definition of graph with edges between them. You should have posted, if the init itself is important for you as well (can be worth of effort for application where input word list changes often, so graph has to be rebuilt often).

Imagine the words as nodes of graph, with edges between words different only in single letter (obviously words of different length form a completely disconnected sub graph, but even with words of the same length the graph can have several disconnected sub-groups).

If a starting word has a ladder to the ending one, both should belong to the same sub group of graph.

So first optimization may be to store each sub group separately (for example: having for 4-letter words two or more graphs). Then in O(subgroup_size*letters) you can tell if starting word belongs to the group under scrutiny. (with global hash map of words containing index to subgroup and index within subgroup the O(…) can get even lower, depends on hash map search implementation, but the initialization will get longer).

Now the ending word must belong to the same subgroup, otherwise the ladder path does not exist at all, so another search of word ( O(subgroup_size*letters) ) will tell you, if the solution does exist. And also you will have both nodes (starting and ending word).

Now you should do a “path between nodes” search, I’m not sure if the shortest one is required, or any will do. For these situations something like A* path-finding algorithm can be used.

I didn’t check A* lately, so just from my head some idea about such algorithm, basically searching the graph from both ends, in a deep/wide way by word_distance:

You have to create some int distance[subgroup_size] = { -1 }; and set distance[start_node_index] = distance[end_node_index] = 0;. distance will be distance (number of nodes) from starting point.
Also set int color[subgroup_size] = { -1 }; and color[start_node_index] = 0; color[end_node_index] = 1; color marks whether distance belongs to the starting node, or to the ending node.

Put both indices in the deque.

Now till the deque is not empty, pick the index out of it and it’s color.

For all it’s neighbours:

  • first check if the neighbour has already distance set for different color. If yes, you have found some path (a node somewhere between starting and ending node). If you don’t need shortest one, you can start building the answer by searching from current node for it’s color neighbour with distance – 1, and from the different color neighbour search it’s distance – 1 neighbour (till you find both start and end nodes on each end.
  • if you need shortest path, you should store as best_found the current middle node, with the total distance (distance of current node + distance of neighbour) – if already set best_found is not shorter, then continue.
  • if neighbour distance is already set, but with same color, either continue (any path), or check if you came here with better distance (for shortest path search). If better distance can be achieved, reset the distance to new one, and put the neighbour again into deque.
  • distance of neighbour not set yet case:
  • color the neighbour (with same color as current node)
  • set distance to current distance + 1.
  • calculate word_distance from opposite color word (= how many letters the neighbour word share with ending word (for color 0), or with starting word (color 1)), insert the node into deque ahead of all words with worse word_distance (as you have 4 letter words, it would make sense to use 5 deques for each possible word_distance, actually just 4 of them, because having 4 letters same means you have found the ending word, so that case is not possible).

Continue with other neighbours of current node, then go back to loop by fetching next current node from deque (if you have 4 of them, then empty the one with best word_distance first), till the (shortest) path is found.

The word_distance is supplying some orientation in the path finding (absolute coordinate distance in A* when used with geometric coordinates) – it’s more likely you will find shorter path with words which are becoming closer to the other end of path.

I will end here, as it already took some time to write this, unfortunately I will not provide you with working source.


Hmm… ok, one bit of source just to make it look a bit less like “wall of text” (although I’m afraid it’s too late for that 😀 ).

private LinkedList<String> neighbours(String word) {
    int size = word.length();
    LinkedList<String> neighbours = new LinkedList<>();
    for(String validWord: wordLengths.get(size)){
        if (validWord.equals(word)) continue;
        int differentLetterCount = 0;
        for(int i = 0; i < size; i++) {
            // In original source you have "contains" test.
            // So is "tire" neighbour with "rate"?!
            // I think this is bug, and you wanted this:
            if ((word.charAt(i) != validWord.charAt(i)) &&
                (2 == ++differentLetterCount)) {
                break; // early exit optimization
            }
        }
        if (1 == differentLetterCount) neighbours.add(validWord);
    }
    return neighbours;
}

I was thinking whether searching for neighbours for new word can be optimized by searching only trough “neighbours of neighbours of first-found-neighbour”, but after short tinkering with it I think this is not valid, this would maybe work for all 4-letter permutations forming complete graph, not for regular words when plenty of possible 4 letter permutations are not available.


I found another bug in your source:
if (word.length() > MAX_WORD_LENGTH && word.length() < MIN_WORD_LENGTH) { will be always false.

Unrelated to your bug, I have read an advice somewhere to always use only “<“, “<=”, “==” and “!=” in comparisons. It felt strange for few weeks, but once you get used to it, it really makes easier to read sources, as you know the values should only increase from left to right, if the expression is true.

So I would write your expression as: if (word.length() < MIN_WORD_LENGTH || MAX_WORD_LENGTH < word.length()) {.


Example implementation of those ideas in this post and comments:
ped.7gods.org/WordLadder.zip

Meanwhile, lured by the fun of figuring out algorithm, and writing the code, I completely went off the path of reviewing your code instead. Sorry Jonathan.

And to make it even worse, my source is not as clean as to be proper example in every possible way, but I have it now for few days sitting on disc and can’t get to cleaning it up more, so I’m releasing it at least for the sake of algorithm example, as is.

Leave a Reply

Your email address will not be published. Required fields are marked *