A Trie data structure in Java

Posted on

Problem

this is the first time I’ implementing a Trie. My goal was to make it as simple as possible. So here is the result:
Can I simplify anything else? Are there some things I can change to increase performance ? Also any feedback on code quality/design is welcome 🙂
Node.java

package trie;

public class Node {
    final Node[] children = new Node[26];
    boolean isEnd = false;
}

Trie.java

package trie;

import java.util.ArrayList;
import java.util.List;

public class Trie {
    static final int ALPHABET_SIZE = 26;
    final Node root = new Node();

    void add(String str) {
        Node curr = root;
        for (int i = 0; i < str.length(); i++) {
            int idx = str.charAt(i) - 'a';
            if (curr.children[idx] == null) {
                curr.children[idx] = new Node();
            }
            curr = curr.children[idx];
        }
        curr.isEnd = true;
    }

    Node getNode(String str) {
        Node node = root;
        for (int i = 0; i < str.length(); i++) {
            Node child = node.children[str.charAt(i) - 'a'];
            if (child == null) {
                return null;
            }
            node = child;
        }
        return node;
    }

    boolean contains(String str) {
        Node node = getNode(str);
        return node != null && node.isEnd;
    }

    List<String> prefixedWords(String str) {
        Node curr = getNode(str);
        List<String> prefixedWords = new ArrayList<>();
        DFS(curr, str, prefixedWords);
        return prefixedWords;
    }

    void DFS(Node root, String prefix, List<String> list) {
        if (root != null) {
            if (root.isEnd) {
                list.add(prefix);
            }
            for (int i = 0; i < ALPHABET_SIZE; i++) {
                if (root.children[i] != null) {
                    DFS(root.children[i], prefix + (char) (i + 'a'), list);
                }
            }
        }
    }
}

Solution

Magic numbers

    final Node[] children = new Node[26];

Consider changing this to

    final Node[] children = new Node[Trie.ALPHABET_SIZE];

Now you don’t have to manually change the array size if you change the alphabet size. It may not matter here (we are unlikely to add a letter to the alphabet and you may not need to change alphabets), but it is good practice to set things up so that the system makes them the same.

Range-based for loops

        for (int i = 0; i < str.length(); i++) {
            int idx = str.charAt(i) - 'a';

This could be

        for (char c : str.toCharArray()) {
            int idx = c - 'a';

Then you don’t have to maintain i manually.

You also may want to check that idx is in the range 0 to 25. You currently just trust that that will be so.

Early exit

    void DFS(Node root, String prefix, List<String> list) {
        if (root != null) {

This could be

    static void DFS(Node root, String prefix, List<String> list) {
        if (root == null) {
            return;
        }

This way you don’t have to enclose the entire method inside an if. You can just bail out immediately.

The method should be static because it doesn’t use any of the state of the trie. In fact, one of its parameters has the same name as an object field.

Performance

I don’t see anything that leaps out at me performance wise. Tries are best at prefix searches and relatively slow and memory intensive to build. For many uses, a HashSet (for full word searches) or even an ArrayList is faster. We need a long run to amortize out the build time enough to give faster prefix lookups.

Anyway, time your actual usage to see if this solution is faster than other solutions.

Leave a Reply

Your email address will not be published. Required fields are marked *