Problem
A trie for handling strings for an autocomplete dictionary. This passes my fairly casual tests, though it’s always possible that there are broken edge cases, but I’m mainly concerned about design and efficiency: is this a sensible way to implement this data structure? (In particular, is it sensible to do the insert method recursively?). It feels a little gnarly to me.
struct Trie {
var root: Node
init() {
root = Node()
}
func search(_ word: String) -> Bool {
let letters = Array(word)
var curnode = root
for letter in letters {
guard let match = curnode.children.first(where: {(key, _) in
key == letter
})
else {
return false
}
curnode = match.value
}
if curnode.contained {
return true
}
return false
}
func remove(_ word: String) {
let letters = Array(word)
var curnode = root
for letter in letters {
if !curnode.children.contains(where: {(key, _) in
key == letter
}) {
break
} else {
curnode = curnode.children[letter]!
}
}
curnode.contained = false
}
func insert(_ letters: [Character], parent: Node) -> Node {
if letters.count == 1 {
let letter = letters[0]
if parent.children.contains(where: {(key, _) in
key == letter
}) {
let newNode = parent
newNode.children[letter]!.contained = true
return newNode
} else {
let newNode = Node(letter, final: true)
return newNode
}
} else {
let first = letters[0]
let rest = Array(letters.dropFirst())
if let subtree = parent.children.first(where: {(key, _) in
key == first
}) {
let newNode = Node(first, final: subtree.value.contained, kids: subtree.value.children)
newNode.children[rest[0]] = insert(rest, parent: newNode)
return newNode
} else {
let newNode = Node(first, final: false)
newNode.children[rest[0]] = insert(rest, parent: newNode)
return newNode
}
}
}
mutating func insert(_ word: String) {
let new_subtree = insert(Array(word), parent: root)
root.children[new_subtree.char!] = new_subtree
}
}
class Node {
var char: Character?
var children: [Character:Node]
var contained: Bool
init() {
char = nil
children = [:]
contained = false
}
init(_ c: Character, final: Bool) {
children = [:]
contained = final
char = c
}
init(_ c: Character, final: Bool, kids: [Character:Node]) {
children = kids
contained = final
char = c
}
}
Clarification edit: this isn’t meant to be production ready, is only meant to be a concise and straightforward implementation of the data structure. So I intentionally left off checks for problem input; it should only be able to handle sensible inputs right now, where “sensible” means “input that someone using the data structure correctly to add, look for, and remove data would intentionally provide.” So, not empty strings, removing stuff that isn’t there, inserting the same thing twice, etc. etc.
The casual test cases I used, meant to represent sensible behavior, are:
var testTrie = Trie()
testTrie.insert("cat")
testTrie.search("cat") // T
testTrie.search("car") // F
testTrie.insert("car")
testTrie.search("car") // T
testTrie.search("cat") // T
testTrie.search("ca") // F
testTrie.search("cad") // F
testTrie.search("carburetor") // F
testTrie.insert("carburetor")
testTrie.search("carburetor") // T
testTrie.search("car") // T
testTrie.search("cat") // T
testTrie.search("ca") // F
testTrie.remove("car")
testTrie.search("carburetor") // T
testTrie.search("car") // F
testTrie.search("cat") // T
Solution
String traversal
At several places in your code you convert a string to an array of its characters in order to iterate over it:
let letters = Array(word)
for letter in letters {
// ...
}
These intermediate arrays are not needed. A Swift string is a collection of characters, so that you can iterate over it simply with
for letter in word {
// ...
}
Dictionary access
In the search()
method you look up the node for a character with
guard let match = curnode.children.first(where: {(key, _) in
key == letter
})
else {
return false
}
curnode = match.value
Similar patterns are also in the other methods. This dictionary lookup can be simplified using a subscript:
guard let node = curnode.children[letter] else {
return false
}
curnode = node
Returning boolean values
Code like
if someCondition {
return true
}
return false
can always be simplified to
return someCondition
which is shorter and clearer. The search method then looks like this:
func search(_ word: String) -> Bool {
var curnode = root
for letter in word {
guard let node = curnode.children[letter] else {
return false
}
curnode = node
}
return curnode.contained
}
Removing non-existent strings
Removing a string which has never been inserted currently has undesired side effects:
var trie = Trie()
trie.insert("a")
trie.remove("ab")
print(trie.search("a")) // false
That is easy to fix: As soon as the traversal does not find a node for the next character it should return instead of setting curnode.contained = false
on the last node encountered:
func remove(_ word: String) {
var curnode = root
for letter in word {
guard let node = curnode.children[letter] else {
return // <--- HERE
}
curnode = node
}
curnode.contained = false
}
Mutating (or not?) methods
The mutating
keyword in
mutating func insert(_ word: String)
is not needed: Node
is a reference type so that the properties of root
can be modified without making the method mutating. For the same reason, the property can be declared as a constant:
struct Trie {
let root: Node
// ...
}
Use substrings!
The main insert method create a array of all characters:
let new_subtree = insert(Array(word), parent: root)
and the recursive helper methods repeatedly creates more arrays of the remaining characters:
let rest = Array(letters.dropFirst())
That is very inefficient. The better approach is that the helper method takes a Substring
argument:
func insert(_ letters: Substring, parent: Node) -> Node
so that it can call itself with
let rest = letters.dropFirst()
insert(rest, parent: newNode)
This is called “slicing” in Swift and very efficient because the substrings share the element storage with the original string and no copies are made.
The main insert method then calls the helper method with a substring comprising all its characters:
func insert(_ word: String) {
let new_subtree = insert(word[...], parent: root)
// ...
}
Simplify the insertion method (and the Node type)
I found the insertion code difficult to understand. It also has some problems (which you are already aware of):
- It is not possible to insert an empty string.
- It is not possible to insert the same string twice.
To be honest: I cannot see which cases are handled correctly and which are not.
What I also do not like is the var char: Character?
property of Node
. Apparently this is needed to insert a newly created subtree at the right position of the parent’s children
dictionary. But
- it introduces some redundancy,
- it is not clear in which cases it can be
nil
(only in the root node?), - it requires forced unwrapping.
Doing the insertion recursively is fine. But if we create new nodes before the recursive call with the rest of the string then everything becomes much simpler:
func insert(_ word: Substring, node: Node) {
if let letter = word.first {
if let nextnode = node.children[letter] {
insert(word.dropFirst(), node: nextnode)
} else {
let newnode = Node()
node.children[letter] = newnode
insert(word.dropFirst(), node: newnode)
}
} else {
node.contained = true
}
}
func insert(_ word: String) {
insert(word[...], node: root)
}
The char
property is not needed anymore, i.e. that type simplifies to
class Node {
var children: [Character: Node] = [:]
var contained: Bool = false
}
More advantages:
- The recursion terminates when the string is empty, not when it is a single character. As a consequence, inserting an empty string works now.
- Inserting the same string twice works as well.
The same can be done with iteration instead of recursion:
func insert(_ word: String) {
var curnode = root
for letter in word {
if let nextnode = curnode.children[letter] {
curnode = nextnode
} else {
let newnode = Node()
curnode.children[letter] = newnode
curnode = newnode
}
}
curnode.contained = true
}
That is a matter of taste, but it is shorter and makes even the substrings obsolete.
Naming
You use different naming conventions in your code:
curnode, newNode, new_subtree
The Swift naming convention is camelcase (upper camelcase for types, and lower camelcase for everything else):
currentNode, newNode, newSubtree
I would also prefer char
or character
over letter
: A Swift string can contain arbitrary Unicode characters, not only “letters.”