Problem
Implement a MapSum class with insert, and sum methods.
For the method insert, you’ll be given a pair of (string, integer).
The string represents the key and the integer represents the value. If
the key already existed, then the original key-value pair will be
overridden to the new one.For the method sum, you’ll be given a string representing the prefix,
and you need to return the sum of all the pairs’ value whose key
starts with the prefix.
Example 1: Input: insert("apple", 3), Output: Null Input: sum("ap"), Output: 3 Input: insert("app", 2), Output: Null Input: sum("ap"), Output: 5
Please comment on performance and style
using System;
using System.Collections.Generic;
using System.Runtime.InteropServices;
using Microsoft.VisualStudio.TestTools.UnitTesting;
namespace TrieQuestions
{
[TestClass]
public class TrieMapSum
{
[TestMethod]
public void MapSumTest()
{
MapSum mapSum = new MapSum();
mapSum.Insert("apple", 3);
Assert.AreEqual(3, mapSum.Sum("ap"));
mapSum.Insert("app", 2);
Assert.AreEqual(5, mapSum.Sum("ap"));
}
}
public class MapSum
{
private TrieMSNode Head;
/** Initialize your data structure here. */
public MapSum()
{
Head = new TrieMSNode();
}
public void Insert(string key, int val)
{
var current = Head;
foreach (var letter in key)
{
if (!current.Edges.ContainsKey(letter))
{
current.Edges.Add(letter, new TrieMSNode());
}
current = current.Edges[letter];
}
current.IsTerminal = true;
current.Value = val;
}
public int Sum(string prefix)
{
int sum = 0;
var current = Head;
foreach (var letter in prefix)
{
if (!current.Edges.ContainsKey(letter))
{
return sum;
}
current = current.Edges[letter];
}
// we use dfs for each edges of the trie;
return DFS(current);
}
private int DFS(TrieMSNode current)
{
if (current == null)
{
return 0;
}
int sum = current.IsTerminal ? current.Value : 0;
foreach (var edge in current.Edges.Values)
{
sum += DFS(edge);
}
return sum;
}
}
internal class TrieMSNode
{
public Dictionary<char, TrieMSNode> Edges { get; set; }
public bool IsTerminal;
public int Value;
public TrieMSNode()
{
Edges = new Dictionary<char, TrieMSNode>();
IsTerminal = false;
Value = 0;
}
}
}
Solution
In Insert(...)
you should use TryGetValue
instead of ContainsKey
for efficiency:
foreach (var letter in key)
{
if (!current.Edges.TryGetValue(letter, out var edge))
{
edge = current.Edges[letter] = new TrieMSNode();
}
current = edge;
}
Name your methods after what they do, not after their implementation:
DFS(...)
could be GetSumFrom(...)
public int Sum(string prefix)
{
int sum = 0;
var current = Head;
foreach (var letter in prefix)
{
if (!current.Edges.ContainsKey(letter))
{
return sum;
}
current = current.Edges[letter];
}
Here the sum
variable is redundant because it is never changed so you could just return 0
from the loop
Again the loop can be optimized:
foreach (char letter in prefix)
{
if (!current.Edges.TryGetValue(letter, out current))
return 0;
}
DFS()
can be simplified using LINQ:
private int GetSumFrom(TrieMSNode current)
{
if (current == null)
{
return 0;
}
return current.Edges.Aggregate(current.Value, (sum, kvp) => sum + GetSumFrom(kvp.Value));
}
You really don’t have to check for IsTerminal
because you know that only terminal nodes have a values different from 0
.
You should test this LINQ-approach against your own plain foreach
-loop, and you’ll maybe find the latter fastest.
I don’t have much to add to Henrik’s answer at a low level, but I think you’ve missed the point of the exercise at a high level.
The spec could be implemented as simply as
public class MapSum
{
private readonly IDictionary<string, int> data = new Dictionary<string, int>();
public void Insert(string key, int val) => data[key] = val;
public int Sum(string prefix) => data.Sum(kvp => kvp.Key.StartsWith(prefix) ? kvp.Value : 0);
}
The reason you instead used a trie was to avoid looping over all of the entries in Sum
. But in the worst case DFS
will do exactly that. The efficient solution is to add a field to TrieMSNode
which caches the sum of the entire subtree rooted at that node. Updating it doesn’t affect the asymptotic cost of Insert
, just the constant. And returning it allows you to ditch DFS
entirely, so that Sum
is guaranteed to run in time linear in the length of the prefix.
The time complexity of Sum
is O(mn),
because in the worst case it may need to enumerate all nodes.
An O(m) implementation is possible with O(n) extra space, and a bit of extra work in Insert
:
- Add a
Dictionary<string, int>
to track the current values of the keys, let’s call itvalues
. - Use
TrieMSNode.Value
to store the sum of all values below each node - When inserting:
- First check if the key replaces an existing key. Let’s store in
int prev
the previous value of the key, or 0 if there was no previous value (the key is new). - Traverse the trie letter by letter as before, and for each node traversed, subtract
prev
and addval
. - Set the new value in
values
.
- First check if the key replaces an existing key. Let’s store in
Notice that the added work in Insert
does not increase the time or space complexity.
public void Insert(string key, int val)
{
var prev = values.ContainsKey(key) ? values[key] : 0;
values[key] = val;
var current = Head;
foreach (var letter in key)
{
if (!current.Edges.ContainsKey(letter))
{
current.Edges.Add(letter, new TrieMSNode(val));
}
else
{
current.Edges[letter].Value -= prev;
current.Edges[letter].Value += val;
}
current = current.Edges[letter];
}
}
And now, the implementation of Sum
can become simpler and faster:
public int Sum(string prefix)
{
var current = Head;
foreach (var letter in prefix)
{
if (!current.Edges.TryGetValue(letter, out current))
{
return 0;
}
}
return current.Value;
}
Also notice that in this implementation the TrieMSNode.IsTerminal
is no longer necessary.