Problem
For string similarity, it is defined as longest common prefix length. For example, “abc” and “abd” is 2, and “aaa” and “aaab” is 3.
The problem is calculate the similarity of string S and all its suffixes, including itself as the first suffix.
For example, for S=”ababaa”, suffixes are “ababaa”, “babaa”, “abaa”,”baa”,”aa”
and “a”, the similarity are 6+0+3+0+1+1=11.
My code works and I am looking for smarter ideas to be more efficient.
# Complete the function below.
from collections import defaultdict
class TrieNode:
def __init__(self):
self.children=defaultdict(TrieNode)
self.isEnd=False
class TrieTree:
def __init__(self):
self.root=TrieNode()
def insert(self, word):
node = self.root
for w in word:
node = node.children[w]
node.isEnd = True
def search(self, word):
node = self.root
count = 0
for w in word:
node = node.children.get(w)
if not node:
break
else:
count += 1
return count
def StringSimilarity(inputs):
resultFormat=[]
for word in inputs:
# build Trie tree
index = TrieTree()
index.insert(word)
result = 0
# search for suffix
for i in range(len(word)):
result += index.search(word[i:])
print result
resultFormat.append(result)
return resultFormat
Solution
One thing I’d suggest is to allow a word to be inserted inside __init__
, rather than force the user to always call .insert(word)
. You can still allow for creating an empty TrieTree
. Just make word
an optional parameter, and ignore it if it’s empty:
def __init__(self, word=''):
self.root = TrieNode()
if word:
self.insert(word)
now you could create your tree in one line:
index = TrieTree(word)