# String similarity calculation

Posted on

Problem

For string similarity, it is defined as longest common prefix length. For example, “abc” and “abd” is 2, and “aaa” and “aaab” is 3.

The problem is calculate the similarity of string S and all its suffixes, including itself as the first suffix.

For example, for S=”ababaa”, suffixes are “ababaa”, “babaa”, “abaa”,”baa”,”aa”
and “a”, the similarity are 6+0+3+0+1+1=11.

My code works and I am looking for smarter ideas to be more efficient.

``````# Complete the function below.
from collections import defaultdict

class TrieNode:
def __init__(self):
self.children=defaultdict(TrieNode)
self.isEnd=False
class TrieTree:
def __init__(self):
self.root=TrieNode()
def insert(self, word):
node = self.root
for w in word:
node = node.children[w]
node.isEnd = True
def search(self, word):
node = self.root
count = 0
for w in word:
node = node.children.get(w)
if not node:
break
else:
count += 1
return count

def  StringSimilarity(inputs):
resultFormat=[]
for word in inputs:
# build Trie tree
index = TrieTree()
index.insert(word)
result = 0
# search for suffix
for i in range(len(word)):
result += index.search(word[i:])
print result
resultFormat.append(result)

return resultFormat
``````

Solution

One thing I’d suggest is to allow a word to be inserted inside `__init__`, rather than force the user to always call `.insert(word)`. You can still allow for creating an empty `TrieTree`. Just make `word` an optional parameter, and ignore it if it’s empty:

``````def __init__(self, word=''):
self.root = TrieNode()
if word:
self.insert(word)
``````

now you could create your tree in one line:

``````index = TrieTree(word)
``````