String similarity calculation

Posted on

Problem

For string similarity, it is defined as longest common prefix length. For example, “abc” and “abd” is 2, and “aaa” and “aaab” is 3.

The problem is calculate the similarity of string S and all its suffixes, including itself as the first suffix.

For example, for S=”ababaa”, suffixes are “ababaa”, “babaa”, “abaa”,”baa”,”aa”
and “a”, the similarity are 6+0+3+0+1+1=11.

My code works and I am looking for smarter ideas to be more efficient.

# Complete the function below.
from collections import defaultdict

class TrieNode:
    def __init__(self):
        self.children=defaultdict(TrieNode)
        self.isEnd=False
class TrieTree:
    def __init__(self):
        self.root=TrieNode()
    def insert(self, word):
        node = self.root
        for w in word:
            node = node.children[w]
        node.isEnd = True
    def search(self, word):
        node = self.root
        count = 0
        for w in word:
            node = node.children.get(w)
            if not node:
                break
            else:
                count += 1
        return count

def  StringSimilarity(inputs):
    resultFormat=[]
    for word in inputs:
        # build Trie tree
        index = TrieTree()
        index.insert(word)
        result = 0
        # search for suffix
        for i in range(len(word)):
            result += index.search(word[i:])
        print result
        resultFormat.append(result)

    return resultFormat

Solution

One thing I’d suggest is to allow a word to be inserted inside __init__, rather than force the user to always call .insert(word). You can still allow for creating an empty TrieTree. Just make word an optional parameter, and ignore it if it’s empty:

def __init__(self, word=''):
    self.root = TrieNode()
    if word:
        self.insert(word)

now you could create your tree in one line:

index = TrieTree(word)

Leave a Reply

Your email address will not be published. Required fields are marked *