# Finding Isogram Word

Posted on

Problem

An isogram (also known as a “nonpattern word”) is a word or phrase without a repeating letter, however spaces and hyphens are allowed to appear multiple times.

The `IsIsogram()` method takes a string and returns boolean value.

1. INPUT: anik OUTPUT: true
2. INPUT: A nika OUTPUT: false

Approach 01: `Using Dictionary<TKey, TValue>`

``````static bool IsIsogram(string str)
{
//create a dictionary with 26 keys and assign false as default value
var storeOccurance = new Dictionary<int, bool>();
for (int i = 0; i <= 25; i++)
{
storeOccurance[i] = false;
}

str = Regex.Replace(str, "[ -]", ""); //ignore whitespace and dash

//iterate over each character of the string
foreach (var ch in str.ToLower())
{
//(ch - 'a') is used to calculate the key. Check if the character key has false as it's value,
//meaning they were never stored
if (!storeOccurance[ch - 'a'])
{
storeOccurance[ch - 'a'] = true; //assign true when they are stored in the dictionary
}
else //otherwise, if a character key has true as it's value then it means that it exists in the dictionary
{
return false; //return false and exits the iteration..Multiple occurance in dictionary
//means the string has repeating character so it's not Isogram
}
}
return true;
}
``````

Approach 02: `Using HashSet<T>`

``````static bool IsIsogram(string str)
{
//otherwise returns true and add the character to the datastructure

var charHash = new HashSet<char>();

foreach (var ch in str.ToLower())
{
//check if the charHash.Add() returns false, if it does then terminate
//as it means the character already exists
return false;
}
return true; //otherwise return true
}
``````

Approach 03: Using LINQ

``````static bool IsIsogram(string str)
{                                    //Suppose "anik a" is a given string
return str.Where(char.IsLetter) //filter only the letters ||here- only a, n, i, k, a will be taken
.GroupBy(char.ToLower) //create group by characters(and transform them to lowercase) || here, there will be a group for each character a, n, i, k
.All(g => g.Count() == 1); //for every group, count the number of it's element and check if it's 1
//|| here, 'a' group has 2 elements so it return false though other groups has only one element in their group
}
``````

Given those above approaches:

1. Which approach would you recommend regarding readability, clean code and performance?
2. Is there any scope for improvement in the existing approaches?

Solution

I like the second approach since it allows you to exit as soon as a false condition is reached and is simpler than the first. I would offer one minor optimization, keep the check for isletter separate and only check for duplicate if isletter is true:

``````static bool IsIsogram(string str)
{
//otherwise returns true and add the character to the datastructure

var charHash = new HashSet<char>();

foreach (var ch in str.ToLower())
{
//check if the charHash.Add() returns false, if it does then terminate
//as it means the character already exists
if(char.IsLetter(ch))
{
return false;
}

}
return true; //otherwise return true
}
``````

### Choice of Map Key

Ultimately, as you are querying your map based upon the character, you should really make your key as the character. This makes your regular subtraction of `'a'` unnecessary.

``````    var storeOccurance = new Dictionary<char, bool>();
for (char c = 'a'; c <= 'z'; c++)
{
storeOccurance[c] = false;
}
``````

### Inconsistent behaviour for non-letter characters

Given a string such as `anik00`, the first approach will produce a `false` response, as the duplicate 0s are detected like any other letter. The other two approaches will produce `true`, as 0s are ignored.

I don’t think these comments are necessary, as the code is clear enough on its own. If you feel the need to add comments to explain what something is doing, you should extract methods instead to make these intentions clear.

### Which approach would I choose?

The litmus test is how easy it is to understand. I’ve not seen any serious LINQ for a number of years, and despite that I can understand your LINQ query without difficulty. If you can create understandable code in fewer lines, it’s a good thing.

I think i would go for LINQ approach. For this simple operation, it is short and easy to understand. Don’t reinvent the wheel, use LINQ.

Here is another shorter version of LINQ :

``````public static bool IsIsogram(string input)
{
var preprocessedInput = Regex.Replace (input.ToLower(), "[ -]", "").Dump();
return preprocessedInput.Length == preprocessedInput.Distinct().Count();
}
``````

But i would like to introduce another approach which is recursive. There is no need to use HashSets or Dictionaries within the implementation.

``````public static bool IsIsogram( string input){
if (string.IsNullOrEmpty(input))
{
throw new ArgumentNullException(nameof(input));
//or we can return true , no character,  no repeated character :)
}
var preprocessedInput = Regex.Replace(input.ToLower(), "[ -]", "");
if (input.Length==1)
{
return true;
}
var firstHalf = preprocessedInput.Substring(0,preprocessedInput.Length/2);
var secondHalf = preprocessedInput.Remove(0,preprocessedInput.Length/2);

if (firstHalf.Intersect(secondHalf).Any())
{
return false;
}

return IsIsogram(firstHalf) && IsIsogram(secondHalf);
}
``````

The input string is divided into two string and checked for any intersected characters. If there is an intersecting character then, returns false.
If there is not any intersected character then, each string divided into two substrings and called the method for each recursively.

I suggest using an array of boolean, so instead of

``````var storeOccurance = new Dictionary<int, bool>();
``````

use simply

``````var storeOccurance = new bool[26];
``````

Although if you want to allow a large character set instead of just a-z, say the whole of unicode, the HashSet approach may be appropriate, although in that case you would have to consider surrogates ( the situation where a character is represented by two chars ).

The third approach is definitely the one to start with. It’s the simplest one and unless you don’t have to analyze an entire library of texts it would probably be sufficient in 99.99% use-cases.

Without measuring/benchmarking every presented approach it’s not possible to say which one would be the fastest one. It also rarely matters. Don’t overengineer it if you don’t have a good explanation for doing so.

Posted in C#