How to replace the given email address with a value in a given input?

Posted on

Problem

I have written the below method to replace some of the email domains like @gmail.com and @yahoo.com with a given text.

public static string RemovePersonalInfo(string input)
{
    string[] tokens = input.Split(new char[] { ' ', 't', 'r', 'n' });
    string output = string.Empty;
    foreach (string token in tokens)
    {
        if (token.Contains("@gmail.com"))
        {
            output += " SOMETEXT";
        }
        else
        {
            output += " " + token;
        }
    }
    tokens = output.Split(new char[] { ' ', 't', 'r', 'n' });
    output = string.Empty;
    foreach (string token in tokens)
    {
        if (token.Contains("@yahoo.com"))
        {
            output += " SOMETEXT";
        }
        else
        {
            output += " " + token;
        }
    }
    return output;
}

It is working as expected for the below input.

But I don’t think it is a good solution, I can see the improvements in the code like it is not scalable, let’s see tomorrow some other email domain comes, I will have to again modify the code and write another if condition. the second improvement is that I am running the loop twice, it can be done in one loop. so performance can be improved.

Or if there is any better approach than this, please suggest.

Input: test@gmail.com test@abc.com @teest@yahoo.com

Output: SOMETEXT test@abc.com SOMETEXT

Note: I am not supposed to use the Replace method. So the only intention here is to use the same logic in basic programming languages like C and C++ as well.

Solution

  • Performance wise using string concatination (output += ) inside a loop isn’t the best choice because for each output += a new string is created because strings are immutable. Using a StringBuilder would be the way to go. But instead of returning a string, you could return an IEnumerable<string> which would make the code more flexible.

  • Passing an IEnumerable<string> as method parameter holding the not wanted domains will make the code open for needed changes.

  • Passing an IEnumerable<string> as method parameter holding the tokens would enable the method to only do one thing.

  • Because the method is public you should have proper parameter validation.

  • Using a little bit of Linq would compact the code a lot.

  • Instead of hardcoding “SOMETEXT” you could pass an optional parameter.

Implementing the mentioned changes using string as return-type would lead to

public static string RemovePersonalInfo(string input)
{
    if (input == null) { throw new NullReferenceException(nameof(input)); }
    if (string.IsNullOrWhiteSpace(input)) { return input; }

    return RemovePersonalInfo(input.Split(new char[] { ' ', 't', 'r', 'n' }), new string[] { "@gmail.com", "@yahoo.com" });

}
private static string RemovePersonalInfo(IEnumerable<string> tokens, IEnumerable<string> domains, string replacement = "SOMETEXT")
{
    return string.Join(" ", tokens.Select(token => (domains.Any(domain => token.Contains(domain)) ? replacement : token)));
}  

and could be called like before.

I am the one who added the Homework tag to the post. I will answer your question under the premise that you are a student and that perhaps your instructor has not yet covered some C# features to you but will as the curriculum progresses.

@Hescacher gave the standard advice that continually modifying the same string is slow and inefficient. Instead, a StringBuilder object should be used. This is sound advice.

However, there is an alternative to using StringBuilder. Since you use String.Split to create an intermediate string array tokens, I will just modify the element in the array. Later, a String.Join will produce the output. Note that in order to modify an individual token, the loop cannot be a foreach. Instead it must use a for with an index variable.

Let’s briefly cover one of the many strong principles of good coding: DON’T REPEAT YOURSELF, or DRY. You posted here because you obviously saw the duplicated code in your method. Your small test has only 2 email domains but what if you have 200? Were you planning to just repeat the code 200 times?

As you attempt to keep your code DRY, you will discover that you can do so by passing in parameters. This makes your code more modular, with smaller methods singular in purpose. This touches upon the Single Responsibility Principle.

public static string[] EmailDomainFilters => new string[] { "@gmail.com", "@yahoo.com" };

public static string RemovePersonalInfo(string input, string[] filters, bool ignoreCase = false, string replacement = "SOMETEXT")
{
    if (string.IsNullOrWhiteSpace(input))
    {
        return string.Empty;
    }
    if (filters == null)
    {
        throw new ArgumentNullException(nameof(filters));
    }
    string[] tokens = input.Split(new char[] { ' ', 't', 'r', 'n' });
    StringComparison comparer = ignoreCase ? StringComparison.OrdinalIgnoreCase : StringComparison.Ordinal;
    // Cannot use a foreach tokens since the token may be modified inside the loop.
    for (int i = 0; i < tokens.Length; i++)
    {
        foreach (string filter in filters)
        {
            if (tokens[i].IndexOf(filter, comparer) >= 0)
            {
                tokens[i] = replacement;
                break; 
            }
        }
    }
    return string.Join(" ", tokens);
}

public static void TestFilters()
{
    string input = "test@gmail.com test@abc.com @teest@yahoo.com";
    string output = RemovePersonalInfo(input, EmailDomainFilters, ignoreCase: true);
    Console.WriteLine(output);
}

Let’s review what we did to make it DRY.

One, the email domain filters have been saved to a read-only property, and they are passed into the RemovePersonalInfo method. There is now an inner foreach loop to check each email domain. Once we find one, we apply the replacement and continue to the next token.

A reality of a developer’s life is some data will be bad. What if someone input “TEST@GMAIL.COM”? Should we replace it’s text or not? To account for string casing, I added a parameter on whether you should ignore or honor it. This keeps the method flexible.

Another flexible feature is the parameter for the replacement text. Maybe one day you want it to be “REDACTED” or “OBFUSCATED”. I get it that a student tries to hit exact requirements of the lesson. But you can hit it and still be flexible in how its coded.

Intermediate or advanced … your instructor has a lesson plan and some topics are to be covered in the future. My answer here uses a lot of string arrays. If you were not a student, my answer have changed slightly with:

public static IList<string> EmailDomainFilters => new string[] { "@gmail.com", "@yahoo.com" };

public static string RemovePersonalInfo(string input, IEnumerable<string> filters, bool ignoreCase = false, string replacement = "SOMETEXT")
{ . . . }

But those may be topics that your instructor does not want you exposed to just yet.

SUMMARY

  • Keep it DRY.
  • Use Single Responsibility Principle for modular code.
  • Use parameters for flexibility.
  • Expect bad data and have checks for it.

UPDATE

My original answer used a continue for the inner foreach loop. It has been changed to break.

Leave a Reply

Your email address will not be published. Required fields are marked *