Composite GetHashCode function

Posted on

Problem

I’ve built a small library for comparing objects based on their members here (related SO question).

There’s a need to compute hash function in order to conform to IEqualityComparer<T> interface. In my case this function has to be computed based on the hash functions of members. So the main problem is how to compose them.

I use the following approach currently:

public int GetHashCode(T obj)
{
    VerifyHaveMembersSetup();

    if (TypeTester<T>.IsNull(obj))
    {
        return 0;
    }

    if (getHashCodeBehavior == GetHashCodeBehavior.Cache && cachedHashCode.HasValue)
    {
        return cachedHashCode.Value;
    }

    int hashCode = 0;

    for (int i = 0; i < memberSetups.Count; ++i)
    {
        int leftShift = i % 32;
        int rightShift = 32 - leftShift;

        int memberHashCode = memberSetups[i].GetMemberHashCode(obj);

        hashCode = hashCode ^ ((memberHashCode << leftShift) | (memberHashCode >> rightShift));
    }

    if (getHashCodeBehavior == GetHashCodeBehavior.ImmutableCheck
        && cachedHashCode.HasValue && cachedHashCode.Value != hashCode)
    {
        throw new InvalidOperationException("Hash code value changed");
    }

    cachedHashCode = hashCode;

    return hashCode;
}
  1. Are there any problems with this code?

  2. I don’t like the way many people implement composite functions. With multiplication and addition, significant bits are shifted away and are lost.

That’s why I use bitshift operators << and >>. Still I’m not completely sure how good would be a hash function generated such way. Especially when it’s computed over small collection of memberSetups of bools for example (since bool has only 2 hash values in .NET: 1 and 0).

Solution

Shifting the values before xor:ing them is somewhat better than the worst thinkable ways of combining the hash codes, but it’s not very good. You will easily get combinations that cancel each other out, so you get hash codes with a distribution heavily skewed towards zero.

Multiplying values by a prime number on the other hand gives a pretty good distribution. Back in the day it was actually used to produce random numbers, partly because of how it spreads the values reasonably even over the number range.

A method that would give an even better distribution would be to use an advanced hashing algorithm like MD5 or CRC32, but the drawback would be that they are a lot slower. You would lose much of the advantage of using a hash code in the first place.

All in all, multiplying by a prime number gives a very good distribution in relation to the processing time needed. It’s a compromise well suited for the GetHashCode method.

If I’m not wrong, your hashcode starts using the most significant bit only when memberHashCode << leftShift is using it. This may very well never be the case, if memberSetups.Count is low, and the hashcode of the object you’re using is not perfect. For example, ints or bool.

As so, in the worst cases, your hashes are only spread a bit better than your inputs over al the integers, which is not 100% satisfying. Maybe instead of ++iin the loop, you could go with i+=32/memberSetups.Countto fasten the process of using the most significant bits (even if that’s still not perfect).

Leave a Reply

Your email address will not be published. Required fields are marked *