Develop custom Hash Map

Posted on

Problem

I’ve got a task to implement custom hash map with only get, put, size functions without remove and implementing base Map interface in Java. I’ve made one but not sure if it is good one.

Can You check it out please and maybe give some tips? https://github.com/AlistratenkoNikita/CustomHashMap

package com.alistratenko.customMap;

/**
 * An interface for custom hashmap with open addressing
 * An object that maps int keys to long values.
 *
 * @author Nikita
 * @version 1.0
 * @since 26.08.2017
 */
public interface CustomMap {
    /**
     * Returns the value to which the specified key is mapped, or 0 if this map contains no mapping for the key.
     *
     * @param key the key whose associated value is to be returned
     * @return the value to which the specified key is mapped, or null if this map contains no mapping for the key
     */
    long get(int key);

    /**
     * Associates the specified value with the specified key in this map. If the map previously contained a
     * mapping for the key, the new value will be put in free spot in the array which will be found with linear probing.
     *
     * @param key   key with which the specified value is to be associated
     * @param value value to be associated with the specified key
     */
    void put(int key, long value);

    /**
     * Returns the number of key-value mappings in this map.
     *
     * @return the number of key-value mappings in this map
     */
    int size();
}


package com.alistratenko.customMap;

/**
 * Base implementation of CustomMap interface
 *
 * @author Nikita
 * @version 1.0
 * @since 26.08.2017
 */
public class CustomMapOpenAddressing implements CustomMap {
    private Pair[] array;
    private static final int DEFAULT_CAPACITY = 8;
    private int size;

    public CustomMapOpenAddressing() {
        array = new Pair[DEFAULT_CAPACITY];
        size = 0;
    }

    /**
     * {@inheritDoc}
     */
    public long get(int key) {
        return array[hash(key)] != null ? array[hash(key)].getValue() : 0;
    }

    /**
     * {@inheritDoc}
     */

    public void put(int key, long value) {
        //the index of the array cell to store the value
        int index = hash(key);
        //if the standard index of the array cell is busy linear probing takes place
        if (array[index] != null) {
            int tryNumber = 1;
            do {
                index = linearProbing(index, tryNumber);
            } while (array[index] != null);
        }
        array[index] = new Pair(key, value);
        size++;
        //rehashes the map if the number of elements crosses 0.75 times the capacity
        if (array.length * 0.75 < size) {
            rehashing();
        }
    }

    /**
     * {@inheritDoc}
     */
    public int size() {
        return size;
    }

    /**
     * Returns the hashcode(array index) for a key in this array
     *
     * @param key a value to get hashcode for
     * @return the hashcode(array index) for a key in this array
     */
    private int hash(int key) {
        return Integer.hashCode(key) % array.length;
    }

    /**
     * Rehashes the {@link CustomMap}, increases its capacity in 1.5 times and copies elements from old array to a new one
     */
    private void rehashing() {
        int newCapacity = (int) (this.array.length * 1.5);
        Pair[] oldArr = this.array;
        this.array = new Pair[newCapacity];
        //setting number of the elements in the new array to zero
        //the value will be restored after copying all elements from old array to the new one
        size = 0;
        for (Pair pair : oldArr) {
            if (pair != null) {
                put(pair.getKey(), pair.getValue());
            }
        }
    }

    /**
     * Looks up array cell one by one
     *
     * @param hash the hash of the value
     * @param i    the number of the linear probing element
     * @return the index of the array cell
     */
    private int linearProbing(int hash, int i) {
        return (hash + i) % array.length;
    }

    /**
     * Represents a pair of key-value set for a hashmap
     */
    protected class Pair {
        private final int key;
        private long value;

        Pair(int key, long value) {
            this.key = key;
            this.value = value;
        }

        int getKey() {
            return key;
        }

        long getValue() {
            return value;
        }

        @Override
        public String toString() {
            return "Pair{" +
                    "key=" + key +
                    ", value=" + value +
                    '}';
        }

    }
}


public class MapTest {

    private CustomMap map;

    @Before
    public void init() {
        map = new CustomMapOpenAddressing();
    }

    @Test
    public void testMapForCasualInput() {
        map.put(0, 0);
        map.put(1, 1);
        map.put(2, 2);
        assertEquals(3, map.size());
    }

    @Test
    public void testMapForDuplicates1() {
        map.put(0, 0);
        map.put(8, 8);
        assertEquals(map.size(), 2);
    }

    @Test
    public void testMapForDuplicates2() {
        map.put(0, 0);
        map.put(1, 1);
        map.put(8, 8);
        assertEquals(map.size(), 3);
        assertEquals(map.get(0), 0);
        assertEquals(map.get(1), 1);
        assertEquals(map.get(8), 0);
    }

    @Test
    public void testMapForEmptyValues() {
        map.put(0, 0);
        assertTrue(map.get(1) == 0);
    }

    @Test
    public void testMapForRehashing() {
        for (int i = 0; i < 100_000; i++) {
            map.put(i, i);
        }
        assertEquals(100_000, map.size());
    }
}

Solution

Code review

Advice 1
package com.alistratenko.customMap;: According to general Java conventions, the package names should not use uppercase characters. Consider renaming to, say, package com.alistratenko.custom_map;.

Advice 2

public long get(int key) {
    return array[hash(key)] != null ? array[hash(key)].getValue() : 0;
}

Here, you return a long 0 whenever the key is not present in the map. Basically, that arrangement may confuse the possible user since if they get 0, they are not quite sure which is true: 1) the key was explicitly mapped to zero, 2) the key is not present. In order to circumvent, I would throw an exception whenever the key is not in the map, and would not rely on returning a sentinel value.

Advice 3
class CustomMapOpenAddressing; I would rephrase the class name a bit, how about OpenAddressingCustomMap?

Advice 4

private int hash(int key) {
    return Integer.hashCode(key) % array.length;
}

You can write:

private int hash(int key) {
    return key % array.length;
}

since Integer.hashCode returns its argument unmodified.

Speaking of which, your implementation will throw ArrayIndexOutOfBoundsException on most negative keys. Just put the key into Math.abs and you are ready to go. However, as Roland Illig pointed out, the Math.abs of Integer.MIN_VALUE is negative, so we have to deal with this corner case:

private int hash(int key) {
    // Math.abs of Integer.MIN_VALUE < 0.
    if (key == Integer.MIN_VALUE) {
        return 0;
    }

    return Math.abs(key) % array.length;
}

Advice 5

protected class Pair: redeclare as private static final class Pair. The problem here is the fact that each Pair will contain an explicity reference to the outer object. static removes those unnecessary references and helps to save memory a little bit. Also, I believe Entry is a more conventional name.

First of all, it’s good to see some documentation and tests! I don’t use Java a lot so I won’t say much about that specifically, but let’s have a look:

  • The documentation of an interface should describe what implementations need to do, not how they need to do it. Things like ‘open addressing’ and ‘linear probing’ are implementation details that belong to the CustomMapOpenAddressing class, they should not be mentioned in the CustomMap documentation.
  • It would be useful to see the reasoning behind certain decisions, like when to resize (you’ve chosen 75% capacity) or how to resolve collisions (you’ve chosen open addressing and linear probing).
  • Some method names aren’t very clear:
    • the main purpose of rehashing is to resize the hash map, so I’d call it resize.
    • Likewise, I’d rename testMapForRehashing to testMapForResizing.
    • hash is used to determine an entry’s index, so I’d probably call that getIndex instead.
    • testMapForDuplicates1 is testing for hash collisions, not duplicate keys or values, so I’d call it testMapForHashCollisions.
  • The linearProbing method doesn’t actually do any probing, the loop that calls it does. I’d refactor this into a getFirstEmptyIndex(int desiredIndex) method.
  • coderodde already said this, but returning 0 if a key doesn’t exist is probably not a good idea: it’s a valid value, so calling code cannot distinguish between ‘key does not exist’ and ‘key is mapped to 0’. Throwing an exception and/or providing a containsKey method would solve that problem.
  • There’s a bug in get: it doesn’t take the open addressing/probing into account. This means that hash collisions cause wrong results. The testMapForDuplicates2 test method actually triggers this bug, but the test is also broken, because it expects map.get(8) to return 0, while it clearly called map.put(8, 8).
  • Besides a broken test, some of the other tests can also be improved:
    • The documentation of put specifically mentions that calling it again for the same key will overwrite the existing value, but there is no test for that.
    • The resize test does not verify that get still returns the correct value for each key – just checking the size is not enough.
    • All tests always use the same value for both key and value. This means that a bug that mixes up keys and values won’t be detected.
    • I would remove the testMapForDuplicates1 test, because testMapForDuplicates2 does the same thing, and more.

Leave a Reply

Your email address will not be published. Required fields are marked *