# Hash Table implementation in C using linear probing for collisions

Posted on

Problem

I’ve written a simple Hash Table implementation in C, in order to use it in an IRC bot, so mostly storing nicknames, channel names, etc (small strings). I’m using linear probind to resolve collision.

Here is the public API (`hash.h`):

``````#pragma once

#include <stdbool.h>

typedef struct hash {
struct hash_node* nodes;
size_t capacity;
size_t size;
} hash_t;

typedef struct hash_node {
char *key;
void *value;
} hash_node_t ;

hash_t* hash_create(size_t capacity);
void hash_destroy(hash_t *hash);

bool hash_set(hash_t *hash, const char *key, void *value);
void* hash_get(hash_t *hash, const char *key);

``````

And here is the implementation (`hash.c`):

``````#include <stddef.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>

#include "hash.h"

// Implementation of the 64 bits version of the FNV1-a hashing algorithm
static uint64_t hash_fnv1a(const char *str) {
uint64_t hash = UINT64_C(14695981039346656037);
while (*str) {
hash ^= (uint64_t) *str++;
hash *= UINT64_C(1099511628211);
}
return hash;
}

static hash_node_t *hash_node_get(hash_node_t *nodes, size_t nodes_size, const char *key) {
uint64_t index = hash_fnv1a(key) % nodes_size;
hash_node_t *node = &nodes[index];
while (node->key != NULL) {
if (strcmp(node->key, key) == 0) {
return node;
}
if (++index == nodes_size)
index = 0;
node = &nodes[index];
}
// If we did not find this key, we return the node which can be used to store this key.
return node;
}

bool hash_grow(hash_t* hash, size_t new_capacity) {
hash_node_t *new_nodes = calloc(new_capacity, sizeof(hash_node_t));
if (new_nodes == NULL)
return false;
for (size_t i = 0; i < hash->capacity; i++) {
hash_node_t *node = &hash->nodes[i];
if (node->key != NULL) {
hash_node_t *new_node = hash_node_get(new_nodes, new_capacity, node->key);
*new_node = *node;
}
}
free(hash->nodes);
hash->nodes = new_nodes;
hash->capacity = new_capacity;
return true;
}

hash_t *hash_create(size_t capacity) {
hash_t *hash = malloc(sizeof(hash_t));
if (hash == NULL)
return NULL;
hash->size = 0;
hash->capacity = capacity;
hash->nodes = calloc(capacity, sizeof(hash_node_t));
if (hash->nodes == NULL) {
free(hash);
return NULL;
}
return hash;
}

void hash_destroy(hash_t *hash) {
for (size_t i = 0; i < hash->capacity; i++) {
hash_node_t *node = hash->nodes + i;
if (node->key != NULL) {
free(node->key);
}
}
free(hash->nodes);
free(hash);
}

bool hash_set(hash_t *hash, const char *key, void *value) {
hash_node_t *node = hash_node_get(hash->nodes, hash->capacity, key);
if (node->key != NULL) {
node->value = value;
return true;
}

if (hash->size >= hash->capacity / 2) {
if (!hash_grow(hash, hash->capacity * 2)) {
return false;
}
// After growing the hash, the node reference has been freed,
// so we get it again from the new nodes array.
node = hash_node_get(hash->nodes, hash->capacity, key);
}

node->key = strdup(key);
if (node->key == NULL) {
return NULL;
}
node->value = value;
hash->size++;
return true;
}

void *hash_get(hash_t *hash, const char *key) {
return hash_node_get(hash->nodes, hash->capacity, key)->value;
}
``````

I’m mostly satisfied with it, except the `hash_grow` function which I find a bit complicated to understand. So I asked myself a few questions regarding this implementation:

1. Should I use `size_t` for my nodes array ? In my case this hash table will never contain a lot of entries, so I could have use a `uint32_t` because I don’t need the extra space on 64 bits architecture. In this case I can use the 32 bits implement of FNV1-a.

2. Is storing the result of the `hash_fnv1a` hash inside the `hash_node_t` is useful in order to avoid computing it again in `hash_grow` ? In this case I would just need to recompute the “modulo” because the key didn’t change.

3. I used linear probing to resolve collisions, it’s easier and I don’t think that for my use case it would make a big difference.

4. My structs are public because I have unit tests (with cmocka) and I need to check internal
states in my tests.

I’m a professional developer, but not in C. I do this just for fun, so please don’t be too rude with me. Thanks for your comments, I’m really excited to get a review.

Solution

Just a few points for now, I will extend my answer over the weekend:

1. I think you should think about the max size of your table anyway. I mean you can’t have 4 billion nodes in there.
2. This is connected to the size. If you only have to resize your table maybe once or twice, it’s probably not worth storing the hash value. It’s a trade off between speed and size.
3. I think keeping things simple is key
4. you could have a file hash_priv.h to keep the structure private.

Two more things:

• letting the user choose the table size doesn’t make sense. I would let your lib choose the table size because with the table size you can improve the collision rate. To reduce collision rate you want the table size to be a prime number. Maybe the user could give you a hint for the size and you choose the next higher prime number as the actual size.
• check the arguments for NULL in the public functions