# Convert hex string to byte array

Posted on

Problem

The goal is to convert a hex string to a byte array with the following requirements:

• O(1)

$O\left(1\right)$

$O(1)$ additional space apart from input and output.

• O(n)

$O\left(n\right)$

$O(n)$ runtime

This mostly just prohibits creating a new string with a 0 prepended to avoid having to deal with odd strings.

private static byte[] ConvertHexToBytes(string input) {
var result = new byte[(input.Length + 1) / 2];
var offset = 0;
if (input.Length % 2 == 1) {
// If length of input is odd, the first character has an implicit 0 prepended.
result = (byte)Convert.ToUInt32(input + "", 16);
offset = 1;
}
for (int i = 0; i < input.Length / 2; i++) {
result[i + offset] = (byte)Convert.ToUInt32(input.Substring(i * 2 + offset, 2), 16);
}
return result;
}


Can we write this nicer, possibly using Linq (and some chained iterables) in C#?

Test code that covers I think all requirements:

private static void TestConversions() {
var inputs = new[] { "",
"0",
"10",
"f",
"0f",
"010",
"0ff",
"f2ab"
};
var i = 0;
Debug.Assert(ConvertHexToBytes(inputs[i++]).SequenceEqual(new byte[] {  }));
Debug.Assert(ConvertHexToBytes(inputs[i++]).SequenceEqual(new byte[] { 0 }));
Debug.Assert(ConvertHexToBytes(inputs[i++]).SequenceEqual(new byte[] { 0x10 }));
Debug.Assert(ConvertHexToBytes(inputs[i++]).SequenceEqual(new byte[] { 0xf }));
Debug.Assert(ConvertHexToBytes(inputs[i++]).SequenceEqual(new byte[] { 0xf }));
Debug.Assert(ConvertHexToBytes(inputs[i++]).SequenceEqual(new byte[] { 0x0, 0x10 }));
Debug.Assert(ConvertHexToBytes(inputs[i++]).SequenceEqual(new byte[] { 0x0, 0xff }));
Debug.Assert(ConvertHexToBytes(inputs[i++]).SequenceEqual(new byte[] { 0xf2, 0xab }));
Debug.Assert(i == inputs.Length);
}


Solution

In situations like this, when dealing with odd-offset values, and byte-manipulation, I recommend four things:

1. use a logical frame of reference.
2. get familiar with bit-wise operations.
3. pre-computing results at compile time is very efficient at runtime.
4. switch statements are high-performance lookup tables

## End-of-data frame of reference

What I mean by frame-of-reference, is that your common reference point between the input string, and the output array, is the last character, and the last member of the byte[] array. You should line up your reference points, and work out from there. In this case, it means working backwards.

The next thing that you have as a frame of reference, is your input value. You use this value to drive the calculation of the output array size, and also to drive the iteration in the loop. In this case, what it means is that you should be using the input chars to drive the loop, not the size / 2.

Putting this together, consider a loop that iterates from the last char, to the first char, and then populates the last byte, to the first byte. We can then do math from the end of the input/output arrays based on the loop.

## Bitwise manipulations

There are some bitwise manipulation tricks you can do here that help:

• Because exactly 2 input chars are required to populate a byte, and because we are working in a base-2 (binary) system, we can take advantage of a trick in counting…. As we count from the end of the input chars to the beginning, we notice that every time we count 2, we move another output byte. If we divide the count by 2, we get the relative position from the output end.

• Remember, a right-shift is the same as dividing by 2 (integer division):

int x = 10;
x = x >> 1; // right shift once...
// x now is 5.

• Bit-wise ANDing of a value with 1 tells you what the last bit is.

int x = 5;
x = x & 1;
// x now is 1

• Bit-wise ORing of two values sets the respective output bits to 1 if either of the input bits are 1.

int a = 0x0a;
int b = 0xb0;
int c = a | b;
// c now is 0xba;


## switch statements for fast lookups

The compiler is able to optimize a switch statement very effectively. Consider the following switch statement that converts a character in to the corresponding hex value. This is very efficient because it is based on fancy compile-time logic. It takes some effort to code, but the results are worth it:

private static int HexToInt(char c)
{
switch (c) {
case '0':
return 0;
case '1':
return 1;
case '2':
return 2;
case '3':
return 3;
case '4':
return 4;
case '5':
return 5;
case '6':
return 6;
case '7':
return 7;
case '8':
return 8;
case '9':
return 9;
case 'a':
case 'A':
return 10;
case 'b':
case 'B':
return 11;
case 'c':
case 'C':
return 12;
case 'd':
case 'D':
return 13;
case 'e':
case 'E':
return 14;
case 'f':
case 'F':
return 15;
default:
throw new FormatException("Unrecognized hex char " + c);
}
}


On a character-by-character basis this will be faster than the use of ToUInt32()

## Lookup Tables in memory

Again, there are only 16 hex values, this makes the memory usage very small, and you can do some memory lookup tricks.

## Putting it together

1. use a loop that counts back from the end of the input/output. Then you don’t need special odd-length input handling.
2. identify whether an input char is going to be a high or low nibble in the result.
3. use a lookup table (low/high nibble) to find a byte value for that input.
4. use a bit-wise OR to add the low and high nibbles together
5. use bitwise AND and bitwise right-shift to convert the input character to the output byte/nibble position.

The following code is a ‘simple’ loop, that is efficient, and works for your input cases:

private static readonly byte[,] ByteLookup = new byte[,]
{
// low nibble
{0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f},
// high nibble
{0x00, 0x10, 0x20, 0x30, 0x40, 0x50, 0x60, 0x70, 0x80, 0x90, 0xa0, 0xb0, 0xc0, 0xd0, 0xe0, 0xf0}
};

private static byte[] ConvertHexToBytesX(string input) {
var result = new byte[(input.Length + 1) >> 1];
int lastcell = result.Length - 1;
int lastchar = input.Length - 1;
// count up in characters, but inside the loop will
// reference from the end of the input/output.
for (int i = 0; i < input.Length; i++) {
// i >> 1    -  (i / 2) gives the result byte offset from the end
// i & 1     -  1 if it is high-nibble, 0 for low-nibble.
result[lastcell - (i >> 1)] |= ByteLookup[i & 1, HexToInt(input[lastchar - i])];
}
return result;
}


Thus, you can accomplish the conversion with efficient switches, array lookups, and ‘simple’ math in the loops.

I have put together an Ideone implementation of this code so you can compare it with your code.