# Converting from strings to byte arrays with LINQ in C#

Posted on

Problem

I’m taking in a string of input from the command line, and when prefixed by `0o` or `8#` interpreting it as an octal string. I’d like to convert it to a byte array more directly, but I’m not sure how to perform the bit carrying in LINQ.

All three of these methods are fully working; you can checkout the repository or just download the built executable and run it from the command line if need be.

I’d like a review of all three working methods, but more specifically I’d like to have the Octal method, below, not use a `BitArray` intermediary, similar to the Binary and Hex methods.

Here’s how I’m doing it for hexadecimal (mostly LINQ):

``````public static byte[] GetHexBytes(this string hex, bool preTrimmed = false)
{
if (!preTrimmed)
{
hex = hex.Trim();
if (hex.StartsWith("0x", StringComparison.OrdinalIgnoreCase))
hex = hex.Substring(2);
else if (hex.StartsWith("16#"))
hex = hex.Substring(3);
}

if (hex.Length % 2 != 0) hex = hex.PadLeft(hex.Length + 1, '0');

return Enumerable.Range(0, hex.Length)
.Where(x => x % 2 == 0)
.Select(x => Convert.ToByte(hex.Substring(x, 2), 16))
.ToArray();
}
``````

And here’s binary (mostly LINQ):

``````public static byte[] GetBinaryBytes(this string binary, bool preTrimmed = false)
{
if (!preTrimmed)
{
binary = binary.Trim();
if (binary.StartsWith("0b", StringComparison.OrdinalIgnoreCase) || binary.StartsWith("2#"))
binary = binary.Substring(2);
}

if (binary.Length % 8 != 0) binary = binary.PadLeft(binary.Length + 8 - binary.Length % 8, '0');

return Enumerable.Range(0, binary.Length)
.Where(x => x % 8 == 0)
.Select(x => Convert.ToByte(binary.Substring(x, 8), 2))
.ToArray();
}
``````

And here’s what I’ve got for Octal (LINQ, then a `BitArray`, then more LINQ):

``````public static byte[] GetOctalBytes(this string octal, bool preTrimmed = false)
{
if (!preTrimmed)
{
octal = octal.Trim();
if (octal.StartsWith("0o", StringComparison.OrdinalIgnoreCase) || octal.StartsWith("8#"))
octal = octal.Substring(2);
}

octal = octal.TrimStart('0');
if (octal.Length == 0)
octal = "0";

BitArray bits = new BitArray(octal
.Reverse()
.SelectMany(x =>
{
byte value = (byte)(x - '0');
return new bool[] { (value & 0x01) == 1, (value & 0x02) == 2, (value & 0x04) == 4 };
})
.ToArray());

byte[] bytes = new byte[bits.Length / 8 + 1];
bits.CopyTo(bytes, 0);

bytes = bytes.Reverse().SkipWhile(b => b == 0x00).ToArray();
if (bytes.Length == 0)
bytes = new byte[] { 0x00 };

return bytes;
}
``````

I don’t like using the `BitArray` intermediary, but I don’t know how to do it without it. If possible, I’d like the whole conversion in a single LINQ statement like the hex and binary.

This is part of a C# console application for computing hashes. Here’s a link to the relevant source file on Github.

Solution

Edit: corrected an issue with leading zero bytes in the result.

I’ll focus on the `GetOctalBytes` method in this review.

Also, for the explanation I assume that the string is processed right-to-left (reverse order) and similarily, the resulting byte array starts with the lowest byte at index 0. Those assumptions will be represented / corrected in the final code.

For octal strings, a group of 8 characters forms a group of up to 3 result bytes. So for each complete group, take 3 bytes. The following table shows important properties for different character counts of a group.

``````(in-group character count) (bits count) (bytes needed) (affected byte indices in group)
1                           3           1              0
2                           6           1              0
3                           9           2              0, 1
4                          12           2                 1
5                          15           2                 1
6                          18           3                 1, 2
7                          21           3                    2
8                          24           3                    2
``````

The total `bytes needed` can be calculated as `(octal.Length * 3 + 7) / 8`. However, for length `3` or `6`, the actually needed byte count can be lower, if the character value needs only `2` or `1` of its bits. So for these cases, the number of bytes can be lowered. Since a new group starts every 8 characters and each group needs 3 bytes, the starting index of the current byte group in the result array can be calculated as `(character_index / 8) * 3`. The affected in-group indices are calculated from the remainder:

``````remainder = (character_index % 8);
if (remainder < 3) group-index 0 affected
if (remainder >= 2 && remainder < 6) group-index 1 affected
if (remainder >= 5) group-index 2 affected
``````

In order to avoid leading zero bytes, the character value has to be considered additionally for the second and third byte.

I hope these explanations are enough introduction for the following code suggestion:

Note: I ditched the linq solution in favor of a more readable solution with less array-recreation.

``````public static byte[] GetOctalBytes(this string octal, bool preTrimmed = false)
{
if (!preTrimmed)
{
octal = octal.Trim();
if (octal.StartsWith("0o", StringComparison.OrdinalIgnoreCase) || octal.StartsWith("8#"))
octal = octal.Substring(2);
}

octal = octal.TrimStart('0');
if (octal.Length == 0)
return new byte[] { 0 };

var arrayLength = (octal.Length * 3 + 7) / 8;
var inGroup = (octal.Length % 8);
if ((inGroup == 3 && octal[0] < '4') ||
(inGroup == 6 && octal[0] < '2'))
{
--arrayLength;
}

var result = new byte[arrayLength];

for (int i = 0; i < octal.Length; i++)
{
var baseIndex = (i / 8) * 3;
var shift = (i % 8) * 3;
var valueInGroup = (octal[octal.Length - i - 1] - '0') << shift;

result[result.Length - baseIndex - 1] |= (byte)(valueInGroup & 0xff);
if (valueInGroup > 0xff)
result[result.Length - baseIndex - 2] |= (byte)((valueInGroup >> 8) & 0xff);
if (valueInGroup > 0xffff)
result[result.Length - baseIndex - 3] |= (byte)((valueInGroup >> 16) & 0xff);
}

return result;
}
``````