Converting from strings to byte arrays with LINQ in C#

Posted on

Problem

I’m taking in a string of input from the command line, and when prefixed by 0o or 8# interpreting it as an octal string. I’d like to convert it to a byte array more directly, but I’m not sure how to perform the bit carrying in LINQ.

All three of these methods are fully working; you can checkout the repository or just download the built executable and run it from the command line if need be.

I’d like a review of all three working methods, but more specifically I’d like to have the Octal method, below, not use a BitArray intermediary, similar to the Binary and Hex methods.

Here’s how I’m doing it for hexadecimal (mostly LINQ):

public static byte[] GetHexBytes(this string hex, bool preTrimmed = false)
{
    if (!preTrimmed)
    {
        hex = hex.Trim();
        if (hex.StartsWith("0x", StringComparison.OrdinalIgnoreCase))
            hex = hex.Substring(2);
        else if (hex.StartsWith("16#"))
            hex = hex.Substring(3);
    }

    if (hex.Length % 2 != 0) hex = hex.PadLeft(hex.Length + 1, '0');

    return Enumerable.Range(0, hex.Length)
            .Where(x => x % 2 == 0)
            .Select(x => Convert.ToByte(hex.Substring(x, 2), 16))
            .ToArray();
}

And here’s binary (mostly LINQ):

public static byte[] GetBinaryBytes(this string binary, bool preTrimmed = false)
{
    if (!preTrimmed)
    {
        binary = binary.Trim();
        if (binary.StartsWith("0b", StringComparison.OrdinalIgnoreCase) || binary.StartsWith("2#"))
            binary = binary.Substring(2);
    }

    if (binary.Length % 8 != 0) binary = binary.PadLeft(binary.Length + 8 - binary.Length % 8, '0');

    return Enumerable.Range(0, binary.Length)
            .Where(x => x % 8 == 0)
            .Select(x => Convert.ToByte(binary.Substring(x, 8), 2))
            .ToArray();
}

And here’s what I’ve got for Octal (LINQ, then a BitArray, then more LINQ):

public static byte[] GetOctalBytes(this string octal, bool preTrimmed = false)
{
    if (!preTrimmed)
    {
        octal = octal.Trim();
        if (octal.StartsWith("0o", StringComparison.OrdinalIgnoreCase) || octal.StartsWith("8#"))
            octal = octal.Substring(2);
    }

    octal = octal.TrimStart('0');
    if (octal.Length == 0)
        octal = "0";

    BitArray bits = new BitArray(octal
        .Reverse()
        .SelectMany(x =>
            {
                byte value = (byte)(x - '0');
                return new bool[] { (value & 0x01) == 1, (value & 0x02) == 2, (value & 0x04) == 4 };
            })
        .ToArray());

    byte[] bytes = new byte[bits.Length / 8 + 1];
    bits.CopyTo(bytes, 0);

    bytes = bytes.Reverse().SkipWhile(b => b == 0x00).ToArray();
    if (bytes.Length == 0)
        bytes = new byte[] { 0x00 };

    return bytes;
}

I don’t like using the BitArray intermediary, but I don’t know how to do it without it. If possible, I’d like the whole conversion in a single LINQ statement like the hex and binary.

This is part of a C# console application for computing hashes. Here’s a link to the relevant source file on Github.

Solution

Edit: corrected an issue with leading zero bytes in the result.

I’ll focus on the GetOctalBytes method in this review.

Also, for the explanation I assume that the string is processed right-to-left (reverse order) and similarily, the resulting byte array starts with the lowest byte at index 0. Those assumptions will be represented / corrected in the final code.

For octal strings, a group of 8 characters forms a group of up to 3 result bytes. So for each complete group, take 3 bytes. The following table shows important properties for different character counts of a group.

(in-group character count) (bits count) (bytes needed) (affected byte indices in group)
1                           3           1              0
2                           6           1              0
3                           9           2              0, 1
4                          12           2                 1
5                          15           2                 1
6                          18           3                 1, 2
7                          21           3                    2
8                          24           3                    2

The total bytes needed can be calculated as (octal.Length * 3 + 7) / 8. However, for length 3 or 6, the actually needed byte count can be lower, if the character value needs only 2 or 1 of its bits. So for these cases, the number of bytes can be lowered. Since a new group starts every 8 characters and each group needs 3 bytes, the starting index of the current byte group in the result array can be calculated as (character_index / 8) * 3. The affected in-group indices are calculated from the remainder:

remainder = (character_index % 8);
if (remainder < 3) group-index 0 affected
if (remainder >= 2 && remainder < 6) group-index 1 affected
if (remainder >= 5) group-index 2 affected

In order to avoid leading zero bytes, the character value has to be considered additionally for the second and third byte.

I hope these explanations are enough introduction for the following code suggestion:

Note: I ditched the linq solution in favor of a more readable solution with less array-recreation.

public static byte[] GetOctalBytes(this string octal, bool preTrimmed = false)
{
    if (!preTrimmed)
    {
        octal = octal.Trim();
        if (octal.StartsWith("0o", StringComparison.OrdinalIgnoreCase) || octal.StartsWith("8#"))
            octal = octal.Substring(2);
    }

    octal = octal.TrimStart('0');
    if (octal.Length == 0)
        return new byte[] { 0 };

    var arrayLength = (octal.Length * 3 + 7) / 8;
    var inGroup = (octal.Length % 8);
    if ((inGroup == 3 && octal[0] < '4') ||
        (inGroup == 6 && octal[0] < '2'))
    {
        --arrayLength;
    }

    var result = new byte[arrayLength];

    for (int i = 0; i < octal.Length; i++)
    {
        var baseIndex = (i / 8) * 3;
        var shift = (i % 8) * 3;
        var valueInGroup = (octal[octal.Length - i - 1] - '0') << shift;

        result[result.Length - baseIndex - 1] |= (byte)(valueInGroup & 0xff);
        if (valueInGroup > 0xff)
            result[result.Length - baseIndex - 2] |= (byte)((valueInGroup >> 8) & 0xff);
        if (valueInGroup > 0xffff)
            result[result.Length - baseIndex - 3] |= (byte)((valueInGroup >> 16) & 0xff);
    }

    return result;
}

Leave a Reply

Your email address will not be published. Required fields are marked *