Problem
I’m taking in a string of input from the command line, and when prefixed by 0o
or 8#
interpreting it as an octal string. I’d like to convert it to a byte array more directly, but I’m not sure how to perform the bit carrying in LINQ.
All three of these methods are fully working; you can checkout the repository or just download the built executable and run it from the command line if need be.
I’d like a review of all three working methods, but more specifically I’d like to have the Octal method, below, not use a BitArray
intermediary, similar to the Binary and Hex methods.
Here’s how I’m doing it for hexadecimal (mostly LINQ):
public static byte[] GetHexBytes(this string hex, bool preTrimmed = false)
{
if (!preTrimmed)
{
hex = hex.Trim();
if (hex.StartsWith("0x", StringComparison.OrdinalIgnoreCase))
hex = hex.Substring(2);
else if (hex.StartsWith("16#"))
hex = hex.Substring(3);
}
if (hex.Length % 2 != 0) hex = hex.PadLeft(hex.Length + 1, '0');
return Enumerable.Range(0, hex.Length)
.Where(x => x % 2 == 0)
.Select(x => Convert.ToByte(hex.Substring(x, 2), 16))
.ToArray();
}
And here’s binary (mostly LINQ):
public static byte[] GetBinaryBytes(this string binary, bool preTrimmed = false)
{
if (!preTrimmed)
{
binary = binary.Trim();
if (binary.StartsWith("0b", StringComparison.OrdinalIgnoreCase) || binary.StartsWith("2#"))
binary = binary.Substring(2);
}
if (binary.Length % 8 != 0) binary = binary.PadLeft(binary.Length + 8 - binary.Length % 8, '0');
return Enumerable.Range(0, binary.Length)
.Where(x => x % 8 == 0)
.Select(x => Convert.ToByte(binary.Substring(x, 8), 2))
.ToArray();
}
And here’s what I’ve got for Octal (LINQ, then a BitArray
, then more LINQ):
public static byte[] GetOctalBytes(this string octal, bool preTrimmed = false)
{
if (!preTrimmed)
{
octal = octal.Trim();
if (octal.StartsWith("0o", StringComparison.OrdinalIgnoreCase) || octal.StartsWith("8#"))
octal = octal.Substring(2);
}
octal = octal.TrimStart('0');
if (octal.Length == 0)
octal = "0";
BitArray bits = new BitArray(octal
.Reverse()
.SelectMany(x =>
{
byte value = (byte)(x - '0');
return new bool[] { (value & 0x01) == 1, (value & 0x02) == 2, (value & 0x04) == 4 };
})
.ToArray());
byte[] bytes = new byte[bits.Length / 8 + 1];
bits.CopyTo(bytes, 0);
bytes = bytes.Reverse().SkipWhile(b => b == 0x00).ToArray();
if (bytes.Length == 0)
bytes = new byte[] { 0x00 };
return bytes;
}
I don’t like using the BitArray
intermediary, but I don’t know how to do it without it. If possible, I’d like the whole conversion in a single LINQ statement like the hex and binary.
This is part of a C# console application for computing hashes. Here’s a link to the relevant source file on Github.
Solution
Edit: corrected an issue with leading zero bytes in the result.
I’ll focus on the GetOctalBytes
method in this review.
Also, for the explanation I assume that the string is processed right-to-left (reverse order) and similarily, the resulting byte array starts with the lowest byte at index 0. Those assumptions will be represented / corrected in the final code.
For octal strings, a group of 8 characters forms a group of up to 3 result bytes. So for each complete group, take 3 bytes. The following table shows important properties for different character counts of a group.
(in-group character count) (bits count) (bytes needed) (affected byte indices in group)
1 3 1 0
2 6 1 0
3 9 2 0, 1
4 12 2 1
5 15 2 1
6 18 3 1, 2
7 21 3 2
8 24 3 2
The total bytes needed
can be calculated as (octal.Length * 3 + 7) / 8
. However, for length 3
or 6
, the actually needed byte count can be lower, if the character value needs only 2
or 1
of its bits. So for these cases, the number of bytes can be lowered. Since a new group starts every 8 characters and each group needs 3 bytes, the starting index of the current byte group in the result array can be calculated as (character_index / 8) * 3
. The affected in-group indices are calculated from the remainder:
remainder = (character_index % 8);
if (remainder < 3) group-index 0 affected
if (remainder >= 2 && remainder < 6) group-index 1 affected
if (remainder >= 5) group-index 2 affected
In order to avoid leading zero bytes, the character value has to be considered additionally for the second and third byte.
I hope these explanations are enough introduction for the following code suggestion:
Note: I ditched the linq solution in favor of a more readable solution with less array-recreation.
public static byte[] GetOctalBytes(this string octal, bool preTrimmed = false)
{
if (!preTrimmed)
{
octal = octal.Trim();
if (octal.StartsWith("0o", StringComparison.OrdinalIgnoreCase) || octal.StartsWith("8#"))
octal = octal.Substring(2);
}
octal = octal.TrimStart('0');
if (octal.Length == 0)
return new byte[] { 0 };
var arrayLength = (octal.Length * 3 + 7) / 8;
var inGroup = (octal.Length % 8);
if ((inGroup == 3 && octal[0] < '4') ||
(inGroup == 6 && octal[0] < '2'))
{
--arrayLength;
}
var result = new byte[arrayLength];
for (int i = 0; i < octal.Length; i++)
{
var baseIndex = (i / 8) * 3;
var shift = (i % 8) * 3;
var valueInGroup = (octal[octal.Length - i - 1] - '0') << shift;
result[result.Length - baseIndex - 1] |= (byte)(valueInGroup & 0xff);
if (valueInGroup > 0xff)
result[result.Length - baseIndex - 2] |= (byte)((valueInGroup >> 8) & 0xff);
if (valueInGroup > 0xffff)
result[result.Length - baseIndex - 3] |= (byte)((valueInGroup >> 16) & 0xff);
}
return result;
}