Problem
I’d like to make parsing small samples of data more efficient (from the developer point of view). This means, instead of writing the parsing logic each time from scratch when I encounter something as simple as Car4
or Ticket#123/22APR
It’d be great to have something that I can reuse.
So I was experimenting with dynamics and tuples. I didn’t like the first one because then you need to cast every property and you lose compilation time checking. Tuples seemed like a better choice but they are not perfect (yet). They lack a feature that would make the following code much prettier which is, you cannot actually use reflection to access the property names. Because of that, I had to add a list of their names as params string[]
.
Example
The idea is to have a general purpose Parse
extension taking a regex and based on group names map matches to tuple properties. The generic arguments specify the variable types and the names after the pattern specify the order of the properties (regex groups might be in different order).
var (none, _) = "".Parse<string, object>(@"(?<Name>(?i:[a-z]+))", "name", "count");
var (name, count) = "John3".Parse<string, int?>(@"(?<Name>(?i:[a-z]+))(?<Count>d+)?", "name", "count");
none.Dump(); // null
name.Dump(); // John
count.Dump(); // 3
Implementaion
The user API is simple. Just a couple of Parse
extensions. Internally they try to parse the input
and then call the appropriate Deconstructor
.
public static class StringExtensions
{
public static Deconstructor<T1, T2> Parse<T1, T2>(this string input, string pattern, params string[] propertyNames)
{
return new Deconstructor<T1, T2>(input.Parse(pattern), propertyNames);
}
public static Deconstructor<T1, T2, T3> Parse<T1, T2, T3>(this string input, string pattern, params string[] propertyNames)
{
return new Deconstructor<T1, T2, T3>(input.Parse(pattern), propertyNames);
}
private static IDictionary<string, string> Parse(this string input, string pattern)
{
var match = Regex.Match(input, pattern, RegexOptions.ExplicitCapture);
return
match.Success
? match
.Groups
.Cast<Group>()
// First group is the entire match. We don't need it.
.Skip(1)
.Where(g => g.Success)
.ToDictionary(
g => g.Name,
g => string.IsNullOrEmpty(g.Value) ? null : g.Value
)
: new Dictionary<string, string>();
}
}
Deconstructor
s are types that consume the dictionary created from the groups and a list of names specifying the order of the properties (they have to match with the generic types). Then they use the Deconstruct
method to create the final tuple. The first Deconstructor
also provides a method for converting strings to the target type.
public class Deconstructor<T1, T2> : Dictionary<string, string>
{
private readonly IList<string> _itemNames;
public Deconstructor(IDictionary<string, string> data, IList<string> itemNames) : base(data, StringComparer.OrdinalIgnoreCase)
{
// Shift items to the right to use indexes that are compatible with items later.
_itemNames = itemNames.Prepend(null).ToList();
}
public void Deconstruct(out T1 item1, out T2 item2)
{
Convert<T1>(1, out item1);
Convert<T2>(2, out item2);
}
protected void Convert<T>(int itemIndex, out T result)
{
if (this.TryGetValue(_itemNames[itemIndex], out var value))
{
if (value is null)
{
result = default;
}
else
{
var isNullable =
typeof(T).IsGenericType &&
typeof(T).GetGenericTypeDefinition() == typeof(Nullable<>);
var targetType =
isNullable
? typeof(T).GetGenericArguments().Single()
: typeof(T);
result = (T)System.Convert.ChangeType(value, targetType);
}
}
else
{
result = default;
}
}
}
Each other Deconstructor
s is based on the one that is taking one generic parameter less.
public class Deconstructor<T1, T2, T3> : Deconstructor<T1, T2>
{
public Deconstructor(IDictionary<string, string> data, IList<string> names) : base(data, names) { }
public void Deconstruct(out T1 item1, out T2 item2, out T3 item3)
{
base.Deconstruct(out item1, out item2);
Convert<T3>(3, out item3);
}
}
This prototype works quite well but perhaps it can still be made better. What do you think?
Solution
As usual, anything I have to note about your implementations are pretty minor:
-
The
IList<string> itemNames
parameter in theDeconstructor
constructor parameters can beIEnumerable<string> itemNames
as it’s not using anyIList<string>
-specific methods. The.ToList()
in the constructor allows it to be assigned to theprivate
member_itemNames
. -
No need to specify the generic parameters in the calls to
Convert
in theDeconstruct
method as they’ll be handily inferred by the type of the second parameter. -
No need to specify
base.
in the call toDeconstruct
in the three-generic-parameter version of the class as it’s not overriding the base class version. It’s a different signature due to the generics. -
Can simplify the
if..else
block in the baseConvert
method with an OR:if (!this.TryGetValue(_itemNames[itemIndex], out var value) || value is null) { result = default; } else { var isNullable = typeof(T).IsGenericType && typeof(T).GetGenericTypeDefinition() == typeof(Nullable<>); var targetType = isNullable ? typeof(T).GetGenericArguments().Single() : typeof(T); result = (T)System.Convert.ChangeType(value, targetType); }
-
Maybe add a parameter to the
Parse
extension method to optionally compile and cache a generatedRegex
in aDictionary
? May not be a factor, but that’s my usual go-to when aRegex
shows up.
I’ve been experimenting with other designs and after a couple refactorings I completely rewrote the API (and of course incorporated the suggestions too).
This is how it looks now:
var (success, (name, count)) = "John4".Parse<string, int?>(@"(?<T1Name>(?i:[a-z]+))(?<T2Count>d+)?");
I removed the list of names. They are now part of the regex itself. Each group name must start with Tx
that corresponds to the generic T
parameter. Additionally at index 0
there is a flag with the parsing result. This is necessary becuase you cannot create a TryParse
method that could return a named tuple like out var (name, count)
– this does not compile so a workaround I had to add it to the result. I didn’t want to throw exeptions.
The names after each Tx
are optional so this call is also valid:
var (success, (name, count)) = "John4".Parse<string, int?>(@"(?<T1>(?i:[a-z]+))(?<T2>d+)?");
It was also possible to simplify StringExtensions
and turn other methods into extensions too. Only the Parse
method got more complex because now it has to parse group names to extract the ordinal number of each group.
public static class StringExtensions
{
public static Tuple<bool, Tuple<T1, T2>> Parse<T1, T2>(this string input, string pattern, RegexOptions options = RegexOptions.None)
{
return input.Parse(pattern, options).Tupleize<T1, T2>();
}
public static Tuple<bool, Tuple<T1, T2, T3>> Parse<T1, T2, T3>(this string input, string pattern, RegexOptions options = RegexOptions.None)
{
return input.Parse(pattern, options).Tupleize<T1, T2, T3>();
}
public static Tuple<bool, Tuple<T1, T2, T3, T4>> Parse<T1, T2, T3, T4>(this string input, string pattern, RegexOptions options = RegexOptions.None)
{
return input.Parse(pattern, options).Tupleize<T1, T2, T3, T4>();
}
public static Tuple<bool, Tuple<T1, T2, T3, T4, T5>> Parse<T1, T2, T3, T4, T5>(this string input, string pattern, RegexOptions options = RegexOptions.None)
{
return input.Parse(pattern, options).Tupleize<T1, T2, T3, T4, T5>();
}
public static Tuple<bool, Tuple<T1, T2, T3, T4, T5, T6>> Parse<T1, T2, T3, T4, T5, T6>(this string input, string pattern, RegexOptions options = RegexOptions.None)
{
return input.Parse(pattern, options).Tupleize<T1, T2, T3, T4, T5, T6>();
}
public static Tuple<bool, Tuple<T1, T2, T3, T4, T5, T6, T7>> Parse<T1, T2, T3, T4, T5, T6, T7>(this string input, string pattern, RegexOptions options = RegexOptions.None)
{
return input.Parse(pattern, options).Tupleize<T1, T2, T3, T4, T5, T6, T7>();
}
private static IDictionary<int, object> Parse(this string input, string pattern, RegexOptions options)
{
if (string.IsNullOrEmpty(input)) throw new ArgumentException($"{nameof(input)} must not be null or empty.");
if (string.IsNullOrEmpty(pattern)) throw new ArgumentException($"{nameof(pattern)} must not be null or empty.");
var inputMatch = Regex.Match(input, pattern, RegexOptions.ExplicitCapture | options);
var result =
inputMatch.Success
? inputMatch
.Groups
.Cast<Group>()
// First group is the entire match. We don't need it.
.Skip(1)
.Where(g => g.Success)
.Select(g =>
{
var ordinal = Regex.Match(g.Name, @"^(?:T(?<ordinal>d+))").Groups["ordinal"];
return
(
Ordinal:
ordinal.Success
? int.TryParse(ordinal.Value, out var x) && x > 0 ? x : throw new ArgumentException($"Invalid 'Tx'. 'x' must be greater than 0.")
: throw new ArgumentException("Invalid group name. It must start with 'Tx' where 'x' is the ordinal of the T parameter and must be greater than 0."),
Value:
string.IsNullOrEmpty(g.Value)
? null
: g.Value
);
})
.ToDictionary(
g => g.Ordinal,
g => (object)g.Value
)
: new Dictionary<int, object>();
result[0] = inputMatch.Success;
return result;
}
}
I remove the Deconstructor
and replaced it with the Tupleizer
. It now maps dictionaries to tuples and takes care of the conversion of parsed data.
internal static class Tupleizer
{
public static Tuple<bool, Tuple<T1, T2>> Tupleize<T1, T2>(this IDictionary<int, object> data)
{
return
Tuple.Create(
data.GetItemAt<bool>(0),
Tuple.Create(
data.GetItemAt<T1>(1),
data.GetItemAt<T2>(2)
)
);
}
public static Tuple<bool, Tuple<T1, T2, T3>> Tupleize<T1, T2, T3>(this IDictionary<int, object> data)
{
return
Tuple.Create(
data.GetItemAt<bool>(0),
Tuple.Create(
data.GetItemAt<T1>(1),
data.GetItemAt<T2>(2),
data.GetItemAt<T3>(3)
)
);
}
public static Tuple<bool, Tuple<T1, T2, T3, T4>> Tupleize<T1, T2, T3, T4>(this IDictionary<int, object> data)
{
return
Tuple.Create(
data.GetItemAt<bool>(0),
Tuple.Create(
data.GetItemAt<T1>(1),
data.GetItemAt<T2>(2),
data.GetItemAt<T3>(3),
data.GetItemAt<T4>(4)
)
);
}
public static Tuple<bool, Tuple<T1, T2, T3, T4, T5>> Tupleize<T1, T2, T3, T4, T5>(this IDictionary<int, object> data)
{
return
Tuple.Create(
data.GetItemAt<bool>(0),
Tuple.Create(
data.GetItemAt<T1>(1),
data.GetItemAt<T2>(2),
data.GetItemAt<T3>(3),
data.GetItemAt<T4>(4),
data.GetItemAt<T5>(5)
)
);
}
public static Tuple<bool, Tuple<T1, T2, T3, T4, T5, T6>> Tupleize<T1, T2, T3, T4, T5, T6>(this IDictionary<int, object> data)
{
return
Tuple.Create(
data.GetItemAt<bool>(0),
Tuple.Create(
data.GetItemAt<T1>(1),
data.GetItemAt<T2>(2),
data.GetItemAt<T3>(3),
data.GetItemAt<T4>(4),
data.GetItemAt<T5>(5),
data.GetItemAt<T6>(6)
)
);
}
public static Tuple<bool, Tuple<T1, T2, T3, T4, T5, T6, T7>> Tupleize<T1, T2, T3, T4, T5, T6, T7>(this IDictionary<int, object> data)
{
return
Tuple.Create(
data.GetItemAt<bool>(0),
Tuple.Create(
data.GetItemAt<T1>(1),
data.GetItemAt<T2>(2),
data.GetItemAt<T3>(3),
data.GetItemAt<T4>(4),
data.GetItemAt<T5>(5),
data.GetItemAt<T6>(6),
data.GetItemAt<T7>(7)
)
);
}
private static T GetItemAt<T>(this IDictionary<int, object> data, int itemIndex)
{
if (!data.TryGetValue(itemIndex, out var value) || value is null)
{
return default;
}
else
{
var isNullable =
typeof(T).IsGenericType &&
typeof(T).GetGenericTypeDefinition() == typeof(Nullable<>);
var targetType =
isNullable
? typeof(T).GetGenericArguments().Single()
: typeof(T);
return (T)System.Convert.ChangeType(value, targetType);
}
}
}