Problem
So I’ve been trying to check through a string, looping for, [
(start) and ]
(end).
In this, I also have to check for nested brackets.
My expected output, given input [depth1[depth2]]
is as follows:
depth2
depth1[depth2]
To achieve this, I wrote this little snippet of code, which uses a Dictionary<int, int>
to store the initial bracket’s position at a given depth. If it finds another opening bracket, the depth will increase (and the position will be recorded). It will look for the next ending bracket (with respect to any other opening brackets), and get the initial position of the bracket for that depth. (Thus we get both the beginning and ending of that nested string)
var depthItems = new Dictionary<int, int>();
int depth = 0;
string input = "[depth1[depth2]]";
for (int i = 0; i < input.Length; i++) {
if (input[i] == '[') {
depth++;
depthItems.Add(depth, i);
} else if (input[i] == ']') {
int item = depthItems[depth];
depthItems.Remove(depth);
depth--;
Console.Write($"Found item at ({item}, {i}) -> ");
Console.WriteLine(input.Substring(item + 1, i - item - 1));
}
}
I’m not sure how viable this would be when looking up against large strings with lots of nested brackets – I think it’s a start, at least.
I will note that I was first surprised that higher “depths” were returned first, but then realised that in order for the lower ones to complete, the higher ones should have been returned first. Not a bad thing, but curious to note.
So how could this be improved, especially in terms of memory (i.e. large strings)?
Solution
I’m not sure how viable this would be when looking up against large strings with lots of nested brackets […]
“how viable” is a very vague question.
Two common performance metrics are CPU load and memory load,
perhaps that’s what you were trying to get at.
It’s good to clarify what exactly you want to know.
In terms of CPU, how will this program behave as the input grows?
- The algorithm does a single pass over all characters of the input.
When the input doubles, the number of operations doubles, roughly.- => In other words, it’s linear.
- Can it be any better than linear?
- => No, because we cannot know the positions of the brackets without looking through each character once.
In terms of memory, how will this program behave as the input grows?
- Aside from the memory used by the input string,
what consumes memory in this program?- => The dictionary storing the starting positions of brackets.
- How do brackets on the same nesting level impact the memory consumption?
That is, in a string like[][][][]
?- => Since the storage for a
[
is removed when the corresponding]
is found, brackets at the same nesting level use a single entry in the dictionary.
- => Since the storage for a
- How do deeply nested brackets impact the memory consumption?
- => The dictionary has as many entries as the deepest nesting level. As closing brackets are found, entries get deleted. The maximum memory consumed by the dictionary throughout the program is proportional to the maximum depth of nesting encountered.
So how could this be improved, especially in terms of memory (i.e. large strings)?
The current performance of the program is on the expected order of complexity, so it’s fine.
However, a dictionary is not the most natural choice for this purpose,
and its actually overkill.
A simpler data structure, a stack, would have been enough to solve this.
Instead of storing (depth, index)
pairs in a dictionary,
you could store just index
in a stack:
- When you see a
[
, push theindex
onto the stack. - When you see a
]
, pop the last pushedindex
from the stack.
The result will be equivalent, the depth
variable becomes unnecessary.
When you have a starting bracket without a closing bracket, your program ignores the problem entirely.
Now ask yourself this, what happens when you get a closing bracket without an opening bracket?
The answer is a negative depth and an exception from the dictionary for trying to remove something that doesn’t exist… oops.
The easiest way to solve that is to check that the depth is greater than 0 and handle the error before trying to index or remove an item or decrease the depth.
In terms of improvements, I’d suggest turning your algorithm into a proper function and not mixing console code with it. Doing so would make this behaviour reusable instead of having the algorithm tightly coupled to the output method.
Additionally, it would be best to provide a list of pairs of positions (a start position and an end position), and as an added bonus you could then turn this into a generator function (i.e. it returns IEnumerable<BracketPair>
).
I apologise in advance for using some ‘outdated’ methods like string.Format
instead of $""
, I haven’t caught up with the techniques introduced after C# 4.5.
// Using a struct because this is a small and cheap immutable object
public struct BracketPair
{
private int startIndex;
private int endIndex;
private int depth;
public BracketPair(int startIndex, int endIndex, int depth)
{
if (startIndex > endIndex)
throw new ArgumentException("startIndex must be less than endIndex");
this.startIndex = startIndex;
this.endIndex = endIndex;
this.depth = depth;
}
public int StartIndex
{
get { return this.startIndex; }
}
public int EndIndex
{
get { return this.endIndex; }
}
public int Depth
{
get { return this.depth; }
}
}
public static class TextHelper
{
public static IEnumerable<BracketPair> ParseBracketPairs(string text)
{
var startPositions = new Stack<int>();
for (int index = 0; index < text.Length; index++)
if (text[index] == '[')
{
startPositions.Push(index);
}
else if (text[index] == ']')
{
if (startPositions.Count == 0)
throw new ArgumentException(string.Format("mismatched end bracket at index {0}", index));
var depth = startPositions.Count - 1;
var start = startPositions.Pop();
yield return new BracketPair(start, index, depth);
}
if (startPositions.Count > 0)
throw new ArgumentException(string.Format("mismatched start brackets, {0} total", startPositions.Count));
}
// You can even go one step further and handle TextReaders
// Remember you need using System.IO
public static IEnumerable<BracketPair> ParseBracketPairs(TextReader reader)
{
var startPositions = new Stack<int>();
for (int index = 0; reader.Peek() != -1; ++index)
{
// Detect overflow
if (index < 0)
throw new ArgumentException(string.Format("input text too long, must be shorter than {0} characters", int.MaxValue));
var c = (char)reader.Read();
if (c == '[')
{
startPositions.Push(index);
}
else if (c == ']')
{
// Error on mismatch
if (startPositions.Count == 0)
throw new ArgumentException(string.Format("mismatched end bracket at index {0}", index));
// Depth tends to be zero-based
var depth = startPositions.Count - 1;
var start = startPositions.Pop();
yield return new BracketPair(start, index, depth);
}
}
// Error on mismatch
if (startPositions.Count > 0)
throw new ArgumentException(string.Format("mismatched start brackets, {0} total", startPositions.Count));
}
}
After which using the following:
static void Main(string[] args)
{
for (string input = Console.ReadLine(); !string.IsNullOrWhiteSpace(input); input = Console.ReadLine())
{
foreach (var pairs in TextHelper.ParseBracketPairs(input))
Console.WriteLine("Start: {0}, End: {1}, Depth: {2}", pairs.StartIndex, pairs.EndIndex, pairs.Depth);
}
}
For the input [a][b[c[d]e]][f]
you get:
Start: 0, End: 2, Depth: 0
Start: 7, End: 9, Depth: 2
Start: 5, End: 11, Depth: 1
Start: 3, End: 12, Depth: 0
Start: 13, End: 15, Depth: 0
By providing an IEnumerable<BracketPair>
you get access to all the useful extension methods from LINQ which will allow you to process your bracket pairs however you see fit.