Problem
I’m learning regular expression as part of my Java course. Now I know sometimes the best use of regex is to not use regex at all. But since I have to use it for this course and learn it, I figure I might as well learn it properly.
I am matching (##/##)
or (##)
where #
is a number 0-9
. All white space is ignored (except for in between the digits ##
). In context, I’m matching a fraction (10/20)
or a percentage (50)
.
For example ( ## / ## )
and ( ## )
are valid. (# #/# #)
and (# #)
are not.
Code / Explanation:
(?:[(])(?:[ ]*)?([d][d])(?:[ ]*)?(?:[/])?(?:[ ]*)?([d][d])?(?:[ ]*)?(?:[)])
(?:[(]) Beginning parathensis (
(?:[ ]*)? All white space
([d][d]) Two digits for the first number ##
(?:[ ]*)? All white space
(?:[/])? Forward slash /
(?:[ ]*)? All white space
([d][d])? Two digits for the second number ##
(?:[ ]*)? All white space
(?:[)]) Closing parathensis )
Example usage in Java:
String regex = "(?:[(])(?:[ ]*)?([\d][\d])(?:[ ]*)?(?:[\/])?(?:[ ]*)?([\d][\d])?(?:[ ]*)?(?:[)])";
String test = "(20/50)";
if (test.matches(regex)) { // true
System.out.println("Valid.");
else {
System.out.println("Invalid.");
}
Everything is setup using non-capture groups, except for the two digits. This is so I can reference the capture groups in my code (and simply check if group 2 is null before trying to use it, indicating (##)
not (##/##)
).
This is basically my first time writing regex from complete scratch. Some questions are:
- Should I be wrapping everything in
[ ]
even when they can be left out? i.e.(?: *)
instead of(?:[ ]*)
. - Is my use of non-capture groups the right way to do things (feels verbose to me)?
Generally, what can I improve on?
Solution
Your regex accepts (20/)
as valid input, and I suspect that you didn’t intend to consider it valid.
Your regex rejects one-digit numbers. If that is intentional, you might want to write a comment about it in the code.
As you suspected, this is a very “noisy” regex — nearly unreadable. This expression would do the job:
String regex = "\( *(\d{2}) *(?:/ *(\d{2}) *)?\)";
There is no need to wrap everything in [ ]
. In this particular problem, there aren’t even any character classes to speak of. The only place where the square brackets might be justified is as a hack to formulate the expression with fewer backslashes:
String regex = "[(] *(\d{2}) *(?:/ *(\d{2}) *)?[)]";
In (?:[ ]*)?
, the square brackets are pointless. The ?
is redundant with *
. So, you can just write a space followed by *
.
In ([d][d])
, the square brackets are pointless, and I would slightly prefer seeing (d{2})
to (dd)
, since it can more easily be modified to accommodate different numbers of digits.
You might want to consider naming the capture groups for clarity:
String regex = "\( *(?<numerator>\d{2}) *(?:/ *(?<denominator>\d{2}) *)?\)";