Fraction or percentage regular expression

Posted on

Problem

I’m learning regular expression as part of my Java course. Now I know sometimes the best use of regex is to not use regex at all. But since I have to use it for this course and learn it, I figure I might as well learn it properly.

I am matching (##/##) or (##) where # is a number 0-9. All white space is ignored (except for in between the digits ##). In context, I’m matching a fraction (10/20) or a percentage (50).

For example ( ## / ## ) and ( ## ) are valid. (# #/# #) and (# #) are not.

Code / Explanation:

(?:[(])(?:[ ]*)?([d][d])(?:[ ]*)?(?:[/])?(?:[ ]*)?([d][d])?(?:[ ]*)?(?:[)])

(?:[(])          Beginning parathensis (
(?:[ ]*)?        All white space
([d][d])       Two digits for the first number ##
(?:[ ]*)?        All white space
(?:[/])?        Forward slash /
(?:[ ]*)?        All white space
([d][d])?      Two digits for the second number ##
(?:[ ]*)?        All white space
(?:[)])          Closing parathensis )

Try it online

Example usage in Java:

String regex = "(?:[(])(?:[ ]*)?([\d][\d])(?:[ ]*)?(?:[\/])?(?:[ ]*)?([\d][\d])?(?:[ ]*)?(?:[)])";
String test = "(20/50)";

if (test.matches(regex)) { // true
    System.out.println("Valid.");
else {
    System.out.println("Invalid.");
}

Everything is setup using non-capture groups, except for the two digits. This is so I can reference the capture groups in my code (and simply check if group 2 is null before trying to use it, indicating (##) not (##/##)).

This is basically my first time writing regex from complete scratch. Some questions are:

  • Should I be wrapping everything in [ ] even when they can be left out? i.e. (?: *) instead of (?:[ ]*).
  • Is my use of non-capture groups the right way to do things (feels verbose to me)?

Generally, what can I improve on?

Solution

Your regex accepts (20/) as valid input, and I suspect that you didn’t intend to consider it valid.

Your regex rejects one-digit numbers. If that is intentional, you might want to write a comment about it in the code.


As you suspected, this is a very “noisy” regex — nearly unreadable. This expression would do the job:

String regex = "\( *(\d{2}) *(?:/ *(\d{2}) *)?\)";

There is no need to wrap everything in [ ]. In this particular problem, there aren’t even any character classes to speak of. The only place where the square brackets might be justified is as a hack to formulate the expression with fewer backslashes:

String regex = "[(] *(\d{2}) *(?:/ *(\d{2}) *)?[)]";

In (?:[ ]*)?, the square brackets are pointless. The ? is redundant with *. So, you can just write a space followed by *.

In ([d][d]), the square brackets are pointless, and I would slightly prefer seeing (d{2}) to (dd), since it can more easily be modified to accommodate different numbers of digits.

You might want to consider naming the capture groups for clarity:

String regex = "\( *(?<numerator>\d{2}) *(?:/ *(?<denominator>\d{2}) *)?\)";

Leave a Reply

Your email address will not be published. Required fields are marked *