# Fraction or percentage regular expression

Posted on

Problem

I’m learning regular expression as part of my Java course. Now I know sometimes the best use of regex is to not use regex at all. But since I have to use it for this course and learn it, I figure I might as well learn it properly.

I am matching `(##/##)` or `(##)` where `#` is a number `0-9`. All white space is ignored (except for in between the digits `##`). In context, I’m matching a fraction `(10/20)` or a percentage `(50)`.

For example `( ## / ## )` and `( ## )` are valid. `(# #/# #)` and `(# #)` are not.

Code / Explanation:

``````(?:[(])(?:[ ]*)?([d][d])(?:[ ]*)?(?:[/])?(?:[ ]*)?([d][d])?(?:[ ]*)?(?:[)])
``````

``````(?:[(])          Beginning parathensis (
(?:[ ]*)?        All white space
([d][d])       Two digits for the first number ##
(?:[ ]*)?        All white space
(?:[/])?        Forward slash /
(?:[ ]*)?        All white space
([d][d])?      Two digits for the second number ##
(?:[ ]*)?        All white space
(?:[)])          Closing parathensis )
``````

Try it online

Example usage in Java:

``````String regex = "(?:[(])(?:[ ]*)?([\d][\d])(?:[ ]*)?(?:[\/])?(?:[ ]*)?([\d][\d])?(?:[ ]*)?(?:[)])";
String test = "(20/50)";

if (test.matches(regex)) { // true
System.out.println("Valid.");
else {
System.out.println("Invalid.");
}
``````

Everything is setup using non-capture groups, except for the two digits. This is so I can reference the capture groups in my code (and simply check if group 2 is null before trying to use it, indicating `(##)` not `(##/##)`).

This is basically my first time writing regex from complete scratch. Some questions are:

• Should I be wrapping everything in `[ ]` even when they can be left out? i.e. `(?: *)` instead of `(?:[ ]*)`.
• Is my use of non-capture groups the right way to do things (feels verbose to me)?

Generally, what can I improve on?

Solution

Your regex accepts `(20/)` as valid input, and I suspect that you didn’t intend to consider it valid.

Your regex rejects one-digit numbers. If that is intentional, you might want to write a comment about it in the code.

As you suspected, this is a very “noisy” regex — nearly unreadable. This expression would do the job:

``````String regex = "\( *(\d{2}) *(?:/ *(\d{2}) *)?\)";
``````

There is no need to wrap everything in `[ ]`. In this particular problem, there aren’t even any character classes to speak of. The only place where the square brackets might be justified is as a hack to formulate the expression with fewer backslashes:

``````String regex = "[(] *(\d{2}) *(?:/ *(\d{2}) *)?[)]";
``````

In `(?:[ ]*)?`, the square brackets are pointless. The `?` is redundant with `*`. So, you can just write a space followed by `*`.

In `([d][d])`, the square brackets are pointless, and I would slightly prefer seeing `(d{2})` to `(dd)`, since it can more easily be modified to accommodate different numbers of digits.

You might want to consider naming the capture groups for clarity:

``````String regex = "\( *(?<numerator>\d{2}) *(?:/ *(?<denominator>\d{2}) *)?\)";
``````