Problem
I’m writing a Ruby script to reverse engineering message log files. They come from an external system with the following characteristics:
- Each line of the log file has at least one message.
- Each line of the log file can have multiple messages.
- Each message consists of a set of numbers separated by spaces (e.g.
30 0 -1 1 2 1
). - Each message can have one of many different templates (e.g. some contain five numbers, others contains six).
The approach I’m using is to process each line, one at a time, via a method that takes a string to work on as an argument. It saves a copy of the initial input (for later comparison) then tries to match known patterns. When a pattern is matched, the string that made it up is removed. If there is nothing left, or if no more matches are found, the method exits. Otherwise, it calls itself with the remainder of the string to process. Here’s the code I camp up with along with an example.
#!/usr/bin/evn ruby
def parse_line remainder_of_line
puts "Processing: #{remainder_of_line}"
# Save a copy of the initial input for later comparison
initial_snapshot = remainder_of_line.dup
# Look for known pattern matches, removing them if found
if remainder_of_line.gsub!(/^(d+) 0 -1 1 (d+) d+s*/, '')
puts " - Matched format 1 - found: #{$1} - #{$2}nn"
elsif remainder_of_line.gsub!(/^d+ 0 -1 2 (d+) d+s*/, '')
puts " - Matched format 2 - found: #{$1}nn"
### More patterns here.
end
# If noting changed, then no matches were found.
if initial_snapshot.eql? remainder_of_line
puts " - Line still has data but no matches found. (Left with: #{remainder_of_line}nn"
# Keep going if there is anything left.
elsif !remainder_of_line.empty?
parse_line remainder_of_line
end
end
line = "11 0 -1 2 13560 2 11 0 -1 2 13564 2 11 0 -1 1 36880 106 91 0 -1 1 36881 106 36881 106 91 1 13556 2 36880 106 36880 106 11 1 734 11 0 -1 1 36884 106 91 0 -1 1 36885 106 36885 106 91 1 13556 2 36884 106 36884 106 11 1 735 13556 2 31 18 799 13556 2 31 25 799 "
parse_line line
This works but I’m wondering if there is a better way.
Solution
-
Because you’re using the “bang” version of gsub,
parse_line
modifies the string you pass to it, which is generally a not a good idea. I wouldn’t expect a parsing method to “eat” my input. -
Since there’s only one line and your regexes are anchored to the start of it, there’s little point in using
gsub
(i.e. global substitution), since you’ll only ever match 1 occurrence of the pattern. -
Don’t bother with all the newline literals.
puts
will automatically add one, and if you want an extra one, you should be able to just sayputs
with no argument in a strategic location (i.e. after having tried all the patterns).
This seems like a good fit for Ruby’s case
statement (aka switch
) since you can match against regexes directly. And Ruby also sets other magic variables besides $1
and $2
whenever you match a regex. There’s no reason to make the method recursive, though. A simple loop would do nicely too.
For instance:
def parse_line(line)
puts "Processing: #{line}"
# Loop until the string's empty (or we hit the return below)
until line.empty?
# Try matching the line
case line
when /^(d+) 0 -1 1 (d+) d+s*/
puts " - Matched format 1 - found: #{$1} - #{$2}"
when /^d+ 0 -1 2 (d+) d+s*/
puts " - Matched format 2 - found: #{$1}"
# more patterns...
else # no match
puts " - Line still has data but no matches found. (Left with: #{line})"
return # stop here
end
line = $' # set line to the *unmatched* part, i.e. the remainder
puts "" # output an extra blank line
end
puts "Entire line matched, yay"
end