Name/word generator using DTMC in Ruby

Posted on


I just wrote a basic DTMC algorithm focused on generating names, though it could be used to generate lots of other things. It’s designed to be run from the command line, so input is taken from a file, with a couple of options specified on the command line.

I’m specifically looking for these things, though any other advice is also of course welcome:

  • Ways to make it faster or more memory-efficient
  • Ways to make it more idiomatic for Ruby

I’m perfectly fine with changing the input format, though I’d prefer if it stayed the same. I’m also fine with making the code a bit more unreadable if it helps make it more efficient or shorter, as long as it’s still somewhat understandable and easy to maintain.

This is my full code (version with -h help available here):


# End with the standard error message format -- makes it easy to stay consistent
def error(code, message)
  puts "Error #{code}: #{message}"
  puts 'Run this script with -e to see a list of error codes.'
  abort "name_gen.rb: Error #{code}: #{message}"

# Keys are candidates, values are probabilities
def weighted_random_choice(picking_from)
  current = 0
  max = picking_from.values.inject :+
  r_val = rand max
  picking_from.each { |candidate, probability|
    current += probability
    return candidate if r_val < current
  raise "r_val>max? #{r_val>max}. Error while picking a weighted random value from #{picking_from}"

#Parsing the commandline arguments and suchlike
syllable_separator = (/-d./ === ARGV[-1]) ? ARGV.pop[2..-1] : ''
name_count = (/d+/ === ARGV[-1]) ? ARGV.pop.to_i : DEFAULT_NAME_COUNT
file = ARGV.join ' '

#Parsing the file
syllables = false
start = 0
  IO.foreach(file) { |line|
    name, start_prob, end_prob, links = line.split LINE_PART_DELIMITER
    error 2, "`#{line}`" if links.nil? || end_prob.nil? || start_prob.nil? || name.nil?
    links = links.split(LINK_DELIMITER).inject( 0) { |memo, current_pair|
      syl, prob = current_pair.split LINK_HALF_MARK
      (prob = Integer prob) rescue error 3, "`#{name}`: `#{syl}`, `#{prob}`"
      memo[syl.to_sym] += prob
    links[false] = end_prob.to_i
    syllables[name.to_sym] = links
    start[name.to_sym] += start_prob.to_i
rescue Exception => message
  puts message
  error 1, "`#{file}`"

#Validate that all syllables referenced actually exist!
syllables.each { |syllable, links|
  links.each { |link, _|
    error(4, "`#{link}` in `#{syllable}`") if !!link && syllables[link.to_sym].nil?

#Generating and printing the names
name_count.times {
  current_syllable = weighted_random_choice start
  name_so_far = [current_syllable.to_s]
  while (current_syllable = weighted_random_choice syllables[current_syllable.to_sym])
    name_so_far << current_syllable.to_s
  puts name_so_far.join syllable_separator

And this is an example ‘dictionary’ file:


And here’s a sample of ten names it can generate (with the above dictionary file):



I don’t like how you’ve mixed the concerns of calculating with output. I know that right now you only want to output to a file, but what if you decide later that you want to work with this data in some other program? Writing to the file system is expensive and slow. Why write to a file and then read it back in.

I would modify this to be in two parts. One to generate the names from the dictionary and one that uses that class to output to a file. This leaves things open to writing the output to Standard IO, some other UI, or for another program to simply work with the data. The idea is that each class should do one thing and do it well.

On this note, each of your comments indicates a missed opportunity to extract a well named method that does one and only one thing.

#Parsing the commandline arguments and suchlike
syllable_separator = (/-d./ === ARGV[-1]) ? ARGV.pop[2..-1] : ''
name_count = (/d+/ === ARGV[-1]) ? ARGV.pop.to_i : DEFAULT_NAME_COUNT
file = ARGV.join ' '

This is just begging to be a method called parse_cmd_args. It should return some object that represents those three values in some sensible and well named way.

I apologize that I’ve left s critique without any code examples. Normally I would provide some, but it’s been a while since I’ve written any Ruby. It’s better in this case that I leave you to attempt a clean up yourself.

Try to:

  • Write methods that do one and only one thing.
  • Create useful abstractions by the way of classes, even if they’re just simple data structures to hold information.
  • Add some vertical whitespace around logic that’s too trivial or too intertwined to be its own method.

Leave a Reply

Your email address will not be published. Required fields are marked *