A platform-independent implementation of “which” in Ruby

Posted on

Problem

I like to know if there’s anything that I may have missed in this implementation. The method tries to provide the absolute path of the binary or script which is likely to execute based on the provided path. I carefully based it from how cmd.exe chooses the file to execute. It intends to be platform-independent but so far it’s only tested with an NT-based Windows and Linux. I don’t plan to support legacy Windows or DOS that may not have PATHEXT just in case Ruby supports them.

You may also wonder why some declarations happen within a condition line but I find it hard seeing anything in the code like test flags that run even when they are not necessary so please just avoid talking about it or anything that’s only about coding style or philosophy. Some may argue that it’s not Ruby-like but it’s simply functionally wrong or programmatically wrong to me. I also avoid significant redundant queries. Despite this I still balance efficiency and readability. I’m also strict to uniformity in coding style. Ruby may be an expressive language as some may say but I believe it’s not restricted from showing correct logic structure and choosing not to compromise it for the sake of readability is an option.

module Bin
  def self.which(cmd)
    exts = (pathext = ENV['PATHEXT']) ? 
        pathext.split(';').select{ |e| e[0] == '.' } : []
    if (cmd[File::SEPARATOR] or (File::ALT_SEPARATOR and cmd[File::ALT_SEPARATOR])) 
    or (paths = ENV['PATH']).nil? 
    or (paths = paths.split(File::PATH_SEPARATOR).select{ |e| File.directory?(e) }).empty?
      if not exts.empty?
        return File.absolute_path(cmd) 
            if not (ext = File.extname(cmd)).empty? 
            and exts.any?{ |e| e.casecmp(ext) } 
            and File.file? cmd and File.executable? cmd
        exts.each do |ext|
          exe = "#{cmd}#{ext}"
          return File.absolute_path(exe) if File.file? exe and File.executable? exe
        end
      elsif File.file? cmd and File.executable? cmd
        return File.absolute_path(cmd)
      end
    elsif not exts.empty?
      has_valid_ext = (not (ext = File.extname(cmd)).empty? and exts.any?{ |e| e.casecmp(ext) })
      paths.unshift('.').each do |path|
        if has_valid_ext 
        and File.file? (exe = File.join(path, "#{cmd}")) 
        and File.executable? exe
          return File.absolute_path(exe)
        end
        exts.each do |ext|
          exe = File.join(path, "#{cmd}#{ext}")
          return File.absolute_path(exe) if File.file? exe and File.executable? exe
        end
      end
    else
      paths.each do |path|
        exe = File.join(path, cmd)
        return File.absolute_path(exe) if File.file? exe and File.executable? exe
      end
    end
    nil
  end
end

Solution

I see at least two issues, aside from the atrocious code style:

  • String#casecmp returns an integer, much like strcasecmp does in C. But in Ruby, everything other than false and nil is truthy. So if a PATHEXT environment variable is present and has at least one extension, and cmd has an extension, has_valid_ext will always be true — regardless of whether cmd‘s extension is valid. The only thing keeping it from affecting results is the low chance of, say, whatever.zip being both executable and in the search path.

  • Windows and Linux differ in how they handle the current directory when searching. Windows checks it first before resorting to a path search, and Linux won’t check it at all unless the path either includes it, or is empty. If $PATH were set to blah, for example, Linux wouldn’t even bother looking at ..

    Since you eliminate invalid directories before checking for emptiness, though, you don’t catch the case where the path contains only inaccessible locations — so you end up checking the CWD, making the result in Linux potentially different from what which would report. Not sure how significant it is, since it’s rather uncommon to have such a broken PATH variable…but it doesn’t help.

Aside from that, there’s a difference between “multi-platform” and “platform-independent”. This code can pretty much be considered the former (once it’s fixed), but by definition is not the latter. Anything that depends solely on system environment variables, has made assumptions about the platform that might not hold true.

For example, you assume that if PATHEXT exists, that (1) it has a certain significance, and (2) it is case insensitive. You assume that iff there is a list of directories to search for files in, it will always be called PATH. Without knowing what OS you’re running in, neither assumption can be supported; at best, they’re educated guesses.

There’s a lot going on here, so I’m going to put all my feedback in a list as I go through your code and put a new version of it with my suggested changes at the end.

Keep in mind that I’m not testing these changes and that the code I come up with may be broken or incorrect. It’s simply the result of me going through this and changing things to bring them up to a level of quality I’d expect within a codebase I was responsible for.

Feedback

  • I used the extend self shorthand/pattern to declare all methods within
    the bin module as class methods. I generally don’t use this too much unless
    I’m working on singleton classes/modules like the one you’re provided us.
  • There are a lot of logical checks here that could easily be broken out into
    private methods on the Bin module. doing so lets you break up many of the
    if/else checks into their own contexts.
  • I had trouble reading variables like ‘pathext’ thinking it was ‘path text’,
    so I renamed such variables to include underscores to help me avoid skimming
    through words thinking they meant or referred to things they did not.
  • I like to avoid implicitly referencing global or environment variables
    within the body of a method. The main motivation for this is that by
    allowing arbitrary strings within the input for methods like
    Bin.extensions, I’m now able to more easily test the method by providing
    it input without needing to resort to polluting global state within
    my unit tests.
  • Avoid the keywords ‘and’ and ‘or’ unless you are VERY sure you need them.
    The operator precedence for these keywords is LOWER than their short-hand
    equivalents ‘&&’ and ‘||’. Using these without understanding this little
    fact can lead to surprising behavior.
  • I saw a few places where you were using .select { ... }.empty? to check
    if any or all items in the select block passed a test. The Enumerable
    module within the ruby standard library has you covered here:
    use .any?, all?, or .none? to do similar tests if all you need is a
    boolean result.
  • Following up on my last point, you can call methods like .any? on
    without a block and they will implicitly test the truthyness of a value;
    ie: [1,2].any? is equivalent to [1,2].any? { |num| num ? true : false }
  • Avoid constructs like if not; ruby provided the unless keyword
    for negative assertions. Use it.
  • Document your methods thoroughly, comment on complex bits of code,
    and explain any design decisions that are significant to your code.
    I don’t trust myself to remember my thought process for code
    I wrote an hour ago, let alone that of someone who wrote something
    long, long ago in a place far, far away.

    I’m personally a fan of YARD for documentation of ruby code, but the tools
    you use for this (if any) are unimportant so long as the poor sod who comes
    after you has some help figuring out what’s going on. You’ll feel even
    better about making such efforts when that poor sod will probably be you.

Revised Code

# `Bin.which` is (hopefully) a operating system agnostic implementation of
# Linux's `which` command.
module Bin
  extend self

  private

  # All paths availble in `ENV['PATH']` as an array
  # @return [Array<String>] an array of paths
  def paths
    ENV['PATH'] ? ENV['PATH'].split(File::PATH_SEPARATOR) : []
  end

  # Returns all files within a string that contain a '.'
  #
  # @param path_ext [String] a semicolon delimited string of valid filename
  #                          extensions for executables
  # @return [Array<String>] an array containing valid executable file extensions
  # @example
  #   extensions('file1.txt; fizzbuzz ; file3.jpg')
  #   # => ['file1.txt', 'file3.jpg']
  def extensions(path_ext = ENV['PATHEXT'])
    return [] unless path_ext # no-op if PATHEXT is not set/present
    path_ext.split(';').select { |segment| segment[0] == '.' }
  end

  # Determine if the provided string appears to be a filename or a directory
  #
  # @param string [String] a file path or filename
  # @return [Boolean]
  def directory?(path)
    if path[File::SEPARATOR] || File::ALT_SEPARATOR && path[File::ALT_SEPARATOR]
      return true

    else
      paths.any? { |p| File.directory?(p) }
    end
  end

  # Determine if the provided string is an known file extension for executable
  # files on this system.
  #
  # @param ext [String] a file extension
  # @return [Boolean]
  def executable_file_extension?(ext)
    return false if ext.empty?
    extensions.any? { |extension| extension.casecmp(ext) }
  end

  # Given a path to a file, determine if the provided path leads to a
  # file with a known executable file extension
  #
  # @param path [String] the path to a file
  def path_with_executable_file_extension?(path)
    ext = File.extname(path)

    return false if ext.empty?
    executable_file_extension?(ext)
  end

  # Given a path, determine if the path is that of an executable file.
  #
  # If `dir` is provided, assume that the provided path is relative to `dir`
  # and attempt to find an executable file there.
  #
  # @param filename [String] the path to a file
  # @param dir [String] a directory within which to find `filename`
  # @return [Boolean]
  #
  # @example Basic usage
  #   executable_file?('/bin/sh') #=> true
  #
  # @example Finding an executable within a directory
  #   executable_file?('sh', '/usr/local') #=> false
  #   executable_file?('sh', '/bin') #=> true
  def executable_file?(filename, dir = nil)
    path = File.join(dir, filename) if dir
    path ||= filename

    File.file?(path) && File.executable?(path)
  end

  public

  # Platform independent implementation of the linux `which` command
  # @param cmd [String] The name of a command to find
  # @return [String, Nil] Returns the aboslute path to the desired command
  #                       or `nil` if the command could not be found
  # @example
  #   which('vim') # => '/usr/local/bin/vim'
  #   which('/usr/local/bin/vim') # => '/usr/local/bin/vim'
  #   which('foobar') # => nil
  def which(cmd)
    # Check to see if there is an executable file that shares the name
    # of a directory at the same path (ie: `/foo/bar/` and `/foo/bar.exe`)
    if directory?(cmd)
      if extensions.any?
        if path_with_executable_file_extension?(cmd) && executable_file?(cmd)
          return File.absolute_path(cmd)
        end

        extensions.each do |extension|
          exe = format('%s%s', cmd, extension)
          return File.absolute_path(exe) if executable_file?(cmd)
        end

      elsif executable_file?(cmd)
        return File.absolute_path(cmd)
      end

    # Search for the provided command within all
    # of the designated PATH locations
    elsif extensions.any?
      paths.delete_if { |path| path == '.' }.each do |path|
        if path_with_executable_file_extension?(cmd) && executable_file?(cmd, path)
          return File.absolute_path File.join(path, cmd.to_s)
        end

        extensions.each do |extension|
          exe = File.join(path, format('%s%s', cmd, extension))
          return File.absolute_path(exe) if executable_file?(exe)
        end
      end

    else
      paths.each do |path|
        exe = File.join(path, cmd)
        return File.absolute_path(exe) if executable_file?(exe)
      end
    end

    nil
  end
end

Leave a Reply

Your email address will not be published. Required fields are marked *