Problem
kexec is a way for a Linux kernel to directly boot another Linux kernel without going through the usual BIOS startup sequence, which can take several minutes on enterprise servers. The big problem with kexec is that Linux distributions don’t set it up for you when you install a kernel, and setting it up yourself is manual and error-prone, so not many people actually use it when rebooting their servers.
I wanted to learn some Ruby, (I’ve worked through this Rails tutorial but I haven’t previously written anything substantial in Ruby) and so I wrote a Ruby script to help automate kexec somewhat. It can simply stage the latest installed kernel for kexec, or it has an interactive mode where you can choose a kernel from a list. It does both of these by searching for the GRUB configuration file, parsing it to get the kernel, initrd and kernel command line, and then calling kexec with these arguments.
My big concern here is with the obvious duplicate code in process_grub_config
and the functions it calls, load_kernels_grub
and load_kernels_grub2
. I know these bits need to be refactored, but I’m not familiar enough with the language to know the best way to go about it. In particular, it’s necessary to parse GRUB 1 and GRUB 2 style configuration files differently, and these files can be in different locations depending on Linux distribution.
I also literally wrote this yesterday evening, and I might not have had enough coffee while writing it, so I’m open to suggestions on any other part of the code that might need improvement.
(Note: Because this code is meant to be part of the process of rebooting, I suggest testing in a virtual machine. Run the script with no arguments for a usage statement. I’ve personally tested it on EL6, EL7, Ubuntu 10.04, 12.04, and Debian wheezy and it should work properly on any Linux distirbution that uses GRUB 1 or GRUB 2.)
#!/usr/bin/env ruby
# kexec-reboot - Easily choose a kernel to kexec
require 'optparse'
# Find a mount point given the device special
def device_to_mount_point(device)
if File.ftype(device) != "blockSpecial" then
STDERR.puts("Device #{device} isn't a block devicen")
return nil
end
mount_point = nil
mounts = open("/proc/mounts").each_line do |mount|
line = mount.split
if line[0] == device then
mount_point = line[1]
break
end
end
mount_point = "" if mount_point == "/" # Eliminate double /
if mount_point.nil? then
STDERR.puts "Can't find the mount point for device #{device}n"
return nil
end
mount_point
end
# Find a mount point given the GRUB device and device map
def device_map_to_mount_point(device, device_map)
dev = device.match(/(hdd+)/)
part = device.match(/hdd+,(d+)/)
mount_point = device_map.match(/(#{dev[1]})s+(.+)$/)
mount_point_part = 1 + Integer(part[1]) if !part.nil?
device_path = "#{mount_point[1]}#{mount_point_part}"
if !File.exists?(device_path) then
STDERR.puts("Can't find the device #{device_path} from #{device}n")
return nil
end
device_to_mount_point("#{mount_point[1]}#{mount_point_part}")
end
# Find a mount point given the device UUID
def uuid_to_mount_point(uuid)
begin
device = File.realpath("/dev/disk/by-uuid/#{uuid}")
rescue Errno::ENOENT
STDERR.puts "No such file or directory, uuid #{uuid}n"
return nil
end
device_to_mount_point(device)
end
# Load the available kernels from the given GRUB 1 configuration file
def load_kernels_grub(config)
device_map = open("/boot/grub/device.map").read
entries = Array.new
config.scan(/title (.+?$).+?root (([^)]+)).+?kernel ([^ ]+) (.+?)$.+?initrd (.+?$)/m).each do |entry|
mount_point = device_map_to_mount_point(entry[1], device_map)
name = entry[0].strip
kernel = "#{mount_point}#{entry[2]}"
initrd = "#{mount_point}#{entry[4]}"
cmdline = entry[3].strip
# Sanity check the kernel and initrd; they must be present
if !File.readable?(kernel) then
STDERR.puts "Kernel #{kernel} is not readablen"
next
end
if !File.readable?(initrd) then
STDERR.puts "Initrd #{initrd} is not readablen"
next
end
entries.push({
"name" => name,
"kernel" => kernel,
"initrd" => initrd,
"cmdline" => cmdline,
})
end
entries
end
# Load the available kernels from the given GRUB 2 configuration file
def load_kernels_grub2(config)
entries = Array.new
config.scan(/menuentry '([^']+)'.+?{.+?search.+?([0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}).+?linux(16)?s+([^ ]+) (.+?)$.+?initrd(16)?s+(.+?)$.+?}/m).each do |entry|
mount_point = uuid_to_mount_point(entry[1])
name = entry[0].strip
kernel = "#{mount_point}#{entry[3]}"
initrd = "#{mount_point}#{entry[6]}"
cmdline = entry[4].strip
# Sanity check the kernel and initrd; they must be present
if !File.readable?(kernel) then
STDERR.puts "Kernel #{kernel} is not readablen"
next
end
if !File.readable?(initrd) then
STDERR.puts "Initrd #{initrd} is not readablen"
next
end
entries.push({
"name" => name,
"kernel" => kernel,
"initrd" => initrd,
"cmdline" => cmdline,
})
end
entries
end
# Load a grub configuration file and process it
def process_grub_config
# TODO: Duplicate code smells, refactor this
# First, locate the grub configuration file
# We try GRUB 1 files first
["/boot/grub/menu.lst"].each do |file|
begin
entries = load_kernels_grub(open(file).read)
if !entries.empty? then
if $verbose then
puts "Read GRUB configuration from #{file}n"
end
return entries
end
rescue Errno::EACCES
STDERR.puts("#{$!}nYou must be root to run this utility.n")
exit 1
rescue Errno::ENOENT
next
end
end
# Then we try GRUB 2 files
["/boot/grub2/grub.cfg", "/boot/grub/grub.cfg"].each do |file|
begin
entries = load_kernels_grub2(open(file).read)
if !entries.empty? then
if $verbose then
puts "Read GRUB configuration from #{file}n"
end
return entries
end
rescue Errno::EACCES
STDERR.puts("#{$!}nYou must be root to run this utility.n")
exit 1
rescue Errno::ENOENT
next
end
end
STDERR.puts("Couldn't find a grub configuration anywhere!n")
exit 1
end
def kexec(entry)
if $verbose then
print "Staging kernel #{entry['name']}n"
end
fork do
exec "/sbin/kexec", "-l", "#{entry['kernel']}", "--append=#{entry['cmdline']}", "--initrd=#{entry['initrd']}"
end
end
def interactive_select_kernel
entries = process_grub_config
selection = nil
loop do
puts "nSelect a kernel to stage:nn"
entries.each_with_index do |entry, index|
selection_number = index + 1
puts "#{selection_number}: #{entry['name']}n"
end
print "nYour selection: "
selection = gets.chomp
begin
selection = Integer(selection)
rescue ArgumentError
return nil
end
break if selection.between?(0, entries.count)
end
return nil if selection == 0
entries[selection - 1]
end
def select_latest_kernel
entries = process_grub_config
entries.first
end
options = {}
opts = OptionParser.new do |opts|
opts.banner = "Usage: kexec-reboot [options]"
opts.on("-i", "--interactive", "Choose the kernel to stage from a list") do |i|
options[:interactive] = i
end
opts.on("-l", "--latest", "Stage the latest kernel") do |l|
options[:latest] = l
end
opts.on("-r", "--reboot", "Reboot immediately after staging the kernel") do |r|
options[:reboot] = r
end
opts.on("-v", "--[no-]verbose", "Extra verbosity.") do |v|
$verbose = v
end
end
opts.parse!
if (options[:interactive]) then
entry = interactive_select_kernel
if (entry.nil?) then
STDERR.puts "Canceled.n"
exit 1
end
elsif (options[:latest]) then
entry = select_latest_kernel
else
STDERR.puts opts.help
exit 1
end
if !entry.nil? then
entry = kexec(entry)
if options[:reboot] then
`shutdown -r now`
end
end
This code is now available on github and future changes will be published there.
After this was posted, these changes have been made (which can be seen in the github version):
- A bug which caused kexec to fail if a previous kernel had already been staged (e.g. via kexec or a kdump crash kernel) has been fixed.
- A bug which caused the script to fail to find the boot partition on certain older HP ProLiant servers has been fixed.
- Ruby hashes have been changed to use symbols as keys, rather than strings.
- Support was added for systems that boot with UEFI.
A number of further changes have been made and are now in github, including most of the suggestions given by 200_success.
In addition, after more extensive testing on a variety of servers (thanks to ewwhite) the following change was made:
- When processing a grub 1 configuration, first assume the kernel can be reached in either / or /boot, before trying to read the device.map file, because device.map is very frequently wrong due to post-installation hardware changes. This issue doesn’t affect systems which boot with grub 2.
I’ll be doing some more cleanup, and after I’ve incorporated the rest of the suggestions I’ll post the new version for review.
Solution
The Ruby code looks quite good.
You have a couple of filehandle leaks. A typical way to process a file is open(…) { |file| … }
. If you call open
without a block, then you should also close
the resulting filehandle.
An even simpler approach would be to call static methods such as IO::readlines
. For example, in device_to_mount_point
, the following code
mounts = open("/proc/mounts").each_line do |mount|
line = mount.split
if line[0] == device then
mount_point = line[1]
break
end
end
could be simplified with
proc_mounts = Hash[IO.readlines('/proc/mounts').collect { |line| line.split[0..1] }]
mount_point = proc_mounts[device]
You should avoid returning nil
to indicate an error. That just burdens the caller with the responsibility to handle a nil
result properly. If it’s not actually an error, then return an empty string. If it is an error, you should raise an exception instead:
raise ArgumentError.new("Device #{device} isn't a block device")
It is unusual to see string-to-number conversions written as Integer(part[1])
in Ruby. A more common expression would be part[1].to_i
.
Here is one way to eliminate the code duplication in process_grub_config
:
def process_grub_config
possible_grub_configs = [
["/boot/grub/menu.lst", :load_kernels_grub],
["/boot/grub2/grub.cfg", :load_kernels_grub2],
["/boot/grub/grub.cfg", :load_kernels_grub2],
]
possible_grub_configs.each do |filename, handler|
begin
entries = method(handler).call(IO::read(filename))
if !entries.empty? then
if $verbose then
puts "Read GRUB configuration from #{file}n"
end
return entries
end
rescue Errno::EACCES
STDERR.puts("#{$!}nYou must be root to run this utility.n")
exit 1
rescue Errno::ENOENT
next
end
end
end
I consider load_kernels_grub
and load_kernels_grub2
to be misnamed, as they aren’t actually loading anything, at least not in the kexec sense. I suggest a name like grub1_cfg_kernel_entries
instead.
In kexec
, fork
and exec
could just be a system
call:
system "/sbin/kexec", "-l", entry['kernel'], "--append=#{entry['cmdline']}", "--initrd=#{entry['initrd']}"
The entry['kernel']
parameter does not need string interpolation.
In accordance with the suggestion in the kexec(8)
man page, you could just call kexec
with no option parameter, which loads the specified kernel and calls shutdown
.
The task is painful because it really is difficult. It’s difficult because
grub.cfg
is written in GRUB’s built-in scripting language, which has a syntax quite similar to that of GNU Bash and other Bourne shell derivatives.
For example, from my Debian squeeze server, here is an excerpt from grub.cfg
:
### BEGIN /etc/grub.d/10_linux ### menuentry 'Debian GNU/Linux, with Linux 2.6.32-5-amd64' --class debian --class gnu-linux --class gnu --class os { set gfxpayload=1024x768 insmod lvm insmod part_gpt insmod ext2 set root='(vg-root1)' search --no-floppy --fs-uuid --set 84cc28cc-e54f-43f2-9e62-182d5e6af329 echo 'Loading Linux 2.6.32-5-amd64 ...' linux /boot/vmlinuz-2.6.32-5-amd64 root=/dev/mapper/vg-root1 ro console=tty0 console=ttyS1,115200n8r quiet vga=791 text echo 'Loading initial ramdisk ...' initrd /boot/initrd.img-2.6.32-5-amd64 }
… and device.map
:
(hd0) /dev/disk/by-id/cciss-3600508b100104439535547344832000b
Two of the complications are:
- The configuration opted to use the commands
set root=…
andlinux
rather thanroot …
andkernel …
. - Due to the use of LVM, you won’t be able to easily correlate the GRUB device name
(vg-root1)
with the mountpoint by looking indevice.map
.
To claim completeness, you would need to be able to do the inverse of everything that grub-mkconfig_lib
is capable of generating. A fully general solution would be even more difficult, as it would involve reimplementing a huge chunk of GRUB itself.
Perhaps it would be more advantageous to avoid trying to interpret GRUB’s device nomenclature altogether and stay entirely within Linux’s device-naming scheme by looking for the root=…
kernel command-line parameter. (The rdev(8)
command from util-linux may be of interest here, if no kernel parameters are passed — which is a rare practice these days.)
Considering the enormity of the task of writing a complete interpreter for grub.cfg
, I’d be happy if you handled just a limited subset of the configuration language properly. Ignoring #Comments
would be a good start.
Use symbols as keys in your hashes. The script is inconsistent with its use of symbols or strings as hash keys, and using symbols saves both time and memory as Ruby symbols are immutable.
When processing a grub 1 configuration, first assume the kernel can be reached in either / or /boot, before trying to read the device.map file, because device.map is very frequently wrong due to post-installation hardware changes. This issue doesn’t affect systems which boot with grub 2.
# Scan directories to find the one containing the given path
def locate_kernel(kernel)
["", "/boot"].each do |dir|
STDERR.puts "Looking for #{dir}#{kernel}n" if $verbose
return dir if File.exists?("#{dir}#{kernel}")
end
raise Errno::ENOENT
end
# Load the available kernels from the given GRUB 1 configuration file
def grub1_kernel_entries(config)
device_map = IO.read("/boot/grub/device.map")
entries = Array.new
config.scan(/title (.+?$).+?root (([^)]+)).+?kernel ([^ ]+) (.+?)$.+?initrd (.+?$)/m).each do |entry|
begin
# Try hard-coded locations, works 99.9% of the time
mount_point = locate_kernel(entry[2])
rescue Errno::ENOENT
# Fallback to reading grub1 device.map, which is often wrong
mount_point = device_map_to_mount_point(entry[1], device_map)
end
#.....
I get that the objective here is to do it with Ruby, but if all you are looking to do is kexec based reboot, this is a little more complicated than necessary. You may find the script in this article handy for a simple fast reboot using kexec. The bash is trivial enough to easily convert to Ruby.