Problem
I have the following Ruby code that’s designed to update item, price and stock data for items in a MSSQL database. It’s running on a Ruby 1.8.6/Rails 1.2.3 installation, in its own controller (for now)
What I’m looking for is ways to optimize performance.
Right now each item takes about 0,2 seconds (200ms) to process. Edit: There may be 10,000+ items in the XML file, and 10,000,000+ items in the “Items” and “Price” SQL tables.
I’ve been told that
-
composing my own SQL queries directly instead of using the models will be faster, since I’m doing quite a few lookups, how much performance would that lend me?
-
and selecting many rows at once, doing processing, and inserting at once should be faster, instead of doing it one by one. (How would this be done in practice?)
# counters for statistics
count = 0
skips = 0
# load + parse XML
file = File.read(XML_PATH)
doc = REXML::Document.new file
# loop through products in XML file
doc.elements.each('products') { |product|
itemid = product.attributes['id'].to_i
itemprice = product.elements['price'].text.to_i
item = Item.find_by_id(itemid)
if item.nil?
skips += 1
next
end
# update prices
price = Price.find(:first, :conditions => { :item_id => itemid })
# create new price if price not found
if price.nil?
price = Price.new
price.item_id = itemid
end
price.price = product.elements['price'].text.to_i
price.save
# find + update item stock data
product.elements.each('stocks/location') { |location|
item_stock_location_id = location.attributes['location_code']
item_stock_location = ItemStockLocationCount.find(:first,
:conditions => {
:item_stock_location_id => item_stock_location_id,
:item_id => itemid.to_s
})
if item_stock_location.nil?
item_stock_location = ItemStockLocationCount.new
item_stock_location.item_stock_location_id = item_stock_location_id
item_stock_location.item_id = itemid
end
item_stock_location.stock_qty = location.elements['stock'].text.to_i
item_stock_location.save
}
# update onsite_stock + offsite_stock
item_stock_count_on_site = 0
item_stock_count_off_site = 0
item_stock_loc_qtys = ItemStockLocationQuantity.find_all({ :item_id => itemid.to_s })
item_stock_loc_qtys.each { |stock_loc_qty|
item_stock_count_on_site += stock_loc_qty.stock_qty
item_stock_count_off_site += stock_loc_qty.stock_qty unless stock_loc_qty.item_stock_location_id == 1
}
item.onsite_stock = item_stock_count_on_site
item.offsite_stock = item_stock_count_off_site
# update price lock
item.price_lock = product.elements['price_locked']
if item.price_lock == 1
item.vat_price = product.elements['price']
item.price = (itemprice - ((itemprice * 25)/ (100 + 25)))
end
item.save
count += 1
# progress bar
if count%100 == 0
printf '.'
end
}
puts 'Item Count: ' + count.to_s
puts 'Items skipped: ' + skips.to_s
Solution
As for performance, I definitely think you want to use SQL’s Bulk Insert from XML functionality. I don’t know how to use it from Ruby, so I’ll leave that to another reviewer.
I do want to point out something else though. I had to scroll down two pages to find the end of the loop that this opens up. doc.elements.each('products') { |product|
. In fact, the entirety of the code is inside of this loop. Blocks of code should fit on the screen in their entirety. I call it the “Single Page Principle”.
So, the refactoring step that should be taken is to define a method that takes product
as a parameter and put all of the logic necessary to handle a single doc.element
inside of it.
def update_product!(product)
#ALL THE CODEZ
end
# loop through products in XML file
doc.elements.each('products') { |product| update_product!(product) }
The next step would be to break down the steps inside of the new update_product
method into smaller methods as well. A good start would be to take the code between the comments and place them inside of a method with a good name. This replaces your comment and breaks the code down into logical chunks.
For example:
Instead of this:
# update prices
price = Price.find(:first, :conditions => { :item_id => itemid })
# create new price if price not found
if price.nil?
price = Price.new
price.item_id = itemid
end
price.price = product.elements['price'].text.to_i
price.save
# find + update item stock data
product.elements.each('stocks/location') { |location|
item_stock_location_id = location.attributes['location_code']
item_stock_location = ItemStockLocationCount.find(:first,
:conditions => {
:item_stock_location_id => item_stock_location_id,
:item_id => itemid.to_s
})
Define this:
def update_price
# update prices
price = Price.find(:first, :conditions => { :item_id => itemid })
# create new price if price not found
if price.nil?
price = Price.new
price.item_id = itemid
end
price.price = product.elements['price'].text.to_i
price.save
# find + update item stock data
product.elements.each('stocks/location') { |location|
item_stock_location_id = location.attributes['location_code']
item_stock_location = ItemStockLocationCount.find(:first,
:conditions => {
:item_stock_location_id => item_stock_location_id,
:item_id => itemid.to_s
})
end
And call it from update_product!
. Continuing doing this until each method has exactly one responsibility. You should end up with code that looks something like this psuedocode.
def update_product(product)
update_price
update_item_stock_data
update_stock_site
update_price_lock
count += 1
# progress bar
if count%100 == 0
printf '.'
end
end
# counters for statistics
count = 0
skips = 0
# load + parse XML
file = File.read(XML_PATH)
doc = REXML::Document.new file
# loop through products in XML file
doc.elements.each('products') { |product| update_product!(product) }
puts 'Item Count: ' + count.to_s
puts 'Items skipped: ' + skips.to_s