Sei sulla pagina 1di 26

work.rowanhick.

com

How to avoid hanging


yourself with Rails
Using ActiveRecord right the first time

1
Discussion tonight
• Intended for new Rails Developers
• People that think Rails is slow
• Focus on simple steps to improve
common :has_many performance problems
• Short - 15mins
• All links/references up on
http://work.rowanhick.com tomorrow

2
About me
• New Zealander (not Australian)

• Product Development Mgr for a startup in Toronto

• Full time with Rails for 2 years

• Previously PHP/MySQL for 4 years

• 6 years Prior QA/BA/PM for Enterprise CAD/


CAM software dev company

3
Disclaimer
• For sake of brevity and understanding, the SQL
shown here is cut down to “psuedo sql”

• This is not an exhaustive in-depth analysis, just


meant as a heads up

• Times were done using ApacheBench through


mongrel in production mode

• ab -n 1000 http://127.0.0.1/orders/test_xxxx

4
ActiveRecord lets you get in
trouble far to quick.

• Super easy syntax comes at a cost.


@orders = Order.find(:all)
@orders.each do |order|
puts order.customer.name
puts order.customer.country.name
end

✴Congratulations, you just overloaded your DB


with (total number of Orders x 2) unnecessary
SQL calls

5
What happened there?
• One query to get the orders
@orders = Order.find(:all)
“SELECT * FROM orders”

• For every item in the orders collection


customer.name:
“SELECT * FROM customers WHERE id = x”
customer.country.name:
“SELECT * FROM customers WHERE id = y”

6
Systemic Problem in
Web development
I’ve seen:
- 15 Second page reloads
- 10000 queries per page
“<insert name here> language performs
really poorly, we’re going to get it
redeveloped in <insert new language
here>”

7
Atypical root cause
• Failure to build application with *real* data
• ie “It worked fine on my machine” but the
developer never loaded up 100’000 records
to see what would happen
• Using Rake tasks to build realistic data sets
• Test, test, test
• tail -f log/development.log
8
Faker to the rescue
• in lib/xchain.rake
namespace :xchain do
desc "Load fake customers"
task :load_customers => :environment do
require 'Faker'
Customer.find(:all, :conditions => "email LIKE('%XCHAIN_
%')").each { |c| c.destroy }
1..300.times do
c = Customer.new
c.status_id = rand(3) + 1
c.country_id = rand(243) + 1
c.name = Faker::Company.name
c.alternate_name = Faker::Company.name
c.phone = Faker::PhoneNumber.phone_number
c.email = "XCHAIN_"+Faker::Internet.email
c.save
end
end

$ rake xchain:load_customers

9
Eager loading
• By using :include in .finds you create sql joins
• Pull all required records in one query
find(:all, :include => [ :customer, :order_lines ])

✓ order.customer, order.order_lines

find(:all, :include => [ { :customer


=> :country }, :order_lines ])

✓ order.customer order.customer.country
order.order_lines

10
Improvement
• Let’s start optimising ...
@orders = Order.find(:all, :include => {:customers => :country} )

• Resulting SQL ...


“SELECT orders.*, countries.* FROM orders LEFT JOIN
customers ON ( customers.id = orders.customers_id )
LEFT JOIN countries ON ( countries.id =
customers.country_id)

✓ 7.70 req/s 1.4x faster

11
Select only what you
need
• Using the :select parameter in the find
options, you can limit the columns you are
requesting back from the database
• No point grabbing all columns, if you only
want :id and :name
Orders.find(:all, :select => ‘orders.id,
orders.name’)

12
The last slide was very
important
• Not using selects is *okay* provided you
have very small columns, and never any
binary, or large text data
• You can suddenly saturate your DB
connection.
• Imagine our Orders table had an Invoice
column on it storing a pdf of the invoice...

13
Oops
• Can’t show a benchmark
• :select and :include don’t work together !,
reverts back to selecting all columns
• Core team for a long time have not
included patches to make it work
• One little sentence in ActiveRecord rdoc
“Because eager loading generates the SELECT
statement too, the :select option is ignored.”

14
‘mrj’ to the rescue

• http://dev.rubyonrails.org/attachment/ticket/
7147/init.5.rb
• Monkey patch to fix select/include problem
• Produces much more efficient SQL

15
Updated finder

• Now :select and :include playing nice:


@orders = Order.find(:all,
:select => 'orders.id, orders.created_at, customers.name,
countries.name, order_statuses.name',
:include => [{:customer[:name]
=> :country[:name]}, :order_status[:name]],
:conditions => conditions,
:order => 'order_statuses.sort_order ASC,order_statuses.id ASC,
orders.id DESC')

✓15.15 req/s 2.88x faster

16
r8672 change
• http://blog.codefront.net/2008/01/30/living-on-the-
edge-of-rails-5-better-eager-loading-and-more/

• The following uses new improved association load


(12 req/s)
@orders = Order.find(:all, :include => [{:customer
=> :country}, :order_status] )

• The following does not


@orders = Order.find(:all, :include => [{:customer
=> :country}, :order_status], :order =>
‘order_statuses.sort_order’ )

17
r8672 output...

• Here’s the SQL


Order Load (0.000837) SELECT * FROM `orders` WHERE (order_status_id <
100) LIMIT 10

Customer Load (0.000439) SELECT * FROM `customers` WHERE


(customers.id IN (2106,2018,1920,2025,2394,2075,2334,2159,1983,2017))

Country Load (0.000324) SELECT * FROM `countries` WHERE (countries.id


IN (33,17,56,150,194,90,91,113,80,54))

OrderStatus Load (0.000291) SELECT * FROM `order_statuses` WHERE


(order_statuses.id IN (10))

18
But I want more

• Okay, this still isn’t blazing fast. I’m building


the next killr web2.0 app
• Forgetabout associations, just load it via
SQL, depending on application, makes a
huge difference
• Concentrate on commonly used pages

19
Catch 22
• Hard coding SQL is the fastest solution
• No construction of SQL, no generation of
ActiveRecord associated classes
• If your DB changes, you have to update
SQL

‣ Keep SQL with models where possible

20
It ain’t pretty.. but it’s
fast
• Find by SQL
class order
def self.find_current_orders
find_by_sql("SELECT orders.id, orders.created_at, customers.name
as customer_name, countries.name as country_name, order_statuses.name
as status_name FROM orders LEFT OUTER JOIN `customers` ON
`customers`.id = `orders`.customer_id LEFT OUTER JOIN `countries` ON
`countries`.id = `customers`.country_id LEFT OUTER JOIN
`order_statuses` ON `order_statuses`.id = `orders`.order_status_id
WHERE order_status_id < 100 ORDER BY order_statuses.sort_order
ASC,order_statuses.id ASC, orders.id DESC")
end
end

• 28.90 req/s ( 5.49x faster )


21
And the results
find(:all) 5.26 req/s

find(:all, :include) 7.70 req/s 1.4x

find(:all, :select, :in


15.15 req/s 2.88x
clude)

find_by_sql() 28.90 req/s 5.49x

22
Don’t forget indexes
• 64000 orders
OrderStatus.find(:all).each { |os| puts
os.orders.count }

• Avg 0.61 req/s no indexes

• EXPLAIN your SQL


ALTER TABLE `xchain_test`.`orders` ADD INDEX
order_status_idx(`order_status_id`);

• Avg 23 req/s after index (37x improvment)

23
Avoid .count

• It’s damned slow


OrderStatus.find(:all).each { |os| puts
os.orders.count }

• Add column orders_count + update code


OrderStatus.find(:all).each { |os| puts
os.orders_count }

✓34 req/s vs 108 req/s (3x faster)

24
For the speed freaks

• Merb - http://merbivore.com
• 38.56 req/s - 7x performance improvement
• Nearly identical code
• Blazingly fast

25
work.rowanhick.com

The End

26

Potrebbero piacerti anche