I recently ran across a rather bare post espousing some generic “optimization” techniques for Rails apps. It offered no education, no explanation, no benchmarks. So, I thought, why not put those claims to the test?
find_by_sql versus find_by_x
First, Konstantin claims that Model#find_by_field
is slower than Model#find_by_sql
. This one is hard to dispute; the first will invoke method_missing and spend time generating SQL, while the latter simply executes a statement. Is cutting the knees out from under your ORM worth the time saved? Let’s see!
require 'benchmark'
def measure_find_by_sql_vs_orm(num = 1000)
puts "find_by_sql (#{num}x)"
puts Benchmark.measure {
num.times { User.find_by_sql "select * from users where id = 123" }
}
puts "find_by_id (#{num}x)"
puts Benchmark.measure {
num.times { User.find_by_id 123 }
}
end
measure_find_by_sql_vs_orm(10000)
Let’s run this a few times.
[chris@polaris benchmarks]$ script/runner benchmark.rb
find_by_sql (10000x)
2.290000 0.540000 2.830000 ( 4.452150)
find_by_id (10000x)
4.660000 0.400000 5.060000 ( 6.766629)
[chris@polaris benchmarks]$ script/runner benchmark.rb
find_by_sql (10000x)
2.300000 0.480000 2.780000 ( 4.473950)
find_by_id (10000x)
4.520000 0.560000 5.080000 ( 6.837272)
[chris@polaris benchmarks]$ script/runner benchmark.rb
find_by_sql (10000x)
2.170000 0.540000 2.710000 ( 4.419207)
find_by_id (10000x)
4.580000 0.540000 5.120000 ( 6.881676)
find_by_sql: Averages 4.44 sec for 10,000 queries
find_by_id: Averages 6.83 sec for 10,000 queries
Conclusion the first: Using the ORM to build SQL adds some overhead; in my tests, 2.47 sec/10,000 queries, or 0.000247 seconds per query. Is this worth optimizing out? Yeah, probably not. In fact, the productivity lost by using find_by_sql
is likely going to end up costing the project more.
IDs and numbers in quotes
Second, they claim that quoting values in your SQL statements slows down your queries. This one struck me as just a little out there. Let’s see what the benchmarks say.
require 'benchmark'
def measure_select_with_quotes(num = 1000)
puts "Without quotes (#{num}x):"
db = ActiveRecord::Base.connection.instance_variable_get :@connection
puts Benchmark.measure {
num.times { db.query("select * from users where id = 123") {} }
}
puts "With quotes (#{num}x):"
puts Benchmark.measure {
num.times { db.query("select * from users where id = \"123\"") {} }
}
end
measure_select_with_quotes(10000)
And the results:
[chris@polaris benchmarks]$ script/runner benchmark.rb
Without quotes (10000x):
0.690000 0.340000 1.030000 ( 2.639554)
With quotes (10000x):
0.670000 0.290000 0.960000 ( 2.655049)
[chris@polaris benchmarks]$ script/runner benchmark.rb
Without quotes (10000x):
0.570000 0.320000 0.890000 ( 2.654003)
With quotes (10000x):
0.550000 0.400000 0.950000 ( 2.617369)
Well, that’s certainly interesting. In 10,000 queries, an average difference of about 3/100ths of a second. Certainly not worth combing through your codebase as an optimization point.
Conclusion the second: The performance gain from quoted versus non-quoted field values is so small to be inconsequential.
On a side note, there is a very interesting subtlety here. Observe the difference between
num.times { db.query("select * from users where id = 123") {} }
and
num.times { db.query("select * from users where id = 123") }
The former passes the Mysql::Result
object to a block, and frees it after the block terminates. The latter does not, and the returned Mysql::Result
object remains in scope for the entire pass of the benchmark. This subtlety makes a massive difference.
def measure_select_with_free(num = 1000)
db = ActiveRecord::Base.connection.instance_variable_get :@connection
puts "Query with block, result immediately freed"
puts Benchmark.measure {
num.times { db.query("select * from users where id = 123") {} }
}
puts "Query without block, result remains in scope"
puts Benchmark.measure {
num.times { db.query("select * from users where id = 123") }
}
end
Results:
[chris@polaris benchmarks]$ script/runner benchmark.rb
Query with block, result immediately freed
0.060000 0.040000 0.100000 ( 0.267983)
Query without block, result remains in scope
5.040000 0.050000 5.090000 ( 5.266476)
Whoa damn. Ruby’s GC is slaughtering performance there. Just adding a pair of curly braces makes the benchmark run 20 times faster.
It’s better to request only specific column
Finally, Konstantin mentions that selecting only specific fields from a table is faster. This is a truth in both MySQL and in the ActiveRecord ORM, for a number of reasons. However, he says:
Person.find_by_name(“Name”).phone_number. It would be much faster if you use: Person.find_by_sql(“SELECT persons.phone_number WHERE persons.name = ‘Name’”)
Why not just use the :select option that ActiveRecord provides?
Person.find_by_name("Name", :select => "phone_number")
Let’s test those assumptions.
def measure_single_field_select(num = 1000)
puts "Find with all fields"
puts Benchmark.measure {
num.times { User.find_by_id(123)}
}
puts "Find with one field, with :select"
puts Benchmark.measure {
num.times { User.find_by_id(123, :select => "email")}
}
end
Results:
[chris@polaris benchmarks]$ script/runner benchmark.rb
Find with all fields
0.720000 0.060000 0.780000 ( 0.963273)
Find with one field, with :select
0.310000 0.010000 0.320000 ( 0.364554)
[chris@polaris benchmarks]$ script/runner benchmark.rb
Find with all fields
0.710000 0.110000 0.820000 ( 1.014548)
Find with one field, with :select
0.260000 0.020000 0.280000 ( 0.351761)
Very significant difference there…and we didn’t have to bypass the ORM to get it, either.