Making HAML faster

Haml’s among my favorite of the Rails technology stack. Clean, self-correcting templates that mean less typing and more doing for me. I love it.

Unfortunately, there have been a number of performance regressions introduced into Haml recently, and that sucks, because Rails spends a lot of time building views, and I’d really like those numbers to be smaller.

Over the past couple of weeks, I’ve been on-and-off profiling Haml and working on various performance patches. I mentioned one of them in my previous post - avoiding exceptions as flow control. There are a couple more we need to watch out for, though.

Problem #1: Lots and lots of extra string parsing

There’s a utility method that lets Haml compare the current version of Ruby with some arbitrary string, to find out if certain features are supported. You can look at the implementation of version_gt, if you’d like, but it’s relatively complex, and we’re invoking it a lot in any given template.

In Haml::Util

    def version_geq(v1, v2)
      version_gt(v1, v2) || !version_gt(v2, v1)
    end

Memoizing these values results in significantly less string parsing and much faster templates.

    def version_geq(v1, v2)
      @@version_comparison_cache ||= {}
      k = "#{v1}#{v2}"
      return @@version_comparison_cache[k] unless @@version_comparison_cache[k].nil?
      @@version_comparison_cache[k] = ( version_gt(v1, v2) || !version_gt(v2, v1) )
    end

Problem #2: Extraneous block creation

Haml::Compiler#compile is what compiles your Haml soup down into HTML. It also creates a bunch of extra closures - one for every leaf tag in your document.

    def compile(node)
      parent, @node = @node, node
      block = proc {node.children.each {|c| compile c}}
      send("compile_#{node.type}", &(block unless node.children.empty?))
    ensure
      @node = parent
    end

Let’s just change that so that the block is only created and passed if there are children to iterate:

    def compile(node)
      parent, @node = @node, node
      if node.children.empty?
        send("compile_#{node.type}")
      else
        send("compile_#{node.type}",  &proc {node.children.each {|c| compile c}} )
      end
    ensure
      @node = parent
    end

It’s worth noting that I tried a compacted single-line send, but it seems faster to just check #empty? than to conditionally create the block and pass &(block if block).

Problem #3: Exceptions as flow control

Haml::Helpers has a couple of instances where it checks for the presence of _hamlout in a block binding by just eval’ing _hamlout and catching NameError to discover that it doesn’t exist. I’ve refactored that to use more idiomatic constructs.

@@ -337,7 +337,7 @@ MESSAGE
     # @yield [args] A block of Haml code that will be converted to a string
     # @yieldparam args [Array] `args`
     def capture_haml(*args, &block)
-      buffer = eval('_hamlout', block.binding) rescue haml_buffer
+      buffer = eval('if defined? _hamlout then _hamlout else nil end', block.binding) || haml_buffer
       with_haml_buffer(buffer) do
         position = haml_buffer.buffer.length

...	...
@@ -540,10 +540,7 @@ MESSAGE
     # @param block [Proc] A Ruby block
     # @return [Boolean] Whether or not `block` is defined directly in a Haml template
     def block_is_haml?(block)
-      eval('_hamlout', block.binding)
-      true
-    rescue
-      false
+      eval('!!defined?(_hamlout)', block.binding)
     end

Results

To test, I have my branch in ./haml and the current origin master in ./haml-upstream. I’ve also got a 900-line Haml template with no inline ruby (just a html2haml converted webpage) that I’m parsing to test with.

To test, I just include the appropriate library and run the benchmark.

require 'haml/lib/haml'
#require 'haml-upstream/lib/haml'
require 'benchmark'

TIMES = 100
source = open("formatted_email.haml").read

Benchmark.bmbm do |x|
    x.report("Render time") do
        TIMES.times do
            engine = Haml::Engine.new source
            engine.render :ugly => true
        end
    end
end

REE-1.8.7-2011.03

Upstream:

[chris@luna repos]$ ruby haml-bench.rb
Rehearsal -----------------------------------------------
Render time   7.010000   0.770000   7.780000 (  7.777076)
-------------------------------------- total: 7.780000sec

                  user     system      total        real
Render time   6.990000   0.710000   7.700000 (  7.721977)

And my branch:

[chris@luna repos]$ ruby haml-bench.rb
Rehearsal -----------------------------------------------
Render time   5.180000   0.460000   5.640000 (  5.703304)
-------------------------------------- total: 5.640000sec

                  user     system      total        real
Render time   5.170000   0.450000   5.620000 (  5.621875)

Improvement: +27% speedup

JRuby 1.6.0

Upstream:

[chris@luna repos]$ jruby --server haml-bench.rb
Rehearsal -----------------------------------------------
Render time  13.254000   0.000000  13.254000 ( 13.254000)
------------------------------------- total: 13.254000sec

                  user     system      total        real
Render time   6.183000   0.000000   6.183000 (  6.183000)

My branch:

[chris@luna repos]$ jruby --server haml-bench.rb
Rehearsal -----------------------------------------------
Render time  11.726000   0.000000  11.726000 ( 11.726000)
------------------------------------- total: 11.726000sec

                  user     system      total        real
Render time   4.856000   0.000000   4.856000 (  4.856000)

Improvement: +21.5% speedup

Ruby MRI 1.9.2

Upstream:

[chris@luna repos]$ ruby haml-bench.rb
Rehearsal -----------------------------------------------
Render time   6.990000   0.610000   7.600000 (  7.599568)
-------------------------------------- total: 7.600000sec

                  user     system      total        real
Render time   6.920000   0.660000   7.580000 (  7.577772)

My branch:

[chris@luna repos]$ ruby haml-bench.rb
Rehearsal -----------------------------------------------
Render time   5.110000   0.470000   5.580000 (  5.573496)
-------------------------------------- total: 5.580000sec

                  user     system      total        real
Render time   5.150000   0.440000   5.590000 (  5.577480)

Improvement: +26.3% speedup

Final Words

I’ve got an open pull request, but it’s been ignored thus far. Make some noise and get this pulled into master, so we can make Rails apps everywhere faster!