Skip to content

Handle nil in Rack::Utils.escape_html#2202

Merged
jeremyevans merged 2 commits intorack:mainfrom
Earlopain:html-escape-non-strings
Jun 12, 2024
Merged

Handle nil in Rack::Utils.escape_html#2202
jeremyevans merged 2 commits intorack:mainfrom
Earlopain:html-escape-non-strings

Conversation

@Earlopain
Copy link
Copy Markdown
Contributor

This previously worked but unintentionally changed in #2099. It previously called to_s which made this accept almost anything (but especially nil)

This is of course a bit slower than the current implementation. Here's a benchmark:

require 'benchmark/ips'
require 'cgi'

# Generated by ChatGPT
HTML_EXAMPLES = <<HTML.lines(chomp: true)
<p>It's a beautiful day.</p>
<a href="https://example.com/page?id=123&category=5">Link</a>
<button onclick="alert('Hello, world!')">Click me</button>
<input type="text" value="It's my text">
<div class='container'>Content goes here</div>
<span>5 &times; 10 = 50</span>
<img src='image.jpg' alt="Nature's beauty">
<p style='color: red;'>This text is red.</p>
<a href='javascript:alert("XSS attack!");'>Click here</a>
<input type='text' placeholder='Enter your name & email'>
HTML
targets = (HTML_EXAMPLES * 10).shuffle

num_iteration = 10_000

module Utils
  module_function

  ESCAPE_HTML = {
    "&" => "&amp;",
    "<" => "&lt;",
    ">" => "&gt;",
    "'" => "&#x27;",
    '"' => "&quot;"
  }

  ESCAPE_HTML_PATTERN = Regexp.union(*ESCAPE_HTML.keys)

  def escape_html_old(string)
    string.to_s.gsub(ESCAPE_HTML_PATTERN, ESCAPE_HTML)
  end

  define_method(:escape_html_define_method, CGI.method(:escapeHTML))

  def escape_html_or(string)
    CGI.escapeHTML(string || "")
  end

  def escape_html_to_s(string)
    CGI.escapeHTML(string.to_s)
  end
end

Benchmark.ips do |x|
  x.report "old" do
    num_iteration.times do
      targets.each do |string|
        Utils.escape_html_old(string)
      end
    end
  end

  x.report "define_method" do
    num_iteration.times do
      targets.each do |string|
        Utils.escape_html_define_method(string)
      end
    end
  end

  x.report "to_s" do
    num_iteration.times do
      targets.each do |string|
        Utils.escape_html_to_s(string)
      end
    end
  end

  x.report "or" do
    num_iteration.times do
      targets.each do |string|
        Utils.escape_html_or(string)
      end
    end
  end

  x.compare!
end

Results for my machine:

ruby 3.3.1 (2024-04-23 revision c56cd86388) [x86_64-linux]
Warming up --------------------------------------
                 old     1.000 i/100ms
       define_method     1.000 i/100ms
                to_s     1.000 i/100ms
                  or     1.000 i/100ms
Calculating -------------------------------------
                 old      0.353 (± 0.0%) i/s -      2.000 in   5.664835s
       define_method      8.733 (± 0.0%) i/s -     44.000 in   5.040108s
                to_s      6.946 (± 0.0%) i/s -     35.000 in   5.040787s
                  or      7.438 (± 0.0%) i/s -     38.000 in   5.111461s

Comparison:
       define_method:        8.7 i/s
                  or:        7.4 i/s - 1.17x  slower
                to_s:        6.9 i/s - 1.26x  slower
                 old:        0.4 i/s - 24.74x  slower

With YJIT:

ruby 3.3.1 (2024-04-23 revision c56cd86388) +YJIT [x86_64-linux]
Warming up --------------------------------------
                 old     1.000 i/100ms
       define_method     1.000 i/100ms
                to_s     1.000 i/100ms
                  or     1.000 i/100ms
Calculating -------------------------------------
                 old      0.364 (± 0.0%) i/s -      2.000 in   5.494063s
       define_method     10.271 (± 0.0%) i/s -     52.000 in   5.064240s
                to_s      9.982 (± 0.0%) i/s -     50.000 in   5.009911s
                  or     10.031 (± 0.0%) i/s -     51.000 in   5.087295s

Comparison:
       define_method:       10.3 i/s
                  or:       10.0 i/s - 1.02x  slower
                to_s:       10.0 i/s - 1.03x  slower
                 old:        0.4 i/s - 28.21x  slower

With YJIT there is barely any difference between these variants which I think is pretty great. | "" would be an option and it does handle nil but not non-strings.

Copy link
Copy Markdown
Member

@ioquatix ioquatix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with this, but I'd like @jeremyevans's opinion on it.

Copy link
Copy Markdown
Contributor

@jeremyevans jeremyevans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to avoid the slowdown, and I think I have a way (at least for some Ruby versions).

Comment thread lib/rack/utils.rb Outdated
@Earlopain Earlopain force-pushed the html-escape-non-strings branch 2 times, most recently from fc474c2 to 2b2b6e7 Compare June 11, 2024 15:32
@ioquatix
Copy link
Copy Markdown
Member

Are you happy that ERB::Escape is in the same ball-park performance-wise as our current CGI.escape_html?

Copy link
Copy Markdown
Contributor

@jeremyevans jeremyevans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, one small change requested.

Comment thread lib/rack/utils.rb
Comment thread lib/rack/utils.rb
Comment thread lib/rack/utils.rb
@Earlopain
Copy link
Copy Markdown
Contributor Author

I was about to post some numbers (:

ERB::Escape is of course still slower than plain CGI but it gets some nice wins when the text doesn't need escaping.
Both codeblocks contain two benchmark runs, one where the input is html, and one where the input is just boring text. I think, the numbers for this are pretty nice. I also included to_s (my initial proposal) and the current define_method version.

Benchmark script

require 'benchmark/ips'
require 'cgi'
require 'erb'

# Generated by ChatGPT
HTML_EXAMPLES = <<HTML.lines(chomp: true)
<p>It's a beautiful day.</p>
<a href="https://example.com/page?id=123&category=5">Link</a>
<button onclick="alert('Hello, world!')">Click me</button>
<input type="text" value="It's my text">
<div class='container'>Content goes here</div>
<span>5 &times; 10 = 50</span>
<img src='image.jpg' alt="Nature's beauty">
<p style='color: red;'>This text is red.</p>
<a href='javascript:alert("XSS attack!");'>Click here</a>
<input type='text' placeholder='Enter your name & email'>
HTML
HTML_NO_ESCAPING = <<HTML.lines(chomp: true)
Lorem ipsum dolor sit amet, consectetur adipiscing elit,
sed do eiusmod tempor incididunt ut labore et dolore magna
aliqua. Ut enim ad minim veniam, quis nostrud exercitation
ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit
esse cillum dolore eu fugiat nulla pariatur. Excepteur sint
occaecat cupidatat non proident, sunt in culpa qui officia
deserunt mollit anim id est laborum. Lorem ipsum dolor sit
amet, consectetur adipiscing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua. Ut enim ad
HTML
targets = (HTML_EXAMPLES * 10).shuffle
target_no_escaping = (HTML_NO_ESCAPING * 10).shuffle

num_iteration = 10_000

module Utils
  module_function

  define_method(:escape_html_current, CGI.method(:escapeHTML))
  define_method(:escape_html_new, ERB::Escape.instance_method(:html_escape))

  def escape_html_new_fallback(string)
    CGI.escapeHTML(string.to_s)
  end

  def escape_html_to_s(string)
    CGI.escapeHTML(string.to_s)
  end

end

Benchmark.ips do |x|
  x.report "current" do
    num_iteration.times do
      targets.each do |string|
        Utils.escape_html_current(string)
      end
    end
  end

  x.report "to_s" do
    num_iteration.times do
      targets.each do |string|
        Utils.escape_html_to_s(string)
      end
    end
  end

  x.report "erb" do
    num_iteration.times do
      targets.each do |string|
        Utils.escape_html_new(string)
      end
    end
  end

  x.report "erb fallback" do
    num_iteration.times do
      targets.each do |string|
        Utils.escape_html_new_fallback(string)
      end
    end
  end

  x.compare!
end;


puts "TEXT ONLY"

Benchmark.ips do |x|
  x.report "current" do
    num_iteration.times do
      target_no_escaping.each do |string|
        Utils.escape_html_current(string)
      end
    end
  end

  x.report "to_s" do
    num_iteration.times do
      target_no_escaping.each do |string|
        Utils.escape_html_to_s(string)
      end
    end
  end

  x.report "erb" do
    num_iteration.times do
      target_no_escaping.each do |string|
        Utils.escape_html_new(string)
      end
    end
  end

  x.report "erb fallback" do
    num_iteration.times do
      target_no_escaping.each do |string|
        Utils.escape_html_new_fallback(string)
      end
    end
  end

  x.compare!
end

ruby 3.3.1 (2024-04-23 revision c56cd86388) [x86_64-linux]
Warming up --------------------------------------
             current     1.000 i/100ms
                to_s     1.000 i/100ms
                 erb     1.000 i/100ms
        erb fallback     1.000 i/100ms
Calculating -------------------------------------
             current      8.741 (± 0.0%) i/s -     44.000 in   5.034073s
                to_s      7.041 (± 0.0%) i/s -     36.000 in   5.112778s
                 erb      8.751 (± 0.0%) i/s -     44.000 in   5.027805s
        erb fallback      7.040 (± 0.0%) i/s -     36.000 in   5.114978s

Comparison:
                 erb:        8.8 i/s
             current:        8.7 i/s - 1.00x  slower
                to_s:        7.0 i/s - 1.24x  slower
        erb fallback:        7.0 i/s - 1.24x  slower

TEXT ONLY
ruby 3.3.1 (2024-04-23 revision c56cd86388) [x86_64-linux]
Warming up --------------------------------------
             current     1.000 i/100ms
                to_s     1.000 i/100ms
                 erb     1.000 i/100ms
        erb fallback     1.000 i/100ms
Calculating -------------------------------------
             current     10.404 (± 0.0%) i/s -     53.000 in   5.094110s
                to_s      8.340 (± 0.0%) i/s -     42.000 in   5.035849s
                 erb     14.559 (± 0.0%) i/s -     73.000 in   5.014386s
        erb fallback      8.326 (± 0.0%) i/s -     42.000 in   5.044305s

Comparison:
                 erb:       14.6 i/s
             current:       10.4 i/s - 1.40x  slower
                to_s:        8.3 i/s - 1.75x  slower
        erb fallback:        8.3 i/s - 1.75x  slower

YJIT

ruby 3.3.1 (2024-04-23 revision c56cd86388) +YJIT [x86_64-linux]
Warming up --------------------------------------
             current     1.000 i/100ms
                to_s     1.000 i/100ms
                 erb     1.000 i/100ms
        erb fallback     1.000 i/100ms
Calculating -------------------------------------
             current     10.327 (± 0.0%) i/s -     52.000 in   5.036097s
                to_s      9.741 (± 0.0%) i/s -     49.000 in   5.031537s
                 erb     10.289 (± 0.0%) i/s -     52.000 in   5.057790s
        erb fallback     10.077 (± 0.0%) i/s -     51.000 in   5.061175s

Comparison:
             current:       10.3 i/s
                 erb:       10.3 i/s - 1.00x  slower
        erb fallback:       10.1 i/s - 1.02x  slower
                to_s:        9.7 i/s - 1.06x  slower

TEXT ONLY
ruby 3.3.1 (2024-04-23 revision c56cd86388) +YJIT [x86_64-linux]
Warming up --------------------------------------
             current     1.000 i/100ms
                to_s     1.000 i/100ms
                 erb     1.000 i/100ms
        erb fallback     1.000 i/100ms
Calculating -------------------------------------
             current     12.746 (± 0.0%) i/s -     64.000 in   5.021159s
                to_s     12.025 (± 0.0%) i/s -     61.000 in   5.072948s
                 erb     18.742 (± 0.0%) i/s -     94.000 in   5.016059s
        erb fallback     12.225 (± 0.0%) i/s -     62.000 in   5.071884s

Comparison:
                 erb:       18.7 i/s
             current:       12.7 i/s - 1.47x  slower
        erb fallback:       12.2 i/s - 1.53x  slower
                to_s:       12.0 i/s - 1.56x  slower

This previously worked but unintentionally changed in rack#2099
This previously called `to_s` which made this accept almost anything (but especially `nil`)
@Earlopain Earlopain force-pushed the html-escape-non-strings branch from 2b2b6e7 to 9a3bcfc Compare June 11, 2024 15:44
@ioquatix ioquatix requested a review from jeremyevans June 11, 2024 15:52
Copy link
Copy Markdown
Contributor

@jeremyevans jeremyevans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One difference between ERB::Escape#html_escape and CGI::Escape#escapeHTML is that if no changes are needed, ERB::Escape#html_escape does not allocate, it returns the argument. Are we OK with that change in behavior?

@ioquatix
Copy link
Copy Markdown
Member

That seems reasonable to me.

@Earlopain
Copy link
Copy Markdown
Contributor Author

The only problem is that which implementation we choose now depends on load order. If ERB is not loaded before rack, and is loaded AFTER rack, the behaviour may change. That doesn't bother me but maybe worth documenting it?

erb is required at the top of the file so if the required method is available (i.e. recent version of ruby, or erb in the gemfile) it should be deterministic. I don't think the situation you are describing is possible, or am I missing something here?

@ioquatix
Copy link
Copy Markdown
Member

Sorry, I deleted my comment, it was wrong. Yes, you are correct.

@jeremyevans jeremyevans merged commit d40896d into rack:main Jun 12, 2024
@Earlopain Earlopain deleted the html-escape-non-strings branch June 12, 2024 05:01
@ioquatix ioquatix added this to the v3.1.3 milestone Jun 12, 2024
@Earlopain
Copy link
Copy Markdown
Contributor Author

One thing that is additionally nice here is that this function is now again documented. https://www.rubydoc.info/gems/rack/3.1.3/Rack%2FUtils:escape_html

It didn't cope with define_method. I'm sure there would have been annotations to make that work though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants