Skip to content

Thread safety of require and implications on stdlib string.normalize() #9231

@mrckzgl

Description

@mrckzgl

Consider the following code:

value = value.normalize(:nfc)

which runs parallel in multiple threads. With JRuby 9.4.10.0 we just sometimes get this error:

/usr/share/rvm/rubies/jruby-9.4.10.0/lib/ruby/stdlib/unicode_normalize.rb:41:in `unicode_normalize': undefined method `normalize' for UnicodeNormalize:Module (NoMethodError)

presumabely on the first (or second see below) invocation of that code. Later invocations run without the error. It is hard to debug as it is purely undeterministic. Source of unicode_normalize.rb looks like this:

  def unicode_normalize(form = :nfc)
    require 'unicode_normalize/normalize.rb' unless defined? UnicodeNormalize
    ## The following line can be uncommented to avoid repeated checking for
    ## UnicodeNormalize. However, tests didn't show any noticeable speedup
    ## when doing this. This comment also applies to the commented out lines
    ## in String#unicode_normalize! and String#unicode_normalized?.
    # String.send(:define_method, :unicode_normalize, ->(form = :nfc) { UnicodeNormalize.normalize(self, form) } )
    UnicodeNormalize.normalize(self, form)
  end

It took me a while to get following hypothesis. The wiki states:

You should take care in the following situations: concurrent requires, or lazy requires that may happen at runtime in a parallel thread. If objects are in flight while classes are being modified, or constants/globals/class variables are set before their referrents are completely initialized, other threads could have problems.

Is it possible that this kicks in here? First invocation of normalize on first thread does require. Second thread does that in parallel, but sees UnicodeNormalize to be defined, but not all its instance methods being loaded as first thread is still inside the require call? Is it possible that this is the source of the error message?

If so, I wonder if the standard lib should be modified to not do the lazy require? It is one thing to check your own code for thread safety issues, but I would not ever have consider this kind of concurrency problem in a standard library string method.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions