Skip to content

PERFORMANCE[30%+ improvement!]: filter_func can optionally take an array of events to make batched filters much faster#8428

Closed
original-brownbear wants to merge 1 commit intoelastic:masterfrom
original-brownbear:batch-on-filters-correctly
Closed

PERFORMANCE[30%+ improvement!]: filter_func can optionally take an array of events to make batched filters much faster#8428
original-brownbear wants to merge 1 commit intoelastic:masterfrom
original-brownbear:batch-on-filters-correctly

Conversation

@original-brownbear
Copy link
Copy Markdown
Contributor

@original-brownbear original-brownbear commented Oct 3, 2017

This one is kinda frustrating when seen in relation to #8357 :( but a huge and trivial performance gain :)
Just thought I'd level the playing field a little when comparing Java and Ruby exec and well this is the result.

Simply fixing the batched filter execution in the Ruby execution gets us within to about 90% (I can add more scientific numbers tomorrow or so, but it's basically this for the Apache example: master 35k/s, this 50k/s and the full Java exec 56k/s ) of the throughput of the pure java exec in #8357.
For baseline (and simple filters like only UA filter) this version wins over master and the Java exec by a wide margin. Here it's master and Java exec both at ~300k/s and this one at ~380k/s.

To better understand this one, take a look at a compiled filter_func:

example:

  define_singleton_method :filter_func do |event|
    events = [event]
    @logger.debug? && @logger.debug("filter received", "event" => event.to_hash)
              events = @generated_objects[:filter_dlq_commit_2].multi_filter(events)

    events
  end

=> there is no point turning the events into arrays one by one if we can simply turn the batch into an array and input that.
=> trivial fix by allowing Array to be passed through filter_func and adding the to_a method from #8357 to the batch implementations

@original-brownbear
Copy link
Copy Markdown
Contributor Author

@andrewvc @ph @jordansissel FYI not sure if I should be happy or sad here after all the work that went into Java exec ... but this looks pretty safe to merge to give 6.1 a massive boost imo ... obviously way diminishing people's incentive to try out the Java exec ...

@original-brownbear
Copy link
Copy Markdown
Contributor Author

Also, this version beats the Java exec by quite a margin in baseline (17% faster) stdin->stdout or just a single fast filter like the UA (10% faster) one in place :(

@original-brownbear original-brownbear changed the title PERFORMANCE: filter_func can optionally take an array of events to make batched filters much faster PERFORMANCE[30%+ improvement!]: filter_func can optionally take an array of events to make batched filters much faster Oct 4, 2017
@original-brownbear original-brownbear force-pushed the batch-on-filters-correctly branch from c3eab94 to 9626c4e Compare October 4, 2017 10:13
@ph
Copy link
Copy Markdown
Contributor

ph commented Oct 5, 2017

Woa, good catch! This change is low risk we could release it in 6.0.x and event 5.6.x if we ever do another release. I am doing some testing, will report soon.

@colinsurprenant
Copy link
Copy Markdown
Contributor

oh sweet <3 🥇
also doing testing.

@ph
Copy link
Copy Markdown
Contributor

ph commented Oct 5, 2017

obviously way diminishing people's incentive to try out the Java exec

Concerning the java execution, we will make it faster eventually, that should not block us from merging it with a feature flag.

@original-brownbear
Copy link
Copy Markdown
Contributor Author

original-brownbear commented Oct 5, 2017

@ph "should" or "should not"? :D

@ph
Copy link
Copy Markdown
Contributor

ph commented Oct 5, 2017

should not :)

I did some testing with a really simple pipeline configuration

filter {
    mutate { uppercase => [ "message" ] }
}

I got a 7% improvements with the code changes, but as soon as you add more plugins to the filter stage, I get the 30% speed improvements.

filter {
    mutate { uppercase => [ "message" ] }
    mutate { lowercase => [ "message" ] }
    mutate { uppercase => [ "message" ] }
    mutate { lowercase => [ "message" ] }
}

Copy link
Copy Markdown
Contributor

@ph ph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, This can go in multiple branch, master, 6.x 6.0 and even 5.6. I don't see any reason not to do it.

@original-brownbear original-brownbear removed the request for review from jordansissel October 5, 2017 18:53
@original-brownbear
Copy link
Copy Markdown
Contributor Author

@ph alrighty thanks, merging :)

@elasticsearch-bot
Copy link
Copy Markdown

Armin Braun merged this into the following branches!

Branch Commits
6.x 179eaff
master 116ed1d

elasticsearch-bot pushed a commit that referenced this pull request Oct 5, 2017
@original-brownbear
Copy link
Copy Markdown
Contributor Author

needs manual backport to 6.0 and 5.6, will do later

@jordansissel
Copy link
Copy Markdown
Contributor

Thank you for working on this :)

@original-brownbear
Copy link
Copy Markdown
Contributor Author

np :)

elasticsearch-bot pushed a commit that referenced this pull request Oct 6, 2017
original-brownbear added a commit to original-brownbear/logstash that referenced this pull request Oct 6, 2017
elasticsearch-bot pushed a commit that referenced this pull request Oct 6, 2017
@original-brownbear original-brownbear deleted the batch-on-filters-correctly branch March 8, 2018 09:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants