Skip to content

Tablets of brand bq: regexp causes Catastrophic backtracking. #605

@pantaluna

Description

@pantaluna

Hi,
I would like to report this specific issue.

The following regexp for detecting the tablets of brand "bq" results consistently in a Catastrophic Backtracking problem when feeding it the User Agent of a well-known Google Smartphone Bot (the one for the Nexus 5X; other bots exist as well).

Catastrophic Backtracking means that the regexp library must perform a massive amount of checks (hundreds of thousands or millions) for this regexp + user-agent combination, and it is typically something to be avoided when creating regexp's. This means it does use a massive amount of CPU% - and this is the reason why I started this lengthy investigation.

The libpcre3 detects these situations and aborts when so (according to their configurable thresholds).

Regexp: (?i)Android.*(bq)?.*(Elcano|Curie|Edison|Maxwell|Kepler|Pascal|Tesla|Hypatia|Platon|Newton|Livingstone|Cervantes|Avant|Aquaris [E|M]10)|Maxwell.*Lite|Maxwell.*Plus
User Agent: Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

You can see for yourself how the regexp is executed at https://regex101.com/r/rC3MDK/1 Note the error message at the top "catastrophic backtracking".

I'm using Mobile Detect within a Varnish Cache V4.1 configuration to detect smartphones and tablets under the OS Ubuntu 14 LTS. The Varnish Cache product uses the latest version of libpcre3 (libpcre3:amd64/trusty 2:8.39-1+deb.sury.org~trusty+1). The product returns consistently the Varnish VCL_Error "Regexp matching returned -8" for this regexp + user-agent combination, once the number of checks has skyrocketed (and the CPU% went through the roof) and then it skips that line of code.

What I did to "fix it" is that I modified my transformation script which takes your https://raw.github.com/serbanghita/Mobile-Detect/master/Mobile_Detect.json and excluded the regexp for the "bq" tablets before converting it into a Varnish configuration script, and now everything is fine.

I would advice that this regexp will be improved (or removed) within Mobile_Detect as well. I cannot help you with improving the regexp because I do not know which user-agents it is supposed to match. I gave it a try and found some possible solutions but I do not know if they are 100% correct; you could change for example the capturing group "(bq)?" into a non-capturing group "(bq)"

Thanks for your time.

Metadata

Metadata

Assignees

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions