Skip to content

Short-circuit the check-hooks-apply hook #2935

@mxr

Description

@mxr

search you tried in the issue tracker

check-hooks-apply

describe your actual problem

I was looking at the implementation of check_hooks_apply and noticed that while the filenames_for_hook function returns multiple matching files, the caller is only interested in whether the tuple is empty or not. I was curious if short-circuiting would have any benefit:

I prototyped a function Classifier.any_filenames_for_hook which short-circuits:

	def any_filenames_for_hook(self, hook: Hook) -> bool:
	    include_re, exclude_re = re.compile(hook.files), re.compile(hook.exclude)
	    types, types_or, exclude_types = (
	        frozenset(hook.types),
	        frozenset(hook.types_or),
	        frozenset(hook.exclude_types),
	    )
	
	    for i, name in enumerate(self.filenames):
	        if include_re.search(name) and not exclude_re.search(name):
	            return True
	
	        tags = self._types_for_file(name)
	        if (
	            tags >= types
	            and (not types or tags & types_or)
	            and not tags & exclude_types
	        ):
	            return True
	
	    return False
     for hook in all_hooks(config, Store()):
         if hook.always_run or hook.language == 'fail':
             continue
-        elif not classifier.filenames_for_hook(hook):
+        elif not classifier.any_filenames_for_hook(hook):
             print(f'{hook.id} does not apply to this repository')
             retv = 1

Just to try it out, I applied this change in my pre-commit install directory and profiled with hyperfine on a local repo. The repo has about 12000 files. The short-circuit version ran about 20% faster. I also added logging and saw the short-circuit version cut out after ~26 files, whereas the existing version looks at all 12000.

Old

$ hyperfine 'pre-commit run check-hooks-apply --all-files'
Benchmark 1: pre-commit run check-hooks-apply --all-files
  Time (mean ± σ):      1.253 s ±  0.066 s    [User: 0.546 s, System: 0.695 s]
  Range (min … max):    1.193 s …  1.419 s    10 runs

New

$ hyperfine 'pre-commit run check-hooks-apply --all-files'
Benchmark 1: pre-commit run check-hooks-apply --all-files
  Time (mean ± σ):     969.9 ms ±  65.6 ms    [User: 429.0 ms, System: 535.0 ms]
  Range (min … max):   873.1 ms … 1067.6 ms    10 runs

pre-commit --version

3.2.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions