Skip to content

BUG: Fix weak hash function in np.isin().#30840

Merged
ngoldbaum merged 1 commit intonumpy:mainfrom
hawkinsp:hash
Feb 16, 2026
Merged

BUG: Fix weak hash function in np.isin().#30840
ngoldbaum merged 1 commit intonumpy:mainfrom
hawkinsp:hash

Conversation

@hawkinsp
Copy link
Copy Markdown
Contributor

In a build of NumPy against libcxx, we found the following code regressed from almost instantaneous with NumPy 2.3 to over 40s with NumPy 2.4.

import numpy as np

full_list = np.array("2015-12-01T00:00:00.000000000", 'datetime64[ns]') + np.arange(162143) * np.timedelta64(4, 'h').astype('timedelta64[ns]')
sampled_dates = full_list[10:28]
np.isin(sampled_dates, full_list)

Our belief is the following:

  • std::unordered_set in libcxx uses power of two hash buckets.
  • std::hash is the identity function.
  • in this particular example, we are hashing integers (datetime64 values) separated by multiples of a power of two.
  • the net result is that all of the integers end up in the same hash bucket.

We can make the code more robust simply by using the same npy_fnv1a hash used elsewhere in the same file since it will do a better job of distributing hash bits.

In a build of NumPy against libcxx, we found the following code
regressed from almost instantaneous with NumPy 2.3 to over 40s with
NumPy 2.4.

```
import numpy as np

full_list = np.array("2015-12-01T00:00:00.000000000", 'datetime64[ns]') + np.arange(162143) * np.timedelta64(4, 'h').astype('timedelta64[ns]')
sampled_dates = full_list[10:28]
np.isin(sampled_dates, full_list)
```

Our belief is the following:
* std::unordered_set in libcxx uses power of two hash buckets.
* std::hash is the identity function.
* in this particular example, we are hashing integers (datetime64
  values) separated by multiples of a power of two.
* the net result is that all of the integers end up in the same hash
  bucket.

We can make the code more robust simply by using the same npy_fnv1a hash
used elsewhere in the same file since it will do a better job of
distributing hash bits.
@hawkinsp hawkinsp changed the title BUG: Fix weak hash function in np.unique(). BUG: Fix weak hash function in np.isin(). Feb 16, 2026
@hawkinsp
Copy link
Copy Markdown
Contributor Author

@math-hiyoko

@math-hiyoko
Copy link
Copy Markdown
Contributor

Nice catch! Using npy_fnv1a seems like a solid fix.

@ngoldbaum
Copy link
Copy Markdown
Member

We can make the code more robust simply by using the same npy_fnv1a hash used elsewhere in the same file since it will do a better job of distributing hash bits.

Nice catch! Using npy_fnv1a seems like a solid fix.

Agreed, thanks for the detailed analysis and fix. I think it was probably an oversight not to use fnv1a in the first place.

@ngoldbaum ngoldbaum added the 09 - Backport-Candidate PRs tagged should be backported label Feb 16, 2026
@ngoldbaum ngoldbaum added this to the 2.4.3 release milestone Feb 16, 2026
@ngoldbaum ngoldbaum merged commit 552c46f into numpy:main Feb 16, 2026
77 checks passed
charris pushed a commit to charris/numpy that referenced this pull request Feb 19, 2026
@charris charris removed the 09 - Backport-Candidate PRs tagged should be backported label Feb 19, 2026
@charris charris removed this from the 2.4.3 release milestone Feb 19, 2026
charris added a commit that referenced this pull request Feb 19, 2026
BUG: Fix weak hash function in np.isin(). (#30840)
sabasiddique1 pushed a commit to sabasiddique1/numpy that referenced this pull request Mar 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants