Skip to content

BUG: Fix weak hash function in np.isin(). (#30840)#30850

Merged
charris merged 1 commit intonumpy:maintenance/2.4.xfrom
charris:backport-30840
Feb 19, 2026
Merged

BUG: Fix weak hash function in np.isin(). (#30840)#30850
charris merged 1 commit intonumpy:maintenance/2.4.xfrom
charris:backport-30840

Conversation

@charris
Copy link
Copy Markdown
Member

@charris charris commented Feb 19, 2026

Backport of #30840.

In a build of NumPy against libcxx, we found the following code regressed from almost instantaneous with NumPy 2.3 to over 40s with NumPy 2.4.

import numpy as np

full_list = np.array("2015-12-01T00:00:00.000000000", 'datetime64[ns]') + np.arange(162143) * np.timedelta64(4, 'h').astype('timedelta64[ns]')
sampled_dates = full_list[10:28]
np.isin(sampled_dates, full_list)

Our belief is the following:

  • std::unordered_set in libcxx uses power of two hash buckets.
  • std::hash is the identity function.
  • in this particular example, we are hashing integers (datetime64 values) separated by multiples of a power of two.
  • the net result is that all of the integers end up in the same hash bucket.

We can make the code more robust simply by using the same npy_fnv1a hash used elsewhere in the same file since it will do a better job of distributing hash bits.

@charris charris added this to the 2.4.3 release milestone Feb 19, 2026
@charris charris added 00 - Bug 08 - Backport Used to tag backport PRs labels Feb 19, 2026
@charris charris merged commit c5685a6 into numpy:maintenance/2.4.x Feb 19, 2026
74 checks passed
@charris charris deleted the backport-30840 branch February 19, 2026 02:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

00 - Bug 08 - Backport Used to tag backport PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants