Skip to content

BUG: IntervalIndex.unique() only contains the first interval if all interval borders are negative #61920

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

khemkaran10
Copy link
Contributor

Fixes: #61917
Before FIx ❌:

idx_neg = pd.IntervalIndex.from_tuples([(-4, -3), (-4, -3), (-3, -2), (-3, -2), (-2, -1), (-2, -1)])
print(idx_neg.unique())

# Output:
# IntervalIndex([(-4, -3]], dtype='interval[int64, right]')

After Fix ✅:

idx_neg = pd.IntervalIndex.from_tuples([(-4, -3), (-4, -3), (-3, -2), (-3, -2), (-2, -1), (-2, -1)])
print(idx_neg.unique())

# output:
# IntervalIndex([(-4, -3], (-3, -2], (-2, -1]], dtype='interval[int64, right]')

@simonjayhawkins simonjayhawkins added Bug Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Interval Interval data type labels Jul 22, 2025
@simonjayhawkins simonjayhawkins changed the title FIX BUG: IntervalIndex.unique() only contains the first interval if all interval borders are negative BUG: IntervalIndex.unique() only contains the first interval if all interval borders are negative Jul 22, 2025
@il1sf4
Copy link

il1sf4 commented Jul 31, 2025

Hello @khemkaran10, I have tried your fix and I have the following issue. When i run with this branch this code:
import pandas as pd idx_neg = pd.IntervalIndex.from_tuples([(-4, -3), (-4, -3), (-3, -2), (-3, -2), (-2, -1), (-2, -1)]) print(idx_neg.unique())
I get from the last line: E ValueError: left side of interval must be <= right side
I have tried to debug the problem and I found that:
unique() from interval.py calls at the return self._from_combined(nc). the _from_combined() method from interval.py sets nc to nc = combined.view("i8").reshape(-1, 2). Here is the problem, because:
array([[-4.-3.j],
[-3.-2.j],
[-2.-1.j]])
gets transformed to:
array([[-4607182418800017408, -4609434218613702656],
[-4609434218613702656, -4611686018427387904],
[-4611686018427387904, -4616189618054758400]])
and then the method delivers an invalid Intervalindex, where left side is greater then the right side.

Gould you try to run the code again and assure, that idx_neg.unique() really returns "IntervalIndex([(-4, -3], (-3, -2], (-2, -1]], dtype='interval[int64, right]')"? In my tests the return is "IntervalIndex([([-4607182418800017408, -4609434218613702656], (-4609434218613702656, -4611686018427387904], (-4611686018427387904, -4616189618054758400]], dtype='interval[int64, right]')" which triggers the error: "E ValueError: left side of interval must be <= right side"

@khemkaran10
Copy link
Contributor Author

@il1sf4 Thanks for pointing out. I have update the PR.

@@ -1985,6 +1985,9 @@ def _from_combined(self, combined: np.ndarray) -> IntervalArray:
)._from_sequence(nc[:, 1], dtype=dtype)
else:
assert isinstance(dtype, np.dtype)
nc = np.hstack(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the hstack really necessary here since the next 2 lines is just splitting them back up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @jbrockmendel , It's not required. I'll change this.

nc = unique(
# Using .view("complex128") with negatives causes issues.
# GH#61917
(np.array(self._combined[:, 0], dtype=complex))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is More Correct, should we just patch inside _combined directly? The only other place it is used is isin; will there be an analaguos but there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, agree we can add this logic directly in _combined.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Bug Interval Interval data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: IntervalIndex.unique() only contains the first interval if all interval borders are negative
5 participants