Merge branch 'main' into issue_61863

khemkaran10 · web-flow · commit e0f1f376b538 · 2025-07-17T11:06:22.000+05:30
diff --git a/doc/source/user_guide/io.rst b/doc/source/user_guide/io.rst
@@ -5228,33 +5228,32 @@ languages easy. Parquet can use a variety of compression techniques to shrink th
 while still maintaining good read performance.
 
 Parquet is designed to faithfully serialize and de-serialize ``DataFrame`` s, supporting all of the pandas
-dtypes, including extension dtypes such as datetime with tz.
+dtypes, including extension dtypes such as datetime with timezone.
 
 Several caveats.
 
 * Duplicate column names and non-string columns names are not supported.
-* The ``pyarrow`` engine always writes the index to the output, but ``fastparquet`` only writes non-default
-  indexes. This extra column can cause problems for non-pandas consumers that are not expecting it. You can
-  force including or omitting indexes with the ``index`` argument, regardless of the underlying engine.
+* The DataFrame index is written as separate column(s) when it is a non-default range index.
+  This extra column can cause problems for non-pandas consumers that are not expecting it. You can
+  force including or omitting indexes with the ``index`` argument.
 * Index level names, if specified, must be strings.
 * In the ``pyarrow`` engine, categorical dtypes for non-string types can be serialized to parquet, but will de-serialize as their primitive dtype.
-* The ``pyarrow`` engine preserves the ``ordered`` flag of categorical dtypes with string types. ``fastparquet`` does not preserve the ``ordered`` flag.
-* Non supported types include ``Interval`` and actual Python object types. These will raise a helpful error message
-  on an attempt at serialization. ``Period`` type is supported with pyarrow >= 0.16.0.
+* The ``pyarrow`` engine supports the ``Period`` and ``Interval`` dtypes. ``fastparquet`` does not support those.
+* Non supported types include actual Python object types. These will raise a helpful error message
+  on an attempt at serialization.
 * The ``pyarrow`` engine preserves extension data types such as the nullable integer and string data
-  type (requiring pyarrow >= 0.16.0, and requiring the extension type to implement the needed protocols,
+  type (this can also work for external extension types, requiring the extension type to implement the needed protocols,
   see the :ref:`extension types documentation <extending.extension.arrow>`).
 
 You can specify an ``engine`` to direct the serialization. This can be one of ``pyarrow``, or ``fastparquet``, or ``auto``.
 If the engine is NOT specified, then the ``pd.options.io.parquet.engine`` option is checked; if this is also ``auto``,
-then ``pyarrow`` is tried, and falling back to ``fastparquet``.
+then ``pyarrow`` is used when installed, and falling back to ``fastparquet``.
 
 See the documentation for `pyarrow <https://arrow.apache.org/docs/python/>`__ and `fastparquet <https://fastparquet.readthedocs.io/en/latest/>`__.
 
 .. note::
 
-   These engines are very similar and should read/write nearly identical parquet format files.
-   ``pyarrow>=8.0.0`` supports timedelta data, ``fastparquet>=0.1.4`` supports timezone aware datetimes.
+   These engines are very similar and should read/write nearly identical parquet format files for most cases.
    These libraries differ by having different underlying dependencies (``fastparquet`` by using ``numba``, while ``pyarrow`` uses a c-library).
 
 .. ipython:: python
@@ -5280,24 +5279,21 @@ Write to a parquet file.
 
 .. ipython:: python
 
-   df.to_parquet("example_pa.parquet", engine="pyarrow")
-   df.to_parquet("example_fp.parquet", engine="fastparquet")
+   # specify engine="pyarrow" or engine="fastparquet" to use a specific engine
+   df.to_parquet("example.parquet")
 
 Read from a parquet file.
 
 .. ipython:: python
 
-   result = pd.read_parquet("example_fp.parquet", engine="fastparquet")
-   result = pd.read_parquet("example_pa.parquet", engine="pyarrow")
-
+   result = pd.read_parquet("example.parquet")
    result.dtypes
 
 By setting the ``dtype_backend`` argument you can control the default dtypes used for the resulting DataFrame.
 
 .. ipython:: python
 
-   result = pd.read_parquet("example_pa.parquet", engine="pyarrow", dtype_backend="pyarrow")
-
+   result = pd.read_parquet("example.parquet", dtype_backend="pyarrow")
    result.dtypes
 
 .. note::
@@ -5309,41 +5305,36 @@ Read only certain columns of a parquet file.
 
 .. ipython:: python
 
-   result = pd.read_parquet(
-       "example_fp.parquet",
-       engine="fastparquet",
-       columns=["a", "b"],
-   )
-   result = pd.read_parquet(
-       "example_pa.parquet",
-       engine="pyarrow",
-       columns=["a", "b"],
-   )
+   result = pd.read_parquet("example.parquet", columns=["a", "b"])
    result.dtypes
 
 
 .. ipython:: python
    :suppress:
 
-   os.remove("example_pa.parquet")
-   os.remove("example_fp.parquet")
+   os.remove("example.parquet")
 
 
 Handling indexes
 ''''''''''''''''
 
 Serializing a ``DataFrame`` to parquet may include the implicit index as one or
-more columns in the output file. Thus, this code:
+more columns in the output file. For example, this code:
 
 .. ipython:: python
 
-    df = pd.DataFrame({"a": [1, 2], "b": [3, 4]})
+    df = pd.DataFrame({"a": [1, 2], "b": [3, 4]}, index=[1, 2])
     df.to_parquet("test.parquet", engine="pyarrow")
 
-creates a parquet file with *three* columns if you use ``pyarrow`` for serialization:
-``a``, ``b``, and ``__index_level_0__``. If you're using ``fastparquet``, the
-index `may or may not <https://fastparquet.readthedocs.io/en/latest/api.html#fastparquet.write>`_
-be written to the file.
+creates a parquet file with *three* columns (``a``, ``b``, and
+``__index_level_0__`` when using the ``pyarrow`` engine, or ``index``, ``a``,
+and ``b`` when using the ``fastparquet`` engine) because the index in this case
+is not a default range index. In general, the index *may or may not* be written
+to the file (see the
+`preserve_index keyword for pyarrow <https://arrow.apache.org/docs/python/pandas.html#handling-pandas-indexes>`__
+or the
+`write_index keyword for fastparquet <https://fastparquet.readthedocs.io/en/latest/api.html#fastparquet.write>`__
+to check the default behaviour).
 
 This unexpected extra column causes some databases like Amazon Redshift to reject
 the file, because that column doesn't exist in the target table.
diff --git a/doc/source/whatsnew/v3.0.0.rst b/doc/source/whatsnew/v3.0.0.rst
@@ -414,6 +414,7 @@ Other API changes
 - Index set operations (like union or intersection) will now ignore the dtype of
   an empty ``RangeIndex`` or empty ``Index`` with object dtype when determining
   the dtype of the resulting Index (:issue:`60797`)
+- Comparison operations between :class:`Index` and :class:`Series` now consistently return :class:`Series` regardless of which object is on the left or right (:issue:`36759`)
 - Numpy functions like ``np.isinf`` that return a bool dtype when called on a :class:`Index` object now return a bool-dtype :class:`Index` instead of ``np.ndarray`` (:issue:`52676`)
 
 .. ---------------------------------------------------------------------------
@@ -717,6 +718,7 @@ Datetimelike
 Timedelta
 ^^^^^^^^^
 - Accuracy improvement in :meth:`Timedelta.to_pytimedelta` to round microseconds consistently for large nanosecond based Timedelta (:issue:`57841`)
+- Bug in :class:`Timedelta` constructor failing to raise when passed an invalid keyword (:issue:`53801`)
 - Bug in :meth:`DataFrame.cumsum` which was raising ``IndexError`` if dtype is ``timedelta64[ns]`` (:issue:`57956`)
 
 Timezones
diff --git a/pandas/_libs/tslibs/timedeltas.pyx b/pandas/_libs/tslibs/timedeltas.pyx
@@ -2006,6 +2006,20 @@ class Timedelta(_Timedelta):
                            "milliseconds", "microseconds", "nanoseconds"}
 
     def __new__(cls, object value=_no_input, unit=None, **kwargs):
+        unsupported_kwargs = set(kwargs)
+        unsupported_kwargs.difference_update(cls._req_any_kwargs_new)
+        if unsupported_kwargs or (
+            value is _no_input and
+            not cls._req_any_kwargs_new.intersection(kwargs)
+        ):
+            raise ValueError(
+                # GH#53801
+                "cannot construct a Timedelta from the passed arguments, "
+                "allowed keywords are "
+                "[weeks, days, hours, minutes, seconds, "
+                "milliseconds, microseconds, nanoseconds]"
+            )
+
         if value is _no_input:
             if not len(kwargs):
                 raise ValueError("cannot construct a Timedelta without a "
@@ -2014,16 +2028,6 @@ class Timedelta(_Timedelta):
 
             kwargs = {key: _to_py_int_float(kwargs[key]) for key in kwargs}
 
-            unsupported_kwargs = set(kwargs)
-            unsupported_kwargs.difference_update(cls._req_any_kwargs_new)
-            if unsupported_kwargs or not cls._req_any_kwargs_new.intersection(kwargs):
-                raise ValueError(
-                    "cannot construct a Timedelta from the passed arguments, "
-                    "allowed keywords are "
-                    "[weeks, days, hours, minutes, seconds, "
-                    "milliseconds, microseconds, nanoseconds]"
-                )
-
             # GH43764, convert any input to nanoseconds first and then
             # create the timedelta. This ensures that any potential
             # nanosecond contributions from kwargs parsed as floats
diff --git a/pandas/core/arrays/datetimelike.py b/pandas/core/arrays/datetimelike.py
@@ -1486,7 +1486,8 @@ def __rsub__(self, other):
             # GH#19959 datetime - datetime is well-defined as timedelta,
             # but any other type - datetime is not well-defined.
             raise TypeError(
-                f"cannot subtract {type(self).__name__} from {type(other).__name__}"
+                f"cannot subtract {type(self).__name__} from "
+                f"{type(other).__name__}[{other.dtype}]"
             )
         elif isinstance(self.dtype, PeriodDtype) and lib.is_np_dtype(other_dtype, "m"):
             # TODO: Can we simplify/generalize these cases at all?
@@ -1495,8 +1496,14 @@ def __rsub__(self, other):
             self = cast("TimedeltaArray", self)
             return (-self) + other
 
+        flipped = self - other
+        if flipped.dtype.kind == "M":
+            # GH#59571 give a more helpful exception message
+            raise TypeError(
+                f"cannot subtract {type(self).__name__} from {type(other).__name__}"
+            )
         # We get here with e.g. datetime objects
-        return -(self - other)
+        return -flipped
 
     def __iadd__(self, other) -> Self:
         result = self + other
diff --git a/pandas/core/ops/common.py b/pandas/core/ops/common.py
@@ -56,20 +56,14 @@ def _unpack_zerodim_and_defer(method: F, name: str) -> F:
     -------
     method
     """
-    stripped_name = name.removeprefix("__").removesuffix("__")
-    is_cmp = stripped_name in {"eq", "ne", "lt", "le", "gt", "ge"}
 
     @wraps(method)
     def new_method(self, other):
-        if is_cmp and isinstance(self, ABCIndex) and isinstance(other, ABCSeries):
-            # For comparison ops, Index does *not* defer to Series
-            pass
-        else:
-            prio = getattr(other, "__pandas_priority__", None)
-            if prio is not None:
-                if prio > self.__pandas_priority__:
-                    # e.g. other is DataFrame while self is Index/Series/EA
-                    return NotImplemented
+        prio = getattr(other, "__pandas_priority__", None)
+        if prio is not None:
+            if prio > self.__pandas_priority__:
+                # e.g. other is DataFrame while self is Index/Series/EA
+                return NotImplemented
 
         other = item_from_zerodim(other)
 
diff --git a/pandas/tests/arithmetic/common.py b/pandas/tests/arithmetic/common.py
@@ -111,24 +111,19 @@ def xbox2(x):
             return x.astype(bool)
         return x
 
-    # rev_box: box to use for reversed comparisons
-    rev_box = xbox
-    if isinstance(right, Index) and isinstance(left, Series):
-        rev_box = np.array
-
     result = xbox2(left == right)
     expected = xbox(np.zeros(result.shape, dtype=np.bool_))
 
     tm.assert_equal(result, expected)
 
     result = xbox2(right == left)
-    tm.assert_equal(result, rev_box(expected))
+    tm.assert_equal(result, xbox(expected))
 
     result = xbox2(left != right)
     tm.assert_equal(result, ~expected)
 
     result = xbox2(right != left)
-    tm.assert_equal(result, rev_box(~expected))
+    tm.assert_equal(result, xbox(~expected))
 
     msg = "|".join(
         [
diff --git a/pandas/tests/arithmetic/test_datetime64.py b/pandas/tests/arithmetic/test_datetime64.py
@@ -770,11 +770,18 @@ def test_dti_cmp_tdi_tzawareness(self, other):
 
         result = dti == other
         expected = np.array([False] * 10)
-        tm.assert_numpy_array_equal(result, expected)
+        if isinstance(other, Series):
+            tm.assert_series_equal(result, Series(expected, index=other.index))
+        else:
+            tm.assert_numpy_array_equal(result, expected)
 
         result = dti != other
         expected = np.array([True] * 10)
-        tm.assert_numpy_array_equal(result, expected)
+        if isinstance(other, Series):
+            tm.assert_series_equal(result, Series(expected, index=other.index))
+        else:
+            tm.assert_numpy_array_equal(result, expected)
+
         msg = "Invalid comparison between"
         with pytest.raises(TypeError, match=msg):
             dti < other
@@ -955,7 +962,12 @@ def test_dt64arr_add_sub_td64ndarray(self, tz_naive_fixture, box_with_array):
 
         result = dtarr - tdarr
         tm.assert_equal(result, expected)
-        msg = "cannot subtract|(bad|unsupported) operand type for unary"
+        msg = "|".join(
+            [
+                "cannot subtract DatetimeArray from ndarray",
+                "cannot subtract a datelike from a TimedeltaArray",
+            ]
+        )
         with pytest.raises(TypeError, match=msg):
             tdarr - dtarr
 
@@ -1272,7 +1284,7 @@ def test_dt64arr_series_sub_tick_DateOffset(self, box_with_array):
 
         result2 = -pd.offsets.Second(5) + ser
         tm.assert_equal(result2, expected)
-        msg = "(bad|unsupported) operand type for unary"
+        msg = "cannot subtract DatetimeArray from Second"
         with pytest.raises(TypeError, match=msg):
             pd.offsets.Second(5) - ser
 
@@ -1317,9 +1329,7 @@ def test_dti_add_tick_tzaware(self, tz_aware_fixture, box_with_array):
             roundtrip = offset - scalar
             tm.assert_equal(roundtrip, dates)
 
-            msg = "|".join(
-                ["bad operand type for unary -", "cannot subtract DatetimeArray"]
-            )
+            msg = "cannot subtract DatetimeArray from"
             with pytest.raises(TypeError, match=msg):
                 scalar - dates
 
@@ -1378,7 +1388,7 @@ def test_dt64arr_add_sub_relativedelta_offsets(self, box_with_array, unit):
             expected = DatetimeIndex([x - off for x in vec_items]).as_unit(exp_unit)
             expected = tm.box_expected(expected, box_with_array)
             tm.assert_equal(expected, vec - off)
-            msg = "(bad|unsupported) operand type for unary"
+            msg = "cannot subtract DatetimeArray from"
             with pytest.raises(TypeError, match=msg):
                 off - vec
 
@@ -1494,7 +1504,7 @@ def test_dt64arr_add_sub_DateOffsets(
         expected = DatetimeIndex([offset + x for x in vec_items]).as_unit(unit)
         expected = tm.box_expected(expected, box_with_array)
         tm.assert_equal(expected, offset + vec)
-        msg = "(bad|unsupported) operand type for unary"
+        msg = "cannot subtract DatetimeArray from"
         with pytest.raises(TypeError, match=msg):
             offset - vec
 
@@ -1983,7 +1993,7 @@ def test_operators_datetimelike_with_timezones(self):
         result = dt1 - td1[0]
         exp = (dt1.dt.tz_localize(None) - td1[0]).dt.tz_localize(tz)
         tm.assert_series_equal(result, exp)
-        msg = "(bad|unsupported) operand type for unary"
+        msg = "cannot subtract DatetimeArray from"
         with pytest.raises(TypeError, match=msg):
             td1[0] - dt1
 
diff --git a/pandas/tests/arithmetic/test_timedelta64.py b/pandas/tests/arithmetic/test_timedelta64.py
@@ -320,7 +320,7 @@ def test_subtraction_ops(self):
         with pytest.raises(TypeError, match=msg):
             td - dt
 
-        msg = "(bad|unsupported) operand type for unary"
+        msg = "cannot subtract DatetimeArray from Timedelta"
         with pytest.raises(TypeError, match=msg):
             td - dti
 
diff --git a/pandas/tests/indexes/multi/test_equivalence.py b/pandas/tests/indexes/multi/test_equivalence.py
@@ -64,8 +64,8 @@ def test_equals_op(idx):
     with pytest.raises(ValueError, match="Lengths must match"):
         index_a == series_b
 
-    tm.assert_numpy_array_equal(index_a == series_a, expected1)
-    tm.assert_numpy_array_equal(index_a == series_c, expected2)
+    tm.assert_series_equal(index_a == series_a, Series(expected1))
+    tm.assert_series_equal(index_a == series_c, Series(expected2))
 
     # cases where length is 1 for one of them
     with pytest.raises(ValueError, match="Lengths must match"):
diff --git a/pandas/tests/indexes/test_old_base.py b/pandas/tests/indexes/test_old_base.py
@@ -560,8 +560,8 @@ def test_equals_op(self, simple_index):
         with pytest.raises(ValueError, match=msg):
             index_a == series_b
 
-        tm.assert_numpy_array_equal(index_a == series_a, expected1)
-        tm.assert_numpy_array_equal(index_a == series_c, expected2)
+        tm.assert_series_equal(index_a == series_a, Series(expected1))
+        tm.assert_series_equal(index_a == series_c, Series(expected2))
 
         # cases where length is 1 for one of them
         with pytest.raises(ValueError, match="Lengths must match"):
diff --git a/pandas/tests/tslibs/test_timedeltas.py b/pandas/tests/tslibs/test_timedeltas.py
@@ -104,6 +104,10 @@ def test_kwarg_assertion(kwargs):
     with pytest.raises(ValueError, match=re.escape(err_message)):
         Timedelta(**kwargs)
 
+    with pytest.raises(ValueError, match=re.escape(err_message)):
+        # GH#53801 'unit' misspelled as 'units'
+        Timedelta(1, units="hours")
+
 
 class TestArrayToTimedelta64:
     def test_array_to_timedelta64_string_with_unit_2d_raises(self):