DOC: fix doctests for pandas/core/strings/accessor.py for new string dtype #61908

arthurlw · 2025-07-19T18:17:39Z

~~closes #xxxx (Replace xxxx with the GitHub issue number)~~
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
~~Added type annotations to new arguments/methods/functions.~~
~~Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.~~

arthurlw · 2025-07-19T18:24:14Z

I noticed that some of the docstrings highlight differences between returning NaN and False for boolean operations (e.g., here), which are now outdated. These should be updated, though it might be better to open a separate issue for that.

Other files may have similar cases, though I haven't done a full check yet.

jorisvandenbossche · 2025-07-19T21:12:06Z

I noticed that some of the docstrings highlight differences between returning NaN and False for boolean operations (e.g., here), which are now outdated. These should be updated, though it might be better to open a separate issue for that.

Good catch. That was an intentional change, see #54805 / #59616, so it is fine to update the docstrings here while updating them to use the string dtype.

jorisvandenbossche · 2025-07-19T21:13:44Z

pandas/core/strings/accessor.py

        >>> ind = pd.Index(["Mouse", "dog", "house and parrot", "23.0", np.nan])
        >>> ind.str.contains("23", regex=False)
-        Index([False, False, False, True, nan], dtype='object')
+        array([False, False, False,  True, False])


The fact that it changes here from Index to array does not seem to be intentional, though. Will look into that.

Apparently all Index operations that return a boolean result (eg also pd.Index([1, 2, 3]) == 2) uses a numpy bool array, not an Index object. So this change is then "expected" given that we decided to return a bool dtype instead of the original object dtype from this operation (because the NaN now propagates as False).

Just curious, why did we decide to propagate NaN as False in these boolean operations?

See the links I mentioned above in #61908 (comment).
The main reason is that the current "object-dtype with NaN" is not that useful in practice. For example, it means that boolean filtering like ser[ser.str.contains("B")] only works as long as ser does not contain missing values

jorisvandenbossche · 2025-07-19T21:14:58Z

pandas/core/strings/accessor.py

+        4    False
+        dtype: bool

        Specifying `na` to be `False` instead of `NaN` replaces NaN values


I realized that this example is now also a bit outdated (we could still show it when starting with object dtype, or show it with filling with another value), but that is also fine to keep for another PR

jorisvandenbossche · 2025-07-19T21:16:34Z

pandas/core/strings/accessor.py

    >>> s3.str.isdigit()
    0     True
-    1     True
+    1    False


This is a behaviour change we should actually fix, see #61466

jorisvandenbossche · 2025-07-19T21:16:59Z

@arthurlw thanks for the PR!

jorisvandenbossche · 2025-07-25T13:28:55Z

Going to merge this, so we can enable the doctests again. ~~Will open an issue for the remaining follow-up task~~ (not actually an issue, see inline comment above)

jorisvandenbossche · 2025-07-25T13:29:11Z

Thanks @arthurlw!

…dtype (pandas-dev#61908)

Fix doctest errors related to new string repr

1613ce9

arthurlw added the Docs label Jul 19, 2025

jorisvandenbossche reviewed Jul 19, 2025

View reviewed changes

jorisvandenbossche added this to the 3.0 milestone Jul 20, 2025

jorisvandenbossche merged commit e954f19 into pandas-dev:main Jul 25, 2025
47 checks passed

eicchen pushed a commit to eicchen/pandas that referenced this pull request Aug 19, 2025

DOC: fix doctests for pandas/core/strings/accessor.py for new string …

b346c94

…dtype (pandas-dev#61908)

eicchen pushed a commit to eicchen/pandas that referenced this pull request Oct 18, 2025

DOC: fix doctests for pandas/core/strings/accessor.py for new string …

cfa4952

…dtype (pandas-dev#61908)

Uh oh!

DOC: fix doctests for pandas/core/strings/accessor.py for new string dtype #61908

DOC: fix doctests for pandas/core/strings/accessor.py for new string dtype #61908

Uh oh!

Conversation

arthurlw commented Jul 19, 2025

Uh oh!

arthurlw commented Jul 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jorisvandenbossche commented Jul 19, 2025

Uh oh!

jorisvandenbossche Jul 19, 2025

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche Jul 26, 2025

Choose a reason for hiding this comment

Uh oh!

arthurlw Jul 26, 2025

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche Jul 26, 2025

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche Jul 19, 2025

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche Jul 26, 2025

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche Jul 19, 2025

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche commented Jul 19, 2025

Uh oh!

jorisvandenbossche commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

jorisvandenbossche commented Jul 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

arthurlw commented Jul 19, 2025 •

edited

Loading

jorisvandenbossche commented Jul 25, 2025 •

edited

Loading