@@ -188,6 +188,14 @@ let pandas do the inference. But if you want to be specific, you can specify the
188
188
This is actually compatible with pandas 2.x as well, since in pandas < 3,
189
189
``dtype="str" `` was essentially treated as an alias for object dtype.
190
190
191
+ .. attention ::
192
+
193
+ While using ``dtype="str" `` in constructors is compatible with pandas 2.x,
194
+ specifying it as the dtype in :meth: `~Series.astype ` runs into the issue
195
+ of also stringifying missing values in pandas 2.x. See the section
196
+ :ref: `string_migration_guide-astype_str ` for more details.
197
+
198
+
191
199
The missing value sentinel is now always NaN
192
200
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
193
201
@@ -310,52 +318,69 @@ case.
310
318
Notable bug fixes
311
319
~~~~~~~~~~~~~~~~~
312
320
321
+ .. _string_migration_guide-astype_str :
322
+
313
323
``astype(str) `` preserving missing values
314
324
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
315
325
316
- This is a long standing "bug" or misfeature, as discussed in https://github.com/pandas-dev/pandas/issues/25353.
326
+ The stringifying of missing values is a long standing "bug" or misfeature, as
327
+ discussed in https://github.com/pandas-dev/pandas/issues/25353, but fixing it
328
+ introduces a significant behaviour change.
317
329
318
- With pandas < 3, when using ``astype(str) `` (using the built-in :func: `str `, not
319
- ``astype("str") ``!), the operation would convert every element to a string,
320
- including the missing values:
330
+ With pandas < 3, when using ``astype(str) `` or ``astype("str") ``, the operation
331
+ would convert every element to a string, including the missing values:
321
332
322
333
.. code-block :: python
323
334
324
335
# OLD behavior in pandas < 3
325
- >> > ser = pd.Series([" a " , np.nan], dtype = object )
336
+ >> > ser = pd.Series([1.5 , np.nan])
326
337
>> > ser
327
- 0 a
338
+ 0 1.5
328
339
1 NaN
329
- dtype: object
330
- >> > ser.astype(str )
331
- 0 a
340
+ dtype: float64
341
+ >> > ser.astype(" str" )
342
+ 0 1.5
332
343
1 nan
333
344
dtype: object
334
- >> > ser.astype(str ).to_numpy()
335
- array([' a ' , ' nan' ], dtype = object )
345
+ >> > ser.astype(" str" ).to_numpy()
346
+ array([' 1.5 ' , ' nan' ], dtype = object )
336
347
337
348
Note how ``NaN `` (``np.nan ``) was converted to the string ``"nan" ``. This was
338
349
not the intended behavior, and it was inconsistent with how other dtypes handled
339
350
missing values.
340
351
341
- With pandas 3, this behavior has been fixed, and now ``astype(str) `` is an alias
342
- for ``astype("str") ``, i.e. casting to the new string dtype, which will preserve
343
- the missing values:
352
+ With pandas 3, this behavior has been fixed, and now ``astype("str") `` will cast
353
+ to the new string dtype, which preserves the missing values:
344
354
345
355
.. code-block :: python
346
356
347
357
# NEW behavior in pandas 3
348
358
>> > pd.options.future.infer_string = True
349
- >> > ser = pd.Series([" a " , np.nan], dtype = object )
350
- >> > ser.astype(str )
351
- 0 a
359
+ >> > ser = pd.Series([1.5 , np.nan])
360
+ >> > ser.astype(" str" )
361
+ 0 1.5
352
362
1 NaN
353
363
dtype: str
354
- >> > ser.astype(str ).values
355
- array([' a ' , nan], dtype = object )
364
+ >> > ser.astype(" str" ).to_numpy()
365
+ array([' 1.5 ' , nan], dtype = object )
356
366
357
367
If you want to preserve the old behaviour of converting every object to a
358
- string, you can use ``ser.map(str) `` instead.
368
+ string, you can use ``ser.map(str) `` instead. If you want do such conversion
369
+ while preserving the missing values in a way that works with both pandas 2.x and
370
+ 3.x, you can use ``ser.map(str, na_action="ignore") `` (for pandas 3.x only, you
371
+ can do ``ser.astype("str") ``).
372
+
373
+ If you want to convert to object or string dtype for pandas 2.x and 3.x,
374
+ respectively, without needing to stringify each individual element, you will
375
+ have to use a conditional check on the pandas version.
376
+ For example, to convert a categorical Series with string categories to its
377
+ dense non-categorical version with object or string dtype:
378
+
379
+ .. code-block :: python
380
+
381
+ >> > import pandas as pd
382
+ >> > ser = pd.Series([" a" , np.nan], dtype = " category" )
383
+ >> > ser.astype(object if pd.__version__ < " 3" else " str" )
359
384
360
385
361
386
``prod() `` raising for string data
0 commit comments