Skip to content

Conversation

@millsks
Copy link

@millsks millsks commented Aug 23, 2023

When trying to load the pickled dataframe using the latest version of pandas it will throw the following error because the internal pandas.core.indexes.numeric module had been deprecated when pandas 2.0 broke backwards compatibility with pandas 1.x.

>>> import pickle
>>> with open('2018-04-01.pkl', 'rb') as fd:
...   df = pickle.load(fd)
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
ModuleNotFoundError: No module named 'pandas.core.indexes.numeric'

To fix this issue the pickled dataframes were rewritten using the pandas==2.0.3. This should give end users read capabilities for future versions of pandas.

>>> import pathlib
>>> pd.__version__
'2.0.3'
>>> for f in pathlib.Path('.').glob('*.pkl'):
...   pd.read_pickle(f).to_pickle(f)
...
>>> with open('2018-04-01.pkl', 'rb') as fd:
...   df = pickle.load(fd)
...
>>> df
      TRANSACTION_ID         TX_DATETIME  ...  TERMINAL_ID_NB_TX_30DAY_WINDOW  TERMINAL_ID_RISK_30DAY_WINDOW
0                  0 2018-04-01 00:00:31  ...                             0.0                            0.0
1                  1 2018-04-01 00:02:10  ...                             0.0                            0.0
2                  2 2018-04-01 00:07:56  ...                             0.0                            0.0
3                  3 2018-04-01 00:09:29  ...                             0.0                            0.0
4                  4 2018-04-01 00:10:34  ...                             0.0                            0.0
...              ...                 ...  ...                             ...                            ...
9483            9483 2018-04-01 23:56:50  ...                             0.0                            0.0
9484            9484 2018-04-01 23:58:14  ...                             0.0                            0.0
9485            9485 2018-04-01 23:58:31  ...                             0.0                            0.0
9486            9486 2018-04-01 23:59:28  ...                             0.0                            0.0
9487            9487 2018-04-01 23:59:51  ...                             0.0                            0.0

[9488 rows x 23 columns]

Resolves #2

Signed-off-by: Kevin Mills <millsks@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Migrate pickled dataframes to pandas >=2.0

1 participant