Skip to content

Conversation

@cem-anyscale
Copy link
Contributor

Description

The Concatenator preprocessor gained new fields between Ray 2.45 and 2.49. Because Ray Data does not currently support forward-compatible preprocessor serialization, attempting to load serialized data from 2.45 fails in newer versions. This PR adds support to safely deserialize legacy Concatenator state by assigning defaults for missing fields like flatten.

* The Concatenator preprocessor gained new fields between Ray 2.45 and 2.49.
* Because Ray Data does not currently support forward-compatible preprocessor serialization, attempting to load serialized data from 2.45 fails in newer versions.
* This PR adds support to safely deserialize legacy Concatenator state by assigning defaults for missing fields like flatten.

Signed-off-by: cem <cem@anyscale.com>
@cem-anyscale cem-anyscale requested a review from a team as a code owner October 30, 2025 17:11
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a backward compatibility issue in the Concatenator preprocessor related to deserialization from Ray 2.45. The Concatenator gained a new flatten field between Ray 2.45 and 2.49, and this PR ensures that older serialized data can be safely deserialized in newer versions by assigning a default value to the flatten field if it's missing. The changes include adding type hints to __getstate__ and __setstate__ in preprocessor.py, adding __setstate__ to concatenator.py to handle the missing flatten attribute, and adding a test case to verify backward compatibility in test_concatenator.py.

@ray-gardener ray-gardener bot added the data Ray Data-related issues label Oct 30, 2025
@cem-anyscale cem-anyscale added the go add ONLY when ready to merge, run all tests label Oct 30, 2025
@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had
any activity for 14 days. It will be closed in another 14 days if no further activity occurs.
Thank you for your contributions.

You can always ask for help on our discussion forum or Ray's public slack channel.

If you'd like to keep this open, just leave any comment, and the stale label will be removed.

@github-actions github-actions bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label Nov 14, 2025
@github-actions
Copy link

This pull request has been automatically closed because there has been no more activity in the 14 days
since being marked stale.

Please feel free to reopen or open a new pull request if you'd still like this to be addressed.

Again, you can always ask for help on our discussion forum or Ray's public slack channel.

Thanks again for your contribution!

@github-actions github-actions bot closed this Nov 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues go add ONLY when ready to merge, run all tests stale The issue is stale. It will be closed within 7 days unless there are further conversation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants