Skip to content

Conversation

@wprzytula
Copy link
Collaborator

@wprzytula wprzytula commented Oct 23, 2025

As decided in #1452, this PR makes deserialization framework interpret tuple fields that are missing in the encoded data from DB as nulls. It also includes a regression test written by @Lorak-mmk.

Partly fixes: #1452

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • I added relevant tests for new features and bug fixes.
  • All commits compile, pass static checks and pass test.
  • PR description sums up the changes and reasons why they should be introduced.
  • [ ] I have provided docstrings for the public items that I want to introduce.
  • [ ] I have adjusted the documentation in ./docs/source/.
  • I added appropriate Fixes: annotations to PR description.~~

@wprzytula wprzytula requested a review from Copilot October 23, 2025 17:14
@wprzytula wprzytula self-assigned this Oct 23, 2025
@wprzytula wprzytula added the bug Something isn't working label Oct 23, 2025
@wprzytula wprzytula added this to the 1.4.0 milestone Oct 23, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR enhances tuple deserialization to handle tuples with fewer elements than expected in the database representation, treating missing fields as nulls. This addresses a scenario where tuple columns are altered to add new fields, causing existing rows to have shorter encoded tuples than the metadata indicates.

Key changes:

  • Modified tuple deserialization logic to interpret UnexpectedEof errors as missing (null) tuple elements
  • Added regression test verifying behavior when a tuple column is altered to add fields
  • Enabled vector type integration test for ScyllaDB

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
scylla/tests/integration/types/cql_collections.rs Added test case test_cql_tuple_db_repr_shorter_than_metadata to verify tuple deserialization with missing fields
scylla-cql/src/deserialize/value.rs Enhanced tuple deserialization to handle UnexpectedEof by treating missing elements as null values

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@wprzytula wprzytula requested a review from Lorak-mmk October 23, 2025 17:14
@github-actions
Copy link

github-actions bot commented Oct 23, 2025

cargo semver-checks found no API-breaking changes in this PR.
Checked commit: 79ac7b1

@wprzytula wprzytula force-pushed the accept-shorter-encoded-tuples branch 4 times, most recently from 2376552 to 5d63ad7 Compare October 25, 2025 06:41
@adespawn
Copy link
Contributor

Did you ensure that nested tuples with missing values also work with those changes:

ex. tuple<tuple<int, string, int>, int, int> containing only ((1, "hello world"), 2)?

While I do not know if any sensible driver would create such value under normal circumstances, it is still possible to receive such value from the database. I assume this should be an easy question to answer, knowing how the deserialization behaves, and it shouldn't be nessesery to test this.

@Lorak-mmk
Copy link
Collaborator

and it shouldn't be nessesery to test this.

Writing a test should also not be very problematic, and its better than guessing.

@Lorak-mmk
Copy link
Collaborator

Lorak-mmk commented Oct 28, 2025

A wrapper like this might be useful for testing shorter tuples, as it allows to easily insert them without ALTER TABLE:

struct SerializeValueWithFakeType<T> {
    fake_type: ColumnType<'static>,
    value: T,
}

impl<T: SerializeValue> SerializeValue for SerializeValueWithFakeType<T> {
    fn serialize<'b>(
        &self,
        _typ: &ColumnType,
        writer: scylla_cql::serialize::CellWriter<'b>,
    ) -> Result<scylla::serialize::writers::WrittenCellProof<'b>, scylla::errors::SerializationError>
    {
        <T as SerializeValue>::serialize(&self.value, &self.fake_type, writer)
    }
}

impl<T> SerializeValueWithFakeType<T> {
    fn new(value: T, fake_type: ColumnType<'static>) -> Self {
        Self { fake_type, value }
    }
}

Example test (helper functions defined in cql_collections.rs):

#[tokio::test]
async fn test_shorter_tuples() {
    use ColumnType::*;
    use NativeType::*;
    setup_tracing();
    let session: Session = connect().await;

    let table_name: &str = "test_cql_tuple_tab";
    create_table(&session, table_name, "tuple<int, int, text>").await;

    // Rust tuple ()
    let tuple1: (i32, i32) = (1, 2);
    let tuple1_ser = SerializeValueWithFakeType::new(tuple1, Tuple(vec![Native(Int), Native(Int)]));
    let tuple1_deser = (1, 2, Option::<String>::None);
    insert_and_select(&session, table_name, &tuple1_ser, &tuple1_deser).await;

    session
        .ddl(format!("DROP KEYSPACE {}", session.get_keyspace().unwrap()))
        .await
        .unwrap();
}

When deserializing tuples, if the encoded data is shorter than expected
(i.e., it contains fewer elements than the tuple type specifies),
we should treat the missing elements as nulls instead of returning an
error. This is what other drivers (Python, Java, Node.js) do, and it
improves compatibility with ScyllaDB/Cassandra, which allow inserting
tuples with fewer elements than the defined type.

Fixes: scylladb#1452.
@wprzytula wprzytula force-pushed the accept-shorter-encoded-tuples branch from 5d63ad7 to 2579e40 Compare October 28, 2025 17:23
@wprzytula
Copy link
Collaborator Author

Rebased on main.

wprzytula and others added 3 commits October 28, 2025 19:10
After scylladb#1452 is fixed in a previous commit, now we add a regression test
to ensure that tuples encoded with fewer elements than in the metadata
are accepted and decoded correctly (missing elements are returned as
null). The test would fail before the fix.

Co-authored-by: Karol Baryła <karol.baryla@scylladb.com>
This helper struct allows serializing a value with a different
CQL type than its actual Rust type. This is useful for testing
scenarios where we want to force serializing a value to a CQL column
of a non-matching type (or, more specifically, a type that is not
accepted by the type-checking logic in the driver).

In the next commit, this will be used to insert tuples with fewer
elements than the CQL type expects, to later verify that reading
such tuples does not cause deserialization errors.

Co-authored-by: Karol Baryła <karol.baryla@scylladb.com>
This is similar to the existing test for shorter tuples, but:
1) it tests nested tuples to ensure that the logic works recursively,
2) it uses SerializeValueWithFakeType to create the shorter tuple
   representation, instead of using the `ALTER TABLE` trick.

Co-authored-by: Karol Baryła <karol.baryla@scylladb.com>
@wprzytula wprzytula force-pushed the accept-shorter-encoded-tuples branch from 2579e40 to 79ac7b1 Compare October 28, 2025 18:16
@wprzytula
Copy link
Collaborator Author

Added a new test for nested tuples.

@wprzytula wprzytula requested a review from adespawn October 28, 2025 18:16
Comment on lines +460 to +481

pub(crate) struct SerializeValueWithFakeType<'typ, T> {
fake_type: ColumnType<'typ>,
value: T,
}

impl<T: SerializeValue> SerializeValue for SerializeValueWithFakeType<'_, T> {
fn serialize<'b>(
&self,
_typ: &ColumnType,
writer: scylla_cql::serialize::CellWriter<'b>,
) -> Result<scylla::serialize::writers::WrittenCellProof<'b>, scylla::errors::SerializationError>
{
<T as SerializeValue>::serialize(&self.value, &self.fake_type, writer)
}
}

impl<'typ, T> SerializeValueWithFakeType<'typ, T> {
pub(crate) fn new(value: T, fake_type: ColumnType<'typ>) -> Self {
Self { fake_type, value }
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I posted this util before I knew about the serialization mechanism that you found (driver allowing tuples shorter than in db).
Do we need this util if the driver allows shorter serialization itself?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The util is useful in wider set of cases than just shorter tuples, so I'd keep it anyway.
  2. I don't want to depend on the lenient behaviour of CqlValue::Tuple, as I don't know if this behaviour is desirable or not. If not, we'll eventually alter it, and the tests would then break.

Copy link
Contributor

@adespawn adespawn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a bonus, I noticed that the integration test for vector type is not run for ScyllaDB even though ScyllaDB already support vector type, so I enabled it.

While I do not have any comments on the PR, I couldn't find any changes that enable those tests.

@Lorak-mmk
Copy link
Collaborator

As a bonus, I noticed that the integration test for vector type is not run for ScyllaDB even though ScyllaDB already support vector type, so I enabled it.

While I do not have any comments on the PR, I couldn't find any changes that enable those tests.

I think those were extracted to separate PR. @wprzytula please update PR description.

@wprzytula wprzytula merged commit ff31720 into scylladb:main Oct 29, 2025
12 checks passed
@wprzytula wprzytula deleted the accept-shorter-encoded-tuples branch October 29, 2025 15:05
@Lorak-mmk Lorak-mmk mentioned this pull request Nov 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improper deserialization of tuples with fewer then expected elements

3 participants