Skip to content

SNOW-2174517: save_as_table doesn't use iceberg compatible types #3490

@ctc-andot

Description

@ctc-andot

Please answer these questions before submitting your issue. Thanks!

  1. What version of Python are you using?

    Python 3.11.12

  2. What operating system and processor architecture are you using?

    Linux-3.10.0-1160.119.1.el7.x86_64-x86_64-with-glibc2.17

  3. What are the component versions in the environment (pip freeze)?

    snowflake-connector-python    3.15.0
    snowflake-snowpark-python     1.32.0
    pandas                        1.5.3
    
  4. What did you do?

    df = session.create_dataframe(
        [
            [1, {'a': 'b'}],
            [3, {'c': 'd'}],
        ],
        schema=StructType(
            [
             StructField('COL1', IntegerType()),
             StructField('COL2', MapType(StringType(), StringType(), structured=True))
            ]
        )
    )
    iceberg_config = {
        "external_volume": "ICEBERG_TEST",
        "catalog": "SNOWFLAKE",
        "base_location": "DEV_DB/PLAYGROUND/ICEBERG_POC",
        "storage_serialization_policy": "COMPATIBLE",
    }
    df.write.save_as_table(
        "DEV_DB.PLAYGROUND.DEBUG_TYPE_CONVERSION_TABLE",
        iceberg_config=iceberg_config,
        mode="overwrite"
    )
    

    This fails with the following error:

    snowflake.snowpark.exceptions.SnowparkSQLException: (1304): 01bd443f-0b0b-1083-001f-e12c598d0a0b: 091386 (42601): SQL Compilation error:
    Unsupported data type 'OBJECT' for iceberg tables.
    

    Inspecting the queries, before the save_as_table call, the df's query is:

    SELECT 
      "COL1",  
      CAST (parse_json("COL2") AS MAP(STRING, STRING)) AS "COL2" 
    FROM ( 
      SELECT 
        $1 AS "COL1", 
        $2 AS "COL2" 
      FROM VALUES 
        (1 :: INT, '{"a": "b"}' :: STRING), 
        (3 :: INT, '{"c": "d"}' :: STRING)
    )
    

    But when constructing the save_as_table call, the query is:

    CREATE  OR  REPLACE    ICEBERG  TABLE  DEV_DB.PLAYGROUND.DEBUG_TYPE_CONVERSION_TABLE("COL1" BIGINT, "COL2" OBJECT) ...
    

    (Note the OBJECT type in the query). This seems to happen because the df.schema[1] isn't kept as MapType(..., structured=True)
    but is instead converted to StructType later on. This then is converted to OBJECT by the type_utils.

  5. What did you expect to see?

    An iceberg table is created with the data provided in the DataFrame, and with the correct data types. Specificially
    for the MapType column, I expected the data to be stored as a MapType with the keys and values as strings.

  6. Can you set logging to DEBUG and collect the logs?

    If necessary I can add, but would need to scrub the logs of our internal information. I looked through and there is not
    much relevant type information in the logs.

Metadata

Metadata

Labels

bugSomething isn't workingstatus-triageIssue is under initial triage

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions