-
Notifications
You must be signed in to change notification settings - Fork 135
Description
Please answer these questions before submitting your issue. Thanks!
-
What version of Python are you using?
Python 3.11.12
-
What operating system and processor architecture are you using?
Linux-3.10.0-1160.119.1.el7.x86_64-x86_64-with-glibc2.17
-
What are the component versions in the environment (
pip freeze
)?snowflake-connector-python 3.15.0 snowflake-snowpark-python 1.32.0 pandas 1.5.3
-
What did you do?
df = session.create_dataframe( [ [1, {'a': 'b'}], [3, {'c': 'd'}], ], schema=StructType( [ StructField('COL1', IntegerType()), StructField('COL2', MapType(StringType(), StringType(), structured=True)) ] ) ) iceberg_config = { "external_volume": "ICEBERG_TEST", "catalog": "SNOWFLAKE", "base_location": "DEV_DB/PLAYGROUND/ICEBERG_POC", "storage_serialization_policy": "COMPATIBLE", } df.write.save_as_table( "DEV_DB.PLAYGROUND.DEBUG_TYPE_CONVERSION_TABLE", iceberg_config=iceberg_config, mode="overwrite" )
This fails with the following error:
snowflake.snowpark.exceptions.SnowparkSQLException: (1304): 01bd443f-0b0b-1083-001f-e12c598d0a0b: 091386 (42601): SQL Compilation error: Unsupported data type 'OBJECT' for iceberg tables.
Inspecting the queries, before the
save_as_table
call, the df's query is:SELECT "COL1", CAST (parse_json("COL2") AS MAP(STRING, STRING)) AS "COL2" FROM ( SELECT $1 AS "COL1", $2 AS "COL2" FROM VALUES (1 :: INT, '{"a": "b"}' :: STRING), (3 :: INT, '{"c": "d"}' :: STRING) )
But when constructing the
save_as_table
call, the query is:CREATE OR REPLACE ICEBERG TABLE DEV_DB.PLAYGROUND.DEBUG_TYPE_CONVERSION_TABLE("COL1" BIGINT, "COL2" OBJECT) ...
(Note the
OBJECT
type in the query). This seems to happen because thedf.schema[1]
isn't kept asMapType(..., structured=True)
but is instead converted toStructType
later on. This then is converted toOBJECT
by the type_utils. -
What did you expect to see?
An iceberg table is created with the data provided in the DataFrame, and with the correct data types. Specificially
for the MapType column, I expected the data to be stored as a MapType with the keys and values as strings. -
Can you set logging to DEBUG and collect the logs?
If necessary I can add, but would need to scrub the logs of our internal information. I looked through and there is not
much relevant type information in the logs.