Skip to content

SNOW-2148589: Wrong results when Summing with Window Function #3459

@emzwys

Description

@emzwys

Please answer these questions before submitting your issue. Thanks!

  1. What version of Python are you using?
    Python 3.11.3 (main, Mar 1 2024, 15:38:32) [GCC 11.4.0]

  2. What are the Snowpark Python and pandas versions in the environment?

pandas==2.3.0
snowflake-snowpark-python==1.31.1

  1. What did you do?
      def test__snowpark_bug(self) -> None:
          col_a: str = "COL_A"
          col_b: str = "COL_B"
          value_col: str = "VAL"
          df = self.snowpark_client.session.create_dataframe(
              [
                  [1, 1, 1],
                  [2, 2, 1],
                  [2, 2, 1],
                  [2, 1, 1],
              ],
              [col_a, col_b, value_col]
          )
          window_a = Window.partition_by(col_a)
          window_both = Window.partition_by(col_b, col_a)
          
          df = df.with_columns(["over_a", "over_both"],
                              [spf.sum(value_col).over(window_a),
                               spf.sum(value_col).over(window_both)])
          df.show()
  1. What did you expect to see?
    output online test:
------------------------------------------------------
|"COL_A"  |"COL_B"  |"VAL"  |"OVER_A"  |"OVER_BOTH"  |
------------------------------------------------------
|1        |1        |1      |1         |1            |
|2        |2        |1      |3         |2            |
|2        |2        |1      |3         |2            |
|2        |1        |1      |3         |1            |
------------------------------------------------------

output local test:

------------------------------------------------------
|"COL_A"  |"COL_B"  |"VAL"  |"OVER_A"  |"OVER_BOTH"  |
------------------------------------------------------
|1        |1        |1      |1.0       |1.0          |
|2        |2        |1      |3.0       |1.0          |
|2        |2        |1      |3.0       |2.0          |
|2        |1        |1      |3.0       |2.0          |
------------------------------------------------------

The value for 'OVER_BOTH' are wrong, e.g. the last row should be 1.0, since we group over both COL_A and COL_B, i.e. the last row is its own group, and has sum 1.

Metadata

Metadata

Assignees

Labels

bugSomething isn't workinglocal testingLocal Testing issues/PRsstatus-triage_doneInitial triage done, will be further handled by the driver team

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions