missing_percent unexpected output after filtering all rows

Hi,

I’ve encountered an issue with the `missing_percent` check when a filter excludes all rows from the dataset. In this scenario, the check unexpectedly fails.

Here’s a minimal reproducible example:

```python
from pyspark.sql import SparkSession
from soda.scan import Scan

spark = SparkSession.builder.appName("SodaScanTest").getOrCreate()

data = [
    (1, "Alice", 29),
    (2, "Bob", 25),
    (3, "Charlie", None),
]
columns = ["id", "name", "age"]
df = spark.createDataFrame(data, columns)
df.createOrReplaceTempView("people")

scan = Scan()
scan.set_scan_definition_name("soda_scan_test")
scan.set_data_source_name("spark_df")
scan.add_spark_session(spark)
scan.set_verbose(True)

scan.add_sodacl_yaml_str("""
checks for people:
  - missing_percent(age):
      fail: when < 100
      filter: name = 'Diana'
""")

scan.execute()

if scan.has_check_fails():
    print(scan.get_logs_text())
    print("Scan failed!")
else:
    print("Scan succeeded!")

spark.stop()
```

**Observed output:**
```
INFO | 1/1 check FAILED:
INFO | people in spark_df
INFO | missing_percent(age) fail when < 100 [FAILED]
INFO | check_value: 0.0
INFO | row_count: 0
INFO | missing_count: 0
Scan failed!
```

**Expected behavior:**  
If the filter excludes all rows (i.e., `row_count: 0`), I would expect the check to pass, since there are no non-missing values for the filtered rows. Failing the check in this case seems unintuitive.

Is this the intended behavior? If not, could the check be adjusted to pass when all rows are filtered out?

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

missing_percent unexpected output after filtering all rows #2407

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

missing_percent unexpected output after filtering all rows #2407

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions