Skip to content

retrieval_normalized_dcg should compute per-query average when given 2D inputs (IR-standard behavior) #3216

@rintaro121

Description

@rintaro121

🐛 Bug

Currently, when passing a 2D tensor ([num_queries, num_documents]) to retrieval_normalized_dcg, the function flattens both preds and target and computes DCG/IDCG on the concatenated list.
This treats all queries as a single large ranking problem.

In Information Retrieval (IR) and recommender systems, the standard practice for NDCG is:

  • Compute NDCG per query
  • Then take the macro average over queries

Flattening across queries changes the interpretation of the metric and can lead to inflated or misleading results.

To Reproduce

from torchmetrics.functional.retrieval import retrieval_normalized_dcg
import torch

# Query 1
p1 = retrieval_normalized_dcg(torch.tensor([0.1, 0.2, 0.3]), torch.tensor([0, 1, 0]))
print(p1)  # tensor(0.6309)

# Query 2
p2 = retrieval_normalized_dcg(torch.tensor([0.8, 0.1, 0.05]), torch.tensor([1, 0, 0]))
print(p2)  # tensor(1.0000)

print("Mean per-query NDCG:", (p1 + p2) / 2)
# tensor(0.8155)

# Batched input (2D)
p_batch = retrieval_normalized_dcg(
    torch.tensor([[0.1, 0.2, 0.3], [0.8, 0.1, 0.05]]),
    torch.tensor([[0, 1, 0], [1, 0, 0]]),
)
print("Batch NDCG:", p_batch)
# tensor(0.9197) <-- Not the mean per-query value

Here, the batch value 0.9197 is different from the expected per-query average 0.8155 because the function flattens both queries before computing NDCG.

Environment

  • macOS 15.4.1 (Sequoia) on Intel MacBook Pro
  • Python 3.12.6
  • torch==2.2.0
  • torchmetrics==1.8.1

Metadata

Metadata

Assignees

Labels

bug / fixSomething isn't workinghelp wantedExtra attention is needed

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions