-
Notifications
You must be signed in to change notification settings - Fork 461
Open
Labels
bug / fixSomething isn't workingSomething isn't workinghelp wantedExtra attention is neededExtra attention is needed
Description
🐛 Bug
Currently, when passing a 2D tensor ([num_queries, num_documents]
) to retrieval_normalized_dcg
, the function flattens both preds
and target
and computes DCG/IDCG on the concatenated list.
This treats all queries as a single large ranking problem.
In Information Retrieval (IR) and recommender systems, the standard practice for NDCG is:
- Compute NDCG per query
- Then take the macro average over queries
Flattening across queries changes the interpretation of the metric and can lead to inflated or misleading results.
To Reproduce
from torchmetrics.functional.retrieval import retrieval_normalized_dcg
import torch
# Query 1
p1 = retrieval_normalized_dcg(torch.tensor([0.1, 0.2, 0.3]), torch.tensor([0, 1, 0]))
print(p1) # tensor(0.6309)
# Query 2
p2 = retrieval_normalized_dcg(torch.tensor([0.8, 0.1, 0.05]), torch.tensor([1, 0, 0]))
print(p2) # tensor(1.0000)
print("Mean per-query NDCG:", (p1 + p2) / 2)
# tensor(0.8155)
# Batched input (2D)
p_batch = retrieval_normalized_dcg(
torch.tensor([[0.1, 0.2, 0.3], [0.8, 0.1, 0.05]]),
torch.tensor([[0, 1, 0], [1, 0, 0]]),
)
print("Batch NDCG:", p_batch)
# tensor(0.9197) <-- Not the mean per-query value
Here, the batch value 0.9197 is different from the expected per-query average 0.8155 because the function flattens both queries before computing NDCG.
Environment
- macOS 15.4.1 (Sequoia) on Intel MacBook Pro
- Python 3.12.6
- torch==2.2.0
- torchmetrics==1.8.1
Metadata
Metadata
Assignees
Labels
bug / fixSomething isn't workingSomething isn't workinghelp wantedExtra attention is neededExtra attention is needed