Skip to content

Conversation

yjoonjang
Copy link

Description

This PR adds support for pylate-based ColBERT models. The existing implementation in main.py shows suboptimal performance, particularly in recall, when using modern pylate-compatible models. This update implemented in the new main_pylate.py script, correctly integrates the pylate library and demonstrates significant performance improvements.

Benchmark Results

The following results were obtained using:
Model: ayushexel/colbert-ModernBERT-base-1-neg-1-epoch-gooaq-1995000
Dataset: zeta-alpha-ai/NanoFiQA2018

Before (Current main.py)

=====================================================================================
                                    FINAL REPORT                                     
(Dataset: zeta-alpha-ai/NanoFiQA2018)
=====================================================================================
Retriever                 | Indexing Time (s)    | Avg Query Time (ms)    | Recall@10 
-------------------------------------------------------------------------------------
1. ColBERT (Native)       | 26.55                | 395.44                 | 0.1600    
2. ColBERT + FDE          | 261.37               | 111.83                 | 0.0000    
=====================================================================================

After (This PR's main_pylate.py)

=====================================================================================
                                    FINAL REPORT                                     
(Dataset: zeta-alpha-ai/NanoFiQA2018)
=====================================================================================
Retriever                 | Indexing Time (s)    | Avg Query Time (ms)    | Recall@10 
-------------------------------------------------------------------------------------
1. ColBERT (Native)       | 12.55                | 447.60                 | 0.4400    
2. ColBERT + FDE          | 188.83               | 146.80                 | 0.1000    
=====================================================================================

Additional Notes

There is a known issue in pylate regarding an incompatibility with MUVERA and the normalization scores for models trained via distillation. Consequently, models trained with this method may exhibit poor performance.
For more details, see the related GitHub issue: lightonai/pylate#142

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant