Tianxiao Wei
Software engineer at MongoDB. I work mostly on vector search using Lucene
Session
05-07
16:00
45min
When BM25 Scores Disagree: A Corpus-Independent Alternative
Tianxiao Wei
In distributed search, BM25 returns different results across nodes because IDF and average document length vary with each node's corpus state. StableTfl replaces these with a term-length rarity heuristic, eliminating all corpus dependency. On 22 BEIR datasets, it retains ~90% of BM25's NDCG@10 while guaranteeing identical rankings across nodes.
Main Stage