Quality versus eﬃciency in document scoring with learning-to-rank models

Gabriele Capannini, Innovation Design och Teknik (IDT), Mälardalens högskola, Västeras, Sweden
Claudio Lucchese, HPC Lab., ISTI-CNR, Pisa, Italy
Franco Maria Nardini, HPC Lab., ISTI-CNR, Pisa, Italy
Salvatore Orlando, DAIS - Università Ca’ Foscari Venezia, Italy
Raﬀaele Perego, HPC Lab., ISTI-CNR, Pisa, Italy
Nicola Tonellotto, HPC Lab., ISTI-CNR, Pisa, Italy

May 18 2016

Journal paper accepted at Information Processing & Management [1].

Abstract. Learning-to-Rank (LtR) techniques leverage machine learning algorithms and large amounts of training data to induce high-quality ranking functions. Given a set of documents and a user query, these functions are able to precisely predict a score for each of the documents, in turn exploited to eﬀectively rank them. Although the scoring eﬃciency of LtR models is critical in several applications - e.g., it directly impacts on response time and throughput of Web query processing – it has received relatively little attention so far.

The goal of this work is to experimentally investigate the scoring eﬃciency of LtR models along with their ranking quality. Speciﬁcally, we show that machine-learned ranking models exhibit a quality versus eﬃciency trade-oﬀ. For example, each family of LtR algorithms has tuning parameters that can inﬂuence both eﬀectiveness and eﬃciency, where higher ranking quality is generally obtained with more complex and expensive models. Moreover, LtR algorithms that learn complex models, such as those based on forests of regression trees, are generally more expensive and more eﬀective than other algorithms that induce simpler models like linear combination of features.

We extensively analyze the quality versus eﬃciency trade-oﬀ of a wide spectrum of state-of-the-art LtR, and we propose a sound methodology to devise the most eﬀective ranker given a time budget. To guarantee reproducibility, we used publicly available datasets and we contribute an open source C++ framework providing optimized, multi-threaded implementations of the most eﬀective tree-based learners: Gradient Boosted Regression Trees (GBRT), Lambda-Mart (λ-MART), and the ﬁrst public-domain implementation of Oblivious Lambda-Mart (Ω_λ-MART), an algorithm that induces forests of oblivious regression trees.

We investigate how the diﬀerent training parameters impact on the quality versus eﬃciency trade-oﬀ, and provide a thorough comparison of several algorithms in the quality-cost space. The experiments conducted show that there is not an overall best algorithm, but the optimal choice depends on the time budget.

The source code of algorithms’ implementations is made available as part of QuickRank: http://quickrank.isti.cnr.it/.

References

[1] Gabriele Capannini, Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raﬀaele Perego, and Nicola Tonellotto. Quality versus eﬃciency in document scoring with learning-to-rank models. Information Processing & Management, 2016.

Share on