Post-Learning Optimization of Tree Ensembles for Eﬃcient Ranking

Claudio Lucchese, ISTI–CNR, Pisa, Italy
Franco Maria Nardini, ISTI–CNR, Pisa, Italy
Salvatore Orlando, Univ. of Venice, Italy
Raﬀaele Perego, ISTI–CNR, Pisa, Italy
Fabrizio Silvestri, Yahoo Labs, London
Salvatore Trani, ISTI–CNR, Pisa, Italy

Mar. 31 2016

Short paper accepted at SIGIR ’16: ACM Conference on Research and Development in Information Retrieval [1].

Abstract. Learning to Rank (LtR) is the machine learning method of choice for producing high quality document ranking functions from a ground-truth of training examples. In practice, eﬃciency and eﬀectiveness are intertwined concepts and trading oﬀ eﬀectiveness for meeting eﬃciency constraints typically existing in large-scale systems is one of the most urgent issues. In this paper we propose a new framework, named CLEaVER, for optimizing machine-learned ranking models based on ensembles of regression trees. The goal is to improve eﬃciency at document scoring time without aﬀecting quality. Since the cost of an ensemble is linear in its size, CLEaVER ﬁrst removes a subset of the trees in the ensemble, and then ﬁne-tunes the weights of the remaining trees according to any given quality measure. Experiments conducted on two publicly available LtR datasets show that CLEaVER is able to prune up to 80% of the trees and provides an eﬃciency speed-up up to 2.6x without aﬀecting the eﬀectiveness of the model.

The source code is made available as part of QuickRank: http://quickrank.isti.cnr.it/.

References

[1] Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raﬀaele Perego, Fabrizio Silvestri, and Salvatore Trani. Post-learning optimization of tree ensembles for eﬃcient ranking. In SIGIR ’16: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2016.

Share on