Exploiting CPU SIMD Extensions to Speed-up
Document Scoring with Tree Ensembles

Claudio Lucchese, ISTI–CNR, Pisa, Italy
Franco Maria Nardini, ISTI–CNR, Pisa, Italy
Salvatore Orlando, Univ. of Venice, Italy
Raffaele Perego, ISTI–CNR, Pisa, Italy
Nicola Tonellotto, ISTI–CNR, Pisa, Italy
Rossano Venturini. Univ. of Pisa, Italy

Mar. 31 2016

Short paper accepted at SIGIR ’16: ACM Conference on Research and Development in Information Retrieval [1].

Abstract. Scoring documents with learning-to-rank (LtR) models based on large ensembles of regression trees is currently deemed one of the best solutions to effectively rank query results to be returned by large scale Information Retrieval systems.

This paper investigates the opportunities given by SIMD capabilities of modern CPUs to the end of efficiently evaluating regression trees ensembles. We propose V-QuickScorer (vQS), which exploits SIMD extensions to vectorize the document scoring, i.e., to perform the ensemble traversal by evaluating multiple documents simultaneously. We provide a comprehensive evaluation of vQS against the state of the art on three publicly available datasets. Experiments show that vQS provides speed-ups up to a factor of 3.2x.

The source code is available here: https://github.com/hpclab/vectorized-quickscorer.


[1]   Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raffaele Perego, Nicola Tonellotto, and Rossano Venturini. Exploiting cpu simd extensions to speed-up document scoring with tree ensembles. In SIGIR ’16: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2016.

Share on