Speeding up Document Ranking with Rank-based Features

Claudio Lucchese, HPC Lab., ISTI-CNR, Pisa, Italy
Franco Maria Nardini, HPC Lab., ISTI-CNR, Pisa, Italy
Salvatore Orlando, DAIS - Università Ca’ Foscari Venezia, Italy
Raﬀaele Perego, HPC Lab., ISTI-CNR, Pisa, Italy
Nicola Tonellotto, HPC Lab., ISTI-CNR, Pisa, Italy

Apr. 16 2015

Accepted as short paper at the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, Santiago de Chile, 9–13 August 2015 [1].

Abstract. Learning to Rank (LtR) is an eﬀective machine learning methodology for inducing high-quality document ranking functions. Given a query and a candidate set of documents, where query-document pairs are represented by feature vectors, a machine-learned function is used to reorder this set. In this paper we propose a new family of rank-based features, which extend the original feature vector associated with each query- document pair. Indeed, since they are derived as a function of the query- document pair and the full set of candidate documents to score, rank-based features provide additional information to better rank documents and return the most relevant ones. We report a comprehensive evaluation showing that rank-based features allow us to achieve the desired eﬀectiveness with ranking models being up to 3.5 times smaller than models not using them, with a scoring time reduction up to 70%.

Rank-based Features

Table 1: A small example of golden set for learning to rank.


q₁			q₂			q₃
rel	BM25	PR	rel	BM25	PR	rel	BM25	PR

1	.80	.20	1	.60	.50	1	.65	.45
1	.75	.15	1	.60	.47	1	.67	.40
0	.65	.05	1	.50	.45	0	.60	.35
0	.65	.05	0	.45	.40	0	.40	.15

To introduce the new set of rank-based features, let us consider the example in Table 1. The table illustrates a small training set of query-document feature vectors. It is made up of three queries with four candidate results each. For each document associated with a query, a binary relevance label rel and two well-known features – BM25 and PageRank (PR) – are listed. During learning, a tree-based algorithm should ﬁnd the rules that best separate relevant from irrelevant results. A simple decision stump – i.e., a tree with one node and two leaves – is not suﬃcient in this case, since a “minimal” classiﬁcation tree with perfect accuracy is the following:

    if BM25 ≥ .75 then 1 else
      if PR ≥ .45 then 1 else
      if BM25 ≥ .67 then 1 else 0

Simpler but still eﬀective trees can be obtained if we enrich the feature set associated with each pair (q,c), c ∈𝒞_q, with new rank-based features. In our toy example, an optimal decision tree that achieves perfect accuracy is the one that classiﬁes as relevant the documents with a PR score being at most 0.05 less than the best PR score in the same candidate set, that is:

if Dist-Max _PR ≤ 0.05 then 1 else 0

where Dist-Max _PR measures the diﬀerence between each value of PR and the largest value assumed by PR in 𝒞_q. This last classiﬁer does not improve the quality of the ﬁrst one. On the other hand, it is much simpler and its evaluation requires much less time if rank-based features are available.

We propose four construction strategies to build new rank-based features for a given feature f ∈ℱ, occurring in the feature vector representing each pair (q,c), where c ∈𝒞_q:

Rank _f. Feature Rank _f ∈ {1,2,…,|𝒞_q|} corresponds to the rank of c after sorting 𝒞_q in descending order of f.
Rev-Rank _f. The same as Rank _f, but 𝒞_q is ordered in ascending order of f.
Dist-Min _f. Feature Dist-Min _f ∈ ℝ is given by the absolute value of the diﬀerence between the value of f and the smallest value of f in 𝒞_q.
Dist-Max _f. The same as Dist-Min _f, but we consider the diﬀerence with the largest value of f in 𝒞_q.

We expect that these new features can improve the scoring ability of the learned models since they contribute to give information about the whole set of candidates C_q, while other features give information limited to the single pairs (q,c) with respect to the entire collection of indexed documents. We claim that they can capture relatively better or relatively worse concepts over the current candidate set. It is worth noting that Rank _f and Rev-Rank _f are not mutually exclusive. If higher values of f are better, the former could promote good documents, while the latter should demote bad documents.

Eﬃciency of learned models

Figure 1 illustrates the quality in terms of NDCG of the λ-MART models trained on MSN/Fold 1 by using ℱ, ℱ⁺⁴⁰, and ℱ⁺¹⁰ (respectively the original feature set, and extended feature sets adding 40 or 10 rank-based features). We report the values of NDCG@50 obtained on the test set of MSN/Fold 1 as a function of the number of trees of the generated model. Note that the three curves stops in correspondence of a diﬀerent number of trees: this is because the validation set is used to choose the number of trees in the ﬁnal model.

Figure 1: Eﬀectiveness with features sets ℱ⁺⁴⁰, ℱ⁺¹⁰, and ℱ on MSN/Fold 1.

The beneﬁt of our rank-based features shows up very early in the learning process. First, we note that the new feature sets ℱ⁺⁴⁰ and ℱ⁺¹⁰ always produce models that are more eﬀective than the one obtained by using the original feature set ℱ at smaller ensemble sizes. More importantly, the maximum quality of the model trained on ℱ requires a number of trees that is about three times larger than the number of trees of the models trained on either ℱ⁺⁴⁰ or ℱ⁺¹⁰ to obtain the same NDCG@50. This behaviour is very signiﬁcant, since the number of trees directly impacts ranking eﬃciency. Document scoring requires in fact the traversal of all the trees of the model, and its cost is thus linearly proportional to their number.

References

[1] Claudio Lucchese, Franco Maria Nardini, Salvatore Orlando, Raﬀaele Perego, and Nicola Tonellotto. Speeding up document ranking with rank-based features. In SIGIR ’15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2015.

Share on