0
$\begingroup$

I’m trying to optimise the diversity problem in the search engine result pages.

Imagine that for a search query, there’s N search results arranged in descending order of relevance. Each result belongs to a category. Those N results are sliced into pages of K results per page.

My objective is to have a high category diversity in the first pages and low category diversity in the last pages. But it should still keep a minimum diversity in the last pages, only 2 or 3 categories in the last pages is not good, in other words, if there’s too many results from a same category in the past pages, it’s not good.

I built a hardcoded algorithm like: find the minimum number of pages P to keep the average number of results from a same category per page smaller than or equal to X. Then fill the N-P first pages with only 1 result per category and fill the following pages with the average number of results from a same category per page.

I think there must be a better algorithm to rerank the results with a smooth diversity change from the first page to the last page and limit the maximum of results from a same category in any pages.

Thanks

1 Answers 1

0

Here are some thoughts.

Look at the literature on recommender systems, of which search engines are part. Diversity is a key aim for them (e.g. see Castells et al, Novelty and Diversity Metrics for Recommender Systems: Choice, Discovery and Relevance), often computed in an information theoretic fashion.

One simple idea would be to use a weight function to choose which result to add to the page, and update the function each time a new result is chosen. For instance: $$ w_p(s)=\alpha \,R(s) + \beta(p)\sum_{s_i\in p} w(s_i,s) $$ where $p$ is the current page, $s$ is a current search result, $R$ is a relevance function, $\beta$ is a weight function for the current page (probably a decreasing function in your case), and $w$ is a weight function to try to increase diversity (e.g. $w(a,b)=1-I[a=b]$). Then the algorithm might be like:

for each page p:
   for i = 1 to K:
      Compute w_p(s) for each s
      Choose argmax_s w_p(s) to add to the page

or one could use the weights to generate probability distributions over the search results choices, and stochastically draw search results from this weighted distribution.