I’m trying to optimise the diversity problem in the search engine result pages.
Imagine that for a search query, there’s N search results arranged in descending order of relevance. Each result belongs to a category. Those N results are sliced into pages of K results per page.
My objective is to have a high category diversity in the first pages and low category diversity in the last pages. But it should still keep a minimum diversity in the last pages, only 2 or 3 categories in the last pages is not good, in other words, if there’s too many results from a same category in the past pages, it’s not good.
I built a hardcoded algorithm like: find the minimum number of pages P to keep the average number of results from a same category per page smaller than or equal to X. Then fill the N-P first pages with only 1 result per category and fill the following pages with the average number of results from a same category per page.
I think there must be a better algorithm to rerank the results with a smooth diversity change from the first page to the last page and limit the maximum of results from a same category in any pages.
Thanks