Recently I’ve been researching search relevance, a problem with search-specific characteristics. Search relevance means content shown to users must satisfy certain association with user-input query. For example, searching “KFC” shouldn’t show “McDonald’s” content.
Unlike feed scenarios where users have no content expectations, feed recommendation algorithms can exploit based on user browsing history, recent hot content, etc., or explore new interests. But in search scenarios, user-initiated queries have strong intent. Returned content must match this expectation, otherwise the search is invalid, causing search retention (LT) loss. From user perspective, if platform search works well, they should find desired content on first page, leading to much lower search browsing depth per PV than feed.
Scenario differences lead to different optimization objectives for search vs feed. For example, duration isn’t the most important metric for search LT measurement. From ranking perspective, added relevance constraints mean limited candidates for ranking under specific query (compared to feed), while ranking formulas often add relevance factors to achieve relevance objectives, interfering with maximizing original objectives (like revenue for ads).
Relevance constraints in ranking are why many effective feed ranking iterations are suboptimal or ineffective in search. For example, the phenomenon mentioned in Why are there few search system tech articles but many recommendation system articles? - an important reason is given query, insufficient relevance candidates limit ranking search space, while ranking’s benefit should have increasing marginal efficiency as candidate count increases.
This article discusses solution approaches for search relevance. Roughly, relevance involves two parts: relevance modeling, and the mechanism for applying relevance model estimates. This article attempts detailed discussion of these aspects.