The long tail problem is common in recommendation/advertising systems (mainly for items). There are many reasons. The author’s understanding is that the system has a feedback loop (training data is generated by the model, and then used for training). Without external intervention, the Matthew effect naturally causes severe head effects, where a small portion of items dominate the system.
For example, in recommendation systems, many videos/articles don’t get exposure opportunities and don’t appear in training sets, while popular videos/articles rank high across different users and get recommended multiple times. In advertising systems, some campaigns have very high spend while others can’t spend at all. This leads to poor user or advertiser experience, often categorized as ecosystem problems.
Since the system’s natural characteristics cause severe head effects (or Pareto effects) without intervention, can forcibly intervening in the system distribution solve this problem? The answer is yes, and most current methods do exactly this. Common approaches are:
- Strategy level: Design rules based on system and business characteristics, such as specific support for long-tail items to forcibly reach more users
- Model level: Core idea is to let the model better learn long-tail item representations, because the root cause is insufficient samples for long-tail items, leading to poor model learning. Specific methods are detailed below.
This article mainly introduces papers at the model level, since strategy-level methods often require business-specific rules, while model-level methods are more universal.