From Bid Shading to Score Shading:Dual Optimization and Game Governance in Mixed Ranking Systems

发表于 2026-01-25 标签计算广告，机器学习，推荐

In Ad Network scenarios, multiple DSPs (Demand Side Platforms) participate in auctions, typically under a First Price Auction (FPA) mechanism. To optimize performance, DSPs often employ Bid Shading techniques to maximize their ROI and revenue while maintaining a competitive level of inventory acquisition.

Bid Shading is a specialized technique developed for first-price auctions. Since the final clearing price equals the submitted bid (i.e., pay-as-you-bid), bidding one's true valuation often leads to "winner's curse" or overpayment, resulting in capital inefficiency. Consequently, Bid Shading leverages historical data to estimate win rates or the "market price" (the second-highest bid) to dynamically shade the bid. This ensures the final payment is lower than the original valuation but higher than the second-highest bid, effectively reducing costs and boosting ROI without sacrificing win probability.

In feed mixed ranking (MixRank) scenarios, various business lines (such as Ads, E-commerce, Live Streaming, Short Video, and Articles) provide a "PK Score" for ranking during each request. From a mechanism standpoint, this PK Score is functionally equivalent to a bid in an advertising system: the PK Score can be viewed as an "implicit bid" submitted by a business entity for an exposure opportunity.

Naturally, this raises a question: Can we adapt the principles of Bid Shading to systematically calibrate PK Scores and optimize overall mixed ranking utility? Theoretically, this is highly feasible. This article introduces Score Shading—a method of applying dynamic perturbations (shading) to PK Scores to identify the globally optimal "bidding point." By doing so, the system maintains competitiveness while maximizing platform-wide revenue, specifically addressing two core objectives:

Problem One: Enhancing the penetration rate (load growth) of specific content formats in high-value traffic by increasing their "participation-to-exposure" ratio.
Problem Two: Reducing the scale (i.e., the cost) of a specific format's PK Score while keeping the overall win rate (load) stable and within a reasonable range.

This article will first review Bid Shading modeling in Ad Networks, then generalize it to mixed ranking Score Shading. We will explore optimization modeling for both objectives, the Lagrangian dual solution, and the regulation mechanism for the pacing parameter λ. Finally, we will discuss critical challenges such as win-rate modeling and incentive compatibility.

Bid Shading Methodology

From the perspective of a DSP in an Ad Network, the Bid Shading problem is typically modeled as the following optimization task: maximizing value (cost) subject to a target profit margin constraint (1 - media payout / revenue).

\[ \begin{aligned} \max_{x_i} \quad & \sum_{i} C_i F(x_i) \\ \text{s.t.} \quad & \sum_{i} x_i F(x_i) \le K \sum_{i} C_i F(x_i) \end{aligned} \]

The notation is defined as follows:

\(C_i\): The intrinsic value of the \(i\)-th traffic unit (usually synonymous with valuation in Ad Networks).
\(x_i\): The bid price submitted by the DSP for the current request.
\(F(x_i)\): The win rate (probability of winning) given bid \(x_i\).
\(K\): The target profit margin constraint.

Using the method of Lagrange multipliers, we construct the following Lagrangian, where \(\lambda\) is the multiplier (\(\lambda > 0\)):

\[\mathcal{L}(x_i, \lambda) = \sum_i C_i F(x_i) - \lambda \left[ \sum_i x_i F(x_i) - K \sum_i C_i F(x_i) \right]\]

Rearranging the terms yields:

\[\mathcal{L}(x_i, \lambda) = \sum_i F(x_i) \left[ (1 + \lambda K)C_i - \lambda x_i \right]\]

The role of \(\lambda\) is to ensure the constraint is satisfied. In production environments, \(\lambda\) is often adjusted dynamically via a controller (such as a PID) to ensure the actual profit rate (TAC) aligns with expectations. However, within a specific time slice, \(\lambda\) remains constant. Thus, for each online request, the system selects the optimal bid \(x_i^*\) that maximizes the Lagrangian \(\mathcal{L}\):

\[x_i^*= \arg\max_{x_i}\;F(x_i)\left[(1+\lambda K)C_i - \lambda x_i\right]\]

How should \(\lambda\) be regulated? We can further reformulate the objective as follows:

\[\mathcal{L} = \sum_i \cdot F(x_i) \left[C_i + \lambda(KC_i-x_i) \right]\]

If the actual TAC exceeds the target \(K\) (i.e., DSP profit is below expectations), \(\lambda\) is increased. In this state, choosing an excessively high bid \(x_i\) would penalize \(\mathcal{L}\), forcing the system to naturally select a lower bid \(x_i\) to restore the profit margin. Conversely, \(\lambda\) is decreased to enhance the DSP's competitiveness and volume acquisition.

Score Shading Strategy

In mixed ranking, each business line's PK Score can be treated as its bid for a given request. Building upon the Bid Shading paradigm, we can model the following scenarios. First, we define several core variables:

\(V\): The value term—the business utility gained if a specific format wins the exposure, defined by the business entity.
\(s\): The PK Score, which serves as the "cost" or "shadow price" for winning the auction.
\(F(s)\): The win rate model, representing the probability of winning given a score \(s\).
\(C\): The business cost constraint (often a proxy for long-term user value like LT, which is difficult to optimize directly).

Using these symbols, we model the following two problems respectively:
- Problem One: On high-value traffic, increase the conversion of specific formats from "not shown" to "shown," boosting format penetration (load growth).
- Problem Two: While maintaining a stable overall win rate (load), reduce the scale or cost of a format's PK Score to keep it within an efficient range.

Problem One: Improving Penetration

This objective is defined as maximizing the expected utility of each auction while ensuring the aggregate cost (PK Score) remains within a specified budget or constraint.

\[ \begin{aligned} \max_{s_i} \quad & \sum_{i=1}^{n} V_i F(s_i) \\ \text{s.t.} \quad & \frac{1}{n}\sum_{i=1}^{n} s_i F(s_i) \le C \end{aligned} \]

The critical challenge here is defining the constraint \(C\). While Long-Term Value (LTV/LT) is the fundamental constraint, its long feedback loop and high volatility make it unsuitable for real-time control.

Furthermore, since mixed ranking systems often impose limits on the distribution of PK Scores (e.g., ensuring p90 or p99 values remain comparable across business lines), we use the winning PK Score as the direct constraint. Theoretically, mean, P90, and P99 each require individual constraints, but for simplicity, we model the mean constraint here.

Using the Lagrangian dual approach, we derive the following function:

\[\mathcal{L}(s_i,\lambda)=\sum_{i=1}^{n} V_i F(s_i) - \lambda \left(\sum_{i=1}^{n} s_i F(s_i) - nC\right)\]

Rearranging gives:

\[\mathcal{L}=\sum_{i=1}^{n} F(s_i) \left(V_i - \lambda s_i\right)+ \lambda nC\]

Since \(C\) is constant, it can be omitted. For the \(i\)-th request, we simply choose \(s_i^*\) to maximize the following objective, where \(\lambda\) acts as the pacing coefficient to keep the average PK Score within bounds (multiple constraints would imply multiple \(\lambda\) values):

\[s_i^*=\arg\max_{s_i}\;F(s_i)\left[(V_i - \lambda s_i) \right]\]

Problem Two: Cost Reduction

In this scenario, the goal is to minimize the PK Score while keeping the business utility constant. The problem is formalized as follows, where we minimize the PK Score (mean) subject to a win rate constraint \(T\) (a proxy for load).

\[ \begin{aligned} \min_{s_i} \quad & \frac{1}{n}\sum_{i=1}^{n} s_i F(s_i) \\ \text{s.t.} \quad & \frac{1}{n}\sum_{i=1}^{n} F(s_i) = T \end{aligned} \]

Through Lagrangian derivation, we obtain the dual function:

\[\mathcal{L}(s_i,\lambda)=\frac{1}{n}\sum_{i=1}^{n} s_i F(s_i)+\lambda \left(\sum_{i=1}^{n} F(s_i) - nT \right)\]

The decision goal is to select \(s_i^*\) for each request that minimizes the following, where \(\lambda\) is the pacing coefficient for the win rate:

\[s_i^* = \arg\min_{s_i}\;F(s_i) \cdot \left(s_i - \lambda\right)\]

Here, \(\lambda\) effectively functions as the "marginal value of a win" or a "price threshold." Analyzing \(F(s_i) \cdot (s_i - \lambda)\), two cases arise:

If the traffic is "cheap" (\(s_i < \lambda\)): The term \((s_i - \lambda)\) is negative. To minimize this (make it more negative), we want \(F(s_i)\) to be as large as possible—i.e., bid higher to win the "bargain."
If the traffic is "expensive" (\(s_i > \lambda\)): The term \((s_i - \lambda)\) is positive. To minimize this, we want \(F(s_i)\) to be as small as possible (approaching 0)—i.e., bid lower or abstain from the "loss-making" auction.

In practice, \(\lambda\) is also dynamically tuned via a PID controller. If volume is insufficient (\(T\) is not reached), \(\lambda\) is increased; if volume exceeds the target, \(\lambda\) is decreased.

Key Technical Challenges

Q1: Win-Rate Modeling

Accurately modeling the win rate \(F(s)\) is the most critical component of this framework.

The most straightforward approach is to directly model the win probability given a bid, which can be implemented in two ways:

Method One: Log Replay. Construct <bid, win/loss> pairs from historical logs and train a binary classifier (similar to a CTR model). However, this requires careful handling to ensure \(F(s)\) remains monotonic with respect to \(s\).
Method Two: Curve Fitting. Perform parametric or non-parametric regression on historical statistics to better guarantee the monotonicity of \(F(s)\).

A more sophisticated approach is Distribution Estimation—instead of modeling the win probability directly, we model the probability distribution of the highest competing score in the auction, similar to the Deep Landscape Forecasting (DLF) method.

The core difference is that the former is Point Estimation, while the latter is Distribution Estimation. Point estimation only provides \(\mathbb{P}(\text{win}=1 | s, \text{context})\). To determine "how the win rate changes if I bid \(s + 0.1\)," the model must re-run inference, which is computationally expensive. More importantly, it cannot capture the uncertainty of the competitive intensity.

And methods like Deep Landscape Forecasting predict the probability distribution parameters of the opponent's highest score (assuming it is a Gaussian distribution, with parameters \(\mu, \sigma\)). Once the distribution function is available, you can directly get the win rate under any bid through \(F(s) = \Phi(s; \mu, \sigma)\).

And in this, \(\sigma\) (variance) has a clear physical meaning: it represents the uncertainty of the current request competition. If the user-side historical features show that this user often watches live streaming (intense and unstable competition), \(\sigma\) will become larger. In the dual optimization logic, a large \(\sigma\) will automatically trigger "conservative bidding," preventing volume loss caused by prediction bias.

For the point estimation modeling method, "almost winning (opponent score 1.01, you bid 1.0)" and "miserable defeat (opponent score 5.0, you bid 1.0)" are the same in the eyes of the model, but they can both be modeled in distribution estimation. This also makes distribution estimation characterize the "competition landscape" far more accurately than binary models, especially when you try to lower scores through Shading, DLF can more accurately tell you "to what extent lowering will encounter the dense area of competitors."

Therefore, in the following situations, the necessity of modeling distribution methods like DLF will be relatively strong:

1) Highly uneven score distributions: If competitor scores cluster heavily under specific features (e.g., evening peaks), DLF's descriptive power helps avoid "red ocean" competition.
2) High sensitivity to volume loss: If undershooting the load target due to aggressive shading is unacceptable, DLF's variance \(\sigma\) provides a "safety margin" that binary models cannot.
3) Compute constraints: If you cannot afford multi-point sampling online, the "infer once, solve analytically" nature of DLF is the only viable choice.

Regarding DLF implementation, refer to Deep Landscape Forecasting for Real-time Bidding Advertising. Key technical points include:

The core of DLF lies in not directly modeling the win rate, but modeling the probability distribution \(P(Z|\theta)\) of the opponent's highest score \(Z\). Suppose \(Z\) follows a log-normal distribution (Log-Normal), and the model outputs parameters \(\theta = (\mu, \sigma)\). The modeling details are as follows:

1. Core Definitions

Probability Density Function (PDF) \(f(z | \theta)\): The probability that the highest competing bid is exactly \(z\).
Cumulative Distribution Function (CDF) \(F(s | \theta) = P(Z < s)\): The win rate at bid \(s\).
Survival Function \(S(s | \theta) = 1 - F(s | \theta) = P(Z > s)\): The probability that bid \(s\) fails to win.

2. Loss Function Derivation

In training, samples are typically Right-Censored—after a loss, we only know \(Z \ge s\). DLF utilizes a Negative Log-Likelihood (NLL) loss:

\[\text{Loss} = \sum_{i \in \text{win}} \underbrace{-\log f(Z_i | \theta_i)}_{\text{Regression on observed second-prices}} + \sum_{i \in \text{loss}} \underbrace{-\log S(s_i | \theta_i)}_{\text{Constraint on censored/lost bids}}\]

For winners: We observe the true highest competing score \(Z_i\) (the second price) and minimize the residual via the PDF.
For losers: We only know \(Z_i > s_i\). The survival function term forces the model to learn that the opponent's distribution is concentrated above the current bid.

3. Online Analytical Decision-Making

Since \(F(s)\) is parameterized as a known distribution (e.g., \(\Phi\)), we can solve for \(s_i^*\) analytically. For Problem One, the optimal \(s\) is found by taking the first derivative and setting it to zero:

\[\frac{\partial}{\partial s} [F(s)(V - \lambda s)] = 0\]

Simplifying yields:

\[F'(s) \cdot (V - \lambda s) - \lambda F(s) = 0 \implies \frac{F(s)}{F'(s)} + s = \frac{V}{\lambda}\]

The elegance of DLF lies in the fact that \(F(s)\) and \(F'(s)\) are no longer black boxes but concrete formulas with parameters \(\mu, \sigma\). For a Log-Normal distribution:

\(\Phi\) (CDF): \(F(s) = \Phi(\frac{\ln s - \mu}{\sigma})\)

\(\phi\) (PDF): \(F'(s) = \frac{1}{s\sigma} \phi(\frac{\ln s - \mu}{\sigma})\)

This eliminates the need for expensive discrete sampling or grid searches online, allowing the system to locate the optimal shaded score in microseconds.

Q2: Feedback Loop Mitigation

When Score Shading is active, the observed "win rate" is based on already-reduced scores. As bids decrease, the competitive landscape appears "harder to win," causing \(F(s)\) to downgrade further—leading to a "death spiral" of plummeting bids.

The standard solution is an Epsilon-Greedy exploration mechanism: allocating a small percentage of traffic to an "exploration bucket" where scores remain unshaded. This provides the ground truth needed to collect unbiased competition data.

Systemic Risks

From a single business line's perspective, Score Shading is a perfect local optimizer. However, from a platform-wide perspective, if every business line shades independently, it triggers a game-theoretic "race to the bottom," potentially destabilizing the entire distribution ecosystem.

Potential Pitfalls

If the platform allows every format (Short Video, Articles, E-commerce, Live Streaming) to run independent PID controllers for Score Shading, several issues arise:

(1) Score Deflation and the "Broken Window" Effect (Race to the Bottom) Suppose Articles and Live Streaming compete for the same slot. Live Streaming shades its score from 1.0 to 0.8. Articles, seeing weaker competition, shades its score from 0.9 to 0.7 to win. Live Streaming then finds it can still win at 0.6...

This leads to a severe leftward shift (deflation) of the global ranking scores (MixRank Score). Many platforms have hard "experience protection thresholds" (e.g., filtering content with Score < 0.3). Score deflation can cause high-quality content to fall below these absolute thresholds, resulting in "empty slots" (frontend gaps) or a flood of low-quality fallback content.

(2) Oscillatory Instability via Multi-PID Coupling (Oscillation & Chaos) In mixed ranking, traffic is often a zero-sum game. When Live Streaming's \(\lambda_1\) and Short Video's \(\lambda_2\) controllers act on the same traffic pool, they form a complex non-linear coupling. If Live Streaming slightly increases its score, Short Video's volume (Load) instantly drops, triggering a violent compensatory reaction (score spike) from the Short Video PID.

This results in high-frequency, pulse-like oscillations in online traffic distribution—where the feed might be entirely Live Streaming one minute and entirely Short Video the next. This instability destroys user experience and creates "compute tides" that can overload backend servers.

Governance Strategies

To mitigate these systemic issues, two primary strategies exist: establishing a global "Score Anchor" or implementing a unified auction mechanism (e.g., VCG) to internalize externalities.

Solution One: The "Global Anchor" Strategy

Without an anchor, Score Shading is like countries engaging in currency wars. Since the auction is "relative," absolute scores can collapse infinitely. An anchor acts as the "Gold Standard" or "USD" of the mixed ranking economy. The platform mandates that a dominant core format (e.g., organic recommendation for Short Videos or Articles, accounting for >70% of traffic) is strictly prohibited from shading. Its score must 100% honestly reflect its estimated business value (True Value). Alternatively, Ads can serve as an anchor as their eCPM has a clear physical meaning and is less prone to abrupt shifts.

By fixing this anchor, all other formats (Live, E-commerce) have a "physical floor" for their shading behavior. If they shade below the anchor's true value, they immediately lose volume. This creates a Price Floor that prevents systemic collapse and the "race to the bottom."

Solution Two: Unified Auction Mechanism (VCG) for Accounting

Although VCG is typically an ad-tech concept, it can be applied to all business lines to account for the "opportunity cost" one format imposes on the platform. This effectively eliminates the incentive for shading.

The core logic of the VCG mechanism is: business lines no longer pay according to their own "bid," but pay according to the "social cost" they cause to the system. In other words, if you win, you pay for the value lost by those you displaced.

Specifically, if Business A wins a slot, the "price" it pays equals: "the total value other participants would have gained if A were absent" minus "the total value other participants gain when A is present." For example:

Suppose there are 3 slots. With A, the optimal set is {A, B, C} with total value \(W = V_A + V_B + V_C\). The steps for VCG pricing are:

Calculate total value \(W\).
Simulate the system "without A." A new business D enters, and the set becomes {B, C, D}.
A is charged the "Score Cost" \(P_A = (V_B^{new} + V_C^{new} + V_D) - (V_B^{old} + V_C^{old})\).

VCG is the gold standard for achieving Incentive Compatibility. In an incentive-compatible system, "telling the truth" (truthful bidding) is the dominant strategy—any attempt to shade or manipulate scores will harm the participant's own interests. VCG aligns individual and platform interests perfectly:

If Business A overbids (Positive Shading), its cost is determined by those behind it. If its true value is lower than the clearing price, it incurs a loss.
If Business A underbids (Negative Shading), it only reduces its chance of winning without reducing the cost it pays when it does win (since its price is independent of its own bid).

Under VCG, business lines realize that shading yields no extra benefit, and only truthful reporting of estimated value maximizes their surplus.

However, despite being mathematically perfect, VCG faces massive implementation hurdles in production.

First is Computational Complexity. VCG requires simulating the "counterfactual" for every winner. For \(N\) slots, this theoretically requires \(N+1\) full ranking passes. In high-performance recommendation engines, this latency is often prohibitive.

Second is Metric Alignment. VCG assumes \(V_B + V_C\) is additive. But is 1 minute of Live Stream duration truly equivalent to 1 minute of Short Video duration? When Ads (eCPM) and E-commerce (GMV) are added, the units are inherently misaligned. If the global value function \(f(V)\) is poorly defined, VCG will generate misleading charges, causing the ecosystem to tilt dangerously toward one format and collapse.

Consequently, VCG is typically reserved for highly refined stages of platform growth where absolute fairness is the priority. In earlier stages, Solution One (Global Anchor) is usually preferred for its simplicity and agility.

Summary

This article explores the transition of Bid Shading from the advertising domain to the feed mixed ranking landscape—Score Shading. By treating PK Scores as "implicit bids," we bridge the gap between auction theory and recommendation algorithms, providing a rigorous mathematical framework for format penetration and scale control.

The success of Score Shading rests on three pillars:

Dual Optimization Framework: Using Lagrangian Duality to transform global business constraints into real-time decisions via PID-controlled \(\lambda\).
Win-Rate Modeling: Moving beyond binary point estimation to Distribution Estimation (DLF), allowing the system to map the "competitive landscape" and solve for optimal bids analytically and efficiently.
Systemic Governance: Guarding against score deflation and oscillations through Global Anchors or VCG-based incentive compatibility to align individual format goals with platform-wide health.

As mixed ranking systems evolve, simple value comparisons are being replaced by sophisticated mechanism design. Score Shading is not just a tuning trick; it is a manifestation of platform governance through mathematical modeling. Future efforts will likely incorporate Reinforcement Learning (RL) to manage even longer-cycle constraints, such as long-term retention (LT), ensuring fairness and growth across a complex multi-format ecosystem.

======= 中文版 =======

在网盟广告（Ad Network）场景中，会有多个 DSP（Demand Side Platform）共同参与竞价，且通常采用一价拍卖（First Price Auction, FPA）的计费方式。此时 DSP 往往会通过 Bid shading 技术，在保证一定获量能力的前提下，最大化自身的 ROI 和收入

Bid shading 是一种在一价拍卖中衍生出来的技术，因为广告主最终支付的价格等于其出价（即按出价付费），直接提交真实估价往往会导致超额支付，造成成本浪费。所以 Bid shading 基于历史数据预估竞胜率或第二高价，动态调整出价，使得最终支付金额低于原始出价，但高于第二高价，从而在维持竞胜率的同时降低成本、提升 ROI

而在信息流混排场景中，各业务方（如广告、电商、直播、短视频、图文等）在每次请求中都会给出一个 PK 分用于排序分。从机制角度看，这个 PK 分其实与广告系统中的 bid 非常类似：PK 分可以被看作是业务方对曝光机会的一种 “隐式出价”

因此，一个自然的问题是：是否可以借鉴 Bid Shading 的思想，对 PK 分进行系统化调节，从而优化整体混排收益？原理上自然是可行的，本文将其称为 Score Shading 方法，即对 PK 分进行类似的动态扰动（Shading），在保持竞争力的前提下，寻找全局最优的 “报价点”，以达到整体业务收益最大化的目标。期望解决以下两个核心问题：

问题一：在高价值流量上，提高某种体裁从 “不出” 到 “出” 的比例，提升这种体裁的渗透率（load 增长）
问题二：在保持整体竞胜率（load）不变的前提下，降低某种体裁的 PK 分量纲（即成本），使其保持在合理范围内

本文将首先回顾网盟场景下的 Bid shading 建模方法，然后将其推广至混排 Score shading，并详细讨论两种目标下的优化建模、拉格朗日对偶解法以及 λ 的调控机制。最后对竞胜率建模、激励兼容性等关键问题展开讨论

Bid Shading 做法

在网盟场景下的 DSP 视角，Bid shading 问题往往可以建模为如下的最优化问题，最大化的目标是 cost, 约束是利润率（1 - 媒体分成 / 收入）

\[ \begin{aligned} \max_{x_i} \quad & \sum_{i} C_i F(x_i) \\ \text{s.t.} \quad & \sum_{i} x_i F(x_i) \le K \sum_{i} C_i F(x_i) \end{aligned} \]

各符号含义如下

\(C_i\)：第 \(i\) 条流量的价值（在网盟的场景下一般是 cost）
\(x_i\)：dsp 在当前流量的报价
\(F(x_i)\): 在 \(x_i\) 报价下的竞胜率
\(K\)：利润率约束

通过拉格朗日乘数法可构造如下拉格朗日系数，其中 \(\lambda\) 是拉格朗日乘子 ( \(\lambda >0\))

\[\mathcal{L}(x_i, \lambda) = \sum_i C_i F(x_i) - \lambda \left[ \sum_i x_i F(x_i) - K \sum_i C_i F(x_i) \right]\]

整理公式可得

\[\mathcal{L}(x_i, \lambda) = \sum_i F(x_i) \left[ (1 + \lambda K)C_i - \lambda x_i \right]\]

\(\lambda\) 的作用是为了让约束达成，在实际中往往会通过控制器（如 PID）来动态调控利润率即 TAC 满足预期。但在一个时间片内，\(\lambda\) 的值是固定的，因此线上对每条请求实际生效的时候，就是选择一个最优的 \(x_i^*\) 使得上面的 \(\mathcal{L}\) 最大化，即

\[x_i^*= \arg\max_{x_i}\;F(x_i)\left[(1+\lambda K)C_i - \lambda x_i\right]\]

那 \(\lambda\) 应该如何调控？我们可以对上面的公式进一步整理成如下形式

\[\mathcal{L} = \sum_i \cdot F(x_i) \left[C_i + \lambda(KC_i-x_i) \right]\]

则当实际的 tac 高于设定的目标 K 时（dsp 利润不达预期），调大 \(\lambda\)，此时如果选择一个很大的对外报价 \(x_i\), 会让 \(\mathcal{L}\) 的值变小甚至为负，因此会自然选择一个小的对外报价 \(x_i\)，提高这一次请求的利润率。反之会调小 \(\lambda\) 提高 dsp 拿量的能力

Score Shading 方案

如同在混排中，每个业务方在请求中的 PK 分也可被认为是对这次请求的出价，基于上面 bid shading 的建模范式，也可以分别对下面的问题做建模。但在此前，有几个通用关键变量需要定义

\(V\)：价值项，即在混排中某种体裁胜出后的业务价值，需要由业务自定义
\(s\)：混排的 PK 分，可以近似作为混排中获胜需要付出的成本
\(F(s)\): 竞胜率模型，在 PK 分 \(s\) 下的竞胜率
\(C\): 业务成本的约束（一般是 LT，但实际中往往难以直接应用，所以一般需要找代理指标）

基于上述符号，我们尝试对以下两个问题分别做建模
- 问题一：在高价值流量上，提高某种体裁从 “不出” 到 “出” 的比例，提升这种体裁的渗透率（load 增长）
- 问题二：在保持整体竞胜率（load）不变的前提下，降低某种体裁的 PK 分的量纲 / 成本，使其保持在合理范围内

问题一：提升渗透

这个问题定义比较清晰，就是最大化每次竞价的收益期望，同时保证成本（PK 分）在约束范围内

\[ \begin{aligned} \max_{s_i} \quad & \sum_{i=1}^{n} V_i F(s_i) \\ \text{s.t.} \quad & \frac{1}{n}\sum_{i=1}^{n} s_i F(s_i) \le C \end{aligned} \]

上面建模比较关键的问题是约束 \(C\) 的定义，从业务本质上来讲，LT 是最根本和最核心的约束，但由于观测周期长，短期波动性较大，不适合作为直接的约束指标

同时，由于大混排业务中对各个业务方的 PK 分往往有直接约束（如 PK 分的 p90、p99 分位相较于其他业务不能过大），因此这里直接考虑将竞胜的 PK 分作为约束；理论上均值、P90、P99 都需要一个单独的约束，这里就只写了均值作为约束

同样使用上面的拉格朗日对偶建模方式，可得到拉格朗日函数形式如下

\[\mathcal{L}(s_i,\lambda)=\sum_{i=1}^{n} V_i F(s_i) - \lambda \left(\sum_{i=1}^{n} s_i F(s_i) - nC\right)\]

整理可得，

\[\mathcal{L}=\sum_{i=1}^{n} F(s_i) \left(V_i - \lambda s_i\right)+ \lambda nC\]

由于 \(C\) 是常数，可以忽略，则第 \(i\) 次请求时，只需要选择一个 \(s_i^*\) 使得以下目标最大化即可，其中 \(\lambda\) 是控制均值 PK 分在约束范围内的 pacing 系数（多约束情况下会有多个 \(\lambda\)）; C 是常数项对所有候选都一样，在下面被忽略了

\[s_i^*=\arg\max_{s_i}\;F(s_i)\left[(V_i - \lambda s_i) \right]\]

问题二：降低成本

在这个问题中，需要保持业务收益不变前提下，尽量降低 PK 分，可以把问题形式化为如下的形式。其中目标是最小化 PK 分（暂时也以均值为例），\(T\) 是竞胜率约束（近似 load 约束）

\[ \begin{aligned} \min_{s_i} \quad & \frac{1}{n}\sum_{i=1}^{n} s_i F(s_i) \\ \text{s.t.} \quad & \frac{1}{n}\sum_{i=1}^{n} F(s_i) = T \end{aligned} \]

则同样通过拉格朗日推导，可得到下面的对偶函数

\[\mathcal{L}(s_i,\lambda)=\frac{1}{n}\sum_{i=1}^{n} s_i F(s_i)+\lambda \left(\sum_{i=1}^{n} F(s_i) - nT \right)\]

最终的决策目标，即每次选择一个 \(s_i^*\), 使得下面的最小即可，其中 \(\lambda\) 是控制竞胜率在约束范围内的 pacing 系数

\[s_i^* = \arg\min_{s_i}\;F(s_i) \cdot \left(s_i - \lambda\right)\]

这里的 \(\lambda\) 实际上充当了 “边际胜率价值” 或 “价格阈值” 的角色。如果分析 \(F(s_i) \cdot (s_i - \lambda)\) 这个式子，这里有两种情况

当流量很便宜时 (\(s_i < \lambda\)): \(s_i - \lambda < 0\) ，要让结果（负数）最小，我们需要 \(F(s_i)\)（胜率）越大越好，尽量出高价去赢（因为这笔买卖划算）。
当流量很贵时 (\(s_i > \lambda\)):\(s_i - \lambda > 0\) 是正数，要让结果（正数）“最小”，我们需要 \(F(s_i)\)（胜率）越小越好（接近 0），需要尽量不出价或出低价放弃（因为这笔买卖亏了）

实际中也是通过 PID 控制器动态调节 \(\lambda\) 满足上面的约束。如果发现量不够 (\(T\) 没达到)，就调高 \(\lambda\)；如果量超了，就调低 \(\lambda\)

方案的关键问题

Q1：竞胜率建模

竞胜率模型 \(F(s)\) 的建模是这个方案最关键的部分之一了

最直观的思路是直接建模不同出价下竞胜的概率，而这又有 2 种方式

方式一：基于日志回放，构造对应的 < 出价，是否竞胜 > 样本对，然后训练一个二分类模型预估，类似 ctr 模型。但是这种情况下需要考虑 \(F(s)\) 的单调性，即需要保证随着 \(s\) 的增加，\(F(s)\) 也是增加的
方式二：可以基于离线统计数据，直接做曲线拟合，这种方式能更好保证 \(F(s)\) 的单调性

另一种思路则不直接建模胜率，而是建模一条请求中所有分数的分布情况，有了这个分布，便能获取某个出价下的胜率情况，类似 Deep Landscape Forecasting 这种方法

这种思路与前面思路的核心差异是，前者是点估计（Point Estimation），后者是分布估计（Distribution Estimation）

第一种思路的 \(F(s)\) 建模将其看作一个二分类问题，目标是预估 \(\mathbb{P}(\text{win}=1 | s, \text{context})\)。这种方式，只能告诉你 “在当前分数 \(s\) 下赢的概率”。如果你想知道 “如果我出 \(s + 0.1\) 胜率会变多少”，模型必须重新推理，推理成本较高。更重要的是，它很难描述竞争强度的确定性

而 Deep Landscape Forecasting 这类方法，预估的是对手最高分的概率分布参数（假设是高斯分布，参数为 \(\mu, \sigma\) ）。一旦有了分布函数，你可以直接通过 \(F(s) = \Phi(s; \mu, \sigma)\) 得到任意出价下的胜率。且在这种 \(\sigma\)（方差）具有明确的物理意义：它代表了当前请求竞争的不确定性。如果 User 侧历史特征显示这个用户经常看直播（竞争激烈且不稳定），\(\sigma\) 会变大。在对偶优化逻辑中，大的 \(\sigma\) 会自动触发 “保守出价”，防止因为预估偏差导致的丢量

对于点预估的建模方式而言，“差一点就赢（对手分 1.01，你出 1.0）” 和 “惨败（对手分 5.0，你出 1.0）” 在模型眼里是一样的，但在分布预估中都是能被建模到的，这也使得分布预估对 “竞争景观（Landscape）” 的刻画远比二分类模型精准，尤其是在你尝试通过 Shading 压低分数时，DLF 能更准确地告诉你 “压到什么程度会遇到竞争对手的密集区”

因此，在以下几种情况下，类似 DLF 这种建模分布的方法必要性会比较强：

1）分值分布极不均匀：如果竞争对手的分数经常在某些特征下聚类（比如晚高峰的分数陡增），DLF 的分布描述能力能帮你避开这些竞争红海
2）对丢量极度敏感：如果不能接受因为 Shading 过头导致的 load 掉坑，DLF 的方差 \(\sigma\) 提供的 “安全边际” 是二分类模型无法给出的
3）算力受限：无法在线上对 \(F(s)\) 进行多点采样搜索，DLF 推理一次、解析求解的方案是唯一选择

关于 DLF 的细节可以参考 Deep Landscape Forecasting for Real-time Bidding Advertising，这里只说 DLF 的一些核心关键点

DLF 的核心在于不直接建模胜率，而是建模竞争对手最高分 \(Z\) 的概率分布 \(P(Z|\theta)\)。假设 \(Z\) 服从对数正态分布 (Log-Normal)，模型输出参数 \(\theta = (\mu, \sigma)\)。建模的细节如下

1. 核心公式定义

概率密度函数 \(f(z | \theta)\)：描述对手最高价恰好等于 \(z\) 的概率
累积分布函数 \(F(s | \theta) = P(Z < s)\)：即当前业务在出价 \(s\) 时的竞胜率
生存函数 \(S(s | \theta) = 1 - F(s | \theta) = P()\)：即出价 \(s\) 依然无法获胜的概率

2. 损失函数推导

在训练阶段，样本存在典型的右截断（Right-Censored）特征，即在竞败后只知道 \(Z \ge s\) 但不知道大多少。DLF 采用负对数似然（NLL）损失函数：

\[\text {Loss} = \sum_{i \in \text {win}} \underbrace {-\log f (Z_i | \theta_i)}_{\text {针对已知竞争价的回归}} + \sum_{i \in \text {loss}} \underbrace {-\log S (s_i | \theta_i)}_{\text {针对截断数据的约束}}\]

对于赢了的样本：我们知道真实的竞争最高分 \(Z_i\)（二价分），此时通过 PDF 最小化残差
对于输了的样本：我们只知道 \(Z_i > s_i\)。生存函数项会强迫模型学习到 “对手分布必然在我的出价之上” 这一约束

3. 线上解析决策

由于 \(F(s)\) 被参数化为已知分布（如 \(\Phi\) 函数），在求解 \(s_i^*\) 时，我们可以直接利用其解析性质，比如说对于上面的问题一，找到一个 s 使得上面的目标值最大化，则可以直接求一阶导，则一阶导为 0 的值即为需要知道的那个 s

\[\frac{\partial}{\partial s} [F(s)(V - \lambda s)] = 0\]

进一步整理可得

\[F'(s) \cdot (V - \lambda s) - \lambda F(s) = 0\]

\[\frac{F(s)}{F'(s)} + s = \frac{V}{\lambda}\]

DLF（Deep Landscape Forecasting）的妙处在于它假设 \(F(s)\) 服从某种具体的数学分布（比如对数正态分布 Log-Normal）。一旦分布确定了，\(F(s)\) 和 \(F'(s)\) 就不再是黑盒，而是带有参数 \(\mu, \sigma\) 的具体公式。以对数分布为例

\(\Phi\) (CDF)：\(F(s) = \Phi(\frac{\ln s - \mu}{\sigma})\)

\(\phi\) (PDF)：\(F'(s) = \frac{1}{s\sigma} \phi(\frac{\ln s - \mu}{\sigma})\)

这种方式使得系统无需在线上进行繁琐的离散采样搜索，仅需一次推理即可秒级定位最优 Shading 点

Q2：Feedback Loop 问题

进行 Score Shading 后，观察到的 “胜率” 是基于削减后的分数的。而由于出价变低，观察到的竞争环境会显得更 “难赢”，导致 \(F(s)\) 进一步下修，陷入出价越来越低的死循环

具体的解决思路也很常见，就是引入 Epsilon-Greedy 探索机制，在开一个小流量探索实验，在这个实验上保持原始 Score 出价，用以收集真实的、无偏的竞争分布数据

系统性风险

在单个业务线看来，Score Shading 是一个完美的局部优化工具；但站在整个平台的视角来看，如果各个业务线都各自为战地进行 Shading，就会引发经典的博弈论困境，甚至导致整个分发系统的崩溃

潜在问题

如果平台放任各业务方（短视频、图文、电商、直播等）自行引入 PID 控制器进行 Score Shading，系统会面临以下问题：

(1) 分值通缩与体验 “破窗”（Race to the Bottom）

假设图文和直播都在争夺同一个版面：直播为了降低分值成本开启 Shading，分数从 1.0 降到 0.8。图文发现竞争变弱了，它的 PID 控制器也会自动向下 Shading，从 0.9 降到 0.7 就能赢。直播的 PID 发现 0.8 胜率依然很高，继续降到 0.6……

这容易导致全局排序分（MixRank Score）发生剧烈的左偏。很多平台在混排底层和前端展示逻辑中，会有硬性的 “体验保护门槛”（比如 Score < 0.3 视为低质内容，直接过滤）。分值通缩会导致大量优质内容跌破绝对值门槛，引发大面积的 “空出”（前端开天窗）或低质保底内容泛滥

(2) 多 PID 耦合引发的震荡（Oscillation & Chaos）

在混排系统中，某种程度上流量是一个零和博弈。当直播的 \(\lambda_1\) 控制器和短视频的 \(\lambda_2\) 控制器同时作用于同一个流量池时，两者会形成复杂的非线性耦合。比如说晚高峰时，直播稍微调高了一点分数获取了流量，短视频的获量（Load）瞬间下跌，短视频的 PID 会立即进行剧烈的反向补偿（拉高分数）

这会导致线上流量分配出现高频的脉冲式震荡。比如说上一分钟全是直播，下一分钟全是短视频。这种流量的不稳定性不仅会毁掉用户体验，还容易导致底层服务器的算力潮汐，引发系统过载

解决思路

为了缓解所有业务同时做 shading 带来的上面的问题，一般有两个思路：设立一个全局的 “分值锚点（Anchor）”，或引入统一的拍卖机制（如 VCG）来给每个业务记账，以保证整体混排系统的稳定性和激励兼容

方案一：确立 “全局锚点”（Global Anchor）

在没有锚点的混排系统中，各个业务的 Score Shading 就像是各个国家在疯狂印钞和货币贬值（为了出口 / 抢量）。由于一价 pk 只看 “相对大小”，如果不加限制，系统的绝对分值就会无限坍缩（通缩）。全局锚点的本质，就是建立混排体系的 “金本位” 或 “美元结算体系”

具体做法是平台强制规定：某一个占主导地位的核心业务（通常是占比 70% 以上的自然推荐图文或短视频），绝对不允许进行任何形式的 Score Shading。它的 pk 分必须 100% 诚实地反映预估的业务价值（True Value，如预估时长、预估互动率的某种线性或非线性加权）；又或者是广告这种 pk 分（ecpm）就具备天然的物理含义且不容易有突变的业务

通过固定锚点之后，其他变现业务（直播、电商、本地生活）的 Shading 行为就有了实体的 “物理底座”。它们在向下挤压分数时，一旦低于锚点业务的真实价值，就会立刻丢量。这就形成了一个天然的价格托底（Price Floor），从而避免了 “Race to the bottom” 的系统性崩溃。

方案二：所有业务走统一拍卖机制（如 VCG）来记账

虽然 VCG 一般是广告计费场景中的概念，但如果为所有业务都通过这种方式记录胜出的体裁对平台造成的 “机会成本”，能够从机制上消除各业务方进行 Shading 的动机

VCG 机制的核心逻辑是：业务方不再按自己的 “出价” 付费，而是按其对系统造成的 “社会成本” 付费。通俗理解就是，你赢了，是因为你把别人挤下去了。你付出的成本，就是被你挤掉的那些人所损失的价值总和

具体的计算逻辑是，按照各业务给出的 pk 分排序，如果业务 A 赢得了某个坑位，它需要支付的 “价格” 等于：“如果没有 A 参与，系统中其他所有业务方能获得的总价值” 减去 “在 A 参与的情况下，系统中除 A 以外其他业务方获得的总价值”, 如下是一个例子

假设当前有 3 个坑位，业务 A 参与时，系统最优排布是 {A, B, C}，总价值为 \(W = V_A + V_B + V_C\)。则通过 VCG 计算 A 应该支付的价格的步骤如下

（1）计算全集最优价值 \(W\)
（2）模拟 “如果没有 A” 的情况。此时系统会填补进来一个业务 D，最优排布变为 {B, C, D}
（3）A 应当被收取的 “分值成本” \(P_A = (V_B^{new} + V_C^{new} + V_D) - (V_B^{old} + V_C^{old})\)。

VCG 是实现激励兼容（Incentive Compatibility）的重要手段。激励兼容是博弈论机制设计中的一个核心概念。简单来说，如果一个系统是 “激励兼容” 的，那么参与者（各业务线）“说真话”（按真实业务价值出价，truthful bidding）就是他们的最优策略，任何伪装或调分都会损害他们自身的利益。VCG 实现激励兼容的核心点在于，它将个体利益与整体利益完全对齐，我们看如下两种情况

（1）如果业务方 A 故意报高分（Positive Shading），它虽然增加了赢的概率，但它支付的 VCG 成本是由排在它后面的业务决定的。如果它真实价值不够，支付的成本就会超过其真实收益，导致亏损
（2）如果业务方 A 故意报低分（Negative Shading），它只会降低自己赢的机会，而不会降低它赢了之后要支付的成本（因为支付成本与它自己的报价无关）

因此，在 VCG 机制下，各业务方会发现，任何 Shading 行为都不会带来额外收益，唯有如实上报预估价值才能获得最大化盈余

但是，VCG 在数学上近乎完美，却在工业界混排场景中面临巨大的落地压力

首先是计算复杂度的爆炸，VCG 要求对每一个胜出的业务都进行一次 “如果没有它” 的模拟重排。如果有 \(N\) 个坑位，理论上需要进行 \(N+1\) 次完整的混排计算。在高性能要求的推荐引擎中，这会带来巨大的计算延迟

其次是不同价值量纲对齐问题，VCG 的前提是 \(V_B + V_C\) 必须是可加的。但直播的 1 分钟时长和短视频的 1 分钟时长，对用户留存的贡献真的可以直接相加吗？更不用说还存在这广告 ecpm、电商 gmv 等各种业务天然量纲就对不齐的业务指标。而如果这个全局价值函数 \(f(V)\) 定义得不准，VCG 就会产生严重的误导性收费，导致整个生态向某个体裁极度倾斜，引发系统崩盘

因此，VCG 虽然理论上非常完备，但面临的问题是实现的复杂度高，往往是在业务进入精细化阶段，追求绝对公平分配阶段才会推进，这种方式下其实就不允许 score shading 了。而在业务早期，往往是用方案一，更有利于各业务快速迭代

总结

本文从广告领域的 Bid Shading 概念出发，深入探讨了其在信息流混排场景下的延伸 ——Score Shading。通过将混排 PK 分建模为业务方的 “隐式出价”，不仅打通了广告机制与推荐算法之间的壁垒，还为多业务体裁的渗透率优化与量纲控制提供了严谨的数学框架

Score Shading 的落地依赖于以下三个维度的协同

对偶优化的决策框架：利用拉格朗日对偶性（Lagrangian Duality），我们将全局的业务约束（如 Load 占比、平均 PK 分）拆解为单次请求下的实时决策问题。通过 PID 等控制器动态调节 \(\lambda\)，系统能够地感知大盘动态波动，并在 “收益最大化” 与 “成本受控” 之间找到动态平衡点
竞胜率模型建模：通过点预估和分布建模的方式，建模在特定出价下的竞胜概率。相较于常规的点预估方式，类似 DLF 的分布建模范式，模型不再仅仅给出一个孤立的胜率数值，而是描绘出整个竞争环境的 “景观（Landscape）”。这种分布估计的方法，配合线上解析求导决策，既保证了 Shading 过程的单调性与稳定性，又极大地优化了工程实现的算力效率
系统级稳定性的博弈治理：Score Shading 虽是业务局部优化的利器，但必须防范 “分值通缩” 与 “多方震荡” 的系统性风险。无论是建立全局锚点（Anchor）的 “金本位制”，还是探索 VCG 机制下的激励兼容性，其核心目的都在于将个体的局部利益与平台的全局生态利益对齐

随着混排场景日益复杂，简单的绝对值比较正逐渐向复杂的机制博弈演进。Score Shading 不仅仅是一个调分技巧，它本质上是平台治理能力建模化的体现。未来，如何在保障各业务方 “公平感” 的同时，进一步引入强化学习（RL）来处理更长周期的约束（如长效留存 LT），将是混排优化领域值得持续探索的深水区