Reading Notes: Modeling Delayed Feedback in Display Advertising

In computational advertising, conversions are delayed—users may convert some time after clicking, and deeper conversion funnels tend to have longer delays. When training CVR/deep-CVR models, two situations arise: (1) sending samples to the model too early, treating events that will eventually convert but haven’t yet as negative examples, causing model underestimation; (2) sending samples to the model too late, waiting for all samples to complete a sufficient time window, preventing timely model updates.

Therefore, modeling the conversion feedback delay is necessary. This paper “Modeling Delayed Feedback in Display Advertising” from Criteo provides a solution. The main idea: for samples where conversion hasn’t been observed yet, don’t treat them directly as negative samples, but instead give the model different gradient magnitudes based on how long since the click occurred. The paper validates this method’s effectiveness on Criteo’s real data. Additionally, the modeling approach from problem formulation to solution is worth studying.

Why Model Delay

A topic closely related to conversion feedback is attribution—assigning conversions to specific clicks. The paper uses common last-click attribution with a fixed 30-day attribution window.

Modeling feedback delay often starts with analyzing the delay distribution, which varies significantly across different advertisers and industries. The paper’s statistics (Figure 1) show about 35% of conversions are reported within one hour, 50% within 24 hours.

Additionally, the paper mentions new campaigns: many new campaigns are created daily (Figure 2). If samples for new campaigns aren’t fed to the model quickly, the model won’t estimate well for these new campaigns.

graph

These two figures correspond to the reasons for modeling delayed conversion mentioned earlier. Figure 1 shows that if attribution isn’t delayed, events that will eventually convert but haven’t yet are treated as negative examples, causing underestimation. Figure 2 shows that waiting too long before feeding samples prevents timely model updates.

Problem Formulation

Notation and meanings:

  • \(X\): Features
  • \(Y \in \{0, 1\}\): Whether conversion has occurred at current time
  • \(C \in \{0, 1\}\): Whether conversion will eventually occur
  • \(D\): True delay time for feedback
  • \(E\): Time elapsed so far

Current observation:

  • If conversion hasn’t been observed (\(Y=0\)), two possibilities:
      1. Conversion will never occur (\(C=0\))
      1. Conversion will occur but \(D>E\)
  • If conversion has been observed (\(Y=1\)), then \(C=1\) (Y=1 is sufficient condition for C=1)

The paper assumes elapsed time \(E\) is independent of eventual conversion time and whether conversion occurs:

\[P(C,D|X,E) = P(C,D|X)\]

This assumption is reasonable—whether conversion eventually occurs and its delay time are unrelated to how much time has elapsed.

The paper models the problem in two parts: first, a common CVR prediction model (formula 1); second, modeling feedback delay \(D\) through exponential distribution (formula 2). Besides exponential distribution, Weibull, Gamma, and Log-normal distributions are commonly used to model event time intervals in Survival analysis.

\[P(C=1|X=x)=p(x)=\frac{1}{1+\exp(-w_cx)} \tag{1}\]

\[P(D=d|X=x, C=1)=\lambda(x)\exp(-\lambda(x)d) \tag{2}\]

Where \(\lambda(x)\) is called the hazard function in survival analysis, representing event frequency/intensity. For exponential distribution, larger values mean higher event frequency and shorter intervals between events, making the probability density function steeper. To ensure \(\lambda(x) > 0\), let \(\lambda(x) = \exp(w_dx)\).

exp prob density

Exponential distribution models time intervals between events, while Poisson distribution models event counts in a time period. See 指数分布公式的含义是什么? on Zhihu for details.

With formulas (1) and (2), we can derive PVR estimation considering feedback delay. Sample likelihood can be written for two cases: samples where conversion has been observed and samples where it hasn’t.

Likelihood for Samples with Observed Conversion

The probability of currently observing conversion:

\[ \begin{align} p_1&=p(Y=1, D=d_i|X = x_i, E=e_i) \\ &= p(Y=1,D=d_i|X = x_i) \\ &= p(C=1,D=d_i|X = x_i) \\ &= p(D=d_i|X=x_i, C=1)*p(C=1|X=x_i) \end{align} \]