This research summary is part of our AI for Marketing series which covers the latest AI & machine learning approaches to 5 aspects of marketing automation:
- Attribution
- Optimization
- Personalization
- Analytics
- Content Generation: Images
- Content Generation: Videos
- Content Generation: Text
In this piece, we start by covering the important topic of marketing attribution and how AI approaches improve upon existing techniques.
Attribution is one of the key issues in marketing these days. If a customer is exposed to ads via multiple advertising channels and finally converts, how should we attribute this conversion? The answer to this question is crucial for optimal budget allocation during future advertising campaigns. One of the simplest approaches is to assign all credit to the last ad clicked before a conversion. However, this attribution mechanism neglects the effect of the ads before the last click. So what are the new state-of-the-art approaches to attribution in marketing?
We have summarized for you seven recently introduced approaches to performing multi-touch attribution. These methods produce much more reasonable credit allocation and thus, result in much better conversion rate prediction and more optimized advertising strategies.
If these accessible AI research analyses & summaries are useful for you, you can subscribe to receive our regular industry updates below.
If you’d like to skip around, here are the papers we featured:
- Additional Multi-Touch Attribution for Online Advertising
- Revenue-based Attribution Modeling for Online Advertising
- Attribution Modeling Increases Efficiency of Bidding in Display Advertising
- Learning Multi-touch Conversion Attribution with Dual-attention Mechanisms for Online Advertising
- Deep Neural Net with Attention for Multi-channel Multi-touch Attribution
- Causally Driven Incremental Multi-Touch Attribution Using a Recurrent Neural Network
- Shapley Meets Uniform: An Axiomatic Framework for Attribution in Online Advertising
Important Attribution Research Papers
1. Additional Multi-Touch Attribution for Online Advertising by Wendi Ji, Xiaoling Wang
Original Abstract
Multi-Touch Attribution studies the effects of various types of online advertisements on purchase conversions. It is a very important problem in computational advertising, as it allows marketers to assign credits for conversions to different advertising channels and optimize advertising campaigns. In this paper, we propose an additional multi-touch attribution model (AMTA) based on two obvious assumptions: (1) the effect of an ad exposure is fading with time and (2) the effects of ad exposures on the browsing path of a user are additive. AMTA borrows the techniques from survival analysis and uses the hazard rate to measure the influence of an ad exposure. In addition, we both take the conversion time and the intrinsic conversion rate of users into consideration. Experimental results on a large real-world advertising dataset illustrate that our proposed method is superior to state-of-the-art techniques in conversion rate prediction and the credit allocation based on AMTA is reasonable.
Our Summary
The paper introduces an Additional Multi-Touch Attribution model (AMTA), a novel approach to multi-touch attribution that borrows from survival analysis and uses hazard rate when modeling the effect of an ad exposure upon the conversion. The method is based on two important assumptions: (1) the effect of an ad exposure fades with time and (2) we can add the effects of multiple ad exposures. AMTA approach considers both the conversion delay and the intrinsic conversion rate of users. The experiments on a real-world advertising dataset demonstrate that AMTA model outperforms other state-of-the-art approaches in conversion prediction, implying the accuracy of the suggested attribution principles.
What’s the core idea of this paper?
- The introduced approach to multi-touch attribution called an Additional Multi-Touch Attribution model (AMTA) is based on two assumptions:
- the effect of an advertisement is fading with time;
- the effects of multiple advertisement exposures on the further conversion are additive.
- The AMTA model combines the survival analysis and exciting point process:
- the exciting point process alone only considers the occurrence of an event (i.e., conversion) and doesn’t take into account the users who have not converted yet;
- the conversion rate is extremely low in online advertising and it is necessary to consider the users that haven’t converted yet but might do so in the future thanks to the advertisements they were exposed to during the analyzed time period;
- the survival analysis allows to include censored data (the conversion has not occurred yet) into the model.
- Thus, the proposed model considers both:
- the conversion delay;
- the intrinsic conversion rate of a user.
What’s the key achievement?
- Introducing a rational and interpretable attribution model.
- Outperforming existing state-of-the-art approaches in predicting conversion within the next 30, 15, and 7 days:
- This is an important indicator as it is usually assumed that more accurate attribution models are likely to generate more accurate conversion predictions.
What does the AI community think?
- The paper was presented at AAAI 2017, one of the key conferences on artificial intelligence.
What are possible business applications?
- The suggested approach can be used by companies to get a more granular and interpretable insight of the true effects that advertisement exposures have on the conversions.
- More accurate attribution will result in better-optimized budget allocation and more successful advertising campaigns.
2. Revenue-based Attribution Modeling for Online Advertising by Kaifeng Zhao, Seyed Hanif Mahboobi, Saeed Bagheri
Original Abstract
This paper examines and proposes several attribution modeling methods that quantify how revenue should be attributed to online advertising inputs. We adopt and further develop relative importance method, which is based on regression models that have been extensively studied and utilized to investigate the relationship between advertising efforts and market reaction (revenue). Relative importance method aims at decomposing and allocating marginal contributions to the coefficient of determination () of regression models as attribution values. In particular, we adopt two alternative submethods to perform this decomposition: dominance analysis and relative weight analysis. Moreover, we demonstrate an extension of the decomposition methods from standard linear model to additive model. We claim that our new approaches are more flexible and accurate in modeling the underlying relationship and calculating the attribution values. We use simulation examples to demonstrate the superior performance of our new approaches over traditional methods. We further illustrate the value of our proposed approaches using a real advertising campaign dataset.
Our Summary
The GroupM research team addresses the attribution problem by decomposing and allocating marginal contributions to the coefficient of determination () of regression models. They experiment with two alternative approaches to decomposition: dominance analysis and relative weight analysis. Simulation examples and experiments with a real advertising campaign dataset demonstrate that proposed data-driven attribution models capture the interactions among different advertising channels and provide consistent and robust results across resamples.
What’s the core idea of this paper?
- The paper investigates the relationship between revenue and advertising efforts through the regression models. With this approach, the research team quantifies how revenue should be attributed to each of the channels , by using relative importance methods to decompose the coefficient of determination ().
- The researchers experiment with two relative importance methods:
- dominance analysis examines all possible subsets of variables to measure the marginal , and thus, suffers from computational efficiency when the number of channels is very high;
- relative weight analysis works through generating a set of auxiliary uncorrelated variables based on singular value decomposition, calculating attribution values for those auxiliary variables, and consequently transforming these variables back to the originals.
- Considering the inherent advantages and limitations of the proposed approaches, we can say that:
- dominance analysis is feasible for the settings with larger volumes (higher number of users) and fewer channels, while
- relative weight analysis is more practical for the settings with smaller volumes (lower number of users) and more channels.
- This research also extends the relative importance methods from linear regressions to additive models to allow for non-linear relationships between advertising channels and revenue.
What’s the key achievement?
- The experiments with simulated and real-world datasets demonstrate that:
- both methods, dominance analysis and relative weight analysis, consistently suggest very close relative importance values;
- additive models usually provide a better fit for the data and higher predictive accuracy.
What are future research areas?
- Extending the suggested approaches to a richer pool of regression models that marketing researchers may find useful. For example:
- dominance analysis can be applied to partially linear additive models, varying coefficient models, and others;
- relative weight analysis works well with models that are linear in coefficients, like marketing mix models, vector autoregressive models, and others.
What are possible business applications?
- Accurate and consistent attribution values produced by the proposed revenue-based approaches will assist advertising practitioners in making smarter decisions with regards to advertising strategy and planning.
3. Attribution Modeling Increases Efficiency of Bidding in Display Advertising by Eustache Diemert, Julien Meynet, Pierre Galland, Damien Lefortier
Original Abstract
Predicting click and conversion probabilities when bidding on ad exchanges is at the core of the programmatic advertising industry. Two separated lines of previous works respectively address i) the prediction of user conversion probability and ii) the attribution of these conversions to advertising events (such as clicks) after the fact. We argue that attribution modeling improves the efficiency of the bidding policy in the context of performance advertising.
Firstly we explain the inefficiency of the standard bidding policy with respect to attribution. Secondly, we learn and utilize an attribution model in the bidder itself and show how it modifies the average bid after a click. Finally, we produce evidence of the effectiveness of the proposed method on both offline and online experiments with data spanning several weeks of real traffic from Criteo, a leader in performance advertising.
Our Summary
The Criteo research team offers an effective and yet simple way to incorporate a proper attribution model into a bidding strategy. A proposed bidder acts oppositely to a baseline bidder, where only the last click is credited. In particular, a baseline bidder considers a click as a strong user engagement signal and increases the bid values after a click in order not to lose the last click. The proposed bidder, in contrast, drastically decreases the bid right after a click because the click already implies a high probability of attribution in case of conversion, and a better strategy might be to reduce post-click user exposure and use the free budget to attract users for which the impact of display advertising is higher. Both offline and online experiments demonstrate that this approach leads to increased efficiency for all involved parties, including an advertiser, an advertising platform, and a user.
What’s the core idea of this paper?
- The researchers claim that the most efficient bidding strategy should learn attribution from the data and then incorporate it into the bidder itself.
- An attribution-aware bidder will drastically decrease the bid values right after a click because this recent click already gives a high probability of getting the attribution in case of conversion, and the better strategy is to invest the money elsewhere.
- Then, the bid values are getting higher as more time passes after the last user’s click and no conversion occurs.
What’s the key achievement?
- The evidence from online and offline experiments demonstrate that attribution bidding policy introduced in the research paper leads to increased efficiency for all parties involved in display advertising:
- an advertiser gets an increased return on investment (ROI) as measured by the number of conversions generated for a given budget;
- an advertising platform reduces its costs related to bidding for ads with low attribution probability;
- a user is exposed to fewer ads from given advertisers after a click on their ads and gets more diverse advertisements.
What does the AI community think?
- The paper was presented at the 2017 AdKDD & Target Ad workshop. The workshop brings together researchers and practitioners in computational advertising.
What are future research areas?
- Studying alternative attribution models and their impact.
- Researching the problem of efficient bidding in display advertising in:
- the reinforcement learning setting where the state would incorporate the attribution evolution;
- the online learning framework where attributed value is learned across repeated auctions.
What are possible business applications?
- If implemented in advertising platforms, this attribution-aware bidder system is likely to increase the efficiency of bidding strategy and thus, benefit both advertisers and advertising platforms.
4. Learning Multi-touch Conversion Attribution with Dual-attention Mechanisms for Online Advertising by Kan Ren, Yuchen Fang, Weinan Zhang, Shuhao Liu, Jiajun Li, Ya Zhang, Yong Yu, Jun Wang
Original Abstract
In online advertising, the Internet users may be exposed to a sequence of different ad campaigns, i.e., display ads, search, or referrals from multiple channels, before led up to any final sales conversion and transaction. For both campaigners and publishers, it is fundamentally critical to estimate the contribution from ad campaign touch-points during the customer journey (conversion funnel) and assign the right credit to the right ad exposure accordingly. However, the existing research on the multi-touch attribution problem lacks a principled way of utilizing the users’ pre-conversion actions (i.e., clicks), and quite often fails to model the sequential patterns among the touch points from a user’s behavior data. To make it worse, the current industry practice is merely employing a set of arbitrary rules as the attribution model, e.g., the popular last-touch model assigns 100% credit to the final touch-point regardless of actual attributions. In this paper, we propose a Dual-attention Recurrent Neural Network (DARNN) for the multi-touch attribution problem. It learns the attribution values through an attention mechanism directly from the conversion estimation objective. To achieve this, we utilize sequence-to-sequence prediction for user clicks, and combine both post-view and post-click attribution patterns together for the final conversion estimation. To quantitatively benchmark attribution models, we also propose a novel yet practical attribution evaluation scheme through the proxy of budget allocation (under the estimated attributions) over ad channels. The experimental results on two real datasets demonstrate the significant performance gains of our attribution model against the state of the art.
Our Summary
The researchers introduce a Dual-attention Recurrent Neural Network model (DARNN) for deriving effective conversion attribution methodology. The idea behind this approach is to capture sequential user patterns with a recurrent neural network while also paying attention to both impression-level and click-level user actions. The experiments show that DARNN method outperforms the other state-of-the-art approaches by suggesting more accurate conversion estimation and better-optimized budget allocation.
What’s the core idea of this paper?
- A Dual-attention Recurrent Neural Network (DARNN) has two learning objectives:
- sequence-to-sequence modeling of the relationship between the impressions and the click actions;
- forecasting the probability of the user conversion with the attention learned from the sequential modeling.
- The architecture encompasses three parts:
- an encoder to model impression-level behavior;
- a decoder to predict click probability;
- a dual-attention mechanism for jointly modeling impression and click behavior and producing the final conversion estimation.
- Thus, the DARNN model applies the attention mechanism to the original touch points (i.e., impression features) as well as to the predicted click actions, combines both attentions and predicts the final conversion.
What’s the key achievement?
- Introducing a model that learns the attribution from the final conversion estimation instead of heuristically assigning credits.
- Proposing a novel offline evaluation protocol for measuring the effectiveness of the attribution model based on advertising budget allocation and campaign data replay.
- Outperforming the previous state-of-the-art baselines in terms of:
- conversion estimations accuracy,
- cost-effectiveness.
What does the AI community think?
- This paper was presented at the International Conference on Information and Knowledge Management (CIKM 2018).
What are future research areas?
- Considering the cost of ad impressions in the attention mechanism to improve cost-effectiveness performance.
What are possible business applications?
- This research paper has two contributions that can improve the effectiveness of online advertising campaigns:
- a new attribution model that outperforms other approaches in terms of cost-effectiveness and accuracy of conversion estimations;
- a novel attribution evaluation scheme that can be used for comparing the cost-effectiveness of different attribution models and always choosing the best available option.
Where can you get implementation code?
- The authors provide a TensorFlow implementation of this research paper.
5. Deep Neural Net with Attention for Multi-channel Multi-touch Attribution by Ning Li, Sai Kumar Arava, Chen Dong, Zhenyu Yan, Abhishek Pani
Original Abstract
Customers are usually exposed to online digital advertisement channels, such as email marketing, display advertising, paid search engine marketing, along their way to purchase or subscribe products( aka. conversion). The marketers track all the customer journey data and try to measure the effectiveness of each advertising channel. The inference about the influence of each channel plays an important role in budget allocation and inventory pricing decisions. Several simplistic rule-based strategies and data-driven algorithmic strategies have been widely used in marketing field, but they do not address the issues, such as channel interaction, time dependency, user characteristics. In this paper, we propose a novel attribution algorithm based on deep learning to assess the impact of each advertising channel. We present Deep Neural Net With Attention multi-touch attribution model (DNAMTA) model in a supervised learning fashion of predicting if a series of events leads to conversion, and it leads us to have a deep understanding of the dynamic interaction effects between media channels. DNAMTA also incorporates user-context information, such as user demographics and behavior, as control variables to reduce the estimation biases of media effects. We used a computational experiment of large real world marketing dataset to demonstrate that our proposed model is superior to existing methods in both conversion prediction and media channel influence evaluation.
Our Summary
Ning Li from the University of Washington together with an Adobe team proposes a novel approach to solving the attribution problem with deep learning. They present Deep Neural Net with Attention Multi-touch Attribution model (DNAMTA) that predicts if a series of events (e.g., display impression, display click, email opening) lead to conversion. The method is based on the Long Short-Term Memory (LSTM) architecture with an attention mechanism. The experiments demonstrate that this approach outperforms several strong baselines with respect to conversion prediction.
What’s the core idea of this paper?
- Introducing a novel approach to evaluating the impact of each advertising channel on the user conversion. The proposed model, called Deep Neural Net with Attention Multi-touch Attribution (DNAMTA):
- is based on LSTM architecture;
- solves a problem as a classification task, i.e. each path is either positive (with conversion) or negative (no conversion);
- incorporates an attention mechanism to identify touchpoints that are more important to the conversion;
- includes a time decay attention layer to reflect the assumption that a touchpoint contribution decreases with time.
- Finally, the researchers also suggest taking into account some of the customer characteristics, including gender, age and other static information (i.e., control variables) as they may impact the conversion engagement. To this end, they propose a fusion model which is built on the original DNAMTA model by introducing another deep neural network for learning the control variables.
What’s the key achievement?
- Suggesting a data-driven solution of the attribution model with well interpretable attribution estimates.
- Outperforming the baselines in terms of conversion prediction.
What does the AI community think?
- The paper got published in AdKDD 2018 workshop as part of KDD 2018, the conference on Knowledge Discovery and Data Mining.
What are possible business applications?
- The suggested model can benefit management of online advertising campaigns since it:
- predicts conversion reasonably well;
- gives insight on how every touchpoint contributes to the conversion decision at any specific time making the suggested attribution coefficients easily interpretable.
6. Causally Driven Incremental Multi-Touch Attribution Using a Recurrent Neural Network, by Ruihuan Du, Yu Zhong, Harikesh Nair, Bo Cui, Ruyang Shou
Original Abstract
This paper describes a practical system for Multi-Touch Attribution (MTA) for use by a publisher of digital ads. We developed this system for JD.com, an eCommerce company, which is also a publisher of digital ads in China. The approach has two steps. The first step (‘response modeling’) fits a user-level model for purchase of a product as a function of the user’s exposure to ads. The second (‘credit allocation’) uses the fitted model to allocate the incremental part of the observed purchase due to advertising, to the ads the user is exposed to over the previous T days. To implement step one, we train a Recurrent Neural Network (RNN) on user-level conversion and exposure data. The RNN has the advantage of flexibly handling the sequential dependence in the data in a semi-parametric way. The specific RNN formulation we implement captures the impact of advertising intensity, timing, competition, and user-heterogeneity, which are known to be relevant to ad-response. To implement step two, we compute Shapley Values, which have the advantage of having axiomatic foundations and satisfying fairness considerations. The specific formulation of the Shapley Value we implement respects incrementality by allocating the overall incremental improvement in conversion to the exposed ads, while handling the sequence-dependence of exposures on the observed outcomes. The system is under production at JD.com, and scales to handle the high dimensionality of the problem on the platform (attribution of the orders of about 300M users, for roughly 160K brands, across 200+ ad-types, served about 80B ad-impressions over a typical 15-day period).
Our Summary
The researchers from JD.com, a key Chinese eCommerce firm and a publisher of digital ads, introduce a new approach to multi-touch attribution. In particular, they suggest a two-step solution with a recurrent neural network (RNN) used at the first step to fit a user-level model for conversion as a function of the user’s exposure to ads. In the second step, the authors use Shapley values to allocate the incremental part of the purchase to the ads that the user was exposed to over the previous T days. The implementation of the suggested approach in production at JD.com demonstrates that this system is effective and applicable in the real world and can be scaled to a huge ad publishing platform with 300M users, over 200 ad types and about 80B ad impressions over a 15-day period.
What’s the core idea of this paper?
- The research team introduces a novel two-step approach to multi-touch attribution:
- Response modeling. To model a user-level conversion as a function of the user’s exposure to ads, the researchers suggest using a recurrent neural network (RNN):
- The RNN allows capturing heterogeneity across users, and the responsiveness of current purchases to a sequence of past ad exposures, as well as to the intensity, timing, and competitiveness of ad exposures.
- The model outputs the probability that a user buys a certain product within a given time period, given the impressions served to the user and a set of user characteristics.
- Credit allocation. To allocate the incremental part of the observed purchase due to advertising to the advertisements that the user was exposed to over the last T days, the authors suggest computing Shapley values, which have a good theoretical grounding and satisfy fairness considerations.
- Response modeling. To model a user-level conversion as a function of the user’s exposure to ads, the researchers suggest using a recurrent neural network (RNN):
What’s the key achievement?
- Introducing a coherent, theoretically-grounded and data-driven attribution framework that:
- captures that some orders are more advertising-driven while others are likely to occur irrespective of advertising;
- helps identify top advertisements that contribute the most to conversion;
- is scalable to a high-dimensionality advertising platform that serves millions of customers, with these customers exposed to billions of ad impressions.
What does the AI community think?
- The paper was presented at the AdKDD workshop within the KDD 2019 conference.
What are future research areas?
- Developing a method for optimal advertiser budget allocation by incorporating the attribution defined by the proposed model.
What are possible business applications?
- The introduced multi-touch attribution framework can be used by ad publishers, including eCommerce platforms, and also advertisers to assign credit to the various ads they buy.
7. Shapley Meets Uniform: An Axiomatic Framework for Attribution in Online Advertising, by Raghav Singal, Omar Besbes, Antoine Desir, Vineet Goyal, Garud Iyengar
Original Abstract
One of the central challenges in online advertising is attribution, namely, assessing the contribution of individual advertiser actions including emails, display ads and search ads to eventual conversion. Several heuristics are used for attribution in practice; however, there is no formal justification for them and many of these fail even in simple canonical settings. The main contribution in this work is to develop an axiomatic framework for attribution in online advertising. In particular, we consider a Markovian model for the user journey through the conversion funnel, in which ad actions may have disparate impacts at different stages. We propose a novel attribution metric, that we refer to as counterfactual adjusted Shapley value, which inherits the desirable properties of the traditional Shapley value. Furthermore, we establish that this metric coincides with an adjusted “unique-uniform” attribution scheme. This scheme is efficiently computable and implementable and can be interpreted as a correction to the commonly used uniform attribution scheme.
Our Summary
In this paper, the authors from Columbia University try to take a systematic and theoretically-sound approach towards measuring attribution in online advertising. First of all, they represent the user journey through the conversion funnel as an abstract Markov chain model, where at each period the user is in one of many states, and an advertiser takes an action observing this state. Then, the researchers propose a new metric for attribution in the Markovian model of user behavior, called counterfactual adjusted Shapley value. This metric inherits the benefits of the classical Shapley value (SV), such as efficiency, symmetry, and linearity, but in contrast to the classical SV, can be easily computed. The authors also demonstrate that the suggested metric coincides with an adjusted “unique-uniform” attribution scheme.
What’s the core idea of this paper?
- Despite the importance of the attribution issue in online advertising, the research community has still not defined the “best” attribution measure:
- Incremental value heuristic (IVH) seems to be the most popular one but lacks theoretical grounding.
- Shapley value (SV) has strong theoretical justification but cannot be estimated exactly, requiring certain assumptions.
- The researchers suggest an abstract Markov chain model for representation of the user journey through the conversion funnel:
- At every period, the user is in one of the finitely many states.
- An advertiser observes the state and takes action.
- Then, instead of the traditional approach of attributing the value only to advertising actions, the authors suggest attributing the values to each state-action pair.
- Finally, the paper suggests measuring attribution in the Markovian model using a counterfactual adjusted Shapley value, a metric that is efficiently computable, implementable and interpretable as a correction to the popular uniform attribution scheme.
What’s the key achievement?
- Suggesting a new metric for measuring attribution in online advertising, a counterfactual adjusted Shapley value, that:
- inherits the desirable properties of the classical SV;
- is robust to a mix of network structures;
- coincides with a unique-uniform attribution scheme;
- can be easily computed.
What does the AI community think?
- The paper was accepted for presentation to The Web Conference 2019.
What are future research areas?
- Developing more canonical settings to verify the appropriateness of the proposed metric.
- Understanding the statistical efficiency of the algorithms used to estimate this metric.
- Comparing the output of the suggested scheme with the alternative ones on a real-world dataset.
- Applying the introduced methodology to other domains.
What are possible business applications?
- The introduced metric can help advertisers get a better understanding of:
- the value of specific channels at a given time;
- the optimal budget allocation.
Enjoy this article? Sign up for more AI for marketing research updates.
We’ll let you know when we release more summary articles like this one.
Bay tech media says
In the realm of digital marketing, attribution methodologies have undergone significant advancements. State-of-the-art approaches include Multi-Touch Attribution (MTA) for holistic channel tracking, Algorithmic Attribution leveraging machine learning for precise credit assignment, Cross-Device Attribution capturing interactions across devices, Incrementality Testing to gauge true marketing impact, and AI-Powered Attribution for deep data analysis. Bay Tech Media implements these cutting-edge methods, empowering businesses with accurate insights to refine and optimize their marketing strategies effectively.