This research summary is part of our AI for Marketing series which covers the latest AI & machine learning approaches to 5 aspects of marketing automation:
- Attribution
- Optimization
- Personalization
- Analytics
- Content Generation: Images
- Content Generation: Videos
- Content Generation: Text
In this piece, we describe approaches to optimize marketing and advertising campaigns to improve targeting and improve ROI on marketing spend.
Ads are getting very expensive, effective marketing channels such as content marketing, become crowded, it’s hard to manage and coordinate the omnichannel presence… May artificial intelligence offer some working solutions for optimizing marketing campaigns? In fact, top companies are already enjoying the possibilities that various machine learning algorithms, including deep neural networks, provide for choosing the right ad to show the right customer at the right time.
Top tech companies, including Google, Amazon, and Alibaba, implement state-of-the-art machine learning approaches that demonstrate their effectiveness at optimizing marketing campaign allocation and improving customer targeting. We have researched the latest breakthroughs as well as best practices from the leading companies to provide you with the latest advances introduced by machine learning researchers throughout the last few years.
If these accessible AI research analyses & summaries are useful for you, you can subscribe to receive our regular industry updates below.
If you’d like to skip around, here are the papers we featured:
- Field-aware Factorization Machines in a Real-world Online Advertising System
- Deep & Cross Network for Ad Click Predictions
- Contextual Multi-Armed Bandits for Causal Marketing
- Deep Interest Evolution Network for Click-Through Rate Prediction
- Learning to Advertise with Adaptive Exposure via Constrained Two-Level Reinforcement Learning
- AiAds: Automated and Intelligent Advertising System for Sponsored Search
- Time-Aware Prospective Modeling of Users for Online Display Advertising
- A Unified Framework for Marketing Budget Allocation
Important Optimization Research Papers
1. Field-aware Factorization Machines in a Real-world Online Advertising System by Yuchin Juan, Damien Lefortier, Olivier Chapelle
Original Abstract
Predicting user response is one of the core machine learning tasks in computational advertising. Field-aware Factorization Machines (FFM) have recently been established as a state-of-the-art method for that problem and in particular won two Kaggle challenges. This paper presents some results from implementing this method in a production system that predicts click-through and conversion rates for display advertising and shows that this method it is not only effective to win challenges but is also valuable in a real-world prediction system. We also discuss some specific challenges and solutions to reduce the training time, namely the use of an innovative seeding algorithm and a distributed learning mechanism.
Our Summary
The research paper investigates applying Field-aware Factorization Machines (FFM) for predicting click-through and conversion rates in real-world production systems. FFM methods have already demonstrated impressive results in several Kaggle competitions but the training speed for these algorithms might be too low for a production system. Thus, the researchers introduce two solutions for increasing the training speed, namely a pre-mature warm start and a distributed learning mechanism. The experiments demonstrate that the FFM approach combined with the suggested techniques leads to an increased number of ad displays and increased ROI while being also fast enough for the real-world online advertising system.
What’s the core idea of this paper?
- Applying Field-aware Factorization Machines for training click-through rate (CTR) and conversion rate (CR) prediction models.
- Reducing the training time by using:
- a distributed learning mechanism;
- a pre-mature warm start:
- warm start: if each training set contains several days of data, and we move a few hours forward at each step, we initialize a new model with a model trained on the training set from the previous step;
- we need to use a pre-mature model from the previous step since FFM method relies on early-stopping to prevent overfitting, and thus, using of a mature model for a warm start may result in a new model becoming post-mature.
What’s the key achievement?
- Demonstrating with offline experiments as well as online A/B testing that FFM approach significantly outperforms the logistic regression model leading to:
- increased number of ad displays with the same cost;
- increased Return on Investment (ROI).
- Introducing two successful techniques for increasing the training speed, including the innovative seeding algorithm and a distributed learning mechanism.
What does the AI community think?
- The paper was presented at the 26th International Conference on World Wide Web Companion (WWW’ 17).
What are future research areas?
- Trying the warm start method on other non-convex problems that are difficult to regularize, e.g. a deep neural network.
What are possible business applications?
- Field-aware Factorization Machines combined with the suggested solutions for accelerating the training process can be directly applied in a real-world online advertising system leading to increased ROI.
Where can you get implementation code?
- A library for Field-aware Factorization Machines is available online.
2. Deep & Cross Network for Ad Click Predictions by Ruoxi Wang, Bin Fu, Gang Fu, Mingliang Wang
Original Abstract
Feature engineering has been the key to the success of many prediction models. However, the process is non-trivial and often requires manual feature engineering or exhaustive searching. DNNs are able to automatically learn feature interactions; however, they generate all the interactions implicitly, and are not necessarily efficient in learning all types of cross features. In this paper, we propose the Deep & Cross Network (DCN) which keeps the benefits of a DNN model, and beyond that, it introduces a novel cross network that is more efficient in learning certain bounded-degree feature interactions. In particular, DCN explicitly applies feature crossing at each layer, requires no manual feature engineering, and adds negligible extra complexity to the DNN model. Our experimental results have demonstrated its superiority over the state-of-art algorithms on the CTR prediction dataset and dense classification dataset, in terms of both model accuracy and memory usage.
Our Summary
The researchers from Stanford University and Google propose a new approach to Click-through rate (CTR) prediction. The idea is to combine deep and cross networks to enjoy the power of deep neural networks (DNN) but also get a more efficient solution in terms of computational costs. The resulting Deep and Cross Network (DCN) efficiently learns predictive cross features of bounded degrees, captures highly nonlinear interactions, requires no manual feature engineering or exhaustive searching, and has nearly an order of magnitude fewer number of parameters than DNN. The experiments also show that DCN outperforms state-of-the-art methods by providing more accurate CTR predictions with less memory used.
What’s the core idea of this paper?
- Deep & Cross Network (DCN) model starts with embedding and stacking layer, followed by a cross network and a deep network in parallel, and ends with a final combinational layer.
- Embedding and stacking layer addresses the issue of excessively high-dimensional feature spaces that are typical for CTR prediction tasks.
- A key idea behind a cross network is to learn bounded-degree feature interactions more efficiently and explicitly:
- by design, the highest polynomial degree grows with each layer;
- the network consists of all the cross terms of degree up to the highest.
- A deep network is introduced to capture highly nonlinear interactions.
- A final combinational layer combines the outputs from two networks.
What’s the key achievement?
- DCN outperforms other state-of-the-art models by a large amount:
- it performs significantly better than a deep neural network without a cross network but uses only 40% of the memory consumed by DNN;
- as more cross layers are added to the deep network, the logloss is decreasing reflecting the performance improvement.
What does the AI community think?
- The paper was presented at AdKDD & TargetAd conference.
What are future research areas?
- Exploring the use of cross layers as building blocks in other models.
- Enabling effective training for deeper cross networks.
- Understanding better the cross network’s interaction with deep networks during optimization.
What are possible business applications?
- The DCN model can improve the accuracy of CTR prediction while keeping the computational costs relatively low.
- The accuracy of CTR prediction is particularly important in the cost-per-click payment setting, where the publisher’s income relies heavily on the ability to predict CTR accurately.
Where can you get implementation code?
- TensorFlow implementation of Deep & Cross Network is available as part of DeepCTR package.
3. Contextual Multi-Armed Bandits for Causal Marketing by Neela Sawant, Chitti Babu Namballa, Narayanan Sadagopan, Houssam Nassif
Original Abstract
This work explores the idea of a causal contextual multi-armed bandit approach to automated marketing, where we estimate and optimize the causal (incremental) effects. Focusing on causal effect leads to better return on investment (ROI) by targeting only the persuadable customers who wouldn’t have taken the action organically. Our approach draws on strengths of causal inference, uplift modeling, and multi-armed bandits. It optimizes on causal treatment effects rather than pure outcome, and incorporates counterfactual generation within data collection. Following uplift modeling results, we optimize over the incremental business metric. Multi-armed bandit methods allow us to scale to multiple treatments and to perform off-policy policy evaluation on logged data. The Thompson sampling strategy in particular enables exploration of treatments on similar customer contexts and materialization of counterfactual outcomes. Preliminary offline experiments on a retail Fashion marketing dataset show merits of our proposal.
Our Summary
Amazon team suggests a new approach to optimizing marketing campaigns. The method draws upon causal inference, uplift modeling, and multi-armed bandits. It enables the targeting of advertising campaigns based on incremental or causal outcomes rather than pure outcomes. In particular, the presented approach enables targeting only responsive customers that wouldn’t have taken action organically. The experiments confirm that focusing on causal effect leads to higher returns on investment (ROI).
What’s the core idea of this paper?
- Combining several techniques for better optimization of campaign allocation:
- borrowing from causal trees approach and optimizing on causal treatment effect rather than pure outcome, incorporating counterfactual matching when collecting data;
- following the uplift modeling technique and optimizing the incremental business metric directly;
- using contextual bayesian multi-armed bandits to allow scaling to multiple treatments and performing an off-policy evaluation:
- Particularly, Thompson sampling is applied to explore treatments on similar customers and materialization of counterfactual outcomes.
- This approach enables targeting only the persuadable customers who wouldn’t have taken the action without exposure to the marketing campaign.
What’s the key achievement?
- Experiments on the offline fashion marketing dataset show that suggested approach outperforms non-causal alternatives in terms of incremental outcomes in targeted customers over a random hold-out group.
What does the AI community think?
- The paper was presented at ICML 2018 workshop, which covers causal inference, counterfactual prediction, and autonomous action (CausalML).
What are future research areas?
- Utilizing a consistent baseline across all marketing campaigns instead of varying baselines.
- Modeling explicitly the trade-off between short-term and long-term objectives.
- Deploying the introduced system at a larger scale to enable further experimentation.
What are possible business applications?
- Focusing on causal effects as suggested in this research paper likely results in:
- better targeting of advertising campaigns, and thus,
- increased return on investment (ROI).
4. Deep Interest Evolution Network for Click-Through Rate Prediction by Guorui Zhou, Na Mou, Ying Fan, Qi Pi, Weijie Bian, Chang Zhou, Xiaoqiang Zhu, Kun Gai
Original Abstract
Click-through rate (CTR) prediction, whose goal is to estimate the probability of the user clicks, has become one of the core tasks in advertising systems. For CTR prediction model, it is necessary to capture the latent user interest behind the user behavior data. Besides, considering the changing of the external environment and the internal cognition, user interest evolves over time dynamically. There are several CTR prediction methods for interest modeling, while most of them regard the representation of behavior as the interest directly, and lack specially modeling for latent interest behind the concrete behavior. Moreover, little work considers the changing trend of interest. In this paper, we propose a novel model, named Deep Interest Evolution Network~(DIEN), for CTR prediction. Specifically, we design interest extractor layer to capture temporal interests from history behavior sequence. At this layer, we introduce an auxiliary loss to supervise interest extracting at each step. As user interests are diverse, especially in the e-commerce system, we propose interest evolving layer to capture interest evolving process that is relative to the target item. At interest evolving layer, attention mechanism is embedded into the sequential structure novelly, and the effects of relative interests are strengthened during interest evolution. In the experiments on both public and industrial datasets, DIEN significantly outperforms the state-of-the-art solutions. Notably, DIEN has been deployed in the display advertisement system of Taobao, and obtained 20.7% improvement on CTR.
Our Summary
Alibaba research team suggests that capturing user’s interests as well as their dynamics is a key to advancing the performance of click-through rate (CTR) prediction. Moreover, they claim that users’ explicit behavior doesn’t directly reflect their latent interests. Thus, the researchers introduce Deep Interest Evolution Network (DIEN) that models interest evolving process and significantly improves the accuracy of CTR predictions in online advertising. The idea is to capture latent interests and their dynamics by designing and incorporating interest extractor layer and interest evolving layer, respectively. The efficiency of this approach was confirmed not only with experiments but also through the real-world application.
What’s the core idea of this paper?
- Deep Interest Evolution Network (DIEN) is designed to improve the performance of CTR prediction. To this end, it includes two key modules:
- interest extractor layer for capturing latent temporal interests from explicit user behaviors;
- interest evolving layer for modeling interest evolving process.
- At interest extractor layer, the model follows the principle that interest leads to the consecutive behavior directly and includes an auxiliary loss which uses the next behavior to supervise the learning of the current hidden state.
- At interest evolving layer, the model strengthens the influence of the interests that are most relevant to a target item and weakens the effect of irrelevant interests to overcome the inference from interest drifting.
What’s the key achievement?
- The experiments on the public Amazon datasets and industrial Alibaba dataset demonstrate that DIEN significantly outperforms alternative state-of-the-art solutions with regards to CTR predictions.
What does the AI community think?
- The paper was presented at AAAI 2019, one of the key conferences on artificial intelligence.
What are future research areas?
- Constructing a more personalized interest model for CTR prediction.
What are possible business applications?
- The introduced framework has already been deployed in the display advertisement system of Taobao, resulting in 20.7% improvement on CTR.
Where can you get implementation code?
- Tensorflow implementation of the Deep Interest Evolution Network is available on GitHub.
5. Learning to Advertise with Adaptive Exposure via Constrained Two-Level Reinforcement Learning by Weixun Wang, Junqi Jin, Jianye Hao, Chunjie Chen, Chuan Yu, Weinan Zhang, Jun Wang, Yixi Wang, Han Li, Jian Xu, Kun Gai
Original Abstract
For online advertising in e-commerce, the traditional problem is to assign the right ad to the right user on fixed ad slots. In this paper, we investigate the problem of advertising with adaptive exposure, in which the number of ad slots and their locations can dynamically change over time based on their relative scores with recommendation products. In order to maintain user retention and long-term revenue, there are two types of constraints that need to be met in exposure: query-level and day-level constraints. We model this problem as constrained markov decision process with per-state constraint (psCMDP) and propose a constrained two-level reinforcement learning to decouple the original advertising exposure optimization problem into two relatively independent sub-optimization problems. We also propose a constrained hindsight experience replay mechanism to accelerate the policy training process. Experimental results show that our method can improve the advertising revenue while satisfying different levels of constraints under the real-world datasets. Besides, the proposal of constrained hindsight experience replay mechanism can significantly improve the training speed and the stability of policy performance.
Our Summary
In this paper, the researchers from Tianjin University and Alibaba investigate the best way to assign the right ad to the right user, while the number of ad slots and their locations can dynamically change over time. They assume that for maintaining customer retention and long-term revenue, the approach needs to consider two types of constraints, namely day-level and query level. Thus, they model the problem as a Constrained Markov Decision Process with a per-state constraint (psCMDP). To learn optimal advertising policies satisfying both day-level and query-level constraints, the authors propose a constrained two-level structured reinforcement learning framework. The experiments demonstrate that the suggested approach has a positive impact on the advertising revenue, training speed and stability of policy performance.
What’s the core idea of this paper?
- Investigating the problem of assigning the right ad to the right customer with a number of ad slots and their locations dynamically changing over time following their relative scores with recommendation products, or in other words, the advertising with adaptive exposure problem.
- Modeling this problem as a Constrained Markov Decision Process with a per-state constraint (psCMDP).
- Proposing a constrained two-level structured reinforcement learning framework to learn optimal advertising policies that satisfy both day-level and query-level constraints:
- trajectory-level (i.e., day-level) and state-level (i.e., query-level) constraints are separated into different levels of the learning process;
- in the low level, the model learns an optimal advertising policy under a particular sub-trajectory constraint provided by the high-level part.
- Proposing Constrained Hindsight Experience Replay (CHER) to accelerate the low-level policy training.
What’s the key achievement?
- Demonstrating through experiments on the real-world dataset that:
- the suggested approach can significantly increase the final revenue from advertising, and
- the CHER mechanism has a positive impact on the training speed, while also reducing the deviation and variance from the per-state constraint.
What are future research areas?
- Generalizing the introduced framework to e-commerce problems with significantly more constraints.
- Building on the famous option-critic architecture to automatically learn the length of each sub-trajectory instead of fixing it.
What are possible business applications?
- The introduced approach to online advertising in e-commerce can be a good candidate for real-world application because of its:
- positive impact on advertising revenue;
- high training speed;
- stability of performance.
6. AiAds: Automated and Intelligent Advertising System for Sponsored Search, by Xiao Yang, Daren Sun, Ruiwei Zhu, Tao Deng, Zhi Guo, Jiao Ding, Shouke Qin, Zongyao Ding, Yanfeng Zhu
Original Abstract
Sponsored search has more than 20 years of history, and it has been proven to be a successful business model for online advertising. Based on the pay-per-click pricing model and the keyword targeting technology, the sponsored system runs online auctions to determine the allocations and prices of search advertisements. In the traditional setting, advertisers should manually create lots of ad creatives and bid on some relevant keywords to target their audience. Due to the huge amount of search traffic and a wide variety of ad creations, the limits of manual optimizations from advertisers become the main bottleneck for improving the efficiency of this market. Moreover, as many emerging advertising forms and supplies are growing, it’s crucial for sponsored search platform to pay more attention to the ROI metrics of ads for getting the marketing budgets of advertisers.
In this paper, we present the AiAds system developed at Baidu, which use machine learning techniques to build an automated and intelligent advertising system. By designing and implementing the automated bidding strategy, the intelligent targeting and the intelligent creation models, the AiAds system can transform the manual optimizations into multiple automated tasks and optimize these tasks in advanced methods. AiAds is a brand-new architecture of sponsored search system which changes the bidding language and allocation mechanism, breaks the limit of keyword targeting with end-to-end ad retrieval framework and provides global optimization of ad creation. This system can increase the advertiser’s campaign performance, the user experience and the revenue of the advertising platform simultaneously and significantly. We present the overall architecture and modeling techniques for each module of the system and share our lessons learned in solving several key challenges.
Our Summary
The Baidu research team addresses the key challenges of sponsored search by introducing the AiAds system, an automated and intelligent advertising system, which is based on various machine learning techniques. In particular, they suggest an automated bidding engine to solve the problems of traditional keyword-level manual bidding optimization; an intelligent targeting service for direct matching from query to related ads without the mediation of keywords; and an intelligent framework for automated creation of ad templates based on the available information about the product and business. The results from the online A/B tests and the long-term grouping experiment demonstrate the effectiveness of the AiAds system for advertisers, advertising platforms, and users.
What’s the core idea of this paper?
- These days, the traditional approach to sponsored search – with manual keyword selection, keyword-level bidding optimization, pay-per-click pricing model, and manual ad creation – is becoming too cumbersome and inefficient.
- To address the main challenges in sponsored search, the Baidu research team suggests:
- a new bidding language and corresponding automated bidding strategy for advertisers to optimize campaign performance directly;
- a straightforward retrieval and matching model for more optimal selection of ads corresponding to search queries;
- a componentized framework for designing and generating ad creations that automatically optimize the content and layout of advertising.
What’s the key achievement?
- Introducing an automated and intelligent system for sponsored search that:
- increases campaign performance for advertisers (56% improvement in conversions);
- enhances the user experience by providing more relevant ads for the given queries;
- increases the revenue of the advertising platform (47% improvement in revenue).
What does the AI community think?
- The paper was accepted to KDD 2019, the leading conference in knowledge discovery and data mining.
What are future research areas?
- Optimizing the ad retrieval model by utilizing more data sources and advanced models.
- Solving the existing issues in designing a reasonable mechanism for ROI-constrained bidders.
What are possible business applications?
- The elements of the introduced advertising system can be implemented by other advertising platforms to improve conversions, enhance user experience and increase revenue.
7. Time-Aware Prospective Modeling of Users for Online Display Advertising, by Djordje Gligorijevic, Jelena Gligorijevic, Aaron Flores
Original Abstract
Prospective display advertising poses a great challenge for large advertising platforms as the strongest predictive signals of users are not eligible to be used in the conversion prediction systems. To that end, efforts are made to collect as much information as possible about each user from various data sources and to design powerful models that can capture weaker signals ultimately obtaining good quality of conversion prediction probability estimates. In this study, we propose a novel time-aware approach to model heterogeneous sequences of users’ activities and capture implicit signals of users’ conversion intents. On two real-world datasets, we show that our approach outperforms other, previously proposed approaches, while providing interpretability of signal impact to conversion probability.
Our Summary
The Yahoo Research team addresses the problem of attracting new users with online display advertising. This is a particularly challenging task since such strong signals of users’ interest as visits to the advertiser’s website or recent conversions are not available in the case of prospective customers. Thus, the researchers suggest gathering all available information about the user as a time-ordered sequence of activities (i.e., search sessions, ad clicks, reservations, shopping carts, etc.). Then, they introduce a sequence learning approach to model time-ordered heterogeneous user activities gathered from multiple sources. The approach includes a novel time-aware mechanism to capture the temporal aspect of events. The experiments demonstrate the effectiveness and interpretability of the suggested approach.
What’s the core idea of this paper?
- Advertisers are always interested in getting new customers who have had no previous interactions with the respective advertiser.
- To address this problem, the Yahoo Research team introduces a novel Deep Time-Aware conversIoN (DTAIN) model:
- The inputs of the model include a sequence of events, the time difference between events’ timestamps, and the time point of prediction.
- This information goes through 5 specifically-designed blocks: events and temporal information embedding, temporal attention learning block, recurrent net block, attention learning block, and final classification block.
What’s the key achievement?
- Experiments with public and proprietary datasets demonstrate that:
- Temporal information is important for predicting conversion.
- DTAIN significantly outperforms several strong baselines with regard to conversion prediction.
What does the AI community think?
- The paper was presented at the AdKDD workshop within the KDD 2019 conference.
What are future research areas?
- Developing novel techniques to address significant noise that is present in the data collected from many data sources.
What are possible business applications?
- The introduced approach can benefit advertisers and ad publishers by effectively predicting the conversion of prospective customers.
8. A Unified Framework for Marketing Budget Allocation, by Kui Zhao, Junhao Hua, Ling Yan, Qi Zhang, Huan Xu, Cheng Yang
Original Abstract
While marketing budget allocation has been studied for decades in traditional business, nowadays online business brings much more challenges due to the dynamic environment and complex decision-making process. In this paper, we present a novel unified framework for marketing budget allocation. By leveraging abundant data, the proposed data-driven approach can help us to overcome the challenges and make more informed decisions. In our approach, a semi-black-box model is built to forecast the dynamic market response and an efficient optimization method is proposed to solve the complex allocation task. First, the response in each market-segment is forecasted by exploring historical data through a semi-black-box model, where the capability of logit demand curve is enhanced by neural networks. The response model reveals relationship between sales and marketing cost. Based on the learned model, budget allocation is then formulated as an optimization problem, and we design efficient algorithms to solve it in both continuous and discrete settings. Several kinds of business constraints are supported in one unified optimization paradigm, including cost upper bound, profit lower bound, or ROI lower bound. The proposed framework is easy to implement and readily to handle large-scale problems. It has been successfully applied to many scenarios in Alibaba Group. The results of both offline experiments and online A/B testing demonstrate its effectiveness.
Our Summary
Online business brings new challenges to the marketing budget allocation process. The environment is very dynamic and budget adjustments need to be made weekly or even daily. The Alibaba research team introduces a unified framework for marketing budget allocation in online business. They suggest a two-step approach, where first, the response in each market segment is learned from historical data, and then, budget allocation is optimized based on learned models. The suggested approach is being applied by the Alibaba Group, demonstrating its effectiveness in handling large-scale problems.
What’s the core idea of this paper?
- Companies that operate in a dynamic online environment need to approach the marketing budget allocation problem with new data-driven solutions.
- The Alibaba research team introduces a novel unified framework for marketing budget allocation.
- First, the market response in each segment is forecast with a semi-black-box model, where the logit demand curve is supported by neural networks.
- Then, budget allocation is formulated as an optimization problem.
- The Lagrange multiplier method is applied to address the non-convexity of logit demand curves.
- Additional business constraints, such as cost upper bound, profit lower bound, or ROI lower bound can be also incorporated into the suggested framework.
What’s the key achievement?
- The proposed framework has been successfully applied to many scenarios in Alibaba Group.
- The online A/B testing demonstrates that the introduced framework for marketing budget allocation can lead to:
- weekly sales growth of over 6%
- with 40% less money spent.
What does the AI community think?
- The paper was accepted to KDD 2019, the leading conference in knowledge discovery and data mining.
What are future research areas?
- Exploring the relationship between the market cost and contextual variables (i.e., brands, cities, consumption time, etc) in the logit response model.
- Investigating the possibility of supporting boundary constraints on decision variables in the optimization part of the framework.
What are possible business applications?
- The introduced approach can significantly improve the effectiveness of marketing budget allocation for companies operating in online business.
Enjoy this article? Sign up for more AI for marketing research updates.
We’ll let you know when we release more summary articles like this one.
Leave a Reply
You must be logged in to post a comment.