This research summary is part of our AI for Marketing series which covers the latest AI & machine learning approaches to 5 aspects of marketing automation:
- Attribution
- Optimization
- Personalization
- Analytics
- Content Generation: Images
- Content Generation: Videos
- Content Generation: Text
We cover the latest research papers in applying AI to common marketing analytic tasks like customer clustering and sentiment analysis.
How well do you know your customers? Do you know what they like about your product and what they don’t like? Do they interact with your product rather functionally or emotionally? How the perception of your brand changes with time? To succeed, you need to know your customers very well. Understanding how different groups of your customers interact with your product is essential for building effective marketing campaigns.
We have summarized the latest research breakthroughs that introduce state-of-the-art approaches to clustering customers (products, images, etc.), scaling market research, performing sentiment analysis of customer feedback, and capturing valuable information from social media images containing branded products.
If these accessible AI research analyses & summaries are useful for you, you can subscribe to receive our regular industry updates below.
If you’d like to skip around, here are the papers we featured:
- Targeted Aspect-Based Sentiment Analysis via Embedding Commonsense Knowledge into an Attentive LSTM
- Aspect Based Sentiment Analysis with Gated Convolutional Network
- Multimodal Image Captioning for Marketing Analysis
- SpectralNet: Spectral Clustering using Deep Neural Networks
- Ask less – Scale Market Research without Annoying Your Customers
- A Deep Probabilistic Model for Customer Lifetime Value Prediction
- Context-aware Embedding for Targeted Aspect-based Sentiment Analysis
- Progressive Self-Supervised Attention Learning for Aspect-Level Sentiment Analysis
Important Marketing Analysis Research Papers
1. Targeted Aspect-Based Sentiment Analysis via Embedding Commonsense Knowledge into an Attentive LSTM by Yukun Ma, Haiyun Peng, Erik Cambria
Original Abstract
Analyzing people’s opinions and sentiments towards certain aspects is an important task of natural language understanding. In this paper, we propose a novel solution to targeted aspect-based sentiment analysis, which tackles the challenges of both aspect-based sentiment analysis and targeted sentiment analysis by exploiting commonsense knowledge. We augment the long short-term memory (LSTM) network with a hierarchical attention mechanism consisting of a target-level attention and a sentence-level attention. Commonsense knowledge of sentiment-related concepts is incorporated into the end-to-end training of a deep neural network for sentiment classification. In order to tightly integrate the commonsense knowledge into the recurrent encoder, we propose an extension of LSTM, termed Sentic LSTM. We conduct experiments on two publicly released datasets, which show that the combination of the proposed attention architecture and Sentic LSTM can outperform state-of-the-art methods in targeted aspect sentiment tasks.
Our Summary
The researchers claim that incorporating commonsense knowledge into a deep neural network can significantly improve targeted aspect-based sentiment analysis by directly contributing to the identification of aspects and sentiment polarity. Additionally, they suggest augmenting the LSTM network with target-level and sentence-level attention. The experiments confirm the effectiveness of the suggested approach for targeted aspect-based sentiment analysis.
What’s the core idea of this paper?
- The proposed neural architecture for targeted aspect-based sentiment analysis includes three key components:
- target-level attention to learn sentiment-salient part of a target expression and generate a more accurate representation of the target (e.g., product);
- sentence-level attention to enable search of the target- and aspect-dependent evidence over the full sentence;
- Sentic LSTM, an extension of the LSTM cell to incorporate affective commonsense knowledge.
- Sentic LSTM has two important roles in this architecture:
- assisting with the filtering of information flowing from one time step to the next;
- providing complementary information to the memory cell.
What’s the key achievement?
- Outperforming strong baselines in such tasks as aspect categorization and aspect-based sentiment classification.
- Demonstrating the efficacy of incorporating commonsense knowledge into the LSTM network for targeted aspect-based sentiment analysis.
What does the AI community think?
- The paper was presented at AAAI 2018, one of the key conferences on artificial intelligence.
What are future research areas?
- Analyzing collectively the sentiment of multiple targets co-occurring in the same sentence.
- Investigating the role of commonsense knowledge in modeling the relation between targets.
What are possible business applications?
- The suggested approach can improve the accuracy of sentiment analysis to provide marketers with more reliable information about the customers’ feedback on different products and different aspects of the same product.
2. Aspect Based Sentiment Analysis with Gated Convolutional Network by Wei Xue and Tao Li
Original Abstract
Aspect based sentiment analysis (ABSA) can provide more detailed information than general sentiment analysis, because it aims to predict the sentiment polarities of the given aspects or entities in text. We summarize previous approaches into two subtasks: aspect-category sentiment analysis (ACSA) and aspect-term sentiment analysis (ATSA). Most previous approaches employ long short-term memory and attention mechanisms to predict the sentiment polarity of the concerned targets, which are often complicated and need more training time. We propose a model based on convolutional neural networks and gating mechanisms, which is more accurate and efficient. First, the novel Gated Tanh-ReLU Units can selectively output the sentiment features according to the given aspect or entity. The architecture is much simpler than attention layer used in the existing models. Second, the computations of our model could be easily parallelized during training, because convolutional layers do not have time dependency as in LSTM layers, and gating units also work independently. The experiments on SemEval datasets demonstrate the efficiency and effectiveness of our models.
Our Summary
The authors introduce a novel, accurate and efficient approach to aspect-based sentiment analysis. They claim that the architecture based on convolutional neural networks (CNNs) and gated mechanisms, is simpler and more efficient than traditional approaches to sentiment analysis built around long short-term memory networks (LSTM) with attention mechanisms. Convolutional layers don’t have time dependency enabling parallelized computations, and thus drastically decreasing the training time. The results of experiments demonstrate the effectiveness and efficiency of the proposed approach in performing the aspect-based sentiment analysis.
What’s the core idea of this paper?
- The research paper introduces solutions to both:
- Aspect-Category Sentiment Analysis (ACSA) where the model is asked to predict the sentiment polarity towards a predefined aspect category (e.g., food, service, price).
- Aspect-Term Sentiment Analysis (ATSA) where sentiment analysis is performed toward the aspect terms that are identified in the specific sentence (e.g., Thai food in the sentence “Average to good Thai food, but terrible delivery”).
- The proposed approach is called Gated Convolutional Network with Aspect Embedding (GCAE), and is probably the first CNN-based solution to aspect-based sentiment analysis:
- for ACSA task, the model includes two separate convolutional layers on the top of the embedding layer, whose outputs are combined by gating units; these gating units have two nonlinear gates, each of which is connected to one convolutional layer;
- for ATSA task, where the aspect terms may contain several words, the model is extended with an additional convolutional layer for the target expressions.
Gated Convolutional Network with Aspect Embedding for ATSA task
What’s the key achievement?
- GCAE outperforms several strong baselines demonstrating higher accuracy in aspect-based sentiment analysis.
- The presented approach performs especially well on the hard test dataset, where a given sentence includes different sentiments towards different aspects.
- In terms of the training time, the experiments confirm that GCAE is much faster than other neural models.
What does the AI community think?
- The paper was presented at ACL 2018, one of the key research conferences on natural language processing.
What are future research areas?
- Leveraging large-scale sentiment lexicons in neural networks.
What are possible business applications?
- Gate Convolutional Network presented in this research paper can be a good candidate for performing aspect-based sentiment analysis in a business setting because of its:
- high accuracy;
- ability to recognize different sentiments towards different aspects provided within one sentence;
- very fast training.
Where can you get implementation code?
- The authors provide implementation code and data for this research paper on Github.
3. Multimodal Image Captioning for Marketing Analysis by Philipp Harzig, Stephan Brehm, Rainer Lienhart, Carolin Kaiser, René Schallner
Original Abstract
Automatically captioning images with natural language sentences is an important research topic. State of the art models are able to produce human-like sentences. These models typically describe the depicted scene as a whole and do not target specific objects of interest or emotional relationships between these objects in the image. However, marketing companies require to describe these important attributes of a given scene. In our case, objects of interest are consumer goods, which are usually identifiable by a product logo and are associated with certain brands. From a marketing point of view, it is desirable to also evaluate the emotional context of a trademarked product, i.e., whether it appears in a positive or a negative connotation. We address the problem of finding brands in images and deriving corresponding captions by introducing a modified image captioning network. We also add a third output modality, which simultaneously produces real-valued image ratings. Our network is trained using a classification-aware loss function in order to stimulate the generation of sentences with an emphasis on words identifying the brand of a product. We evaluate our model on a dataset of images depicting interactions between humans and branded products. The introduced network improves mean class accuracy by 24.5 percent. Thanks to adding the third output modality, it also considerably improves the quality of generated captions for images depicting branded products.
Our Summary
This research paper introduces an approach to image captioning with a specific focus on marketing needs. From the marketing perspective, it is desirable that image caption targets a consumer product depicted on the image and also evaluates the emotional context of this product. The introduced neural network is trained to generate sentences with words that identify the brand of a product. Furthermore, the model produces three kinds of image rating that reflect customers interaction with a product. The experiments demonstrate that this approach provides image captions that are more accurate and also more useful for marketing purposes.
What’s the core idea of this paper?
- Using a popular Show and Tell model as a basis.
- Implementing a loss function that directly penalizes if the brand name doesn’t appear in a generated caption.
- Extending the model with a third output modality to produce three image rating attributes:
- whether the person interacts with the branded product in a positive (0) or negative (4) way;
- if the person in the image is involved (0) with the branded product or uninvolved (4);
- if there is an emotional (0) or a functional (4) interaction with the branded product.
What’s the key achievement?
- Providing a metric to measure if an image is correctly classified with respect to objects of interest in the generated caption.
- Showing that combining multiple tasks in one model helps to get better performance at all tasks of such a model. Thus, including three modalities in the suggested model resulted in better caption quality, brand name detection, and image ratings.
What does the AI community think?
- The paper was presented at the 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR2018).
What are possible business applications?
- The suggested approach enables high-scale capturing of valuable information from the social media pictures containing branded products, including:
- how people react to and interact with a product;
- how a brand’s popularity and perception change over time;
- whether the customers develop emotional connections with a brand etc.
4. SpectralNet: Spectral Clustering using Deep Neural Networks by Uri Shaham, Kelly Stanton, Henry Li, Boaz Nadler, Ronen Basri, Yuval Kluger
Original Abstract
Spectral clustering is a leading and popular technique in unsupervised data analysis. Two of its major limitations are scalability and generalization of the spectral embedding (i.e., out-of-sample-extension). In this paper we introduce a deep learning approach to spectral clustering that overcomes the above shortcomings. Our network, which we call SpectralNet, learns a map that embeds input data points into the eigenspace of their associated graph Laplacian matrix and subsequently clusters them. We train SpectralNet using a procedure that involves constrained stochastic optimization. Stochastic optimization allows it to scale to large datasets, while the constraints, which are implemented using a special-purpose output layer, allow us to keep the network output orthogonal. Moreover, the map learned by SpectralNet naturally generalizes the spectral embedding to unseen data points. To further improve the quality of the clustering, we replace the standard pairwise Gaussian affinities with affinities learned from unlabeled data using a Siamese network. Additional improvement can be achieved by applying the network to code representations produced, e.g., by standard autoencoders. Our end-to-end learning procedure is fully unsupervised. In addition, we apply VC dimension theory to derive a lower bound on the size of SpectralNet. State-of-the-art clustering results are reported on the Reuters dataset. Our implementation is publicly available at https://github.com/kstant0725/SpectralNet.
Our Summary
In this research paper, the authors address two major limitations of spectral clustering – scalability and generalization. They introduce a deep neural network, called SpectralNet, that overcomes both of these issues. The problem of scalability is solved using stochastic optimization while using a neural network that can directly compute the embedding for the input data in the eigenspace solves the issue of out-of-sample extension. The experiments show the effectiveness of SpectralNet with respect to capturing non-convex clusters.
What’s the core idea of this paper?
- The paper introduces SpectralNet, a deep learning approach to spectral clustering that solves scalability and generalization issues.
- SpectralNet is trained using constrained stochastic optimization:
- stochastic optimization enables scaling to large datasets;
- constraints allow keeping the network output orthogonal.
- Once trained, the model provides a function, implemented as a feed-forward network that maps each input data point to its spectral embedding coordinates, enabling the out-of-sample extension.
- To compute Gaussian affinity, the model uses Siamese networks instead of common Euclidean distance.
- Finally, the network is applied to transformed data obtained by an autoencoder.
What’s the key achievement?
- Outperforming existing clustering methods when clusters cannot be contained in non-overlapping convex shapes:
- state-of-the-art results on the Reuters document dataset;
- competitive results on the MNIST dataset of handwritten images.
What does the AI community think?
- The paper was presented at ICLR 2018, one of the key deep learning conferences.
What are future research areas?
- Getting a better understanding behind Siamese networks outperforming common Euclidean distance approach.
- Examining how stochastic gradient descent can be adapted to improve the convergence rate of SpectralNet.
What are possible business applications?
- SpectralNet is good at capturing non-convex clusters and thus, might benefit marketing analysis with regards to clustering customers, products, images.
Where can you get implementation code?
- The authors provide access to SpectralNet, a python library for performing spectral clustering using deep neural networks.
5. Ask less – Scale Market Research without Annoying Your Customers by Venkatesh Umaashankar and Girish Shanmugam S
Original Abstract
Market research is generally performed by surveying a representative sample of customers with questions that includes contexts such as psycho-graphics, demographics, attitude and product preferences. Survey responses are used to segment the customers into various groups that are useful for targeted marketing and communication. Reducing the number of questions asked to the customer has utility for businesses to scale the market research to a large number of customers. In this work, we model this task using Bayesian networks. We demonstrate the effectiveness of our approach using an example market segmentation of broadband customers.
Our Summary
The researchers study the problem of conducting market research for customer segmentation and propose to use Bayesian Networks to reduce the number of questions in the survey and thus, scale the research to more customers. They suggest exploiting the key advantage of Bayesian Network – its ability to handle partial information at the time of inference. The experiments in a real-world setting demonstrate that the proposed approach can help to reduce the number of questions by 50% with only a minor drop in classification performance.
What’s the core idea of this paper?
- The paper introduces a novel Bayesian-based approach to scaling market research for customer segmentation.
- The proposed approach allows to significantly reduce the number of questions in a market research survey.
- This Bayesian-based method is implemented in two phases:
- preparatory phase, where a company rolls out a survey questionnaire to a representative sample of customers and then learns Bayesian network for segmentation to find a minimum number of required questions;
- scaling phase, where a company asks customers a defined number of random questions instead of going through the whole questionnaire and then assigns segment based on the results from the Bayesian Network Model.
What’s the key achievement?
- The proposed Bayesian-based approach to scaling market research allows to significantly reduce the number of questions in a survey, and thus:
- saves times on performing market research;
- helps to avoid customers being annoyed with the long questionnaires.
What does the AI community think?
- The paper was presented at the 8th International Conference on Computer Science and Information Technology (CCSIT 2018) and International Conference on Artificial Intelligence, Smart Grid and Smart City Applications (AISGSC 2019).
What are possible business applications?
- The proposed approach to scaling market research can be directly implemented in the business setting to perform high-quality research for accurate customer segmentation and yet avoid customer irritation with long questionnaires.
6. A Deep Probabilistic Model for Customer Lifetime Value Prediction, by Xiaojing Wang, Tianqi Liu, Jingang Miao
Original Abstract
Accurate predictions of customers’ future lifetime value (LTV) given their attributes and past purchase behavior enables a more customer-centric marketing strategy. Marketers can segment customers into various buckets based on the predicted LTV and, in turn, customize marketing messages or advertising copies to serve customers in different segments better. Furthermore, LTV predictions can directly inform marketing budget allocations and improve real-time targeting and bidding of ad impressions.
One challenge of LTV modeling is that some customers never come back, and the distribution of LTV can be heavy-tailed. The commonly used mean squared error (MSE) loss does not accommodate the significant fraction of zero value LTV from one-time purchasers and can be sensitive to extremely large LTVs from top spenders. In this article, we model the distribution of LTV given associated features as a mixture of zero point mass and lognormal distribution, which we refer to as the zero-inflated lognormal (ZILN) distribution. This modeling approach allows us to capture the churn probability and account for the heavy-tailedness nature of LTV at the same time. It also yields straightforward uncertainty quantification of the point prediction. The ZILN loss can be used in both linear models and deep neural networks (DNN). For model evaluation, we recommend the normalized Gini coefficient to quantify model discrimination and decile charts to assess model calibration. Empirically, we demonstrate the predictive performance of our proposed model on two real-world public datasets.
Our Summary
In this paper, the Google research team addresses the problem of predicting customers’ future lifetime value (LTV). In particular, they want to solve the problem of the heavy-tailed distribution of LTV because of the high number of one-time purchasers and large LTVs from top spenders. To this end, they suggest modeling LTV using the zero-inflated lognormal (ZILN) distribution, which is a mix of zero-point mass and lognormal distribution, and also using a supervised regression to leverage all customer-level attributes. They also measure a model’s ability to differentiate high-value customers from low-value ones with the normalized Gini coefficient. The experiments on two real-world datasets demonstrate the effectiveness of the suggested approach.
What’s the core idea of this paper?
- Prediction of customer lifetime value is important for a firm’s financial planning, marketing decisions, and customer relationship management.
- When predicting the LTV of new customers, the commonly used frequency and recency characteristics cannot differentiate among customers. Thus, the authors suggest leveraging customer attributes and purchase characteristics by applying a supervised regression using a deep neural network (DNN).
- Further, the authors point out the challenges associated with the LTV distribution, which is usually heavy-tailed and volatile due to the high number of non-returning customers and extremely large LTVs for the top spenders:
- Mean Squared Error (MSE) is not appropriate in this case as it (a) ignores the fact that LTV labels include both zero and continuous values; (b) is highly sensitive to outliers because of the squared term.
- The solution is to model the zero-inflated lognormal (ZILN) distribution, which handles the zero and extreme large LTVs by design.
- The model is evaluated using the normalized Gini coefficient, which is robust to outliers and allows better business interpretation.
What’s the key achievement?
- The experiments demonstrate that both deep neural network architecture and ZILN loss contribute to:
- a higher Spearman’s correlation between true and predicted LTV;
- a higher normalized Gini coefficient.
What are future research areas?
- Exploring possible ways to further improve the predictive performance of the introduced approach by experimenting with model architecture and tuning model hyperparameters.
What are possible business applications?
- The suggested approach to predicting customers’ lifetime value can help marketers improve their financial planning and customer relationship management.
Where can you get implementation code?
- The implementation of the suggested approach to predicting customers’ lifetime value is available on GitHub.
7. Context-aware Embedding for Targeted Aspect-based Sentiment Analysis, by Bin Liang, Jiachen Du, Ruifeng Xu, Binyang Li, Hejiao Huang
Original Abstract
Attention-based neural models were employed to detect the different aspects and sentiment polarities of the same target in targeted aspect-based sentiment analysis (TABSA). However, existing methods do not specifically pre-train reasonable embeddings for targets and aspects in TABSA. This may result in targets or aspects having the same vector representations in different contexts and losing the context-dependent information. To address this problem, we propose a novel method to refine the embeddings of targets and aspects. Such pivotal embedding refinement utilizes a sparse coefficient vector to adjust the embeddings of target and aspect from the context. Hence the embeddings of targets and aspects can be refined from the highly correlative words instead of using context-independent or randomly initialized vectors. Experiment results on two benchmark datasets show that our approach yields the state-of-the-art performance in TABSA task.
Our Summary
Targeted aspect-based sentiment analysis (TABSA) can be very useful for automated analysis of customers’ reviews and understanding the reviewers’ attitudes to different aspects of a product (e.g., price, service, safety). Attention-based neural networks have demonstrated remarkable progress in the TABSA task but the authors of the current paper note that the existing approaches usually utilize context-independent or randomly initialized vectors for representing targets and aspects. As a result, the semantic information is lost and the interdependence among specific targets, corresponding aspects, and context, is not considered. To address this problem, the researchers propose a novel embedding refinement method to obtain context-aware embeddings for TABSA. Specifically, they suggest reconstructing the vector representation for the target from the context using a sparse coefficient vector. This results in target representation being generated from highly correlative words rather than randomly initialized embeddings. The experiments show that the introduced approach leads to state-of-the-art performance in the TABSA task.
What’s the core idea of this paper?
- The paper introduces a novel embedding refinement method to obtain context-aware embeddings for the TABSA task rather than context-independent or randomly initialized embeddings:
- A sparse coefficient vector is leveraged to select highly correlated words from the sentence.
- The representations of target and aspect are adjusted to make these highly-correlated words more valuable.
- The aspect embedding is fine-tuned so that it is closer to the highly correlated target and further away from the irrelevant targets.
What’s the key achievement?
- The experimental results show that incorporating context-aware embeddings of targets and aspects into the neural models significantly improves:
- aspect detection (by 2.9% in strict accuracy), and
- sentiment classification (by 1.8% in strict accuracy).
What does the AI community think?
- The paper was presented at ACL 2019, the leading conference in natural language processing.
What are future research areas?
- Exploring the extension of the suggested approach to other tasks.
What are possible business applications?
- The introduced approach to obtaining context-aware embeddings for targeted aspect-based sentiment analysis can significantly improve the accuracy of customer reviews analysis.
8. Progressive Self-Supervised Attention Learning for Aspect-Level Sentiment Analysis, by Jialong Tang, Ziyao Lu, Jinsong Su, Yubin Ge, Linfeng Song, Le Sun, Jiebo Luo
Original Abstract
In aspect-level sentiment classification (ASC), it is prevalent to equip dominant neural models with attention mechanisms, for the sake of acquiring the importance of each context word on the given aspect. However, such a mechanism tends to excessively focus on a few frequent words with sentiment polarities, while ignoring infrequent ones. In this paper, we propose a progressive self-supervised attention learning approach for neural ASC models, which automatically mines useful attention supervision information from a training corpus to refine attention mechanisms. Specifically, we iteratively conduct sentiment predictions on all training instances. Particularly, at each iteration, the context word with the maximum attention weight is extracted as the one with active/misleading influence on the correct/incorrect prediction of every instance, and then the word itself is masked for subsequent iterations. Finally, we augment the conventional training objective with a regularization term, which enables ASC models to continue equally focusing on the extracted active context words while decreasing weights of those misleading ones. Experimental results on multiple datasets show that our proposed approach yields better attention mechanisms, leading to substantial improvements over the two state-of-the-art neural ASC models. Source code and trained models are available.
Our Summary
The authors note that the existing attention mechanism in aspect-level sentiment classification (ASC) tends to focus on several frequent words with sentiment polarities and ignores infrequent ones. To address this problem, they introduce a novel progressive self-supervised attention learning approach for aspect-level sentiment classification. This approach is based on the idea that the context word with the maximum attention weight has a major impact on sentiment prediction. Thus, if the training instance with the respective context word was predicted correctly, this word should be considered in the model training. Otherwise, it should be ignored as it apparently provides inaccurate information for prediction. The researchers incorporate this approach into the neural model by augmenting the training objective with a corresponding regularizer. The experiments on several benchmark datasets demonstrate the effectiveness of the introduced approach.
What’s the core idea of this paper?
- The existing attention mechanism in aspect-level sentiment classification is prone to overly focus on a few frequent words with sentiment polarities while ignoring the infrequent ones. This often results in poor performance of ASC models.
- To solve this issue, the researchers introduce a novel progressive self-supervised attention learning approach for ASC models:
- The approach follows the idea that the context word with the highest attention weight has the greatest impact on the sentiment prediction of the corresponding sentence. Keeping this in mind, we should consider these context words in model training only if they result in correctly predicted training instances.
- Following this idea, sentiment prediction is iteratively conducted on all training instances.
- Finally, the training objective is augmented with a regularizer that enforces focus on the extracted active context words while decreasing the weights of the misleading context words.
What’s the key achievement?
- Proposing a novel approach to automatically extracting attention supervision information for aspect-level classification models.
- Demonstrating the effectiveness of the proposed attention learning approach, which significantly improves the performance of two popular ASC models, Memory Network (MN) and Transformation Network (TNet).
What does the AI community think?
- The paper was presented at ACL 2019, the leading conference in natural language processing.
What are future research areas?
- Extending the presented approach to other NLP tasks with attention mechanisms, including neural document classification and neural machine translation.
What are possible business applications?
- The introduced approach to attention learning for aspect-level sentiment analysis can significantly boost the performance of sentiment classification models applied to the analysis of customer reviews.
Where can you get implementation code?
- The authors provide their source code and trained models on GitHub.
Enjoy this article? Sign up for more AI for marketing research updates.
We’ll let you know when we release more summary articles like this one.
Leave a Reply
You must be logged in to post a comment.