This research summary is part of our Conversational AI series which covers the latest AI & machine learning approaches in the following areas:
In this piece, we start by covering the cutting-edge approaches to building open-domain dialog agents.
2020 is the breakthrough year for open-domain chatbots. Google’s chatbot Meena and Facebook’s chatbot Blender, both introduced this year, achieve close to human-level performance. The developers of these state-of-the-art conversational agents suggested novel approaches to improving conversation quality in terms of sensibility and sensitivity of responses, the empathy of the agent, and the consistency of its personality.
Also, knowing that Meena is based on a model with 2.6 billion parameters and Blender is trained using a Transformer-based model with up to 9.4 billion parameters, we can conclude that the model’s size is one of the key factors for the success of these models.
Most of the companies cannot afford to train and deploy chatbots of that size. Luckily, the research community has a lot of cutting-edge research ideas on improving the performance of open-domain conversational agents without training such huge models. We’ve carefully curated and summarized the most interesting research papers that introduce effective solutions for having meaningful, engaging, persona-consistent, and empathetic conversations with chatbots.
If these accessible AI research analyses & summaries are useful for you, you can subscribe to receive our regular industry updates below.
If you’d like to skip around, here are the papers we featured:
- TransferTransfo: A Transfer Learning Approach for Neural Network Based Conversational Agents
- Target-Guided Open-Domain Conversation
- MoEL: Mixture of Empathetic Listeners
- A Pre-training Based Personalized Dialogue Generation Model with Persona-sparse Data
- You Impress Me: Dialogue Generation via Mutual Persona Perception
- Hierarchical Reinforcement Learning for Open-Domain Dialog
- DCR-Net: A Deep Co-Interactive Relation Network for Joint Dialog Act Recognition and Sentiment Classification
- Sequential Latent Knowledge Selection for Knowledge-Grounded Dialogue
Open-Domain Dialog Systems
1. TransferTransfo: A Transfer Learning Approach for Neural Network Based Conversational Agents, by Thomas Wolf, Victor Sanh, Julien Chaumond, Clement Delangue
Original Abstract
We introduce a new approach to generative data-driven dialogue systems (e.g. chatbots) called TransferTransfo which is a combination of a Transfer learning based training scheme and a high-capacity Transformer model. Fine-tuning is performed by using a multi-task objective which combines several unsupervised prediction tasks. The resulting fine-tuned model shows strong improvements over the current state-of-the-art end-to-end conversational models like memory augmented seq2seq and information-retrieval models. On the privately held PERSONA-CHAT dataset of the Conversational Intelligence Challenge 2, this approach obtains a new state-of-the-art, with respective perplexity, Hits@1 and F1 metrics of 16.28 (45 % absolute improvement), 80.7 (46 % absolute improvement) and 19.5 (20 % absolute improvement).
Our Summary
The HuggingFace research team points to the common issues with open-domain chatbots, including inconsistent personality, lack of long-term memory, and a tendency to produce generic responses. To address these issues, the researchers introduce a new model architecture together with the associated training and generation algorithms. Their TransferTransfo approach is based on a combination of the Transfer learning based training scheme and a multilayer Transfo-rmer encoder. The evaluation demonstrates that the suggested approach significantly improves over the traditional seq-2-seq and information-retrieval baselines in terms of answer relevance, persona consistency, fluency, and grammaticality.
What’s the core idea of this paper?
- A novel TransferTransfo approach to generative data-driven chatbots:
- uses a multilayer Transformer encoder based on GPT as a generative model;
- leverages the pre-trained model weights open-sourced by OpenAI;
- The model is fine-tuned on the PERSONA-CHAT dataset with:
- an augmented input representation:
- first, a sequence of input tokens is constructed by concatenating the persona sentences with the dialog’s previous utterances;
- second, a sequence of input embeddings is constructed from input tokens with the word and positional embeddings learned during pretraining and augmented with dialog-state embeddings;
- a multi-task learning scheme optimized on a combination of two loss functions:
- a next-utterance classification loss;
- a language modeling loss.
- an augmented input representation:
What’s the key achievement?
- The experiments demonstrate that TrasnferTransfo outperforms the existing systems by a significant margin and achieves:
- on the public validation dataset, 51% absolute improvement in perplexity, 35% absolute improvement in Hits@1, and 13% improvement in F1;
- on the private test set (after tuning hyperparameters on the validation set), 45% absolute improvement in perplexity, 46% absolute improvement in Hits@1, and 20% improvement in F1.
What does the AI community think?
- The paper was presented at NeurIPS 2018 Conversational AI workshop.
What are future research areas?
- Exploring more optimal settings and models.
Where can you get implementation code?
- The HuggingFace team has released the code implementation on GitHub.
2. Target-Guided Open-Domain Conversation, by Jianheng Tang, Tiancheng Zhao, Chenyan Xiong, Xiaodan Liang, Eric P. Xing, Zhiting Hu
Original Abstract
Many real-world open-domain conversation applications have specific goals to achieve during open-ended chats, such as recommendation, psychotherapy, education, etc. We study the problem of imposing conversational goals on open-domain chat agents. In particular, we want a conversational system to chat naturally with humans and proactively guide the conversation to a designated target subject. The problem is challenging as no public data is available for learning such a target-guided strategy. We propose a structured approach that introduces coarse-grained keywords to control the intended content of system responses. We then attain smooth conversation transition through turn-level supervised learning, and drive the conversation towards the target with discourse-level constraints. We further derive a keyword-augmented conversation dataset for the study. Quantitative and human evaluations show our system can produce meaningful and effective conversations, significantly improving over other approaches.
Our Summary
Many practical applications of open-domain chatbots require these dialog systems to achieve a specific goal, even though the conversation is open-ended (e.g., therapeutic conversations, persuasion, making recommendations). The current research paper suggests a general approach to creating a conversational AI system that chats naturally with humans on open-domain topics, while proactively guiding the conversation towards a designated target subject. In particular, the authors suggest explicitly modeling and controlling the intended content of each response with coarse-grained utterance keywords. The discourse-level rule encourages the conversation to approach the target topic. Quantitative and human evaluations demonstrate that the introduced agent can generate meaningful and effective conversations while reaching the target topics.
What’s the core idea of this paper?
- The researchers define a new task for open-domain chatbots:
- conversing naturally with humans starting from an arbitrary initial topic and leading the conversation to the target subject in the end;
- a target is defined as a specific word;
- the target is considered to be achieved when either the human or the agent mentions this or a similar word in an utterance;
- the agent balances two objectives: (1) transition smoothness; and (2) target achievement.
- The proposed approach assumes:
- maintaining a smooth conversation transition using turn-level supervised learning;
- injecting target-guided behavior with a rule-based strategy;
- decoupling the decision-making process and utterance generation by explicitly modeling the intended coarse-grained keywords in the next utterance of a dialog system.
- To study this new task, the researchers introduce a new large conversation dataset derived from the Persona-Chat corpus.
What’s the key achievement?
- Results of self-play evaluation demonstrate that the introduced agent with Kernel transition outperforms alternative approaches with a success rate of 75% and an average of 4.2 turns.
- Human evaluations also confirm that the Kernel agent outperforms all other systems in terms of success rate and transition smoothness.
What does the AI community think?
- The paper was accepted for oral presentation at ACL 2019, the leading conference in natural language processing.
What are future research areas?
- Exploring more sophisticated modeling to achieve better control at both sentence and dialog levels.
What are possible business applications?
- Target-guided open-domain chatbots might be useful in psychotherapy, education, and making recommendations in different business areas.
Where can you get implementation code?
- The dataset and code implementation are publicly available on GitHub.
3. MoEL: Mixture of Empathetic Listeners, by Zhaojiang Lin, Andrea Madotto, Jamin Shin, Peng Xu, Pascale Fung
Original Abstract
Previous research on empathetic dialogue systems has mostly focused on generating responses given certain emotions. However, being empathetic not only requires the ability of generating emotional responses, but more importantly, requires the understanding of user emotions and replying appropriately. In this paper, we propose a novel end-to-end approach for modeling empathy in dialogue systems: Mixture of Empathetic Listeners (MoEL). Our model first captures the user emotions and outputs an emotion distribution. Based on this, MoEL will softly combine the output states of the appropriate Listener(s), which are each optimized to react to certain emotions, and generate an empathetic response. Human evaluations on empathetic-dialogues (Rashkin et al., 2018) dataset confirm that MoEL outperforms multitask training baseline in terms of empathy, relevance, and fluency. Furthermore, the case study on generated responses of different Listeners shows high interpretability of our model.
Our Summary
The research team from the Center for Artificial Intelligence Research (CAiRE) addresses the challenging task of building a dialog agent that is able to recognize emotions and respond appropriately. They introduce a novel end-to-end empathetic dialog agent, called Mixture of Empathetic Listeners (MoEL), which uses dialog context to recognize emotional state and includes a number of decoders, or listeners, which are optimized to react to each context emotion accordingly. The experimental results demonstrate that the introduced approach outperforms several competitive baselines in terms of both empathy and relevance.
What’s the core idea of this paper?
- To model empathy in the dialog system, the CAiRE research team introduces a novel end-to-end dialog agent, called Mixture of Empathetic Listeners (MoEL):
- the dialog context is encoded and leveraged to recognize the emotional state (i.e., one of n emotions);
- n decoders, denoted as listeners, are optimized to react to each context emotion accordingly;
- the listeners are trained together with a Meta Listener that softly combines the output states of the appropriate listener(s).
What’s the key achievement?
- With respect to emotion detection, the model achieves 38%, 63%, and 74% in top-1, top-3, and top-5 detection accuracy over 32 emotions.
- With respect to response generation, MoEL outperforms other baselines in terms of answer empathy and relevance.
What does the AI community think?
- The paper was accepted for oral presentation at EMNLP 2019, one of the leading conferences in natural language processing.
What are future research areas?
- Incorporating MoEL with persona- and task-oriented dialog systems.
What are possible business applications?
- Adding empathy to conversational AI systems can improve the performance of chatbots in different real-world applications.
Where can you get implementation code?
- The PyTorch implementation of the paper is available on GitHub.
4. A Pre-training Based Personalized Dialogue Generation Model with Persona-sparse Data, by Yinhe Zheng, Rongsheng Zhang, Xiaoxi Mao, Minlie Huang
Original Abstract
Endowing dialogue systems with personas is essential to deliver more human-like conversations. However, this problem is still far from well explored due to the difficulties of both embodying personalities in natural languages and the persona sparsity issue observed in most dialogue corpora. This paper proposes a pre-training based personalized dialogue model that can generate coherent responses using persona-sparse dialogue data. In this method, a pre-trained language model is used to initialize an encoder and decoder, and personal attribute embeddings are devised to model richer dialogue contexts by encoding speakers’ personas together with dialogue histories. Further, to incorporate the target persona in the decoding process and to balance its contribution, an attention routing structure is devised in the decoder to merge features extracted from the target persona and dialogue contexts using dynamically predicted weights. Our model can utilize persona-sparse dialogues in a unified manner during the training process, and can also control the amount of persona-related features to exhibit during the inference process. Both automatic and manual evaluation demonstrates that the proposed model outperforms state-of-the-art methods for generating more coherent and persona consistent responses with persona-sparse data.
Our Summary
While persona-related dialog datasets, like Persona-Chat, include conversations that cover rich persona-related features (i.e., are persona-dense), real-world dialogs are usually persona-sparse. Therefore, the authors of this paper address the problem of delivering human-like conversations using persona-sparse dialog data. They introduce a personalized dialog model that uses data coming from daily conversations on social media. The encoder and decoder are initialized with a pre-trained language model. To capture persona-related features, attribute embeddings are added in the encoder. To incorporate the target persona in the decoding process, an attention routing mechanism is introduced in the decoder. Automatic and manual evaluation demonstrates that the proposed method outperforms state-of-the-art approaches when the dialog data available at the fine-tuning stage are persona-sparse.
What’s the core idea of this paper?
- In contrast to most of the pre-training based approaches that use persona-dense data for fine-tuning a model, the authors of this paper suggest that modeling dialogs with persona-sparse data is preferable:
- Obtaining persona-dense dialogs by requiring speakers to exchange their personas within a limited number of utterances is expensive.
- Models fine-tuned on such data tend to overfit to the routine that persona-related features should be exhibited in every response.
- Real-world dialogs tend to be persona-sparse.
- The paper introduces a pre-training based method that can utilize persona-sparse data to deliver human-like conversations:
- The encoder and decoder follow the Transformer framework and share the same set of weights.
- To model each persona, attribute embeddings are introduced in the encoder.
- To utilize the persona-sparse dialog data effectively, the researchers devise an attention-routing mechanism in the decoder that allows the contribution of the target persona to be controlled.
What’s the key achievement?
- The automatic evaluation shows that the introduced approach outperforms all the baselines on such metrics as persona accuracy, BLEU score, F1 score, and the Distinct score (i.e, the proportion of unique n-grams in the generated responses).
- The human evaluation also demonstrates that this approach outperforms all the baselines on all the measures, particularly utterance fluency, persona consistency, and context coherency.
What does the AI community think?
- The paper was accepted to AAAI 2020, one of the key research conferences in artificial intelligence.
What are possible business applications?
- The suggested approach can enhance the persona consistency and context coherency of open-domain dialog agents fine-tuned on persona-sparse data.
5. You Impress Me: Dialogue Generation via Mutual Persona Perception, by Qian Liu, Yihong Chen, Bei Chen, Jian-Guang Lou, Zixuan Chen, Bin Zhou, Dongmei Zhang
Original Abstract
Despite the continuing efforts to improve the engagingness and consistency of chit-chat dialogue systems, the majority of current work simply focuses on mimicking human-like responses, leaving understudied the aspects of modeling understanding between interlocutors. The research in cognitive science, instead, suggests that understanding is an essential signal for a high-quality chit-chat conversation. Motivated by this, we propose P2 Bot, a transmitter-receiver based framework with the aim of explicitly modeling understanding. Specifically, P2 Bot incorporates mutual persona perception to enhance the quality of personalized dialogue generation. Experiments on a large public dataset, Persona-Chat, demonstrate the effectiveness of our approach, with a considerable boost over the state-of-the-art baselines across both automatic metrics and human evaluations.
Our Summary
The authors seek to improve the quality of chit-chat conversations by modeling understanding between interlocutors instead of simply mimicking human-like responses. To this end, they introduce Persona Perception Bot (P2 Bot), which leverages a transmitter-receiver framework to explicitly model the understanding between interlocutors. P2 Bot is trained for personalized dialog generation with supervised training and self-play fine-tuning that is guided by reward signals characterizing mutual persona perception. The experiments on the Persona-Chat dataset show that P2 Bot outperforms alternative approaches according to both automatic metrics and human evaluations.
What’s the core idea of this paper?
- The introduced Persona Perception Bot, or P2 Bot, explicitly models understanding between interlocutors.
- The model comprises two components:
- A Transmitter component is responsible for dialog generation.
- It is initialized with GPT.
- The training procedure consists of two steps: (1) supervised dialog generation; (2) self-play model fine-tuning, where the Transmitter learns a policy that maximizes reward signals via reinforcement learning.
- The reward function considers not only language modeling but also mutual persona perception based on the relevance scores measured by a Receiver.
- A Receiver component is responsible for mutual persona perception.
- It tries to measure the proximity between the built impressions and the actual personas.
- Obtained relevance scores serve as mutual persona perception rewards and are further incorporated into the training of the Transmitter.
- A Transmitter component is responsible for dialog generation.
What’s the key achievement?
- According to automatic metrics, P2 Bot outperforms several strong baselines by achieving state-of-the-art performance in terms of F1 score and perplexity, and highly competitive performance in terms of Hits@1 score.
- According to human evaluations, P2 Bot performs significantly better than other baselines by generating responses that are not only interesting and informative but also consistent with the persona of the interlocutor.
What does the AI community think?
- The paper was accepted to ACL 2020, the leading research conference in natural language processing.
What are future research areas?
- Extending Receiver to conversational recommender systems.
What are possible business applications?
- Improving the persona consistency of chit-chat dialog systems.
Where can you get implementation code?
- The PyTorch implementation of the paper is released on GitHub.
6. Hierarchical Reinforcement Learning for Open-Domain Dialog, by Abdelrhman Saleh, Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, Rosalind Picard
Original Abstract
Open-domain dialog generation is a challenging problem; maximum likelihood training can lead to repetitive outputs, models have difficulty tracking long-term conversational goals, and training on standard movie or online datasets may lead to the generation of inappropriate, biased, or offensive text. Reinforcement Learning (RL) is a powerful framework that could potentially address these issues, for example by allowing a dialog model to optimize for reducing toxicity and repetitiveness. However, previous approaches which apply RL to open-domain dialog generation do so at the word level, making it difficult for the model to learn proper credit assignment for long-term conversational rewards. In this paper, we propose a novel approach to hierarchical reinforcement learning, VHRL, which uses policy gradients to tune the utterance-level embedding of a variational sequence model. This hierarchical approach provides greater flexibility for learning long-term, conversational rewards. We use self-play and RL to optimize for a set of human-centered conversation metrics, and show that our approach provides significant improvements – in terms of both human evaluation and automatic metrics – over state-of-the-art dialog models, including Transformers.
Our Summary
The research team from Harvard University and MIT Media Lab suggests applying hierarchical reinforcement learning to open-domain dialog generation. In contrast to previous RL-based approaches, they model rewards at the utterance level instead of the word level to learn long-term conversational rewards. In particular, they introduce the Variational Hierarchical Reinforcement Learning (VHRL) approach, which allows for improved learning of conversational rewards, such as reducing toxicity and repetition, improving sentiment and semantic similarity of responses, and asking more questions. Evaluation based on automatic metrics and human judgments reveals that VHRL outperforms state-of-the-art dialog architectures, including Transformer-based models.
What’s the core idea of this paper?
- The authors claim that reinforcement learning is a powerful approach that allows dialog systems to optimize for non-differentiable metrics of conversation quality, such as minimizing repetition and toxicity.
- They propose a novel RL-based approach, Variational Hierarchical Reinforcement Learning (VHRL) that:
- models rewards at the utterance level to improve the flexibility of dialog systems to learn long-term conversational rewards;
- has the context RNN (manager) responsible for utterance-level decisions and the decoder RNN (worker) responsible for word-level decisions;
- uses self-play to simulate the interactive environment in which the agent learns.
What’s the key achievement?
- Demonstrating with a series of experiments that:
- proposed rewards (i.e., sentiment, question asking, repetition, semantic similarity, and toxicity) lead to higher conversation quality as judged by human users;
- VHRL provides the most effective approach to learning these rewards;
- chats are longer with VHRL compared to other models, suggesting that the proposed approach leads to more interesting and engaging conversations.
What does the AI community think?
- The paper was accepted for oral presentation at AAAI 2020, one of the key research conferences in artificial intelligence.
What are possible business applications?
- The suggested approach can minimize the toxicity of the conversation, and thus enable deployment of open-domain chatbots in safety-critical applications such as mental health.
- Furthermore, the metrics can be tailored to a particular application domain (e.g., increasing politeness of a technical-support chatbot) to enhance integration of open-domain chatbots with a variety of real-world products.
Where can you get implementation code?
- The code for the evaluation platform and models are released on GitHub.
7. DCR-Net: A Deep Co-Interactive Relation Network for Joint Dialog Act Recognition and Sentiment Classification, by Libo Qin, Wanxiang Che, Yangming Li, Mingheng Ni, Ting Liu
Original Abstract
In dialog systems, dialog act recognition and sentiment classification are two correlative tasks to capture speakers’ intentions, where dialog act and sentiment can indicate the explicit and the implicit intentions separately (Kim and Kim 2018). Most of the existing systems either treat them as separate tasks or just jointly model the two tasks by sharing parameters in an implicit way without explicitly modeling mutual interaction and relation. To address this problem, we propose a Deep Co-Interactive Relation Network (DCR-Net) to explicitly consider the cross-impact and model the interaction between the two tasks by introducing a co-interactive relation layer. In addition, the proposed relation layer can be stacked to gradually capture mutual knowledge with multiple steps of interaction. Especially, we thoroughly study different relation layers and their effects. Experimental results on two public datasets (Mastodon and Dailydialog) show that our model outperforms the state-of-the-art joint model by 4.3% and 3.4% in terms of F1 score on dialog act recognition task, 5.7% and 12.4% on sentiment classification respectively. Comprehensive analysis empirically verifies the effectiveness of explicitly modeling the relation between the two tasks and the multi-steps interaction mechanism. Finally, we employ the Bidirectional Encoder Representation from Transformer (BERT) in our framework, which can further boost our performance in both tasks.
Our Summary
The research team from Harbin Institute of Technology suggests explicitly modeling mutual interaction and relation between the two correlative tasks in a dialog system, namely dialog act recognition and sentiment classification. They introduce a Deep Co-Interactive Relation Network (DCR-Net) that explicitly considers the cross-impact of these two tasks and models the interaction between dialog act recognition and sentiment classification with a co-interactive relation layer. Experiments on two real-world datasets, Mastodon and Dailydialog, show that DCR-Net achieves significant and consistent improvement over all baseline methods. Furthermore, incorporating BERT into the framework further boosts its performance.
What’s the core idea of this paper?
- To control cross-knowledge transfer for dialog act recognition and sentiment classification, it is important to explicitly model relation and interaction between these two tasks.
- To this end, a Deep Co-Interactive Relation Network (DCR-Net) is introduced. It consists of three components:
- a shared hierarchical encoder to get the shared representations of dialog act and sentiment among utterances;
- a stacked co-interactive relation layer to control knowledge transfer for both tasks;
- two separate decoders to predict dialog act and sentiment.
- Several relation layers were explored:
- concatenation;
- multilayer perceptron (MLP);
- co-attention.
What’s the key achievement?
- The series of experiments demonstrates that:
- DCR-Net with co-attention achieves:
- on the Mastodon dataset, a 5.7% improvement in F1 score on the sentiment classification task and a 4.3% improvement in F1 score on the dialog act recognition task;
- on the Dailydialog dataset, a 12.4% improvement in F1 score on the sentiment classification task and a 3.4% improvement in F1 score on the dialog act recognition task.
- The co-attention relation layer gains the best performance among the three relation layers on F1 scores on all datasets, while the MLP relation layer outperforms the concatenation layer.
- The BERT-based model achieves a new state-of-the-art performance on both datasets.
- DCR-Net with co-attention achieves:
What does the AI community think?
- The paper was accepted for oral presentation at AAAI 2020, one of the key research conferences in artificial intelligence.
What are possible business applications?
- By explicitly considering the cross-impact of dialog act recognition and sentiment classification, it is possible to significantly boost the performance of open-domain chatbots in a variety of real-world applications.
8. Sequential Latent Knowledge Selection for Knowledge-Grounded Dialogue, by Byeongchang Kim, Jaewoo Ahn, Gunhee Kim
Original Abstract
Knowledge-grounded dialogue is a task of generating an informative response based on both discourse context and external knowledge. As we focus on better modeling the knowledge selection in the multi-turn knowledge-grounded dialogue, we propose a sequential latent variable model as the first approach to this matter. The model named sequential knowledge transformer (SKT) can keep track of the prior and posterior distribution over knowledge; as a result, it can not only reduce the ambiguity caused from the diversity in knowledge selection of conversation but also better leverage the response information for proper choice of knowledge. Our experimental results show that the proposed model improves the knowledge selection accuracy and subsequently the performance of utterance generation. We achieve the new state-of-the-art performance on Wizard of Wikipedia (Dinan et al., 2019) as one of the most large-scale and challenging benchmarks. We further validate the effectiveness of our model over existing conversation methods in another knowledge-based dialogue Holl-E dataset (Moghe et al., 2018).
Our Summary
The research team from Seoul National University explores the possibility of achieving more engaging and accurate knowledge-based chit-chat by suggesting a novel approach to knowledge selection in multi-turn knowledge-grounded dialog. In particular, they introduce a sequential knowledge transformer (SKT) that can handle the one-to-many relationship between the dialog context and the knowledge to be selected and also sequentially track the topic flow of knowledge in the multi-turn dialog. The experimental results demonstrate that the proposed model achieves a new state-of-the-art performance on Wizard of Wikipedia and a knowledge-annotated version of the Holl-E dataset.
What’s the core idea of this paper?
- The paper addresses the issue of knowledge selection for multi-turn knowledge-grounded dialog, since:
- selection of pertinent topics is important for engaging humans in conversations;
- utterance generation becomes easier with a strong knowledge-selection model in place.
- The authors suggest using a sequential latent variable model, called a sequential knowledge transformer (SKT), for knowledge selection. They identify three major advantages of this approach:
- It can correctly deal with multimodality in knowledge selection when many different knowledge pieces can be selected given the dialog context. With a sequential latent variable model, it is possible to reduce the scope of probable knowledge candidates by sequentially modeling the history of knowledge selection in previous turns.
- It can better leverage the response information by keeping track of prior and posterior distribution over knowledge, and thus better predicting the knowledge by sampling from the posterior.
- It works even when the knowledge selection labels for the previous dialog are not available (e.g., when multiple people have a discussion about given documents) as it can infer which knowledge others are likely to select and use.
What’s the key achievement?
- Introducing a novel approach to knowledge selection for knowledge-grounded dialog that not only improves the accuracy of knowledge selection but also the performance of utterance generation.
- Achieving a new state-of-the-art performance on Wizard of Wikipedia and a knowledge-annotated version of Holl-E by outperforming the previous state of the art in all metrics for knowledge selection (accuracy) and utterance generation (unigram F1, bigram F1).
What does the AI community think?
- The paper was presented at ICLR 2020, one of the leading research conferences in artificial intelligence.
What are future research areas?
- Exploring other inference models such as sequential Monte Carlo methods using filtering variational objectives.
- Studying the interpretability of knowledge selection based on the uncertainty of attention.
What are possible business applications?
- The suggested sequential knowledge transformer can significantly boost the performance of knowledge-based chit-chat dialog systems.
Where can you get implementation code?
- The implementation code and dataset are available on GitHub.
Enjoy this article? Sign up for more AI research updates.
We’ll let you know when we release more summary articles like this one.
Leave a Reply
You must be logged in to post a comment.