Top AI & Machine Learning Research Papers From 2019

UPDATE: We’ve also summarized the top 2020 AI & machine learning research papers.

With the AI industry moving so quickly, it’s difficult for ML practitioners to find the time to curate, analyze, and implement new research being published. To help you quickly get up to speed on the latest ML trends, we’re introducing our research series, in which we curate the key AI research papers of 2019 and summarize them in an easy-to-read bullet-point format.

We’ll start with the top 10 AI research papers that we find important and representative of the latest research trends. These papers will give you a broad overview of research advances in neural network architectures, optimization techniques, unsupervised learning, language modeling, computer vision, and more. We’ve selected these research papers based on technical impact, expert opinions, and industry reception. Of course, there is much more research worth your attention, but we hope this would be a good starting point.

We will also be publishing the top 10 lists of key research papers in natural language processing, conversational AI, computer vision, reinforcement learning, and AI ethics.

Subscribe to our AI Research mailing list at the bottom of this article to be alerted when we release new summaries.

If you’d like to skip around, here are the papers we featured:

The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations
Meta-Learning Update Rules for Unsupervised Representation Learning
On the Variance of the Adaptive Learning Rate and Beyond
XLNet: Generalized Autoregressive Pretraining for Language Understanding
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Transferable Multi-Domain State Generator for Task-Oriented Dialogue Systems
A Theory of Fermat Paths for Non-Line-of-Sight Shape Reconstruction
Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning
Learning Existing Social Conventions via Observationally Augmented Self-Play

10 Important ML Research Papers of 2019

1. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks, by Jonathan Frankle and Michael Carbin

Original Abstract

Neural network pruning techniques can reduce the parameter counts of trained networks by over 90%, decreasing storage requirements and improving computational performance of inference without compromising accuracy. However, contemporary experience is that the sparse architectures produced by pruning are difficult to train from the start, which would similarly improve training performance.

We find that a standard pruning technique naturally uncovers subnetworks whose initializations made them capable of training effectively. Based on these results, we articulate the “lottery ticket hypothesis:” dense, randomly-initialized, feed-forward networks contain subnetworks (“winning tickets”) that – when trained in isolation – reach test accuracy comparable to the original network in a similar number of iterations. The winning tickets we find have won the initialization lottery: their connections have initial weights that make training particularly effective.

We present an algorithm to identify winning tickets and a series of experiments that support the lottery ticket hypothesis and the importance of these fortuitous initializations. We consistently find winning tickets that are less than 10-20% of the size of several fully-connected and convolutional feed-forward architectures for MNIST and CIFAR10. Above this size, the winning tickets that we find learn faster than the original network and reach higher test accuracy.

Our Summary

Neural networks are often generated to be larger than is strictly necessary for initialization and then pruned after training to a core group of nodes. The Lottery Ticket Hypothesis proposes that, given this eventual pruning, there must be a smaller starting network which, if perfectly initialized, could achieve the same level of performance after training. The researchers generated so-called “winning ticket” networks, which are equal in accuracy to their parent networks at 10-20% of the size, by iteratively training, pruning, and re-initializing a neural network. The experiments confirm that the proposed approach enables higher test accuracy with faster training.

What’s the core idea of this paper?

For every neural network, there is a smaller subset of nodes that can be used in isolation to achieve the same accuracy after training.
This subset of nodes can be found from an original large neural network by iteratively training it, pruning its smallest-magnitude weights, and re-initializing the remaining connections to their original values.
Iterative pruning, rather than one-shot pruning, is required to find winning ticket networks with the best accuracy at minimal sizes.

What’s the key achievement?

Introducing the Lottery Ticket Hypothesis, which provides a new perspective on the composition of neural networks.
Suggesting a reproducible method for identifying winning ticket subnetworks for a given original, large network.
Providing inspiration for designing new architectures and initialization schemes that will result in much more efficient neural networks.

What does the AI community think?

The paper received the Best Paper Award at ICLR 2019, one of the key conferences in machine learning.
It has sparked follow-up work by several research teams (e.g. Uber).

What are future research areas?

Finding more efficient ways to reach a winning ticket network so that the hypothesis can be tested on larger datasets.
Trying out pruning methods other than sparse pruning.
Investigating the need for learning rate warmup with iterative pruning in deep neural networks.
Stabilizing the Lottery Ticket Hypothesis, as suggested in the researchers’ follow-up paper.

What are possible business applications?

Vastly decreasing time and computational requirements for training neural networks.
Building neural networks that are small enough to be trained on individual devices rather than on cloud computing networks.

Where can you get implementation code?

An implementation on the MNIST database is available on GitHub.

2. Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations, by Francesco Locatello, Stefan Bauer, Mario Lucic, Gunnar Rätsch, Sylvain Gelly, Bernhard Schölkopf, Olivier Bachem

Original Abstract

The key idea behind the unsupervised learning of disentangled representations is that real-world data is generated by a few explanatory factors of variation which can be recovered by unsupervised learning algorithms. In this paper, we provide a sober look at recent progress in the field and challenge some common assumptions. We first theoretically show that the unsupervised learning of disentangled representations is fundamentally impossible without inductive biases on both the models and the data. Then, we train more than 12000 models covering most prominent methods and evaluation metrics in a reproducible large-scale experimental study on seven different data sets. We observe that while the different methods successfully enforce properties “encouraged” by the corresponding losses, well-disentangled models seemingly cannot be identified without supervision. Furthermore, increased disentanglement does not seem to lead to a decreased sample complexity of learning for downstream tasks. Our results suggest that future work on disentanglement learning should be explicit about the role of inductive biases and (implicit) supervision, investigate concrete benefits of enforcing disentanglement of the learned representations, and consider a reproducible experimental setup covering several data sets.

Our Summary

Enabling machines to understand high-dimensional data and turn that information into usable representations in an unsupervised manner remains a major challenge for machine learning. In this paper, the joint team of researchers from ETH Zurich, the Max Planck Institute for Intelligent Systems, and Google Research proves theoretically that unsupervised learning of disentangled representations is impossible without inductive bias in both the learning approaches being considered and the datasets. Furthermore, they performed a large-scale evaluation of the recent unsupervised disentanglement learning methods by training more than 12,000 models on seven datasets to confirm their findings empirically.

What’s the core idea of this paper?

The research paper theoretically proves that unsupervised learning of disentangled representations is fundamentally impossible without inductive biases.
The theoretical findings are supported by the results of a large-scale reproducible experimental study, where the researchers implemented six state-of-the-art unsupervised disentanglement learning approaches and six disentanglement measures from scratch on seven datasets:
- Even though all considered methods ensure that the individual dimensions of the aggregated posterior (which is sampled) are uncorrelated, the dimensions of the representation (which is taken to be the mean) are still correlated.
- Random seeds and hyperparameters often matter more than the model but tuning seems to require supervision.
- Increased disentanglement doesn’t necessarily imply a decreased sample complexity of learning downstream tasks.

What’s the key achievement?

The authors of the research have challenged common beliefs in unsupervised disentanglement learning both theoretically and empirically.
Following their findings, the research team suggests directions for future research on disentanglement learning.
They also release important resources for future work in this research area:
- a new library to train and evaluate disentangled representations;
- over 10,000 trained models that can be used as baselines for future research.

What does the AI community think?

The paper received the Best Paper Award at ICML 2019, one of the leading conferences in machine learning.

What are future research areas?

Exploring the role of inductive bias as well as implicit and explicit supervision in unsupervised learning disentangled representations.
Demonstrating the concrete practical benefits of enforcing a specific notion of disentanglement of the learned representations.
Conducting experiments in a reproducible experimental setup on a wide variety of datasets with different degrees of difficulty to see whether the conclusions and insights are generally applicable.

Where can you get implementation code?

The library used to create the experimental study is available on GitHub.
The research team also released more than 10,000 pretrained disentanglement models, also available on GitHub.

3. Meta-Learning Update Rules for Unsupervised Representation Learning, by Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein

Original Abstract

A major goal of unsupervised learning is to discover data representations that are useful for subsequent tasks, without access to supervised labels during training. Typically, this involves minimizing a surrogate objective, such as the negative log likelihood of a generative model, with the hope that representations useful for subsequent tasks will arise as a side effect. In this work, we propose instead to directly target later desired tasks by meta-learning an unsupervised learning rule which leads to representations useful for those tasks. Specifically, we target semi-supervised classification performance, and we meta-learn an algorithm — an unsupervised weight update rule – that produces representations useful for this task. Additionally, we constrain our unsupervised update rule to a be a biologically-motivated, neuron-local function, which enables it to generalize to different neural network architectures, datasets, and data modalities. We show that the meta-learned update rule produces useful features and sometimes outperforms existing unsupervised learning techniques. We further show that the meta-learned unsupervised update rule generalizes to train networks with different widths, depths, and nonlinearities. It also generalizes to train on data with randomly permuted input dimensions and even generalizes from image datasets to a text task.

Our Summary

One of the major issues with unsupervised learning is that most unsupervised models produce useful representations only as a side effect, rather than as the direct outcome of the model training. Researchers from Google Brain and the University of California, Berkeley, sought to use meta-learning to tackle the problem of unsupervised representation learning. In particular, they propose to meta-learn an unsupervised update rule by meta-training on a meta-objective that directly optimizes the value of the produced representation. Furthermore, the suggested meta-learning approach can be generalized across input data modalities, across permutations of the input dimensions, and across neural network architectures.

What’s the core idea of this paper?

Unsupervised learning has typically found useful data representations as a side effect of the learning process, rather than as the result of a defined optimization objective.
To address this problem, the researchers propose meta-learning an unsupervised update rule that produces representations useful for a specific task:
- The meta-objective directly reflects the usefulness of a representation generated from unlabeled data for further supervised tasks.
- An unsupervised update rule is constrained to be a biologically-motivated, neuron-local function, enabling generalizability.

What’s the key achievement?

Introducing a meta-learning approach with an inner loop consisting of unsupervised learning.
Demonstrating generalizability across input data modalities, datasets, permuted input dimensions, and neural network architectures.
Achieving performance that matches or exceeds existing unsupervised learning techniques.

What does the AI community think?

The paper was presented at ICLR 2019, one of the leading conferences in machine learning.

What are future research areas?

Further investigating the possibilities for replacing manual algorithm design with architectures designed for learning and learned from data via meta-learning.

4. On the Variance of the Adaptive Learning Rate and Beyond, by Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, Jiawei Han

Original Abstract

The learning rate warmup heuristic achieves remarkable success in stabilizing training, accelerating convergence and improving generalization for adaptive stochastic optimization algorithms like RMSprop and Adam. Here, we study its mechanism in details. Pursuing the theory behind warmup, we identify a problem of the adaptive learning rate (i.e., it has problematically large variance in the early stage), suggest warmup works as a variance reduction technique, and provide both empirical and theoretical evidence to verify our hypothesis. We further propose RAdam, a new variant of Adam, by introducing a term to rectify the variance of the adaptive learning rate. Extensive experimental results on image classification, language modeling, and neural machine translation verify our intuition and demonstrate the effectiveness and robustness of our proposed method.

Our Summary

In this paper, the Microsoft research team investigates the effectiveness of the warmup heuristic used for adaptive optimization algorithms. They show that the adaptive learning rate can cause the model to converge to bad local optima because of the large variance in the early stage of model training due to the limited number of training samples being used. This justifies the use of warmup heuristic to reduce such variance by setting smaller learning rates in the first few epochs of training. To ensure automatic control over the warmup behavior, the researchers introduce a new variant of Adam, called Rectified Adam (RAdam). It explicitly rectifies the variance of the adaptive learning rate based on derivations. The experiments demonstrate the effectiveness of the suggested approach in a variety of tasks, including image classification, language modeling, and neural machine translation.

What’s the core idea of this paper?

Adaptive learning rate algorithms like Adam are prone to falling into suspicious or bad local optima unless they are given a warm-up period with a smaller learning rate in the first few epochs of training.
A new optimization algorithm, RAdam, rectifies the variance of the adaptive learning rate:
- If the variance is tractable (i.e., the approximated simple moving average is longer than 4), the variance rectification term is calculated, and parameters are updated with the adaptive learning rate.
- Otherwise, the adaptive learning rate is inactivated, and RAdam acts as stochastic gradient descent with momentum.

What’s the key achievement?

The authors provide both empirical and theoretical evidence of their hypothesis that the adaptive learning rate has an undesirably large variance in the early stage of model training due to the limited amount of samples at that point.
They introduce Rectified Adam, a new variant of Adam, that:
- is theoretically sound;
- outperforms vanilla Adam and achieves similar performance to that of previous state-of-the-art warmup heuristics in image classification, language modeling, and machine translation;
- requires less hyperparameter tuning than Adam with warmup – particularly, it automatically controls the warmup behavior without the need to specify a learning rate.

What does the AI community think?

“It’s been a long time since we’ve seen a new optimizer reliably beat the old favorites; this looks like a very encouraging approach!” – Jeremy Howard, a founding researcher at fast.ai.

What are future research areas?

Applying the proposed approach to other applications, including Named Entity Recognition.

What are possible business applications?

Faster and more stable training of deep learning models used in business settings.

Where can you get implementation code?

You can find the implementation code for RAdam on GitHub.

5. XLNet: Generalized Autoregressive Pretraining for Language Understanding, by Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le

Original Abstract

With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive language modeling. However, relying on corrupting the input with masks, BERT neglects dependency between the masked positions and suffers from a pretrain-finetune discrepancy. In light of these pros and cons, we propose XLNet, a generalized autoregressive pretraining method that (1) enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order and (2) overcomes the limitations of BERT thanks to its autoregressive formulation. Furthermore, XLNet integrates ideas from Transformer-XL, the state-of-the-art autoregressive model, into pretraining. Empirically, XLNet outperforms BERT on 20 tasks, often by a large margin, and achieves state-of-the-art results on 18 tasks including question answering, natural language inference, sentiment analysis, and document ranking.

Our Summary

The researchers from Carnegie Mellon University and Google have developed a new model, XLNet, for natural language processing (NLP) tasks such as reading comprehension, text classification, sentiment analysis, and others. XLNet is a generalized autoregressive pretraining method that leverages the best of both autoregressive language modeling (e.g., Transformer-XL) and autoencoding (e.g., BERT) while avoiding their limitations. The experiments demonstrate that the new model outperforms both BERT and Transformer-XL and achieves state-of-the-art performance on 18 NLP tasks.

What’s the core idea of this paper?

XLNet combines the bidirectional capability of BERT with the autoregressive technology of Transformer-XL:
- Like BERT, XLNet uses a bidirectional context, which means it looks at the words before and after a given token to predict what it should be. To this end, XLNet maximizes the expected log-likelihood of a sequence with respect to all possible permutations of the factorization order.
- As an autoregressive language model, XLNet doesn’t rely on data corruption, and thus avoids BERT’s limitations due to masking – i.e., pretrain-finetune discrepancy and the assumption that unmasked tokens are independent of each other.
To further improve architectural designs for pretraining, XLNet integrates the segment recurrence mechanism and relative encoding scheme of Transformer-XL.

What’s the key achievement?

XLnet outperforms BERT on 20 tasks, often by a large margin.
The new model achieves state-of-the-art performance on 18 NLP tasks including question answering, natural language inference, sentiment analysis, and document ranking.

What does the AI community think?

The paper was accepted for oral presentation at NeurIPS 2019, the leading conference in artificial intelligence.
“The king is dead. Long live the king. BERT’s reign might be coming to an end. XLNet, a new model by people from CMU and Google outperforms BERT on 20 tasks.” – Sebastian Ruder, a research scientist at Deepmind.
“XLNet will probably be an important tool for any NLP practitioner for a while…[it is] the latest cutting-edge technique in NLP.” – Keita Kurita, Carnegie Mellon University.

What are future research areas?

Extending XLNet to new areas, such as computer vision and reinforcement learning.

What are possible business applications?

XLNet may assist businesses with a wide range of NLP problems, including:
- chatbots for first-line customer support or answering product inquiries;
- sentiment analysis for gauging brand awareness and perception based on customer reviews and social media;
- the search for relevant information in document bases or online, etc.

Where can you get implementation code?

The authors have released the official Tensorflow implementation of XLNet.
A PyTorch implementation of the model is also available on GitHub.

6. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations, by Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut

Original Abstract

Increasing model size when pretraining natural language representations often results in improved performance on downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations, longer training times, and unexpected model degradation. To address these problems, we present two parameter-reduction techniques to lower memory consumption and increase the training speed of BERT. Comprehensive empirical evidence shows that our proposed methods lead to models that scale much better compared to the original BERT. We also use a self-supervised loss that focuses on modeling inter-sentence coherence, and show it consistently helps downstream tasks with multi-sentence inputs. As a result, our best model establishes new state-of-the-art results on the GLUE, RACE, and SQuAD benchmarks while having fewer parameters compared to BERT-large.

Our Summary

The Google Research team addresses the problem of the continuously growing size of the pretrained language models, which results in memory limitations, longer training time, and sometimes unexpectedly degraded performance. Specifically, they introduce A Lite BERT (ALBERT) architecture that incorporates two parameter-reduction techniques: factorized embedding parameterization and cross-layer parameter sharing. In addition, the suggested approach includes a self-supervised loss for sentence-order prediction to improve inter-sentence coherence. The experiments demonstrate that the best version of ALBERT sets new state-of-the-art results on GLUE, RACE, and SQuAD benchmarks while having fewer parameters than BERT-large.

What’s the core idea of this paper?

It is not reasonable to further improve language models by making them larger because of memory limitations of available hardware, longer training times, and unexpected degradation of model performance with the increased number of parameters.
To address this problem, the researchers introduce the ALBERT architecture that incorporates two parameter-reduction techniques:
- factorized embedding parameterization, where the size of the hidden layers is separated from the size of vocabulary embeddings by decomposing the large vocabulary-embedding matrix into two small matrices;
- cross-layer parameter sharing to prevent the number of parameters from growing with the depth of the network.
The performance of ALBERT is further improved by introducing the self-supervised loss for sentence-order prediction to address BERT’s limitations with regard to inter-sentence coherence.

What’s the key achievement?

With the introduced parameter-reduction techniques, the ALBERT configuration with 18× fewer parameters and 1.7× faster training compared to the original BERT-large model achieves only slightly worse performance.
The much larger ALBERT configuration, which still has fewer parameters than BERT-large, outperforms all of the current state-of-the-art language modes by getting:
- 89.4% accuracy on the RACE benchmark;
- 89.4 score on the GLUE benchmark; and
- An F1 score of 92.2 on the SQuAD 2.0 benchmark.

What does the AI community think?

The paper has been submitted to ICLR 2020 and is available on the OpenReview forum, where you can see the reviews and comments of NLP experts. The reviewers are mainly very appreciative of the presented paper.

What are future research areas?

Speeding up training and inference through methods like sparse attention and block attention.
Further improving the model performance through hard example mining, more efficient model training, and other approaches.

What are possible business applications?

The ALBERT language model can be leveraged in the business setting to improve performance on a wide range of downstream tasks, including chatbot performance, sentiment analysis, document mining, and text classification.

Where can you get implementation code?

The original implementation of ALBERT is available on GitHub.
A TensorFlow implementation of ALBERT is also available here.
A PyTorch implementation of ALBERT can be found here and here.

7. Transferable Multi-Domain State Generator for Task-Oriented Dialogue Systems, by Chien-Sheng Wu, Andrea Madotto, Ehsan Hosseini-Asl, Caiming Xiong, Richard Socher, Pascale Fung

Original Abstract

Over-dependence on domain ontology and lack of knowledge sharing across domains are two practical and yet less studied problems of dialogue state tracking. Existing approaches generally fall short in tracking unknown slot values during inference and often have difficulties in adapting to new domains. In this paper, we propose a Transferable Dialogue State Generator (TRADE) that generates dialogue states from utterances using a copy mechanism, facilitating knowledge transfer when predicting (domain, slot, value) triplets not encountered during training. Our model is composed of an utterance encoder, a slot gate, and a state generator, which are shared across domains. Empirical results demonstrate that TRADE achieves state-of-the-art joint goal accuracy of 48.62% for the five domains of MultiWOZ, a human-human dialogue dataset. In addition, we show its transferring ability by simulating zero-shot and few-shot dialogue state tracking for unseen domains. TRADE achieves 60.58% joint goal accuracy in one of the zero-shot domains, and is able to adapt to few-shot cases without forgetting already trained domains.

Our Summary

The research team from the Hong Kong University of Science and Technology and Salesforce Research addresses the problem of over-dependence on domain ontology and lack of knowledge sharing across domains. In a practical scenario, many slots share all or some of their values among different domains (e.g., the area slot can exist in many domains like restaurant, hotel, or taxi), and thus transferring knowledge across multiple domains is imperative for dialogue state tracking (DST) models. The researchers introduce a TRAnsferable Dialogue statE generator (TRADE) that leverages its context-enhanced slot gate and copy mechanism to track slot values mentioned anywhere in a dialogue history. TRADE shares its parameters across domains and doesn’t require a predefined ontology, which enables tracking of previously unseen slot values. The experiments demonstrate the effectiveness of this approach with TRADE achieving state-of-the-art joint goal accuracy of 48.62% on a challenging MultiWOZ dataset.

What’s the core idea of this paper?

To overcome over-dependence on domain ontology and lack of knowledge sharing across domains, the researchers suggest:
- generating slot values directly instead of predicting the probability of every predefined ontology term;
- sharing all the model parameters across domains.
The TRADE model consists of three components:
- an utterance encoder to encode dialogue utterances into a sequence of fixed-length vectors;
- a slot gate to predict whether a certain (domain, slot) pair is triggered by the dialogue;
- a state generator to decode multiple output tokens for all (domain, slot) pairs independently to predict their corresponding values.

What’s the key achievement?

On a challenging MultiWOZ dataset of human-human dialogues, TRADE achieves joint goal accuracy of 48.62%, setting a new state of the art.
Moreover, TRADE achieves 60.58% joint goal accuracy in one of the zero-shot domains, demonstrating its ability to transfer knowledge to previously unseen domains.
The experiments also demonstrate the model’s ability to adapt to new few-shot domains without forgetting already trained domains.

What does the AI community think?

The paper received an Outstanding Paper award at the main ACL 2019 conference and the Best Paper Award at NLP for Conversational AI Workshop at the same conference.

What are future research areas?

Transferring knowledge from other resources to further improve zero-shot performance.
Collecting a dataset with a large number of domains to facilitate the study of techniques within multi-domain dialogue state tracking.

What are possible business applications?

The current research can significantly improve the performance of task-oriented dialogue systems in multi-domain settings.

Where can you get implementation code?

The PyTorch implementation of this study is available on GitHub.

8. A Theory of Fermat Paths for Non-Line-of-Sight Shape Reconstruction, by Shijie Xin, Sotiris Nousias, Kiriakos N. Kutulakos, Aswin C. Sankaranarayanan, Srinivasa G. Narasimhan, Ioannis Gkioulekas

Original Abstract

We present a novel theory of Fermat paths of light between a known visible scene and an unknown object not in the line of sight of a transient camera. These light paths either obey specular reflection or are reflected by the object’s boundary, and hence encode the shape of the hidden object. We prove that Fermat paths correspond to discontinuities in the transient measurements. We then derive a novel constraint that relates the spatial derivatives of the path lengths at these discontinuities to the surface normal. Based on this theory, we present an algorithm, called Fermat Flow, to estimate the shape of the non-line-of-sight object. Our method allows, for the first time, accurate shape recovery of complex objects, ranging from diffuse to specular, that are hidden around the corner as well as hidden behind a diffuser. Finally, our approach is agnostic to the particular technology

used for transient imaging. As such, we demonstrate mm-scale shape recovery from pico-second scale transients using a SPAD and ultrafast laser, as well as micron-scale reconstruction from femto-second scale transients using interferometry. We believe our work is a significant advance over the state-of-the-art in non-line-of-sight imaging.

Our Summary

In many security and safety applications, the scene hidden from the camera’s view is of great interest. Currently, it is possible to estimate the shape of hidden, non-line-of-sight (NLOS) objects by measuring the intensity of photons scattered from them. However, this method relies on single-photon avalanche photodetectors that are prone to misestimating photon intensities and requires an assumption that reflection from NLOS objects is Lambertian. The researchers propose a new theory of NLOS photons that follow specific geometric paths, called Fermat paths, between the LOS and NLOS scene. The resulting method can reconstruct the surface of hidden objects that are around a corner or behind a diffuser without depending on the reflectivity of the object.

Non-line-of-sight imaging

What’s the core idea of this paper?

Existing methods for profiling hidden objects depend on measuring the intensities of reflected photons, which requires assuming Lambertian reflection and infallible photodetectors.
The research team suggests reconstructing non-line-of-sight shapes by relying on geometric constraints imposed by Fermat’s principle:
- Fermat paths correspond to discontinuities in the transient measurements.
- Specifically, it is possible to identify the discontinuities in the transient measurement as the length of Fermat paths that contribute to the transient.
- Given a collection of Fermat pathlengths, the procedure produces an oriented point cloud for the NLOS surface.

What’s the key achievement?

The Fermat Flow algorithm derived from the introduced theory can successfully reconstruct the surface of the hidden objects independent of the specific transient imaging technology used.
The Fermat paths theory applies to the scenarios of:
- reflective NLOS (looking around a corner);
- transmissive NLOS (seeing through a diffuser).

What does the AI community think?

The paper received the Best Paper Award at CVPR 2019, the leading conference on computer vision and pattern recognition.

What are future research areas?

Exploring the links between the geometric approach described here and newly introduced backprojection approaches for profiling hidden objects.
Combining geometric and backprojection approaches for other related applications, including acoustic and ultrasound imaging, lensless imaging, and seismic imaging.

What are possible business applications?

Enhanced security from cameras or sensors that can “see” beyond their field of view.
Potential use for autonomous vehicles to “see” around corners.

9. Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning, by Natasha Jaques, Angeliki Lazaridou, Edward Hughes, Caglar Gulcehre, Pedro A. Ortega, DJ Strouse, Joel Z. Leibo, Nando de Freitas

Original Abstract

We propose a unified mechanism for achieving coordination and communication in Multi-Agent Reinforcement Learning (MARL), through rewarding agents for having causal influence over other agents’ actions. Causal influence is assessed using counterfactual reasoning. At each timestep, an agent simulates alternate actions that it could have taken, and computes their effect on the behavior of other agents. Actions that lead to bigger changes in other agents’ behavior are considered influential and are rewarded. We show that this is equivalent to rewarding agents for having high mutual information between their actions. Empirical results demonstrate that influence leads to enhanced coordination and communication in challenging social dilemma environments, dramatically increasing the learning curves of the deep RL agents, and leading to more meaningful learned communication protocols. The influence rewards for all agents can be computed in a decentralized way by enabling agents to learn a model of other agents using deep neural networks. In contrast, key previous works on emergent communication in the MARL setting were unable to learn diverse policies in a decentralized manner and had to resort to centralized training. Consequently, the influence reward opens up a window of new opportunities for research in this area.

Our Summary

In this paper, the authors consider the problem of deriving intrinsic social motivation from other agents in multi-agent reinforcement learning (MARL). The approach is to reward agents for having a causal influence on other agents’ actions to achieve both coordination and communication in MARL. Specifically, it is demonstrated that rewarding actions that lead to a relatively higher change in another agent’s behavior is related to maximizing the mutual information flow between agents’ actions. As a result, such an inductive bias motivates agents to learn coordinated behavior. The experiments confirm the effectiveness of the proposed social influence reward in enhancing coordination and communication between the agents.

TOP ML Research - Social influence reward

A moment of high influence when the purple influencer signals the presence of an apple (green tiles) outside the yellow influencee’s field-of-view (yellow outlined box)

What’s the core idea of this paper?

The paper addresses a long-standing problem of coordination and communication between multiple agents, including such limitations as centralized training and the sharing of reward functions or policy parameters.
The authors suggest giving agent an additional reward for having a causal influence on another agent’s actions.
As the next step, they enhance the social influence reward with the inclusion of explicit communication protocols.
Finally, they equip each agent with an internal neural network that is trained to predict the actions of other agents. That enables independent training of agents.

What’s the key achievement?

Demonstrating that social influence reward eventually leads to significantly higher collective reward and allows agents to learn meaningful communication protocols when this is otherwise impossible.
Introducing a framework for training the agents independently while still ensuring coordination and communication between them.

What does the AI community think?

The paper received the Honorable Mention Award at ICML 2019, one of the leading conferences in machine learning.

What are future research areas?

Using the proposed approach to develop a form of ‘empathy’ in agents so that they can simulate how their actions affect another agent’s value function.
Applying the influence reward to encourage different modules of the network to integrate information from other networks, for example, to prevent collapse in hierarchical RL.

What are possible business applications?

Driving coordinated behavior in robots attempting to cooperate in manipulation and control tasks.

10. Learning Existing Social Conventions via Observationally Augmented Self-Play, by Adam Lerer and Alexander Peysakhovich

Original Abstract

In order for artificial agents to coordinate effectively with people, they must act consistently with existing conventions (e.g. how to navigate in traffic, which language to speak, or how to coordinate with teammates). A group’s conventions can be viewed as a choice of equilibrium in a coordination game. We consider the problem of an agent learning a policy for a coordination game in a simulated environment and then using this policy when it enters an existing group. When there are multiple possible conventions we show that learning a policy via multi-agent reinforcement learning (MARL) is likely to find policies which achieve high payoffs at training time but fail to coordinate with the real group into which the agent enters. We assume access to a small number of samples of behavior from the true convention and show that we can augment the MARL objective to help it find policies consistent with the real group’s convention. In three environments from the literature – traffic, communication, and team coordination – we observe that augmenting MARL with a small amount of imitation learning greatly increases the probability that the strategy found by MARL fits well with the existing social convention. We show that this works even in an environment where standard training methods very rarely find the true convention of the agent’s partners.

Our Summary

The Facebook AI research team addresses the problem of AI agents acting in line with existing conventions. Learning a policy via multi-agent reinforcement learning (MARL) results in agents that achieve high payoffs at training time but fail to coordinate with the real group. The researchers suggest solving this problem by augmenting the MARL objective with a small sample of observed behavior from the group. The experiments in three test settings (traffic, communication, and team coordination) demonstrate that this approach greatly increased the probability of the agent finding a strategy that fits with the existing group’s conventions.

What’s the core idea of this paper?

Without any input from an existing group, a new agent will learn policies that work in isolation but do not necessarily fit with the group’s conventions.
To solve this problem, the authors propose a novel observationally augmented self-play (OSP) method, where the agent is trained with a joint MARL and behavioral cloning objective. In particular, the researchers suggest providing the agent with a small number of observations of existing social behavior (i.e., samples of (state, action) pairs from the test environment).

What’s the key achievement?

The experiments on several multi-agent situations with multiple conventions (a traffic game, a particle environment combining navigation and communication, and a Stag Hunt game) show that OSP can learn relevant conventions with a small amount of observational data.
Moreover, with this method, the agent can learn conventions that are very unlikely to be learned using MARL alone.

What does the AI community think?

The paper was awarded the AAAI-AIES 2019 Best Paper Award.

What are future research areas?

Exploring alternative algorithms for constructing agents that can learn social conventions.
Investigating the possibility of fine-tuning the OSP training strategies during test time.
Considering problems where agents have incentives that are partly misaligned, and thus need to coordinate on a convention in addition to solving the social dilemma.
Extending the work into more complex environments, including interaction with humans.

What are possible business applications?

This work is a stepping-stone towards developing AI agents that can teach themselves to cooperate with humans. This has positive implications for chatbots, customer support agents and many other AI applications.

If you like these research summaries, you might be also interested in the following articles:

We’ll let you know when we release more summary articles like this one.

Bots

Brands

Business

China

Commerce

Computer Vision

Conversational AI

Customer Service

Cybersecurity

Data Science & Engineering

Design

Education

Ethics & Safety

Finance

Gaming

Healthcare

HR & Recruiting

Infrastructure

Leadership & Management

Manufacturing

Marketing

Natural Language Processing

Reinforcement Learning

Research

Retail & CPG

Society

Technical Guide

Technology

About TOPBOTS