Table of contents
No headings in the article.
1) Reinforcement Learning and its real-world applications While an agent is deployed in real world, it may be keen in exploring its environment but it needs to follow certain constraints in order to obey the limitations of that environment. A team from Berkeley AI Research (BAIR) has presented their work titled Constrained Policy Optimisation (CPO) which induces constraints motivated by safety for policy search. It has many applications where safety can be ensured while exploration. Also, the BAIR has published an article explaining their work on CPO.
If an agent/robot is purchased by a non-technical owner, she should be able to train the agent by providing feedback. McGlashan et. al has proposed Convergent Actor-Critic by Humans (COACH), an algorithm to learn from policy-dependent feedback, for training agents/robots with the feedback provided by non-technical users. They demonstrate that COACH can also learn multiple behaviours on a physical robot with noisy images as well.
In order to perform any human activity like cooking, household chores, etc., a RL agent need to execute long sequence of instructions and generalise for new unseen subtasks. Sometimes, there would be other unexpected instructions like low battery, etc., which needs a deviation to be able to finish the rest of the subtasks. To achieve these goals, Oh et. al had proposed a generalised approach which takes sequence of tasks in natural language and executes the subtasks mostly sequential. They have tackled the problem in two steps: 1) Learning the skills to perform sub tasks and an analogy based generalisation framework. 2) A meta-controller to determine the order of execution of subtasks. Unlike the existing work, their architecture generalises well and also handles unexpected subtasks.
2) Deep Learning Optimisation
In order to regularise deep neural networks, several methods like batch normalisation, whitening neural networks (WNN) are used. To apply whitening, the computational overhead of building covariance matrix and solving SVD plays a bottleneck. The work proposed by Ping Luo attempts to overcome the limitations of WNN with a new method termed Generalised Whitening Neural Networks (GWNN) which reduces computational overhead with compact representations.
The limitations over hardware for implementing higher dimensional tensor kernels for ConvNets is studied by Budden et. al. They had proposed a Winograd style faster computation for higher dimensions optimised for CPUs. They have benchmarked their algorithm against popular frameworks like Caffe, Tensorflow that support the AVX and Intel MKL optimised libraries and concluded an interesting insight that the current CPU limitations are largely due to software rather than hardware.
Extending the class of faster computations, like FFT, Winograd, Cho and Brand suggested a Memory-efficient Computation (MEC) which lowers memory requirement and improves the convolution process. MEC takes rolling subsets of columns and expands them into rows to form a smaller matrix. This process is repeated along with Kernel matrix multiplication to produce efficient computation.
3) Meta Learning Model Agnostic Meta Learning (MAML) proposed by Finn et. al, creates a meta-learned model with parameters learned from random sampling over a distribution of tasks. This model can be quickly adapted to new tasks using a few training samples and iterations, which is commonly referred as few-shot learning. The authors also demonstrated the application of MAML over classification, regression and reinforcement learning tasks.
An interesting paper on learning the network structure and weights is proposed by Cortes et. al. The proposed method, called AdaNet, learns the network architecture with incremental addition of depth to the network . The new network’s k^th layer is connected to existing network’s k^th and k-1^th layers. The network architecture is selected by comparing their performances over an empirical loss function with regularisation parameter.
Wichrowska et. al introduced a learned gradient descent optimiser that can generalise to new tasks with reduced memory and computational requirements. They used a hierarchical RNN architecture in defining the optimiser and it outperformed RMSprop/Adam on MNIST dataset.
4) Machine Learning Optimization
A team from Microsoft research India has come up with powerful tree based models that can help run Machine Learning in resource constraint devices like IoT with as little as 2 KB RAM
For classification problems, often Gradient Boosted Decision Trees (GBDT) performs relatively well. However, when the output space of multilabel classification becomes high dimensional and sparse, the GBDT algorithms suffers from memory issues and long running times. In order to have better prediction time and reduced model size Si et. al proposed GBDT-Sparse algorithm to handle the high dimensional sparse data.
5) Generative Models Applications The team from Google Brain has presented a paper on audio synthesis using Wavenet auto encoders. Their main contribution being the Wavenet auto encoder architecture that includes a Temporal Encoder built over dilated convolutions that encodes sequence of hidden codes with separate dimension for time and channel. Also, they introduced NSynth dataset that contains approx 300k annotated musical notes from approx 1k instruments.
6) Natural Language Generation Architectures
In order to overcome the limitations of discriminative models for natural language text generation, Wen et. al proposed a Latent Intention Dialogue Model for learning the intention using latent variable and then composing appropriate machine responses. The key idea behind this paper is the representation of latent intention distribution as an intrinsic policy that reflects human decision-making and it is learned using policy gradient-based reinforcement learning.
An alternative approach to the natural language generation using latent semantic structure was proposed by Hu et. al. They used VAEs to generate text samples conditioned on a latent attribute codes. The attribute codes are learned using individual discriminator for each code that measures the match between generated samples and the desired attributes using softmax approximation.
7) Efficient Online Learning For the online multi-class bandit algorithms, the previous work of Banditron, while computationally efficient, achieves only O(T^2/3) expected regret. This is suboptimal as the Exp4 algorithm achieves O(T^1/2) regret for the 0–1 loss. Beygelzimer et. al had proposed an efficient online bandit multi class learning with O(T^1/2) regret.
The evaluation of contextual multi-armed bandits is a tough problem as the online evaluation is too costly to evaluate different policies whereas off policy evaluation methods suffer from variance in the estimations. While there exists methods like Inverse Propensity Score (IPS) which gives good estimations on the MSE, they don’t consider the context information while choosing actions. The authors Wang et. al proposed an algorithm SWITCH which effectively uses the Reward model and IPS resulting in variance reduction compared to prior work. 8) Graph based algorithms Many existing methods for generating knowledge graphs from data considers the graph to be a static snapshot. In the work published by Trivedi et. al, they had demonstrated that the knowledge graphs evolve temporally and they had developed a multidimensional point process to model the evolving knowledge graph.