Deep Reinforcement Learning involves using a neural network as a universal function approximator to learn a value function that maps state-action pairs to their expected future reward given a particular reward function. This can be done many different ways. For example, a Monte Carlo based algorithm will observe total rewards following state-action pairs from a complete episode to make build training data for the neural network. Alternatively, a Temporal Difference approach would use incremental rewards from single time-steps and bootstrap off of predicted future rewards from the latest version of the value function model. However, no matter what approach is taken, it is important that the neural network is being efficiently fitted to the data in order to optimize the learning algorithm. There are many factors that determine a neural networks ability to fit to training data. In this post we will examine how scaling our outputs can affect our rate of convergence.
Read the rest of the article at Mindboard’s Medium channel.