Spencer Lyon

# Temporal-Difference methods

· by Spencer Lyon · Read in about 1 min · (133 Words)
TD RL algorithms

This is part 3 in the reinforcement learning for economists notes. For part 1 see Reinforcement Learning Intro. For a list of all entries in the series go here

## Temporal Difference methods

We continue our study of applying GPI to the RL problem by looking now at temporal difference (TD) methods.

### One step TD (TD(0))

Let’s begin our exploration of TD methods by considering the problem of evaluating or predicting the state-value function $V(s)$. The simplest TD algorithm will update $V(s)$ according to the following rule:

$$V(s) \leftarrow V(s) + \alpha \left[G - V(S) \right],$$

where $G$ is the return from state $s$. The term in the brackets is the difference between the actual reward in state $s$ ($G$) and the current estimate of that reward ($V(s)$) and is called the temporal difference.