https://github.com/haje01/gym-tictactoe/blob/master/examples/td_agent.py