In the previous article Reinforcement Learning Notes (1) - Overview, I introduced modeling reinforcement learning problems through MDP. However, since reinforcement learning often cannot obtain the transition probabilities in MDP, the value iteration and policy iteration for solving MDP cannot be directly applied to reinforcement learning problems. Therefore, some approximate algorithms have emerged to solve this problem. This article introduces the Q-Learning series methods developed based on value iteration, including Q-Learning, Sarsa, and DQN.