ROC curve and PR curve are two important curves for evaluating machine learning algorithm performance. Their concepts are easily confused, but their usage scenarios differ. This article explains the meaning and application scenarios of both curves.

阅读全文 »

python 的 format 函数能够对输出做格式化从而使得符合输出的要求,这里记录其一些常见用法,主要参考了博客 飘逸的 python - 增强的格式化字符串 format 函数

阅读全文 »

在之前的文章 强化学习笔记 (2)- 从 Q-Learning 到 DQN 中,我们已经知道 Q-Learning 系列方法是基于 value 的方法, 也就是通过计算每一个状态动作的价值,然后选择价值最大的动作执行。这是一种间接的做法,那有没有更直接的做法呢?有!那就是直接更新策略。本文要介绍的 Policy Gradient 就是这类 policy-based 的方法, 除此之外,还会介绍结合了 policy-based 和 value-based 的 Actor-Critic 方法,以及在 Actor-Critic 基础上的 DDPG、A3C 方法。

阅读全文 »

In the previous article Reinforcement Learning Notes (2) - From Q-Learning to DQN, we learned that Q-Learning series methods are value-based methods, which compute the value of each state-action pair and then select the action with the maximum value for execution. This is an indirect approach. Is there a more direct method? Yes! That is directly updating the policy. This article introduces Policy Gradient, which is a policy-based method. Additionally, we will introduce the Actor-Critic method that combines policy-based and value-based approaches, as well as DDPG and A3C methods built upon Actor-Critic.

阅读全文 »

In the previous article Reinforcement Learning Notes (1) - Overview, I introduced modeling reinforcement learning problems through MDP. However, since reinforcement learning often cannot obtain the transition probabilities in MDP, the value iteration and policy iteration for solving MDP cannot be directly applied to reinforcement learning problems. Therefore, some approximate algorithms have emerged to solve this problem. This article introduces the Q-Learning series methods developed based on value iteration, including Q-Learning, Sarsa, and DQN.

阅读全文 »

在上一篇文章强化学习笔记 (1)- 概述中,介绍了通过 MDP 对强化学习的问题进行建模,但是由于强化学习往往不能获取 MDP 中的转移概率,解决 MDP 的 value iteration 和 policy iteration 不能直接应用到解决强化学习的问题上,因此出现了一些近似的算法来解决这个问题,本文要介绍的就是基于 value iteration 而发展出来的 Q-Learning 系列方法,包括 Q-Learning、Sarsa 和 DQN。

阅读全文 »
0%