吴良超的学习笔记

Introduction to CTR Prediction Models - Non-Deep Learning

发表于 2018-07-15 标签计算广告，机器学习

This article mainly introduces some commonly used models in CTR prediction, focusing on non-deep learning models, including LR, GBDT+LR, FM/FFM, and MLR. Each model will briefly introduce its principles, paper sources, and some open-source implementations.

阅读全文 »

ROC Curve and PR Curve

发表于 2018-06-16 标签机器学习

ROC curve and PR curve are two important curves for evaluating machine learning algorithm performance. Their concepts are easily confused, but their usage scenarios differ. This article explains the meaning and application scenarios of both curves.

阅读全文 »

ROC 曲线与 PR 曲线

发表于 2018-06-16 标签机器学习

ROC 曲线和 PR 曲线是评估机器学习算法性能的两条重要曲线，两者概念比较容易混淆，但是两者的使用场景是不同的。本文主要讲述两种曲线的含义以及应用的场景。

阅读全文 »

format 函数常用语法

发表于 2018-06-03 标签 python

python 的 format 函数能够对输出做格式化从而使得符合输出的要求，这里记录其一些常见用法，主要参考了博客飘逸的 python - 增强的格式化字符串 format 函数

阅读全文 »

Reinforcement Learning Notes (3) - From Policy Gradient to A3C

发表于 2018-05-11 标签机器学习，强化学习

In the previous article Reinforcement Learning Notes (2) - From Q-Learning to DQN, we learned that Q-Learning series methods are value-based methods, which compute the value of each state-action pair and then select the action with the maximum value for execution. This is an indirect approach. Is there a more direct method? Yes! That is directly updating the policy. This article introduces Policy Gradient, which is a policy-based method. Additionally, we will introduce the Actor-Critic method that combines policy-based and value-based approaches, as well as DDPG and A3C methods built upon Actor-Critic.

阅读全文 »

强化学习笔记 (3)- 从 Policy Gradient 到 A3C

发表于 2018-05-11 标签机器学习，强化学习

在之前的文章强化学习笔记 (2)- 从 Q-Learning 到 DQN 中，我们已经知道 Q-Learning 系列方法是基于 value 的方法，也就是通过计算每一个状态动作的价值，然后选择价值最大的动作执行。这是一种间接的做法，那有没有更直接的做法呢？有！那就是直接更新策略。本文要介绍的 Policy Gradient 就是这类 policy-based 的方法，除此之外，还会介绍结合了 policy-based 和 value-based 的 Actor-Critic 方法，以及在 Actor-Critic 基础上的 DDPG、A3C 方法。

阅读全文 »

Reinforcement Learning Notes (2) - From Q-Learning to DQN

发表于 2018-05-09 标签机器学习，强化学习

In the previous article Reinforcement Learning Notes (1) - Overview, I introduced modeling reinforcement learning problems through MDP. However, since reinforcement learning often cannot obtain the transition probabilities in MDP, the value iteration and policy iteration for solving MDP cannot be directly applied to reinforcement learning problems. Therefore, some approximate algorithms have emerged to solve this problem. This article introduces the Q-Learning series methods developed based on value iteration, including Q-Learning, Sarsa, and DQN.

阅读全文 »

强化学习笔记 (2)- 从 Q-Learning 到 DQN

发表于 2018-05-09 标签机器学习，强化学习

在上一篇文章强化学习笔记 (1)- 概述中，介绍了通过 MDP 对强化学习的问题进行建模，但是由于强化学习往往不能获取 MDP 中的转移概率，解决 MDP 的 value iteration 和 policy iteration 不能直接应用到解决强化学习的问题上，因此出现了一些近似的算法来解决这个问题，本文要介绍的就是基于 value iteration 而发展出来的 Q-Learning 系列方法，包括 Q-Learning、Sarsa 和 DQN。

阅读全文 »

强化学习笔记 (1)- 概述

发表于 2018-05-05 标签机器学习，强化学习

本文主要介绍强化学习的一些基本概念：包括 MDP、Bellman 方程等, 并且讲述了如何从 MDP 过渡到 Reinforcement Learning。

阅读全文 »

Reinforcement Learning Notes (1) - Overview

发表于 2018-05-05 标签机器学习，强化学习

This article mainly introduces some basic concepts of reinforcement learning: including MDP, Bellman equations, etc., and describes how to transition from MDP to Reinforcement Learning.

阅读全文 »