"Quantitative Trading: How to Build Your Own Algorithmic Trading Business" by Ernie Chan is a comprehensive guide that explores the world of quantitative trading and provides practical advice for building a algorithmic trading business, especially for individuals interested in quantitative trading.

It covers essential concepts, methodologies, and practical tips to help readers develop and implement their own algorithmic trading strategies while effectively managing risk and building a sustainable trading business.

Due to the significant benefits I received from reading this book, I want to write down some of the most important takeways I get from this book, with the hope that it can also be useful to you.

This passage is about the last four chapters, which introduces execution system in actual trading(automated and semi-automated), how to minimize transaction cost and determine the optimal leverage using the Kelly Criterion. It also talks about some special topics or common sense in trading. Finally, it lists some advantages of individuals investors over institutional investors.

阅读全文 »

"Quantitative Trading: How to Build Your Own Algorithmic Trading Business" by Ernie Chan is a comprehensive guide that explores the world of quantitative trading and provides practical advice for building a algorithmic trading business, especially for individuals interested in quantitative trading.

It covers essential concepts, methodologies, and practical tips to help readers develop and implement their own algorithmic trading strategies while effectively managing risk and building a sustainable trading business.

Due to the significant benefits I received from reading this book, I want to write down some of the most important takeways I get from this book, with the hope that it can also be useful to you.

This passage is about the first four chapters, which introduce basic requirements for independent traders, including search for ideas, perform backtest, and what we need before conducting real trading.

阅读全文 »

"The Almanack of Naval Ravikant" is a book that compiles the wisdom and insights of entrepreneur and investor Naval Ravikant. This book mainly talks about two topics, wealth and happiness. It offers practical advice on how to live a more fulfilling and purposeful life.

Through his own experiences and perspectives, Naval provides readers with valuable insights into topics like success, motivation, and personal growth, making the book a useful guide for anyone looking to improve their life and achieve their goals

I have benefited greatly from reading this book, and I want to write some important takeaways from this book. As this book was written in English, I want to give it a try to write it in English, too. Hoping it will be beneficial to you, this passage is about the second topic: Happiness

The passage about the first topic: wealth, can be found here

阅读全文 »

"The Almanack of Naval Ravikant" is a book that compiles the wisdom and insights of entrepreneur and investor Naval Ravikant. This book mainly talks about two topics, wealth and happiness. It offers practical advice on how to live a more fulfilling and purposeful life.

Through his own experiences and perspectives, Naval provides readers with valuable insights into topics like success, motivation, and personal growth, making the book a useful guide for anyone looking to improve their life and achieve their goals

I have benefited greatly from reading this book, and I want to write some important takeaways from this book. As this book was written in English, I want to give it a try to write it in English, too. Hoping it will be beneficial to you, this passage is about the first topic: Wealth

阅读全文 »

在搜广推相关业务中,除了 ctr、cvr 这类常规的二分类任务,还存在着预估 stay_duration、LTV、ECPM、GMV 等一系列回归任务

ctr、cvr 这类二分类任务常用的损失函数是交叉熵损失,基本假设是事件服从伯努利分布,最终学习的输出是正样本的比例,而回归任务中存在着非常多种的损失函数可选,如 mse、mae、huber loss、log-normal、weighted logistics regression、softmax 等

每种损失函数都有其假设和适用范围,如果真实 label 分布与假设差异较大,容易导致结果不佳,因此,本文会重点关注这些常见 loss 的推导过程以及假设

阅读全文 »

在实际的业务中,数据往往由多个 domain 组成,以广告为例,往往会存在多个转化目标,在 ctr、cvr 的预估时也要考虑不同转化目标的影响,因为在不同转化目标下,ctr、cvr 的分布 (如均值、方差) 往往是不一致的

解决这个问题最直观的思路是加 domain 相关特征或根据 domain 拆模型,前者属于隐式的方法,需要特征的区分性足够强、能被模型学到,但这个足够强没有一个量化的标准,基本只能看实验效果;后者则存在维护成本过高的问题,比如说有 n 个 domain 就要拆成 n 个模型

本文着重讲如何通过一个模型 serve 多个 domain 的方法,主要是在业界验证有收益且公开发表的工作,基本上可以分为 3 类

  1. multi-head 结构
  2. LHUC 机制
  3. GRL 机制
阅读全文 »

混排,往往是的推荐系统的最后一个环节,在这个阶段,自然内容(后面简称 item)需要与营销内容(后面简称 ad)进行混合,生成最终推送给用户的 list

如果以 Long Term Value (LTV) 的视角来看,这是个在 LT 和 V 之间做 trade-off 的过程,ad 如果出得过多,必然会挤压 item 的数量和位置,进而影响用户体验和留存即 LT,但相应的广告收入,或者说 Average revenue per user (ARPU) 会提升,反之亦然

所以业界往往的做法是定一个用户体验的约束,在这个约束下尽可能优化 ad 的效率,即达到收入最大化,因此很自然可以把这个建模成一个最优化问题,LinkedIn 在 2020 年的这篇 paper 就是这么做的,Ads Allocation in Feed via Constrained Optimization

直观地看混排这个问题,有 2 个子问题需要解决
(1)怎么计算每个 item 或 ad 在每个位置上的价值:因为 item 和 ad 是各自排序的,目标不同,最终的值的量纲也不同,这么把两者的 scale 拉到可比范围是一个需要讨论的问题
(2)怎么分配能让最终 list 价值最大化:在 item 和 ad 的价值确认后,怎么插入 item 和 ad 的位置,从而达到整个 list 的最大化

上面提到的 LinkedIn 的 paper 重点是在解决第二个问题,部分内容也涉及到第一个问题 ;本文会先讲一下这篇 paper 的建模方法,然后讨论下计算 item 和 ad 价值的一些思路,混排中一些其他需要注意的事项

阅读全文 »

随机性,如同带刺的玫瑰,危险而又迷人;随机性可能会给你带来生活中的惊喜,但也有可能会降临足以毁灭生活的灾难

在去年的年度小结提到,22 年最大的感悟,是生活存在非常多的随机性,正是这些随机性导致了无法预测和无人知晓的未来;1 月份在查资料过程中看到了塔勒布的 “不确定性” 四部曲:《随机漫步的傻瓜》、《黑天鹅》、《反脆弱》和《非对称风险》,相比于笔者简单的感悟,这几本书花了很大功夫来描述随机性这件事情

如果单纯看随机性,很容易让我们陷入虚无主义中,因为没什么是确定的,或者说没什么是能够坚信的,因为随机性几乎主导了一切。幸运的是,书里给出了还给出了一些也许可供参考的方法论,虽然作者行文有时候较为晦涩,但总体还是值得一读的。本文主要是 《随机漫步的傻瓜》和《黑天鹅》两本书的一些笔记以及拓展,祝开卷有益。

阅读全文 »

2022 的最后一个月,在全国 “喜阳阳” 的氛围中度过了,写这篇文章时,恢复了差不多一周多,基本也没什么症状了。但回想下今年发生的各种事情,真的是可以用魔幻来形容,这个魔幻不仅仅指防疫政策的 180 度大转弯,更是发生在身上的各种事情,这些事情基本都可以总计为计划赶不上变化,或者说未来无人知晓

一直都有写年度总结的习惯,而这么魔幻的一年,更值得写篇文章纪念下,正好也是元旦放假,趁着这几天把过去一年发生的事情梳理了一下~

阅读全文 »

当前的推荐或广告系统基本都是做到请求级别的预估和优化,在效果最大化的同时带来的问题是机器成本的上升;而流量分布的不均匀使得这个问题更为严峻,比如说对于抖音或美团,一天内流量往往有两个峰:午高峰和晚高峰,因为这两个时间点餐、刷手机的人数会陡增,而其他时间段流量会下降比较多,如下图所示

这意味着如果准备足以抗住高峰的机器,那在其他时间段大部分机器是空转的,或者说 roi 很低,因此往往在高峰的时候都需要扩容或降级。降级一般是指指降低请求数,按比例 drop 流量,但是 drop 流量对总体效果肯定是有损的,因此也衍生出了算力优化这个研究方向,算力优化本质上就是做效果和机器成本的 trade-off, 或者说如何尽可能无损地降本

本文主要介绍一些算力优化的常见手段,笔者将其总结为 drop、cache 和 dynamic 三类方法;而如果把消耗的算力拆解,可以直观拆成 2 部分:请求量 × 请求消耗的算力,因此可以从这两部分出发去优化算力

  • drop:直接把流量 drop 掉,即直接减少 “请求量”
  • cache:将之前的预估结果存到缓存中,每次预估不用经过实际机器的 inference,即减少了 “请求消耗的算力”
  • dynamic 则是根据请求的价值,动态控制每条请求消耗的算力,这个方法也是减少了 “请求消耗的算力”,DCAF 是这类方法的代表

上面的几个方法都是偏流量维度的优化,还有一些方法是对模型 inference 的 耗时进行优化的,主要方向是计算并行(硬件升级)、模型压缩(量化、蒸馏、结构调整等),本文就不详细展开了

阅读全文 »
0%