This article mainly introduces a common technology behind several important Internet businesses (online advertising, recommendation systems, search engines): semantic understanding, and various methods to implement it, including matrix factorization, topic models, etc. The original video is here (requires VPN).

阅读全文 »

This distributed machine learning series was shared by Wang Yi, covering distributed machine learning. As the author mentioned in the sharing, distributed machine learning differs significantly from the machine learning we commonly hear about today, so many views in the sharing run counter to what we learned from textbooks. The author has rich experience in this area—although it’s a three-year-old sharing, some technologies may have changed, but some views still have reference value.

I have doubts about some views in the sharing. Here I record them according to the author’s expression—perhaps only after I start working will I have the opportunity to verify their correctness.

This article mainly introduces some important concepts in distributed machine learning: real Internet data follows a long-tail distribution, “big is more important than fast,” and not blindly applying a framework. The corresponding video is here (requires VPN).

阅读全文 »

这个分布式机器学习系列是由王益分享的,讲的是分布式机器学习。正如作者在分享中所说,分布式机器学习与我们今天常听到的机器学习存在比较大的差异,因此分享中的很多观点跟我们从教课书上学到的机器学习是背道而驰的。作者在这方面具有丰富的经验,虽然是三年前的分享,或许分享中提到的部分技术改变了,但是其中的一些观点还是具有一定参考价值的。

笔者对于分享中的一些观点也是存在疑惑的,这里还是按照分享中作者表达的意思记录下来, 也许等到笔者工作后,才有机会去验证这些观点的正误。

本文主要介绍了分布式机器学习中的一些重要概念,如互联网的真实数据是长尾分布的、大比快要重要、不能盲目套用一个框架等,本文对应的视频在这里,需要自备梯子。

阅读全文 »

一个多月没写文章了,这个月主要是被各种焦头烂额的事情所烦扰:比赛、数据的采集与筛选、各种无聊的报告等等。一眨眼就踏入了 2018,本来也不打算写年度总结,但是后来想想还是做一下简单的记录,一是因为自己本来就有总结的习惯,要不也不会一直在写这个博客;二是因为不总结下,都不知道自己这一年过得有多烂(捂脸)。言归正传,下面主要写一下在这一年里干了啥。

阅读全文 »
0%