为什么基于skip-gram的word2vec在低频词汇相比cbow更有效？-SofaSofa

面试题库里卷38里的一道题目

对于基于skip-gram和基于CBOW的word2vec，哪个模型对低频词汇表现更好？

答案是skip-gram

不是非常理解，求大神分析下

yayat 2018-08-11 23:27

2个回答

CBOW是根据上下文预测当中的一个词，也就是用多个词预测一个词

比如这样一个句子yesterday was really a [...] day，中间可能是good也可能是nice，比较生僻的词是delightful。当CBOW去预测中间的词的时候，它只会考虑模型最有可能出现的结果，比如good和nice，生僻词delightful就被忽略了。

而对于[...] was really a delightful day这样的句子，每个词在进入模型后，都相当于进行了均值处理（权值乘以节点），delightful本身因为生僻，出现得少，所以在进行加权平均后，也容易被忽视。

Skip-Gram是根据一个词预测它的上下文，也就是用一个词预测多个词，每个词都会被单独得训练，较少受其他高频的干扰。所以对于生僻词Skip-Gram的word2vec更占优。

SofaSofa数据科学社区 DS面试题库 DS面经

mrhust 2018-09-24 08:55

在 Google Groups 中，Milokov 提到：

“Skip-gram: works well with small amount of the training data, represents well even rare words or phrases

CBOW: several times faster to train than the skip-gram, slightly better accuracy for the frequent words

This can get even a bit more complicated if you consider that there are two different ways how to train the models: the normalized hierarchical softmax, and the un-normalized negative sampling. Both work quite differently.”

关于这段话，stackoverflow 就数据量这个问题进行过讨论：https://stackoverflow.com/questions/39224236/word2vec-cbow-skip-gram-performance-wrt-training-dataset-size

但关于低频词汇的有效性，并没有过多的说明，我是这样反向理解的：由于 CBOW 需要更多的数据，所以它对高频词汇更敏感，从而在低频词汇上表现没有 skip-gram 好。

觉得这样解释不是很好，欢迎补充！

SofaSofa数据科学社区 DS面试题库 DS面经

lpq29743 2018-09-08 21:18

为什么基于skip-gram的word2vec在低频词汇相比cbow更有效？

Warning

2个回答

Warning

Warning