Three More Word Embeddings Papers

Posted on Jan 18, 2015 under Neural Networks , Word Embeddings

Ok, one last round of word embeddings papers, then I’m on to research that’s a lot more relevant to my current work. Here I’ll look at three papers, all of which are again related to skip-gram. Two of them look at giving more or different information to neural embedding models, and one looks a little more deeply at the objective function optimized by skip-gram.

Tensor Decompositions and Applications; Kolda and Bader, SIREV 2009

Posted on Jan 14, 2015 under Tensors

Link to paper.

Reactions to the skip-gram model (three papers)

Posted on Jan 13, 2015 under Skip-gram , Word Embeddings , Neural Networks

Finishing up (for now) my reading about skip-gram, I’ll summarize three papers that provide different reactions or follow-ons to the skip-gram papers, all of which deal with the difference between traditional distributional models (that produce representations by looking at word-count statistics in a corpus) and the new neural-network based models (that directly train representations to make predictions about the corpus).

Distributed Representations of Words and Phrases and their Compositionality; Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean; NIPS 2013

Posted on Jan 9, 2015 under Word Embeddings , Neural Networks , Skip-gram

The second word embeddings paper I’ll discuss is the second main skip-gram paper, a follow on to the original ICLR paper that basically drops the CBOW model and focuses on scaling up the skip-gram model to larger datasets. This paper gives three main contributions, I would say. First, they provide a slightly modified objective function and a few other sampling heuristics that result in a more computationally efficient model. Second, they show that their model works with phrases, too, though they just do this by replacing the individual tokens in a multiword expression with a single symbol representing the phrase - pretty simple, but it works. And lastly, they show what to me was a very surprising additional feature of the learned vector spaces: some relationships are encoded compositionally in the vector space, meaning that you can just add the vectors for two words like “Russian” and “capital” to get a vector that is very close to “Moscow”. They didn’t do any kind of thorough evaluation of this, but the fact the it works at all was very surprising to me. They did give a reasonable explanation, however, and I’ve put it into math below.

Efficient Estimation of Word Representations in Vector Space; Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean; ICLR 2013

Posted on Jan 8, 2015 under Word Embeddings , Neural Networks , Skip-gram

I’m a bit late to the word embeddings party, but I just read a series of papers related to the skip-gram model proposed in 2013 by Mikolov and others at Google.

Matt's Thoughts On NLP Papers

Three More Word Embeddings Papers

Tensor Decompositions and Applications; Kolda and Bader, SIREV 2009

Reactions to the skip-gram model (three papers)

Distributed Representations of Words and Phrases and their Compositionality; Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean; NIPS 2013

Efficient Estimation of Word Representations in Vector Space; Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean; ICLR 2013

Tags

Archives

Subscribe