Unsupervised Coreference Resolution in a Nonparametric Bayesian Model, by Aria Haghighi and Dan Klein, ACL 2007.

Haghighi and Klein (H&K) have four papers on coreference resolution, and I read all of them in the last couple of days. It turns out that those four papers also comprise Haghighi’s entire dissertation, minus a little introduction and an appendix. The reason I read these papers is that I’m using H&K’s final coreference system in some research I’m doing, looking to expand on what they did. I’ll write a little bit about each paper here, starting with the first.


Read more...

Beyond NomBank: A Study of Implicit Arguments for Nominal Predicates, by Matthew Gerber and Joyce Y. Chai, ACL 2010.

This paper deals with semantic role labeling. That is, verbs take arguments, like “The lion ate the monkey” - ate has “the lion” as its “agent” argument and “the monkey” as its “patient” argument. (Verbal) Semantic role labeling is the process of determining what parts of a sentence correspond to the arguments of verbs. You can do this for nouns that are derived from verbs, as well, such as “shipping costs” - “shipping” has an entity that ships and an entity that was shipped, even though it’s not even the head noun of the noun phrase its in. If you really want to understand text, you need to be able to determine these arguments for any kind of predicate that you encounter.


Read more...

Filling Knowledge Gaps in Text for Machine Reading, by Anselmo Peñas and Eduard Hovy, COLING 2010.

I mentioned that I saw a talk by Ed Hovy, and so I put a bunch of his papers on my reading list. This was one that he mentioned in his talk, and I think it’s pretty fascinating.


Read more...

Large-Scale Extraction and Use of Knowledge from Text, by Peter Clark and Phil Harrison, K-CAP 2009.

This paper was cited by the Peñas and Hovy paper I just wrote about, and in the citation I thought it looked interesting. It seems clear to me that this paper provided some important foundation for the Peñas and Hovy paper, but is itself not incredibly interesting to me.


Read more...

Coreference resolution across corpora: languages, coding schemes, and preprocessing information, by Marta Recasens and Eduard Hovy (ACL 2010)

This paper discusses problems with evaluating coreference resolution systems. There are three standard metrics used (B^3, MUC, and CEAL); Recasens and Hovy did a controlled comparison of these evaluation metrics across a number of different scenarios. The problem is that they all produce different rankings of algorithms, sometimes in really bad ways (the highest scoring B^3 algorithm was the lowest scoring MUC algorithm, for instance).


Read more...