This paper was cited by the Peñas and Hovy paper I just wrote about, and in the citation I thought it looked interesting. It seems clear to me that this paper provided some important foundation for the Peñas and Hovy paper, but is itself not incredibly interesting to me.

This paper follows a paper by Schubert in using part of speech sequences and words seen in those sequences as indices into a database of counts, just like the Peñas and Hovy paper. The bulk of the paper is in describing that system, why it might be useful, and how they created the database. They then try to evaluate it in three ways, none of which were very impressive to me. Don’t get me wrong, it’s really good that they got us thinking about these kinds of databases. I just don’t think they found the best applications or evaluations of it.

The first evaluation was using their counts to improve a parser. Seeing that statement in the Peñas and Hovy paper was what initially made me read this - I hope to do similar things eventually, and so I wanted to see what they did. What they did amounts to lexicalization, which has been done for a very long time. They lexicalized a hand-built, rule-based parser, which (unsurprisingly) improved the performance, though only by a very little bit. It would have been a much more accurate evaluation to compare against other kinds of lexicalization, like what the Collins parser or the Stanford parser do, but they didn’t seem to even notice the connection between what they did and what has been done before. I suspect that their method would be worse than current methods of lexicalizing parsers.

They also tested on some synonym generation in the context of recognizing textual entailment, but it wasn’t that interesting to me, so I won’t say more about it.

When they did a human evaluation of their database, the precision was something like 70%, which wasn’t all that impressive to me. I wonder how much of that low precision was due to errors in the dependency parser that they used in creating the database.