Monday, May 16, 2011

Narrowing the Modeling Gap: A Cluster-Ranking Approach to Coreference Resolution

"Narrowing the Modeling Gap: A Cluster-Ranking Approach to Coreference Resolution"
Journal of Artificial Intelligence Research 40 (2011) 469–521 Submitted 06/10; published 02/11
Altaf Rahman altaf@hlt.utdallas.edu
Vincent Ng vince@hlt.utdallas.edu
Human Language Technology Research Institute
University of Texas at Dallas
800 West Campbell Road; Mail Station EC31
Richardson, TX 75080-3021 U.S.A.
http://www.jair.org/media/3120/live-3120-5478-jair.pdf

-Long paper, very thorough
-A lot of history, use the citations in this paper
-Learn more about "centering algorithms"
-Describes three different models, mention-pair,entity-mention and mention-ranking
-Outlines key features and deficiencies of each
-In particular the transitivity property is not addressed in the mention-pair model so clustering is used
-Mention-ranking outperforms mention-pair
-Describe a cluster-ranking approach combines both models
-Use lexicalization and knowledge of anaphoricity
-Used ACE for experiments

Interesting:



  1. "Specifically, a classifier that is trained on
    coreference-annotated data is used to determine whether a pair of mentions is co-referring
    or not. However, the pairwise classifications produced by this classifier (which is now commonly
    known as the mention-pair model) may not satisfy the transitivity property inherent
    in the coreference relation, since it is possible for the model to classify (A,B) as coreferent,
    (B,C) as coreferent, and (A,C) as not coreferent. As a result, a separate clustering mechanism
    is needed to coordinate the possibly contradictory pairwise classification decisions and
    construct a partition of the given mentions."

  2. Read about Lappin and Leass’s algorithm

  3. Read about centering algorithms

  4. "the distinction between
    classification and ranking applies to discriminative models but not generative models.
    Generative models try to capture the true conditional probability of some event. In the context
    of coreference resolution, this will be the probability of a mention having a particular
    antecedent or of it referring to a particular entity (i.e., preceding cluster). Since these probabilities
    have to normalize, this is similar to a ranking objective: the system is trying to raise
    the probability that a mention refers to the correct antecedent or entity at the expense of
    the probabilities that it refers to any other. Thus, the antecedent version of the generative
    coreference model as proposed by Ge et al. (1998) resembles the mention-ranking model,
    while the entity version as proposed by Haghighi and Klein (2010) is similar in spirit to the
    cluster-ranking model."