How
well one particular algorithm performs depend on genre of text
Although much data processing time went into the finer grains of differentiating
different types of pronouns and noun phrases, the corpuses were too small
and the resulting figures not meaningful. So, statistically, I concentrated
on the coarser comparison between pronouns and noun phrases. As noted
above, an exception occurs in News2 where the error
of noun phrase resolution is less than that of pronouns. Thus
I concentrated on reasoning how this text was different from the other
two, and found out that this article is a lot more
focused. This means that the discourse
entities in the S-List will nearly always be evoked in consecutive utterances.
As a result, the elements will stay in the S-List across many utterances,
increasing the algorithm’s chance of hitting the correct antecedent.
In fact, this observation also explains why the error rates associated
with News2 are on the whole so much lower than that of the other two corpuses
using the Strube’s algorithm. This observation shows that success
rate may depend on the “suitable” algorithm-text pair – that is,
there may exist one algorithm that performs better for one genre of text
(of a particular writing style) and not in others. Extending this
idea, an algorithm that handles one type of noun phrases may not handle
another type. These hypotheses, of course, have yet far too little
evidence that supports them.
The
S-List algorithm is a combination of the Recency Constraint and the Centering
Theory
One other observation
about the S-List algorithm is that it is implicitly a combination of both
the recency constraint and the centering theory. Where
two discourse entities are in the same class but in different utterances,
we prioritize the element that is in the most recently mentioned utterance.
This is analogous to the ranking criteria in the recency
constraint – where we apply the recency constraint to the utterances
themselves (global), and not to the discourse entities within an utterance
(local). On the other hand, if two discourse
entities are in the same center and in the same utterance, we prioritize
the element that appears first in the utterance. This is analogous
to the ranking criteria in the centering theory
(where subject >> objects >> others), because subjects tend to position
themselves near the beginning of a sentence. Individually, both Centering
Theory and the recency constraint are intuitively correct – that is, they
“make sense” and correct enough. But, together, they give better
performance, as shown by experiments by Strube (1996). If
this S-List combines again with another algorithm, the new combo may have
an even better performance. It is tempting to conclude that combining
more algorithms gives better performance, but I would need to conduct more
experiments on larger corpus to arrive at that conclusion.
Recency
Constraint fails mainly becuase subjects are mainly placed at the beginning
of a sentence
Whilst applying Hobb’s
algorithm, I noticed that most common cause for not finding the correct
antecedents stemmed from the conventional sentences structures. Sentences
are usually structured so that the subjects and other more important discourse
entities that the writer wants the reader to remember more distinctly are
placed at the beginning of a sentence to stress their importance.
Less important discourse entities are placed towards the end of the sentence.
Since important entities are those that the writer
wants to stress, they are the entities that the writer will most likely
continue to use over the next utterances.
This means that entities at the beginning of the sentence are likely to
be the correct antecedents. However, Hobb’s algorithm chooses
the most recently evoked entities which tend to be located at the end of
the previous sentence – that is, the algorithm chooses relatively unimportant
entities. This is one other reason why Hobb’s algorithm performs
rather poorly, for both pronouns and noun phrases. Although Strube
applied the recency constraint idea, his algorithm worked better.
This is because instead of focusing on discourse entities within an utterance,
he applied the constraint to the global utterance themselves, enforcing
the aforementioned idea that we need to look more globally for accuracy.
The
noun phrase A and B evokes two different discourse entities: "A and B"
and "A" and "B"
However, there are two
problems that neither algorithm handles, causing relatively high error
rates in both. The first is the undetermined
ranking criteria for the noun phrase – “A and B”, where “A” and
“B” represents any objects. Two types of entities can be evoked from
this noun phrase:
(1) “A and B”
(2) “A” and “B” as separate entities.
Neither algorithm gives specifications as to which should be ranked higher. For my experiment, I assumed that these noun phrases are interpreted as “A” and “B” unless the text made explicit indications otherwise. This eliminated the complication of having to rank “A and B” and “A” and “B” as separate entities. However, I failed to see any patterns as to when one type of discourse entity should be preferred over the other.
A
pronoun may refer to some entities not yet mentioned
The second problem that
both algorithms experienced stemmed from the fact that both algorithms
updates their discourse entity lists incrementally,
word/phrase by word/phrase , from left to right. This means that
the algorithms are prone to the word order of a sentence.
For example,
12a. Until he composed Parsifal, no one knew Wagner.
This causes a problem because we encounter the referent “he” before we encounter its antecedent “Wagner”. Furthermore, these kinds of sentences are not usually used when the center of the previous utterance is, in this case, Wagner, but used when the center is some other discourse entity. One obvious but somewhat complicated solution is to eliminate this type of sentence structure altogether: First, scan through the corpus and look for sentences with this kind of structure. Rearrange these sentences so that they become “less complicated”. For example, utterance 12a would become 13a:
13a. No one knew Wagner until he composed Parsifal.
However, this would require
a fair amount of undesired preprocessing of the corpus. I propose
a solution which does not require any preprocessing and maintains the incremental
update property.