ISSN:
1467-8640
Source:
Blackwell Publishing Journal Backfiles 1879-2005
Topics:
Computer Science
Notes:
The existence of structural ambiguity in modifying clauses renders noun phrase (NP) extraction from running Chinese texts complicated. It is shown from previous experiments that nearly 33% of the errors in an NP extractor were actually caused by the use of clause modifiers. For example, consider the sequence “V + NP1+〈inlineGraphic alt="inline image" href="urn:x-wiley:08247935:COIN201:COIN_201_mu1" location="equation/COIN_201_mu1.gif"/〉 (of) + NP0.” It can be interpreted as two alternatives, a verb phrase (i.e., [V[NP1+〈inlineGraphic alt="inline image" href="urn:x-wiley:08247935:COIN201:COIN_201_mu2" location="equation/COIN_201_mu2.gif"/〉+ NP0]NP]VP) or a noun phrase (i.e., [[V NP1]VP+〈inlineGraphic alt="inline image" href="urn:x-wiley:08247935:COIN201:COIN_201_mu3" location="equation/COIN_201_mu3.gif"/〉+ NP0]NP). To resolve this ambiguity, syntactical, contextual, and semantics-based approaches are investigated in this article. The conclusion is that the problem can be overcome only when the semantic knowledge about words is adopted. Therefore, a structural disambiguation algorithm based on lexical association is proposed. The algorithm uses the semantic class relation between a word pair derived from a standard Chinese thesaurus, 〈inlineGraphic alt="inline image" href="urn:x-wiley:08247935:COIN201:COIN_201_mu4" location="equation/COIN_201_mu4.gif"/〉, to work out whether a noun phrase or a verb phrase has a stronger lexical association within the collocation. This can, in turn, determine the intended phrase structure. With the proposed algorithm, the best accuracy and coverage are 79% and 100%, respectively. The experiment also shows that the backed-off model is more effective for this purpose. With this disambiguation algorithm, parsing performance can be significantly improved.
Type of Medium:
Electronic Resource
URL:
http://dx.doi.org/10.1111/1467-8640.00214
Permalink