ALBERT

All Library Books, journals and Electronic Records Telegrafenberg

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
  • 1
    Publication Date: 2013-10-30
    Description: Named entity recognition seeks to locate atomic elements in texts and classify them into predefined categories. It is essentially useful for many applications, including microblog analysis and query suggestion. In recent years, with the explosion of Web 2.0, people have found it a promising way to extract large-scale, high-quality entities from structured web content. However, existing studies seldom provide an integrated system for simultaneously extracting and categorizing both the head and tail entities, and the identification of ambiguous entities is still a challenging task. In light of these, we propose a system named quasi-Automatic Named Entity Extraction and Categorization (ANEEC) for massive named-entity management. Specifically, ANEEC first identifies representative websites by using a small seed-set of entities and the query logs of a search engine, and then extracts high-quality entities from the parallel structures in the webpages. ANEEC then employs the extracted entities and their corresponding atom-level groups to establish an entity taxonomy as well as a hierarchical classifier ensemble. Two problems, i.e. definition abnormality and granularity unfitness, have also been addressed to further improve the quality of the taxonomy. An application case using 932 seed entities and the query logs of the search engine Bing demonstrates that ANEEC can effectively identify over 870 000 named entities in 32 bottom-level categories, and the resulting taxonomy has an excellent classification performance with F 1 =85.17%, provided that the entity features are properly preprocessed and weighted. In particular, ANEEC shows the potential for tail entity recognition and ambiguous entity detection.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...