"Unsupervised named-entity extraction from the web: An experimental study", O. Etzioni and M. Cafarella and D. Downey and S. Kok and A. Popescu and T. Shaked and S. Soderland and D. Weld and A. Yates, Artificial Intelligence,91-134,2005
This paper describes a system that uses an unsupervised approach at entity extraction. It describe the architecture of the system and defines general principles for extraction. They present 3 ways improve recall and extraction without compromising precision:
Pattern Learning - domain-specific
Subclass Extraction - identifies sub-classes
List Extraction - locates lists of class instances
The paper starts with a motivation for using an unsupervised approach. It provides background in information extraction and some of the complexities involved. KNOWITALL uses extraction patterns and pointwise mutal information statistics calculated from the web using hit counts measuring degree of correlation between pairs of words. The PMIs are used as features for a classifier. It consumes information from search engines.
It uses a bootstrapping method which is a set of predicates. Labels are given for each class and symbolic names are given for each class. The Bootstrapper "uses the labels to instantiate extraction rules". The Extractor formulates queries to send to search engine and the Assessor evaluates the extractions.
The paper is quite extensive and describes the system well. Overall this paper is a good read and provides a number of tidbits for new ideas.
1 comment:
Thank you :)
Post a Comment