Record:   Prev Next
作者 Snow, Rion Langley
書名 Semantic taxonomy induction
國際標準書號 9781109241075
book jacket
說明 176 p
附註 Source: Dissertation Abstracts International, Volume: 70-07, Section: B, page: 4281
Adviser: Andrew Y. Ng
Thesis (Ph.D.)--Stanford University, 2009
Understanding natural language has been a longstanding dream of artificial intelligence, and machine learning offers a new perspective on this old problem. This work addresses four key problems in automatically reading and understanding text: extracting the knowledge expressed in a body of text in the form of structured relations, reconciling and formalizing that knowledge in a fully consistent, sense-disambiguated hierarchy of knowledge, fluidly transitioning from fine-grained to coarse-grained distinctions between word senses, and applying extracted structured knowledge in applications that depend on deep textual understanding
Textual patterns have frequently been devised to identify specific instances of world knowledge in text. For example, from the text "such fruits as apples and oranges" one might infer the knowledge that "apples and oranges are kinds of fruit". In this work we discuss the use of distant supervision for relation extraction, which applies machine learning techniques to a set of example relation instances and a large body of unannotated text in order to rediscover many of the textual patterns formerly proposed in the information extraction literature, along with hundreds of thousands of previously unconsidered patterns. Further, we apply these automatically discovered patterns to extract structured knowledge from newswire articles and other text, significantly outperforming hand-designed patterns and discovering hundreds of thousands of novel examples of world knowledge not previously encoded in manually-created knowledge bases
Many proposed methods for extracting structured knowledge suffer from a critical inability to deal with redundancy or contradictory extractions. While modern algorithms can often suggest millions of possible facts extracted from a large body of text, they are unable to reconcile this extracted knowledge into a set of consistent, sense-disambiguated assertions. We propose a probabilistic framework for taxonomy induction that solves each of these problems, taking advantage of the full set of predicted facts and any knowledge already known in an existing taxonomy. This work has resulted in one of the largest automatically-constructed augmentations of the WordNet knowledge base currently in existence
In addition to the automatic augmentation of knowledge resources, we explore the task of automatically creating coarse-grained taxonomies. It has been widely observed that different natural language applications require different sense granularities in order to best exploit word sense distinctions, and that for many applications WordNet senses are too fine-grained. In contrast to previously proposed automatic methods for sense clustering, we formulate sense merging as a supervised learning problem, exploiting human-labeled sense clusterings as training data. Our learned similarity measure outperforms previously proposed automatic methods for sense clustering on the task of predicting human sense merging judgments. Finally, we propose a model for clustering sense taxonomies using the outputs of this classifier, and we make available several automatically sense-clustered WordNets of various sense granularities. These resources offer the capability of tailoring a knowledge resource to the sense granularity most suited to a particular application
Our framework for taxonomy induction lays the groundwork for new semantic applications, including inferring domain-specific hierarchies of knowledge and augmenting foreign-language Wordnets. Finally, we demonstrate that our automatically augmented taxonomies significantly outperform manually-constructed resources across several natural language tasks, including relation prediction, question answering, and text categorization
School code: 0212
Host Item Dissertation Abstracts International 70-07B
主題 Artificial Intelligence
0800
Alt Author Stanford University
Record:   Prev Next