Record:   Prev Next
Author Ray, Soumya
Title Learning from data with complex interactions and ambiguous labels
book jacket
Descript 148 p
Note Source: Dissertation Abstracts International, Volume: 66-12, Section: B, page: 6737
Co-Supervisors: Mark Craven; David Page
Thesis (Ph.D.)--The University of Wisconsin - Madison, 2005
In this thesis, we develop and evaluate machine learning algorithms that can learn effectively from data with complex interactions and ambiguous labels. The need for such algorithms is motivated by such problems as protein-protein binding and drug activity prediction
In the first part of the thesis, we focus on the problem of myopia. This problem arises when greedy learning strategies are applied to learn from data with complex interactions. We present skewing, our approach to alleviating myopia. We describe theoretical results and empirical results on Boolean data that show that our approach can learn effectively from data with complex interactions. We investigate the effects of various parameter choices on our approach, and the effects of dimensionality and class-label noise. We then propose and evaluate a variant that scales better to high-dimensional data. Finally, we propose and evaluate an extension that is able to learn from non-Boolean data with similar complex interactions as in the Boolean case
In the second part of the thesis, we focus on the multiple-instance (MI) problem. This problem arises when the class labels or responses of individual instances are unknown, but there are constraints relating the labels of collections of instances (bags). We first describe an empirical evaluation of several multiple-instance and supervised learning methods on several MI datasets. From our study, we derive several useful observations about the accuracy of supervised and MI methods on MI data. We next design and evaluate an approach to learning combining functions from data. These functions are used to combine predictions on each instance into a prediction for a bag. Finally, we consider the problem of regression in a multiple-instance setting. We show that an exact solution to this problem is NP-hard, and develop and evaluate approximation algorithms for MI regression on synthetic and real-world drug activity prediction problems. Our experiments show that there is value in considering the MI setting in regression as well as in learning combining functions from data
School code: 0262
DDC
Host Item Dissertation Abstracts International 66-12B
Subject Computer Science
Artificial Intelligence
0984
0800
Alt Author The University of Wisconsin - Madison
Record:   Prev Next