Descript 
179 p 
Note 
Source: Dissertation Abstracts International, Volume: 6409, Section: B, page: 4459 

Adviser: Lawrence B. Holder 

Thesis (Ph.D.)The University of Texas at Arlington, 2003 

This work is concerned with developing a machine learning algorithm for inducing contextfree graph grammars, which is guided by the minimum description length principle. Recognizing the representational power of graphs and the ability of graph grammars to generalize, we develop an approach that extends the expressive power of hypotheses learned from examples over algorithms that learn static patterns only, to equal the expressive power of graph grammars. The induced graph grammar is our hypothesis for concepts. We motivate our work from different angles. First, we point to the need for representing more complex concepts, those that include recursion, variability and relationships among data points. Second, we point out the lack of existing graphbased systems to address these concerns. Third, we recognize the ability of other machine learning approaches to address some of these concerns, and that the combination of graphbased systems with these features will have practical utility. We survey related work, among which are graphbased and inductive logic programming systems. These systems are the closest to our approach, either because they represent data in graph format, or they are able to learn recursive concepts. We describe our approach in detail, which includes the ability to learn recursive concepts, concepts that include variable data points, and those that include relationships among data points. We extend our approach for both discovery and concept learning modes. We find that the class of graph grammars learned by our algorithm is equivalent to the class of regular graph grammars. The algorithm is implemented as an extension to the Subdue knowledge discovery system, and is named SubdueGL. We present a rigorous evaluation methodology used to validate the new algorithm. We find that the system tolerates noise to a certain degree. We also evaluate the system on several realworld examples, among which are an antiterrorism domain, protein sequences of primary and secondary structure, and a breast cancer domain. We compare the system to prominent competing approaches on further examples, which are commonly used in the literature to for the purpose of evaluating machine learning systems. We find that our approach performs better than competing ILP systems on the domains tested, and does better that some statistical and connectionist approaches reported in the literature 

School code: 2502 

DDC 
Host Item 
Dissertation Abstracts International 6409B

Subject 
Computer Science


Artificial Intelligence


0984


0800

Alt Author 
The University of Texas at Arlington

