Record:   Prev Next
作者 Kosorukoff, Alexander Lvovich
書名 Methods for cluster analysis and validation in microarray gene expression data
國際標準書號 9780542774423
book jacket
說明 103 p
附註 Source: Dissertation Abstracts International, Volume: 67-07, Section: B, page: 3903
Adviser: Sylvian Ray
Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2006
Motivation. Unsupervised learning or clustering is frequently used to explore gene expression profiles for insight into both regulation and function. However, the quality of clustering results is often difficult to assess and each algorithm has tunable parameters with often no obvious way to choose appropriate values. Most algorithms also require the number of clusters to be predetermined yet this value is rarely known and, thus, is arrived at by subjective criteria. Here we present a method to systematically address these challenges using statistical evaluation
Method. The method presented compares the quality of clustering results in order to choose the most appropriate algorithm, distance metric and number of clusters for gene network discovery using objective criteria. In brief, two quality assessment metrics are used: the Consensus Share (CS) and the Feature Configuration Statistic (FCS). CS is the percentage of genes (not gene pairs) that are identically clustered in several clusterings and FCS is a measure of randomness of the observed configuration of transcription factor binding sites among clustered genes
Results. We evaluate this method using both artificial and yeast microarray data. By choosing parameters settings that minimize FCS values and maximize CS values we show major advantages over other clustering methods in particular for identifying combinatorially regulated groups of genes. The results produced provide remarkable enrichment for cis-regulatory elements in clusters of genes known to be regulated by such elements and evidence of extensive combinatorial regulation. Moreover, the method can be generalized when prior information about cis-regulatory sites is absent or it is desirable to calculate FCS values based on functional categorization
School code: 0090
DDC
Host Item Dissertation Abstracts International 67-07B
主題 Biology, Bioinformatics
Computer Science
0715
0984
Alt Author University of Illinois at Urbana-Champaign
Record:   Prev Next