Record:   Prev Next
作者 Long, Xiaohui
書名 Efficient query processing in large Web search engines
國際標準書號 9780542777905
book jacket
說明 99 p
附註 Source: Dissertation Abstracts International, Volume: 67-07, Section: B, page: 3907
Adviser: Torsten Suel
Thesis (Ph.D.)--Polytechnic University, 2006
Large web search engines have to answer thousands of queries per second in interactive response time. Due to the sizes of the data sets involved, often in the range of multiple terabytes, a single query may require the processing of hundreds of megabytes or more of index data. Thus, performance of query processing becomes a critical issue of Web search engines. To keep up with this immense workload, large search engines employ clusters of hundreds or thousands of machines, and a number of techniques such as caching, index compression, and index and query pruning are used to increase query throughput and decrease overall cost
In this thesis, we investigate three techniques: index compression, caching, and query pruning. We demonstrate how these techniques may be used effectively to increase the throughput of query processing in Web search engines. First, we revisit several well known compression schemes for inverted index structures and compare their compression ratios, decoding overheads and impacts on performance of query processing. Next, we present a three-level caching architecture (result cache, list cache, and a new projection cache) and several cache replacement policies are studied on different levels. Finally, we propose query pruning algorithms that use the global ordering (e.g. Pagerank) on the Web for optimized query processing. For experimental evaluation, we use a search engine platform that we developed as part of this dissertation research, a large document collection crawled from the Web, and query logs collected by commercial search engines
School code: 0179
DDC
Host Item Dissertation Abstracts International 67-07B
主題 Computer Science
0984
Alt Author Polytechnic University
Record:   Prev Next