Descript |
1 online resource(xv, 122 pages) : illustrations |
|
text rdacontent |
|
electronic isbdmedia |
|
online resource rdacarrier |
Series |
Synthesis lectures on information concepts, retrieval, and services, 1947-9468 ; # 45
|
|
Synthesis digital library of engineering and computer science
|
|
Synthesis lectures on information concepts, retrieval, and services ; # 45. 1947-9468
|
Note |
Part of: Synthesis digital library of engineering and computer science |
|
Includes bibliographical references (pages 93-120) |
|
1. Introduction -- 1.1 Web search business -- 1.2 Basic search engine architecture -- 1.3 Scalability issues -- |
|
2. The web crawling system -- 2.1 Basic web crawling architecture -- 2.2 Extending the web repository -- 2.3 Refreshing the web repository -- 2.4 Managing the web repository -- 2.5 Distributed web crawling -- 2.6 Factors affecting crawling performance -- 2.7 Literature on web crawling -- 2.8 Open issues in web crawling -- |
|
3. The indexing system -- 3.1 Basic indexing architecture -- 3.2 Inverted index -- 3.3 Compressing an inverted index -- 3.4 Constructing an inverted index -- 3.5 Updating an inverted index -- 3.6 Partitioning an inverted index -- 3.7 Literature on indexing -- 3.8 Open issues in indexing -- |
|
4. The query processing system -- 4.1 Basic query processing architecture -- 4.2 Query processing on a search node -- 4.3 Query processing in a search cluster -- 4.4 Architectural optimizations -- 4.5 Caching -- 4.6 Query processing on multiple search sites -- 4.7 Literature on query processing -- 4.8 Open issues in query processing -- |
|
5. Concluding remarks -- Bibliography -- Authors' biographies |
|
Abstract freely available; full-text restricted to subscribers or individual document purchasers |
|
Compendex |
|
INSPEC |
|
Google scholar |
|
Google book search |
|
Mode of access: World Wide Web |
|
System requirements: Adobe Acrobat Reader |
|
In this book, we aim to provide a fairly comprehensive overview of the scalability and efficiency challenges in large-scale web search engines. More specifically, we cover the issues involved in the design of three separate systems that are commonly available in every web-scale search engine: web crawling, indexing, and query processing systems. We present the performance challenges encountered in these systems and review a wide range of design alternatives employed as solution to these challenges, specifically focusing on algorithmic and architectural optimizations. We discuss the available optimizations at different computational granularities, ranging from a single computer node to a collection of data centers. We provide some hints to both the practitioners and theoreticians involved in the field about the way large-scale web search engines operate and the adopted design choices. Moreover, we survey the efficiency literature, providing pointers to a large number of relatively important research papers. Finally, we discuss some open research problems in the context of search engine efficiency |
|
Also available in print |
|
Title from PDF title page (viewed on January 22, 2016) |
Link |
Print version: 9781627058124
|
Subject |
Web search engines
|
|
Computer networks -- Scalability
|
|
cache invalidation
|
|
central broker
|
|
compression
|
|
content spam
|
|
delay attacks
|
|
distributed crawling
|
|
distributed query processing
|
|
DNS cache
|
|
document id reassignment
|
|
download throughput
|
|
dynamic index pruning
|
|
early exit optimization
|
|
effectiveness
|
|
efficiency
|
|
forward index
|
|
index construction
|
|
index maintenance
|
|
index partitioning
|
|
index replication
|
|
indexing
|
|
inverted index
|
|
inverted list cache
|
|
inverted list
|
|
link exchange
|
|
link farm
|
|
link spam
|
|
machine-learned ranking
|
|
matching
|
|
multisite web search
|
|
near duplicate detection
|
|
page cache
|
|
performance
|
|
position list
|
|
posting list
|
|
query-independent feature
|
|
query expansion
|
|
query forwarding
|
|
query interpretation
|
|
query processing
|
|
query rewriting
|
|
relevance
|
|
query scheduling
|
|
response latency
|
|
result cache
|
|
result freshness
|
|
result preparation
|
|
result retrieval
|
|
scalability
|
|
search center
|
|
search cluster
|
|
search engine result page
|
|
search quality
|
|
selective search
|
|
shingles
|
|
skip pointer
|
|
snippet
|
|
soft 404 page
|
|
spider trap
|
|
static index pruning
|
|
text processing
|
|
throughput
|
|
tiering
|
|
time-to-live
|
|
two-phase ranking
|
|
URL-seen test
|
|
URL caching
|
|
web change
|
|
web coverage
|
|
web crawler
|
|
web frontier
|
|
web graph
|
|
web repository
|
|
web search engine
|
|
website mirror
|
Alt Author |
Baeza-Yates, R. (Ricardo), author
|
|