Record:   Prev Next
Author Cambazoglu, B. Barla., author
Title Scalability challenges in web search engines / B. Barla Cambazoglu. Ricardo Baeza-Yates
Imprint San Rafael, California (1537 Fourth Street, San Rafael, CA 94901 USA) : Morgan & Claypool, 2016
book jacket
Descript 1 online resource(xv, 122 pages) : illustrations
text rdacontent
electronic isbdmedia
online resource rdacarrier
Series Synthesis lectures on information concepts, retrieval, and services, 1947-9468 ; # 45
Synthesis digital library of engineering and computer science
Synthesis lectures on information concepts, retrieval, and services ; # 45. 1947-9468
Note Part of: Synthesis digital library of engineering and computer science
Includes bibliographical references (pages 93-120)
1. Introduction -- 1.1 Web search business -- 1.2 Basic search engine architecture -- 1.3 Scalability issues --
2. The web crawling system -- 2.1 Basic web crawling architecture -- 2.2 Extending the web repository -- 2.3 Refreshing the web repository -- 2.4 Managing the web repository -- 2.5 Distributed web crawling -- 2.6 Factors affecting crawling performance -- 2.7 Literature on web crawling -- 2.8 Open issues in web crawling --
3. The indexing system -- 3.1 Basic indexing architecture -- 3.2 Inverted index -- 3.3 Compressing an inverted index -- 3.4 Constructing an inverted index -- 3.5 Updating an inverted index -- 3.6 Partitioning an inverted index -- 3.7 Literature on indexing -- 3.8 Open issues in indexing --
4. The query processing system -- 4.1 Basic query processing architecture -- 4.2 Query processing on a search node -- 4.3 Query processing in a search cluster -- 4.4 Architectural optimizations -- 4.5 Caching -- 4.6 Query processing on multiple search sites -- 4.7 Literature on query processing -- 4.8 Open issues in query processing --
5. Concluding remarks -- Bibliography -- Authors' biographies
Abstract freely available; full-text restricted to subscribers or individual document purchasers
Google scholar
Google book search
Mode of access: World Wide Web
System requirements: Adobe Acrobat Reader
In this book, we aim to provide a fairly comprehensive overview of the scalability and efficiency challenges in large-scale web search engines. More specifically, we cover the issues involved in the design of three separate systems that are commonly available in every web-scale search engine: web crawling, indexing, and query processing systems. We present the performance challenges encountered in these systems and review a wide range of design alternatives employed as solution to these challenges, specifically focusing on algorithmic and architectural optimizations. We discuss the available optimizations at different computational granularities, ranging from a single computer node to a collection of data centers. We provide some hints to both the practitioners and theoreticians involved in the field about the way large-scale web search engines operate and the adopted design choices. Moreover, we survey the efficiency literature, providing pointers to a large number of relatively important research papers. Finally, we discuss some open research problems in the context of search engine efficiency
Also available in print
Title from PDF title page (viewed on January 22, 2016)
Link Print version: 9781627058124
Subject Web search engines
Computer networks -- Scalability
cache invalidation
central broker
content spam
delay attacks
distributed crawling
distributed query processing
DNS cache
document id reassignment
download throughput
dynamic index pruning
early exit optimization
forward index
index construction
index maintenance
index partitioning
index replication
inverted index
inverted list cache
inverted list
link exchange
link farm
link spam
machine-learned ranking
multisite web search
near duplicate detection
page cache
position list
posting list
query-independent feature
query expansion
query forwarding
query interpretation
query processing
query rewriting
query scheduling
response latency
result cache
result freshness
result preparation
result retrieval
search center
search cluster
search engine result page
search quality
selective search
skip pointer
soft 404 page
spider trap
static index pruning
text processing
two-phase ranking
URL-seen test
URL caching
web change
web coverage
web crawler
web frontier
web graph
web repository
web search engine
website mirror
Alt Author Baeza-Yates, R. (Ricardo), author
Record:   Prev Next