作者 Prajapati, Vignesh
書名 Big Data Analytics with R and Hadoop
出版項 Olton : Packt Publishing, Limited, 2013
©2013
國際標準書號 9781782163299 (electronic bk.)
9781782163282
book jacket
說明 1 online resource (267 pages)
text txt rdacontent
computer c rdamedia
online resource cr rdacarrier
附註 Intro -- Big Data Analytics with R and Hadoop -- Table of Contents -- Big Data Analytics with R and Hadoop -- Credits -- About the Author -- Acknowledgment -- About the Reviewers -- www.PacktPub.com -- Support files, eBooks, discount offers and more -- Why Subscribe? -- Free Access for Packt account holders -- Preface -- Introducing R -- Understanding features of R -- Studying the popularity of R -- Introducing Big Data -- Getting information about popular organizations that hold Big Data -- Introducing Hadoop -- Exploring Hadoop features -- Studying Hadoop components -- Understanding the reason for using R and Hadoop together -- What this book covers -- What you need for this book -- Who this book is for -- Conventions -- Reader feedback -- Customer support -- Downloading the example code -- Errata -- Piracy -- Questions -- 1. Getting Ready to Use R and Hadoop -- Installing R -- Installing RStudio -- Understanding the features of R language -- Using R packages -- Performing data operations -- Increasing community support -- Performing data modeling in R -- Installing Hadoop -- Understanding different Hadoop modes -- Understanding Hadoop installation steps -- Installing Hadoop on Linux, Ubuntu flavor (single node cluster) -- Installing Hadoop on Linux, Ubuntu flavor (multinode cluster) -- Installing Cloudera Hadoop on Ubuntu -- Understanding Hadoop features -- Understanding HDFS -- Understanding the characteristics of HDFS -- Understanding MapReduce -- Learning the HDFS and MapReduce architecture -- Understanding the HDFS architecture -- Understanding HDFS components -- Understanding the MapReduce architecture -- Understanding MapReduce components -- Understanding the HDFS and MapReduce architecture by plot -- Understanding Hadoop subprojects -- Summary -- 2. Writing Hadoop MapReduce Programs -- Understanding the basics of MapReduce
Introducing Hadoop MapReduce -- Listing Hadoop MapReduce entities -- Understanding the Hadoop MapReduce scenario -- Loading data into HDFS -- Executing the Map phase -- Shuffling and sorting -- Reducing phase execution -- Understanding the limitations of MapReduce -- Understanding Hadoop's ability to solve problems -- Understanding the different Java concepts used in Hadoop programming -- Understanding the Hadoop MapReduce fundamentals -- Understanding MapReduce objects -- Deciding the number of Maps in MapReduce -- Deciding the number of Reducers in MapReduce -- Understanding MapReduce dataflow -- Taking a closer look at Hadoop MapReduce terminologies -- Writing a Hadoop MapReduce example -- Understanding the steps to run a MapReduce job -- Learning to monitor and debug a Hadoop MapReduce job -- Exploring HDFS data -- Understanding several possible MapReduce definitions to solve business problems -- Learning the different ways to write Hadoop MapReduce in R -- Learning RHadoop -- Learning RHIPE -- Learning Hadoop streaming -- Summary -- 3. Integrating R and Hadoop -- Introducing RHIPE -- Installing RHIPE -- Installing Hadoop -- Installing R -- Installing protocol buffers -- Environment variables -- The rJava package installation -- Installing RHIPE -- Understanding the architecture of RHIPE -- Understanding RHIPE samples -- RHIPE sample program (Map only) -- Word count -- Understanding the RHIPE function reference -- Initialization -- HDFS -- MapReduce -- Introducing RHadoop -- Understanding the architecture of RHadoop -- Installing RHadoop -- Understanding RHadoop examples -- Word count -- Understanding the RHadoop function reference -- The hdfs package -- The rmr package -- Summary -- 4. Using Hadoop Streaming with R -- Understanding the basics of Hadoop streaming -- Understanding how to run Hadoop streaming with R
Understanding a MapReduce application -- Understanding how to code a MapReduce application -- Understanding how to run a MapReduce application -- Executing a Hadoop streaming job from the command prompt -- Executing the Hadoop streaming job from R or an RStudio console -- Understanding how to explore the output of MapReduce application -- Exploring an output from the command prompt -- Exploring an output from R or an RStudio console -- Understanding basic R functions used in Hadoop MapReduce scripts -- Monitoring the Hadoop MapReduce job -- Exploring the HadoopStreaming R package -- Understanding the hsTableReader function -- Understanding the hsKeyValReader function -- Understanding the hsLineReader function -- Running a Hadoop streaming job -- Executing the Hadoop streaming job -- Summary -- 5. Learning Data Analytics with R and Hadoop -- Understanding the data analytics project life cycle -- Identifying the problem -- Designing data requirement -- Preprocessing data -- Performing analytics over data -- Visualizing data -- Understanding data analytics problems -- Exploring web pages categorization -- Identifying the problem -- Designing data requirement -- Understanding the required Google Analytics data attributes -- Collecting data -- Preprocessing data -- Performing analytics over data -- Visualizing data -- Computing the frequency of stock market change -- Identifying the problem -- Designing data requirement -- Preprocessing data -- Performing analytics over data -- Visualizing data -- Predicting the sale price of blue book for bulldozers - case study -- Identifying the problem -- Designing data requirement -- Preprocessing data -- Performing analytics over data -- Understanding Poisson-approximation resampling -- Fitting random forests with RHadoop -- Summary -- 6. Understanding Big Data Analysis with Machine Learning
Introduction to machine learning -- Types of machine-learning algorithms -- Supervised machine-learning algorithms -- Linear regression -- Linear regression with R -- Linear regression with R and Hadoop -- Logistic regression -- Logistic regression with R -- Logistic regression with R and Hadoop -- Unsupervised machine learning algorithm -- Clustering -- Clustering with R -- Performing clustering with R and Hadoop -- Recommendation algorithms -- Steps to generate recommendations in R -- Generating recommendations with R and Hadoop -- Summary -- 7. Importing and Exporting Data from Various DBs -- Learning about data files as database -- Understanding different types of files -- Installing R packages -- Importing the data into R -- Exporting the data from R -- Understanding MySQL -- Installing MySQL -- Installing RMySQL -- Learning to list the tables and their structure -- Importing the data into R -- Understanding data manipulation -- Understanding Excel -- Installing Excel -- Importing data into R -- Understanding data manipulation with R and Excel -- Exporting the data to Excel -- Understanding MongoDB -- Installing MongoDB -- Mapping SQL to MongoDB -- Mapping SQL to MongoQL -- Installing rmongodb -- Importing the data into R -- Understanding data manipulation -- Understanding SQLite -- Understanding features of SQLite -- Installing SQLite -- Installing RSQLite -- Importing the data into R -- Understanding data manipulation -- Understanding PostgreSQL -- Understanding features of PostgreSQL -- Installing PostgreSQL -- Installing RPostgreSQL -- Exporting the data from R -- Understanding Hive -- Understanding features of Hive -- Installing Hive -- Setting up Hive configurations -- Installing RHive -- Understanding RHive operations -- Understanding HBase -- Understanding HBase features -- Installing HBase -- Installing thrift -- Installing RHBase
Importing the data into R -- Understanding data manipulation -- Summary -- A. References -- R + Hadoop help materials -- R groups -- Hadoop groups -- R + Hadoop groups -- Popular R contributors -- Popular Hadoop contributors -- Index
Big Data Analytics with R and Hadoop is a tutorial style book that focuses on all the powerful big data tasks that can be achieved by integrating R and Hadoop.This book is ideal for R developers who are looking for a way to perform big data analytics with Hadoop. This book is also aimed at those who know Hadoop and want to build some intelligent applications over Big data with R packages. It would be helpful if readers have basic knowledge of R
Description based on publisher supplied metadata and other sources
Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2020. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries
鏈接 Print version: Prajapati, Vignesh Big Data Analytics with R and Hadoop Olton : Packt Publishing, Limited,c2013 9781782163282
主題 American literature.;American poetry
Electronic books