Monitoring Hadoop
()
About this ebook
- Track Hadoop operations, errors, and bottlenecks efficiently
- Employ Hadoop logging features to help manage Hadoop clusters better
- Visualize the data collected and present it in a systematic manner
This book is useful for Hadoop administrators who need to learn how to monitor and diagnose their clusters. Also, the book will prove useful for new users of the technology, as the language used is simple and easy to grasp.
Related to Monitoring Hadoop
Related ebooks
Optimizing Hadoop for MapReduce Rating: 0 out of 5 stars0 ratingsHadoop Cluster Deployment Rating: 0 out of 5 stars0 ratingsApache Oozie Essentials Rating: 0 out of 5 stars0 ratingsApache Spark Graph Processing Rating: 0 out of 5 stars0 ratingsPostgreSQL 11 Administration Cookbook: Over 175 recipes for database administrators to manage enterprise databases Rating: 0 out of 5 stars0 ratingsInstant Redis Optimization How-to Rating: 0 out of 5 stars0 ratingsAdministrating Solr Rating: 0 out of 5 stars0 ratingsPostgreSQL 9 Administration Cookbook: LITE Edition Rating: 3 out of 5 stars3/5HBase Essentials Rating: 0 out of 5 stars0 ratingsMicrosoft SQL Server 2012 with Hadoop Rating: 1 out of 5 stars1/5HDInsight Essentials - Second Edition Rating: 0 out of 5 stars0 ratingsPersistence in PHP with Doctrine ORM Rating: 0 out of 5 stars0 ratingsSecuring Hadoop Rating: 4 out of 5 stars4/5PostgreSQL Development Essentials Rating: 5 out of 5 stars5/5MariaDB High Performance Rating: 0 out of 5 stars0 ratingsLearning Azure DocumentDB Rating: 0 out of 5 stars0 ratingsPractical OneOps Rating: 0 out of 5 stars0 ratingsMastering Scala Machine Learning Rating: 0 out of 5 stars0 ratingsInstant Apache ActiveMQ Messaging Application Development How-to Rating: 0 out of 5 stars0 ratingsLearning Couchbase Rating: 0 out of 5 stars0 ratingsHadoop Essentials Rating: 5 out of 5 stars5/5Cloud Development and Deployment with CloudBees Rating: 0 out of 5 stars0 ratingsScaling Big Data with Hadoop and Solr - Second Edition Rating: 0 out of 5 stars0 ratingsPython for Google App Engine Rating: 0 out of 5 stars0 ratingsExploring Hadoop Ecosystem (Volume 1): Batch Processing Rating: 0 out of 5 stars0 ratingsProfessional Hadoop Solutions Rating: 4 out of 5 stars4/5NiFi A Complete Guide - 2021 Edition Rating: 0 out of 5 stars0 ratingsCloudera A Complete Guide - 2019 Edition Rating: 0 out of 5 stars0 ratingsInstant SQL Server Analysis Services 2012 Cube Security Rating: 0 out of 5 stars0 ratings
Data Modeling & Design For You
Thinking in Algorithms: Strategic Thinking Skills, #2 Rating: 5 out of 5 stars5/5Spreadsheets To Cubes (Advanced Data Analytics for Small Medium Business): Data Science Rating: 0 out of 5 stars0 ratingsData Visualization: a successful design process Rating: 4 out of 5 stars4/5The Secrets of ChatGPT Prompt Engineering for Non-Developers Rating: 5 out of 5 stars5/5Data Analytics for Beginners: Introduction to Data Analytics Rating: 4 out of 5 stars4/5Principles of Data Science Rating: 4 out of 5 stars4/5Data Analytics with Python: Data Analytics in Python Using Pandas Rating: 3 out of 5 stars3/5Raspberry Pi :Raspberry Pi Guide On Python & Projects Programming In Easy Steps Rating: 3 out of 5 stars3/5Mastering Agile User Stories Rating: 4 out of 5 stars4/5Supercharge Power BI: Power BI is Better When You Learn To Write DAX Rating: 5 out of 5 stars5/5Mastering Python Design Patterns Rating: 0 out of 5 stars0 ratingsLearn T-SQL Querying: A guide to developing efficient and elegant T-SQL code Rating: 0 out of 5 stars0 ratingsLearning Cypher Rating: 0 out of 5 stars0 ratingsData Visualization with D3.js Cookbook Rating: 0 out of 5 stars0 ratingsWordPress For Beginners - How To Set Up A Self Hosted WordPress Blog Rating: 0 out of 5 stars0 ratingsNeural Networks: Neural Networks Tools and Techniques for Beginners Rating: 5 out of 5 stars5/5150 Most Poweful Excel Shortcuts: Secrets of Saving Time with MS Excel Rating: 3 out of 5 stars3/5The Esri Guide to GIS Analysis, Volume 3: Modeling Suitability, Movement, and Interaction Rating: 0 out of 5 stars0 ratingsA Concise Guide to Object Orientated Programming Rating: 0 out of 5 stars0 ratingsBayesian Analysis with Python Rating: 5 out of 5 stars5/5Brainstorming and Beyond: A User-Centered Design Method Rating: 0 out of 5 stars0 ratingsDAX Patterns: Second Edition Rating: 5 out of 5 stars5/5Graph Databases in Action: Examples in Gremlin Rating: 0 out of 5 stars0 ratingsAdvanced Deep Learning with Python: Design and implement advanced next-generation AI solutions using TensorFlow and PyTorch Rating: 0 out of 5 stars0 ratings20 Most Powerful Conditional Formatting Techniques Rating: 0 out of 5 stars0 ratings
Reviews for Monitoring Hadoop
0 ratings0 reviews
Book preview
Monitoring Hadoop - Gurmukh Singh
Table of Contents
Monitoring Hadoop
Credits
About the Author
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Errata
Piracy
Questions
1. Introduction to Monitoring
The need for monitoring
The monitoring tools available in the market
Nagios
Nagios architecture
Prerequisites for installing and configuring Nagios
Prerequisites
Installing Nagios
Web interface configuration
Nagios plugins
Verification
Configuration files
Setting up monitoring for clients
Ganglia
Ganglia components
Ganglia installation
System logging
Collection
Transportation
Storage
Alerting and analysis
The syslogd and rsyslogd daemons
Summary
2. Hadoop Daemons and Services
Hadoop daemons
NameNode
DataNode and TaskTracker
Secondary NameNode
JobTracker and YARN daemons
The communication between daemons
YARN framework
Common issues faced on Hadoop cluster
Host-level checks
Nagios server
Configuring Hadoop nodes for monitoring
Summary
3. Hadoop Logging
The need for logging events
System logging
Logging levels
Logging in Hadoop
Hadoop logs
Hadoop log level
Hadoop audit
Summary
4. HDFS Checks
HDFS overview
Nagios master configuration
The Nagios client configuration
Summary
5. MapReduce Checks
MapReduce overview
MapReduce control commands
MapReduce health checks
Nagios master configuration
Nagios client configuration
Summary
6. Hadoop Metrics and Visualization Using Ganglia
Hadoop metrics
Metrics contexts
Named contexts
Metrics system design
Metrics configuration
Configuring Metrics2
Exploring the metrics contexts
Hadoop Ganglia integration
Hadoop metrics configuration for Ganglia
Setting up Ganglia nodes
Hadoop configuration
Metrics1
Metrics2
Ganglia graphs
Metrics APIs
The org.apache.hadoop.metrics package
The org.apache.hadoop.metrics2 package
Summary
7. Hive, HBase, and Monitoring Best Practices
Hive monitoring
Hive metrics
HBase monitoring
HBase Nagios monitoring
HBase metrics
Monitoring best practices
The Filter class
Nagios and Ganglia best practices
Summary
Index
Monitoring Hadoop
Monitoring Hadoop
Copyright © 2015 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: April 2015
Production reference: 1240415
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78328-155-8
www.packtpub.com
Credits
Author
Gurmukh Singh
Reviewers
David Greco
Randal Scott King
Yousuf Qureshi
Acquisition Editor
Meeta Rajani
Content Development Editor
Siddhesh Salvi
Technical Editor
Parag Topre
Copy Editors
Hiral Bhat
Sarang Chari
Tani Kothari
Trishla Singh
Project Coordinator
Nidhi Joshi
Proofreaders
Safis Editing
Paul Hindle
Indexer
Hemangini Bari
Graphics
Disha Haria
Production Coordinator
Melwyn D'sa
Cover Work
Melwyn D'sa
About the Author
Gurmukh Singh has been an infrastructure engineer for over 10 years and has worked on big data platforms in the past 5 years. He started his career as a field engineer, setting up lease lines and radio links. He has vast experience in enterprise servers and network design and in scaling infrastructures and tuning them for performance. He is the founder of a small start-up called Netxillon Technologies, which is into big data training and consultancy. He talks at various technical meetings and is an active participant in the open source community's activities. He writes at http://linuxaddict.org and maintains his Github account at https://github.com/gdhillon.
About the Reviewers
David Greco is a software architect with more than 27 years of experience. He started his career as a researcher in the field of high-performance computing; thereafter, he moved to the business world, where he worked for different enterprise software vendors and two start-ups he helped create. He played different roles, those of a consultant and software architect and even a CTO. He's an enthusiastic explorer of new technologies, and he likes to introduce new technologies into enterprises to improve their businesses. In the past 5 years, he has fallen in love with big data technologies and typed functional programming—Scala and Haskell. When not working or hacking, he likes to practice karate and listen to jazz and classical music.
Randal Scott King is the managing partner of Brilliant Data, a global consultancy specializing in big data, analytics, and network architecture. He has done work for industry-leading clients, such as Sprint, Lowe's Home Improvement, Gulfstream Aerospace, and AT&T. In addition to the current book, he was previously a reviewer for Hadoop MapReduce v2 Cookbook, Second Edition, Packt Publishing.
Scott lives with his children on the outskirts of Atlanta, GA. You can visit his blog at www.randalscottking.com.
Yousuf Qureshi is an early adopter of technology and gadgets, has a lot of experience in the e-commerce, social media, analytics, and mobile apps sectors, and is a Cloudera Certified Developer for Apache Hadoop (CCDH).
His expertise includes development, technology turnaround, consultancy, and architecture. He is an experienced developer