Apache Flume: Distributed Log Collection for Hadoop - Second Edition

Ebook355 pages2 hours

Apache Flume: Distributed Log Collection for Hadoop - Second Edition

Name: Apache Flume: Distributed Log Collection for Hadoop - Second Edition
Author: Steve Hoffman
ISBN: 9781784399146

By Steve Hoffman

Rating: 0 out of 5 stars

()

Read preview

About this ebook

About This Book

Construct a series of Flume agents using the Apache Flume service to efficiently collect, aggregate, and move large amounts of event data
Configure failover paths and load balancing to remove single points of failure
Use this step-by-step guide to stream logs from application servers to Hadoop's HDFS

Who This Book Is For

If you are a Hadoop programmer who wants to learn about Flume to be able to move datasets into Hadoop in a timely and replicable manner, then this book is ideal for you. No prior knowledge about Apache Flume is necessary, but a basic knowledge of Hadoop and the Hadoop File System (HDFS) is assumed.

Skip carousel

LanguageEnglish

PublisherPackt Publishing

Release dateFeb 25, 2015

ISBN9781784399146

Author

Steve Hoffman

Related to Apache Flume

Related ebooks

Skip carousel

Apache Solr High Performance
Ebook
Apache Solr High Performance
bySurendra Mohan
Rating: 0 out of 5 stars
0 ratings
Apache Oozie Essentials
Ebook
Apache Oozie Essentials
bySingh Jagat Jasjit
Rating: 0 out of 5 stars
0 ratings
Mastering Scala Machine Learning
Ebook
Mastering Scala Machine Learning
byAlex Kozlov
Rating: 0 out of 5 stars
0 ratings
Flask Blueprints
Ebook
Flask Blueprints
byPerras Joël
Rating: 0 out of 5 stars
0 ratings
Instant Apache Camel Messaging System
Ebook
Instant Apache Camel Messaging System
byEvgeniy Sharapov
Rating: 0 out of 5 stars
0 ratings
Apache Hive Essentials
Ebook
Apache Hive Essentials
byDayong Du
Rating: 0 out of 5 stars
0 ratings
Getting Started with hapi.js
Ebook
Getting Started with hapi.js
byJohn Brett
Rating: 5 out of 5 stars
5/5
Learning Karaf Cellar
Ebook
Learning Karaf Cellar
byJean-Baptiste Onofré
Rating: 0 out of 5 stars
0 ratings
Instant Apache ActiveMQ Messaging Application Development How-to
Ebook
Instant Apache ActiveMQ Messaging Application Development How-to
byTimothy Bish
Rating: 0 out of 5 stars
0 ratings
Building a Web Application with PHP and MariaDB: A Reference Guide
Ebook
Building a Web Application with PHP and MariaDB: A Reference Guide
bySai Srinivas Sriparasa
Rating: 0 out of 5 stars
0 ratings
Splunk Developer's Guide
Ebook
Splunk Developer's Guide
byKyle Smith
Rating: 0 out of 5 stars
0 ratings
Getting Started with Hazelcast
Ebook
Getting Started with Hazelcast
byMat Johns
Rating: 0 out of 5 stars
0 ratings
Instant Apache Stanbol
Ebook
Instant Apache Stanbol
byReto Bachmann-Gmur
Rating: 0 out of 5 stars
0 ratings
DevOps in Python: Infrastructure as Python
Ebook
DevOps in Python: Infrastructure as Python
byMoshe Zadka
Rating: 0 out of 5 stars
0 ratings
Building Web Applications with Flask
Ebook
Building Web Applications with Flask
byItalo Maia
Rating: 0 out of 5 stars
0 ratings
Learning Flask Framework
Ebook
Learning Flask Framework
byCopperwaite Matt
Rating: 4 out of 5 stars
4/5
Apache ZooKeeper Essentials
Ebook
Apache ZooKeeper Essentials
bySaurav Haloi
Rating: 5 out of 5 stars
5/5
HBase Essentials
Ebook
HBase Essentials
byNishant Garg
Rating: 0 out of 5 stars
0 ratings
Hands-On Machine Learning Recommender Systems with Apache Spark
Ebook
Hands-On Machine Learning Recommender Systems with Apache Spark
byErnesto Lee
Rating: 0 out of 5 stars
0 ratings
Opa Application Development
Ebook
Opa Application Development
byLi Wenbo
Rating: 0 out of 5 stars
0 ratings
Mastering Apache Camel
Ebook
Mastering Apache Camel
byJean-Baptiste Onofré
Rating: 0 out of 5 stars
0 ratings
Programming MapReduce with Scalding
Ebook
Programming MapReduce with Scalding
byAntonios Chalkiopoulos
Rating: 0 out of 5 stars
0 ratings
Learning SaltStack
Ebook
Learning SaltStack
byColton Myers
Rating: 4 out of 5 stars
4/5
Apache Solr PHP Integration
Ebook
Apache Solr PHP Integration
byJayant Kumar
Rating: 0 out of 5 stars
0 ratings
Practical OneOps
Ebook
Practical OneOps
byNilesh Nimkar
Rating: 0 out of 5 stars
0 ratings
INSTANT Premium Drupal Themes
Ebook
INSTANT Premium Drupal Themes
byPankaj Sharma
Rating: 0 out of 5 stars
0 ratings
Mastering PHP Design Patterns
Ebook
Mastering PHP Design Patterns
byJunade Ali
Rating: 0 out of 5 stars
0 ratings
PHP 7 Programming Blueprints
Ebook
PHP 7 Programming Blueprints
byJose Palala
Rating: 0 out of 5 stars
0 ratings
Instant Hands-on Testing with PHPUnit How-to
Ebook
Instant Hands-on Testing with PHPUnit How-to
byMichael Lively
Rating: 0 out of 5 stars
0 ratings
Apache Solr for Indexing Data
Ebook
Apache Solr for Indexing Data
byHandiekar Sachin
Rating: 0 out of 5 stars
0 ratings

Databases For You

Skip carousel

Python Projects for Everyone
Ebook
Python Projects for Everyone
byMohamad Charara
Rating: 0 out of 5 stars
0 ratings
Artificial Intelligence for Fashion: How AI is Revolutionizing the Fashion Industry
Ebook
Artificial Intelligence for Fashion: How AI is Revolutionizing the Fashion Industry
byLeanne Luce
Rating: 0 out of 5 stars
0 ratings
FileMaker Pro Design and Scripting For Dummies
Ebook
FileMaker Pro Design and Scripting For Dummies
byTimothy Trimble
Rating: 0 out of 5 stars
0 ratings
Excel 2021
Ebook
Excel 2021
byJIAYI SIMONDS
Rating: 4 out of 5 stars
4/5
SQL Clearly Explained
Ebook
SQL Clearly Explained
byJan L. Harrington
Rating: 5 out of 5 stars
5/5
Learn SQL in 24 Hours
Ebook
Learn SQL in 24 Hours
byAlex Nordeen
Rating: 5 out of 5 stars
5/5
Practical Data Analysis
Ebook
Practical Data Analysis
byHector Cuesta
Rating: 4 out of 5 stars
4/5
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Ebook
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
byWalter Shields
Rating: 4 out of 5 stars
4/5
The AI Bible, Making Money with Artificial Intelligence: Real Case Studies and How-To's for Implementation
Ebook
The AI Bible, Making Money with Artificial Intelligence: Real Case Studies and How-To's for Implementation
byJhon Dujardin
Rating: 4 out of 5 stars
4/5
Access 2010 All-in-One For Dummies
Ebook
Access 2010 All-in-One For Dummies
byAlison Barrows
Rating: 4 out of 5 stars
4/5
Schaum’s Outline of Fundamentals of SQL Programming
Ebook
Schaum’s Outline of Fundamentals of SQL Programming
byRamon Mata-Toledo
Rating: 3 out of 5 stars
3/5
COMPUTER SCIENCE FOR ROOKIES
Ebook
COMPUTER SCIENCE FOR ROOKIES
byAngel Bahabwa
Rating: 0 out of 5 stars
0 ratings
Query Store for SQL Server 2019: Identify and Fix Poorly Performing Queries
Ebook
Query Store for SQL Server 2019: Identify and Fix Poorly Performing Queries
byTracy Boggiano
Rating: 0 out of 5 stars
0 ratings
Serverless Architectures on AWS, Second Edition
Ebook
Serverless Architectures on AWS, Second Edition
byPeter Sbarski
Rating: 5 out of 5 stars
5/5
Access 2019 For Dummies
Ebook
Access 2019 For Dummies
byLaurie A. Ulrich
Rating: 0 out of 5 stars
0 ratings
Beginning Microsoft Power BI: A Practical Guide to Self-Service Data Analytics
Ebook
Beginning Microsoft Power BI: A Practical Guide to Self-Service Data Analytics
byDan Clark
Rating: 0 out of 5 stars
0 ratings
Beginning Microsoft SQL Server 2012 Programming
Ebook
Beginning Microsoft SQL Server 2012 Programming
byPaul Atkinson
Rating: 1 out of 5 stars
1/5
Codeless Data Structures and Algorithms: Learn DSA Without Writing a Single Line of Code
Ebook
Codeless Data Structures and Algorithms: Learn DSA Without Writing a Single Line of Code
byArmstrong Subero
Rating: 0 out of 5 stars
0 ratings
Learn SQL Server Administration in a Month of Lunches
Ebook
Learn SQL Server Administration in a Month of Lunches
byDon Jones
Rating: 0 out of 5 stars
0 ratings
SQL: Practical Guide for Developers
Ebook
SQL: Practical Guide for Developers
byMichael J. Donahoo
Rating: 2 out of 5 stars
2/5
SQL Programming & Database Management For Absolute Beginners SQL Server, Structured Query Language Fundamentals: "Learn - By Doing" Approach And Master SQL
Ebook
SQL Programming & Database Management For Absolute Beginners SQL Server, Structured Query Language Fundamentals: "Learn - By Doing" Approach And Master SQL
byWilliam Sullivan
Rating: 5 out of 5 stars
5/5
SQL Server: Tips and Tricks - 2
Ebook
SQL Server: Tips and Tricks - 2
byPriyanka Agarwal
Rating: 4 out of 5 stars
4/5
Grokking Algorithms: An illustrated guide for programmers and other curious people
Ebook
Grokking Algorithms: An illustrated guide for programmers and other curious people
byAditya Bhargava
Rating: 4 out of 5 stars
4/5
Building a Scalable Data Warehouse with Data Vault 2.0
Ebook
Building a Scalable Data Warehouse with Data Vault 2.0
byDaniel Linstedt
Rating: 4 out of 5 stars
4/5
Relational Database Design and Implementation
Ebook
Relational Database Design and Implementation
byJan L. Harrington
Rating: 5 out of 5 stars
5/5
Data Stewardship: An Actionable Guide to Effective Data Management and Data Governance
Ebook
Data Stewardship: An Actionable Guide to Effective Data Management and Data Governance
byDavid Plotkin
Rating: 4 out of 5 stars
4/5
Access for Beginners: Access Essentials, #1
Ebook
Access for Beginners: Access Essentials, #1
byM.L. Humphrey
Rating: 0 out of 5 stars
0 ratings
Practical SQL
Ebook
Practical SQL
byDavid Perry
Rating: 4 out of 5 stars
4/5
100+ SQL Queries T-SQL for Microsoft SQL Server
Ebook
100+ SQL Queries T-SQL for Microsoft SQL Server
byIFS Harrison
Rating: 4 out of 5 stars
4/5
Learning PostgreSQL
Ebook
Learning PostgreSQL
byJuba Salahaldin
Rating: 1 out of 5 stars
1/5

Related podcast episodes

Skip carousel

69: Testing Front End Code: Summary Oren Rubin (@Shexman) goes through why it’s important to not only test the back-end code of our applications but also to test our Front End code, the integration points, and the full user experience. Oren also goes through...
Podcast episode
69: Testing Front End Code: Summary Oren Rubin (@Shexman) goes through why it’s important to not only test the back-end code of our applications but also to test our Front End code, the integration points, and the full user experience. Oren also goes through...
byThe Web Platform Podcast
0 ratings
0% found this document useful
68: Ember 2 & The Ember Community: Summary Ember community leaders Audrey Listochkin (@listochkin) & Robert Jackson (@rwjblue) talk with us about the long awaited Ember 2 release and the Ember community across the globe. The future of Ember is larger than this 2.x release and...
Podcast episode
68: Ember 2 & The Ember Community: Summary Ember community leaders Audrey Listochkin (@listochkin) & Robert Jackson (@rwjblue) talk with us about the long awaited Ember 2 release and the Ember community across the globe. The future of Ember is larger than this 2.x release and...
byThe Web Platform Podcast
0 ratings
0% found this document useful
70: Web Components at Microsoft: Summary Daniel Buchner (@csuwildcat), former Mozillian & Program Manager at Microsoft takes us through the plans for Web Components at Microsoft. Daniel is the creator of the Web Components free open source library, X-Tag which Microsoft is now...
Podcast episode
70: Web Components at Microsoft: Summary Daniel Buchner (@csuwildcat), former Mozillian & Program Manager at Microsoft takes us through the plans for Web Components at Microsoft. Daniel is the creator of the Web Components free open source library, X-Tag which Microsoft is now...
byThe Web Platform Podcast
0 ratings
0% found this document useful
Episode 103. Let's share data cross-language with Apache Arrow! (among other things): We have a great time talking to Matt Topol from Voltron Data on one of his Apache Software Foundation projects called Apache Arrow. It's both a spec and implementation of a columnar data format that is not only efficient, but cross-language...
Podcast episode
Episode 103. Let's share data cross-language with Apache Arrow! (among other things): We have a great time talking to Matt Topol from Voltron Data on one of his Apache Software Foundation projects called Apache Arrow. It's both a spec and implementation of a columnar data format that is not only efficient, but cross-language...
byJava Pub House
0 ratings
0% found this document useful
Serverless Data APIs
Podcast episode
Serverless Data APIs
byThe Cloudcast
0 ratings
0% found this document useful
Eliminating Garbage In/Garbage Out for Analytics and ML // Roy Hasson & Santona Tuli // MLOps Podcast #166
Podcast episode
Eliminating Garbage In/Garbage Out for Analytics and ML // Roy Hasson & Santona Tuli // MLOps Podcast #166
byMLOps.community
0 ratings
0% found this document useful
MLOps Meetup #34: Streaming Machine Learning with Apache Kafka and Tiered Storage // Kai Waehner, Confluent
Podcast episode
MLOps Meetup #34: Streaming Machine Learning with Apache Kafka and Tiered Storage // Kai Waehner, Confluent
byMLOps.community
0 ratings
0% found this document useful
Opening AI's Black Box with Prof. David Bau, Koyena Pal, and Eric Todd of Northeastern University: In this episode, we dive deep into the inner workings of large language models with Professor David Bau and grad students Koyena Pal and Eric Todd from Northeastern University.
Podcast episode
Opening AI's Black Box with Prof. David Bau, Koyena Pal, and Eric Todd of Northeastern University: In this episode, we dive deep into the inner workings of large language models with Professor David Bau and grad students Koyena Pal and Eric Todd from Northeastern University.
by"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis
0 ratings
0% found this document useful
67: Keeping Fluent with Web Technology: Summary How do you keep up with the vast amounts of web technology released daily? It can be a losing battle for some and a opportunity for others. One person in our community that comes to mind is Peter Cooper (@peterc) from Cooper Press. Join us...
Podcast episode
67: Keeping Fluent with Web Technology: Summary How do you keep up with the vast amounts of web technology released daily? It can be a losing battle for some and a opportunity for others. One person in our community that comes to mind is Peter Cooper (@peterc) from Cooper Press. Join us...
byThe Web Platform Podcast
0 ratings
0% found this document useful
How Hera is an Enabler of MLOps Integrations // Flaviu Vadan // Coffee Sessions #115
Podcast episode
How Hera is an Enabler of MLOps Integrations // Flaviu Vadan // Coffee Sessions #115
byMLOps.community
0 ratings
0% found this document useful
Markets Are Betting On a 100bps Rate Hike This Month: This episode is sponsored by , , and . On this edition of the Weekly Recap, NLW looks at: The markets new expectations for a 100bps or 1% rate hike at this month’s FOMC meeting 3AC and Celsius news Mt. Gox FUD Bank protests in China ...
Podcast episode
Markets Are Betting On a 100bps Rate Hike This Month: This episode is sponsored by , , and . On this edition of the Weekly Recap, NLW looks at: The markets new expectations for a 100bps or 1% rate hike at this month’s FOMC meeting 3AC and Celsius news Mt. Gox FUD Bank protests in China ...
byThe Breakdown
0 ratings
0% found this document useful
MLOps Meetup #25 // Python and Dask: Scaling the DataFrame // Dan Gerlanc - Founder of Enplus Advisors
Podcast episode
MLOps Meetup #25 // Python and Dask: Scaling the DataFrame // Dan Gerlanc - Founder of Enplus Advisors
byMLOps.community
0 ratings
0% found this document useful
OffHeap 81. Is the guilded age of open source over?: ElasticSearch, Akka, Hashicorp, and Red Hat are starting to change their licensing models. What used to be considered open source (Apache, GPL, MIT) is morphing (with an asterisk) for the large open source projects that we know and love. But...
Podcast episode
OffHeap 81. Is the guilded age of open source over?: ElasticSearch, Akka, Hashicorp, and Red Hat are starting to change their licensing models. What used to be considered open source (Apache, GPL, MIT) is morphing (with an asterisk) for the large open source projects that we know and love. But...
byJava Off-Heap
0 ratings
0% found this document useful
Giving Up the Ghost? It’s Chapter 11 Time for Celsius: This episode is sponsored by , , and . On today’s episode, NLW covers Celsius moving into Chapter 11 bankruptcy proceedings. This move has been anticipated for some time, but in advance Celsius sought to repay as many of its DeFi loans as...
Podcast episode
Giving Up the Ghost? It’s Chapter 11 Time for Celsius: This episode is sponsored by , , and . On today’s episode, NLW covers Celsius moving into Chapter 11 bankruptcy proceedings. This move has been anticipated for some time, but in advance Celsius sought to repay as many of its DeFi loans as...
byThe Breakdown
0 ratings
0% found this document useful
Bringing DevOps to the Database with Automation
Podcast episode
Bringing DevOps to the Database with Automation
byThe Cloudcast
0 ratings
0% found this document useful
ThursdAI Aug 3 - OpenAI, Qwen 7B beats LLaMa, Orca is replicated, and more AI news
Podcast episode
ThursdAI Aug 3 - OpenAI, Qwen 7B beats LLaMa, Orca is replicated, and more AI news
byThursdAI - The top AI news from the past week
0 ratings
0% found this document useful
API First, Lifecycles and Governance
Podcast episode
API First, Lifecycles and Governance
byThe Cloudcast
0 ratings
0% found this document useful
DOP 231: Automating API Development With Hasura: #231: We never thought that exposing our databases to the public internet was a good thing. However, when we started creating middleware API services that sat in front of those databases, we probably ended up doing almost the exact same type work that...
Podcast episode
DOP 231: Automating API Development With Hasura: #231: We never thought that exposing our databases to the public internet was a good thing. However, when we started creating middleware API services that sat in front of those databases, we probably ended up doing almost the exact same type work that...
byDevOps Paradox
0 ratings
0% found this document useful
Is Flink the answer to the ETL problem? (with Robert Metzger)
Podcast episode
Is Flink the answer to the ETL problem? (with Robert Metzger)
byDeveloper Voices
0 ratings
0% found this document useful
Composable Data Analytics
Podcast episode
Composable Data Analytics
byThe Cloudcast
0 ratings
0% found this document useful
Episode 233: Demystifying Product Funnels with Rachel Obstler: What do traditional product funnels have to offer? Our guest today is Rachel Obstler, VP of product at Heap. You'll learn about the shortcomings of product funnels, how granular product data can unlock opportunities, common data science challenges, and more.
Podcast episode
Episode 233: Demystifying Product Funnels with Rachel Obstler: What do traditional product funnels have to offer? Our guest today is Rachel Obstler, VP of product at Heap. You'll learn about the shortcomings of product funnels, how granular product data can unlock opportunities, common data science challenges, and more.
byUI Breakfast: UI/UX Design and Product Strategy
0 ratings
0% found this document useful
Episode 104. It's all about Apache Tika, the project that lets you index EVERYTHING.: So we continue to have guests in our show to talk to us about interesting things... This time is about Apache Tika. This is an incredible tool to do search file processing and metadata extraction. Think about that you have tons of unstructured files,...
Podcast episode
Episode 104. It's all about Apache Tika, the project that lets you index EVERYTHING.: So we continue to have guests in our show to talk to us about interesting things... This time is about Apache Tika. This is an incredible tool to do search file processing and metadata extraction. Think about that you have tons of unstructured files,...
byJava Pub House
0 ratings
0% found this document useful
72: Teaching and Learning Angular: Summary Kent C. Dodds (@kentcdodds) & Shai Reznik (@shai_reznik) join us for episode 72 about teaching and learning the popular Angular JavaScript Framework. These two veteran technologists provide great insights into how they teach code, what...
Podcast episode
72: Teaching and Learning Angular: Summary Kent C. Dodds (@kentcdodds) & Shai Reznik (@shai_reznik) join us for episode 72 about teaching and learning the popular Angular JavaScript Framework. These two veteran technologists provide great insights into how they teach code, what...
byThe Web Platform Podcast
0 ratings
0% found this document useful
Managing SAP to the Cloud
Podcast episode
Managing SAP to the Cloud
byThe Cloudcast
0 ratings
0% found this document useful
Rust in Production Ep 1 - InfluxData's Paul Dix: Paul Dix, CTO of InfluxDB, talks about the open-source time series database's development, the decision to use Go and Rust, challenges of managing high data volumes, performance improvements, future plans, and the value of hands-on learning.
Podcast episode
Rust in Production Ep 1 - InfluxData's Paul Dix: Paul Dix, CTO of InfluxDB, talks about the open-source time series database's development, the decision to use Go and Rust, challenges of managing high data volumes, performance improvements, future plans, and the value of hands-on learning.
byRust in Production
0 ratings
0% found this document useful
Putting the “Fun” in Functional with Frank Chen: Almost everyone is using Slack, and a lot of that is because of the work of those like Frank Chen, Slack’s Senior Staff Software Engineer. Frank is here to tell us how Slack keeps us all angrily typing. But equally as important is his own trajectory which
Podcast episode
Putting the “Fun” in Functional with Frank Chen: Almost everyone is using Slack, and a lot of that is because of the work of those like Frank Chen, Slack’s Senior Staff Software Engineer. Frank is here to tell us how Slack keeps us all angrily typing. But equally as important is his own trajectory which
byScreaming in the Cloud
0 ratings
0% found this document useful
MLOps Build or Buy, Startup vs. Enterprise? // Aaron Maurer & Katrina Ni # 157
Podcast episode
MLOps Build or Buy, Startup vs. Enterprise? // Aaron Maurer & Katrina Ni # 157
byMLOps.community
0 ratings
0% found this document useful
Ignore Previous Instructions and Listen To This Interview with Sander Schulhoff, CEO of Learnprompting.org: In this episode, Nathan sits down with Sander Schulhoff, Cofounder and CEO of Learnprompting.org.
Podcast episode
Ignore Previous Instructions and Listen To This Interview with Sander Schulhoff, CEO of Learnprompting.org: In this episode, Nathan sits down with Sander Schulhoff, Cofounder and CEO of Learnprompting.org.
by"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis
0 ratings
0% found this document useful
Should managers of developers ever make technical decisions?: We talk about new content types on Stack Overflow, when its appropriate for managers to make technical decisions for their ICs, and Paul's experience getting a new tool online with Google's Cloud Run.
Podcast episode
Should managers of developers ever make technical decisions?: We talk about new content types on Stack Overflow, when its appropriate for managers to make technical decisions for their ICs, and Paul's experience getting a new tool online with Google's Cloud Run.
byThe Stack Overflow Podcast
0 ratings
0% found this document useful
Now at 9.1%, Inflation Just Keeps Going Up: This episode is sponsored by , , and . Nothing is more important to markets right now than how the Fed responds to inflation. On today’s show, NLW discusses the latest consumer price index numbers for June. He argues that they make it likely that...
Podcast episode
Now at 9.1%, Inflation Just Keeps Going Up: This episode is sponsored by , , and . Nothing is more important to markets right now than how the Fed responds to inflation. On today’s show, NLW discusses the latest consumer price index numbers for June. He argues that they make it likely that...
byThe Breakdown
0 ratings
0% found this document useful

Skip carousel

FLASK Web Frameworks
Linux Format
Article
FLASK Web Frameworks
Jun 4, 2019
The main focus of Python has always been to get you cracking on with your coding – the language was never made for web programming. However, this has just made it more interesting to extend the language for the web, or to create an interface to web-b
9 min read
A.I.-POWERED RASPBERRY Pi
Linux Format
Article
A.I.-POWERED RASPBERRY Pi
Sep 19, 2023
1 min read
How To Build The Linux Format Server
Linux Format
Article
How To Build The Linux Format Server
Oct 19, 2021
10 min read
Set Up A Production- Ready Web Server
APC
Article
Set Up A Production- Ready Web Server
Nov 4, 2019
8 min read
Set Up A Production-ready Web Server
Linux Format
Article
Set Up A Production-ready Web Server
Sep 24, 2019
8 min read
Inform And Enhance Your Business With Open Data
PC Pro Magazine
Article
Inform And Enhance Your Business With Open Data
Jun 10, 2021
7 min read
Flask Resources
Linux Format
Article
Flask Resources
Jun 4, 2019
The designers of Flask decided early on that the framework itself would not have all the functions embedded. This philosophy is, of course, an extension of all open source work. When you start using Flask, you may be disappointed by this. Don’t be: t
1 min read
An Expert Speaks Up on What You Should Know About Programming Languages
Entrepreneur
Article
An Expert Speaks Up on What You Should Know About Programming Languages
Oct 1, 2015
1 min read
Variety Of App Sources
Linux Format
Article
Variety Of App Sources
May 3, 2022
3 min read
Variety Of App Sources
Linux Format
Article
Variety Of App Sources
May 3, 2022
3 min read
Family History Software: An Introduction
Family Tree UK
Article
Family History Software: An Introduction
Feb 11, 2020
5 min read
HotPicks
Linux Format
Article
HotPicks
Jun 27, 2023
12 min read
The Smart Home Dream
MacLife
Article
The Smart Home Dream
Feb 2, 2021
4 min read
HomeBank
Linux Format
Article
HomeBank
Sep 20, 2022
Version: 5.5.6 Web: http://homebank.free.fr/en Ever notice how many sci-fi movies often find a way to remind us that maths is constant across the universe? Closer to home, worrying about income and expenses is just as much a constant, with most peopl
1 min read
Build A Pi-powered Network Storage Device
Linux Format
Article
Build A Pi-powered Network Storage Device
Dec 14, 2021
10 min read
HotPicks
Linux Format
Article
HotPicks
Nov 15, 2022
12 min read
Contributing For Non - Coders
Linux Format
Article
Contributing For Non - Coders
Jan 10, 2023
9 min read
PyScript – Bring Python Coding To The Web
APC
Article
PyScript – Bring Python Coding To The Web
Aug 8, 2022
4 min read
2 The Use of Python in AI and ML
Techfastly
Article
2 The Use of Python in AI and ML
Nov 30, 2020
3 min read
Mining Actionable Information with Smart Capture
The European Business Review
Article
Mining Actionable Information with Smart Capture
May 22, 2018
4 min read
SPOTLIGHT ON THE... Family Historian User Group
Family Tree UK
Article
SPOTLIGHT ON THE... Family Historian User Group
Jan 13, 2023
2 min read
Repositories And Shops
Linux Format
Article
Repositories And Shops
Feb 7, 2023
3 min read
Answers
Linux Format
Article
Answers
Jul 2, 2019
Q How do I generate a very large file, say around 980GB, with the dd command? I’m trying to test a quota for an XFS file system under RHEL7. I use the following command as the user oracle, which has the user quota set: and it outputs Where is my
7 min read
Get To Grips With Kali
Linux Format
Article
Get To Grips With Kali
May 2, 2023
5 min read
“We Should Pay Attention To The Way That A New Language Can Redefine The Limits Of Computing”
PC Pro Magazine
Article
“We Should Pay Attention To The Way That A New Language Can Redefine The Limits Of Computing”
Feb 11, 2021
7 min read
Perfect Backup: Perfect? No, But Darn Close
PCWorld
Article
Perfect Backup: Perfect? No, But Darn Close
Jan 11, 2023
3 min read
Exploring The Hyper Interface
Linux Format
Article
Exploring The Hyper Interface
Jul 2, 2019
12 min read
Is Java Still Relevant In 2020?
Techfastly
Article
Is Java Still Relevant In 2020?
Sep 21, 2020
4 min read
The Smart Home Dream
MacFormat
Article
The Smart Home Dream
Dec 15, 2020
7 min read
Backups Are Dead, Long Live Backups!
Linux Format
Article
Backups Are Dead, Long Live Backups!
Aug 25, 2020
No one needs to back up any more. Yes, after decades of tech “journalists” telling you to back up your systems, modern ways of working have made the backup obsolete. No, wait, don’t go! We’re not obsolete, just yet. It’s true enough that many of us a
1 min read

Related categories

Skip carousel

Reviews for Apache Flume

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Apache Flume - Steve Hoffman

Apache Flume: Distributed Log Collection for Hadoop Second Edition

Credits

About the Author

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why subscribe?

Free access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

1. Overview and Architecture

Flume 0.9

Flume 1.X (Flume-NG)

The problem with HDFS and streaming data/logs

Sources, channels, and sinks

Flume events

Interceptors, channel selectors, and sink processors

Tiered data collection (multiple flows and/or agents)

The Kite SDK

Summary

2. A Quick Start Guide to Flume

Downloading Flume

Flume in Hadoop distributions

An overview of the Flume configuration file

Starting up with Hello, World!

Summary

3. Channels

The memory channel

The file channel

Spillable Memory Channel

Summary

4. Sinks and Sink Processors

HDFS sink

Path and filename

File rotation

Compression codecs

Event Serializers

Text output

Text with headers

Apache Avro

User-provided Avro schema

File type

SequenceFile

DataStream

CompressedStream

Timeouts and workers

Sink groups

Load balancing

Failover

MorphlineSolrSink

Morphline configuration files

Typical SolrSink configuration

Sink configuration

ElasticSearchSink

LogStash Serializer

Dynamic Serializer

Summary

5. Sources and Channel Selectors

The problem with using tail

The Exec source

Spooling Directory Source

Syslog sources

The syslog UDP source

The syslog TCP source

The multiport syslog TCP source

JMS source

Channel selectors

Replicating

Multiplexing

Summary

6. Interceptors, ETL, and Routing

Interceptors

Timestamp

Host

Static

Regular expression filtering

Regular expression extractor

Morphline interceptor

Custom interceptors

The plugins directory

Tiering flows

The Avro source/sink

Compressing Avro

SSL Avro flows

The Thrift source/sink

Using command-line Avro

The Log4J appender

The Log4J load-balancing appender

The embedded agent

Configuration and startup

Sending data

Shutdown

Routing

Summary

7. Putting It All Together

Web logs to searchable UI

Setting up the web server

Configuring log rotation to the spool directory

Setting up the target – Elasticsearch

Setting up Flume on collector/relay

Setting up Flume on the client

Creating more search fields with an interceptor

Setting up a better user interface – Kibana

Archiving to HDFS

Summary

8. Monitoring Flume

Monitoring the agent process

Monit

Nagios

Monitoring performance metrics

Ganglia

Internal HTTP server

Custom monitoring hooks

Summary

9. There Is No Spoon – the Realities of Real-time Distributed Data Collection

Transport time versus log time

Time zones are evil

Capacity planning

Considerations for multiple data centers

Compliance and data expiry

Summary

Index

Apache Flume: Distributed Log Collection for Hadoop Second Edition

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: July 2013

Second edition: February 2015

Production reference: 2250315

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78439-217-8

www.packtpub.com

Credits

Author

Steve Hoffman

Reviewers

Sachin Handiekar

Michael Keane

Stefan Will

Commissioning Editor

Dipika Gaonkar

Acquisition Editor

Reshma Raman

Content Development Editor

Neetu Ann Mathew

Technical Editor

Menza Mathew

Copy Editors

Vikrant Phadke

Stuti Srivastava

Project Coordinator

Mary Alex

Proofreader

Simran Bhogal

Safis Editing

Indexer

Rekha Nair

Graphics

Sheetal Aute

Abhinash Sahu

Production Coordinator

Komal Ramchandani

Cover Work

Komal Ramchandani

About the Author

Steve Hoffman has 32 years of experience in software development, ranging from embedded software development to the design and implementation of large-scale, service-oriented, object-oriented systems. For the last 5 years, he has focused on infrastructure as code, including automated Hadoop and HBase implementations and data ingestion using Apache Flume. Steve holds a BS in computer engineering from the University of Illinois at Urbana-Champaign and an MS in computer science from DePaul University. He is currently a senior principal engineer at Orbitz Worldwide (http://orbitz.com/).

More information on Steve can be found at http://bit.ly/bacoboy and on Twitter at @bacoboy.

This is the first update to Steve's first book, Apache Flume: Distributed Log Collection for Hadoop, Packt Publishing.

I'd again like to dedicate this updated book to my loving and supportive wife, Tracy. She puts up with a lot, and that is very much appreciated. I couldn't ask for a better friend daily by my side.

My terrific children, Rachel and Noah, are a constant reminder that hard work does pay off and that great things can come from chaos.

I also want to give a big thanks to my parents, Alan and Karen, for molding me into the somewhat satisfactory human I've become. Their dedication to family and education above all else guides me daily as I attempt to help my own children find their happiness in the world.

About the Reviewers

Sachin Handiekar is a senior software developer with over 5 years of experience in Java EE development. He graduated in computer science from the University of Greenwich, London, and currently works for a global consulting company, developing enterprise applications using various open source technologies, such as Apache Camel, ServiceMix, ActiveMQ, and ZooKeeper.

Sachin has a lot of interest in open source projects. He has contributed code to Apache Camel and developed plugins for Spring Social, which can be found at GitHub (https://github.com/sachin-handiekar).

He also actively writes about enterprise application development on his blog (http://sachinhandiekar.com).

Michael Keane has a BS in computer science from the University of Illinois at Urbana-Champaign. He has worked as a software engineer, coding almost exclusively in Java since JDK 1.1. He has also worked on the mission-critical medical device software, e-commerce, transportation, navigation, and advertising domains. He is currently a development leader for Conversant, where he maintains Flume flows of nearly 100 billion log lines per day.

Michael is a father of three, and besides work, he spends most of his time with his family and coaching youth softball.

Stefan Will is a computer scientist with a degree in machine learning and pattern recognition from the University of Bonn, Germany. For over a decade, he has worked for several start-ups in Silicon Valley and Raleigh, North Carolina, in the area of search and analytics. Presently, he leads the development of the search backend and real-time analytics platform at Zendesk, a provider of customer service software.

www.PacktPub.com

Support files, eBooks, discount offers, and more

For support files and downloads related to your book, please visit www.PacktPub.com.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www2.packtpub.com/books/subscription/packtlib

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.

Why subscribe?

Fully searchable across every book published by Packt

Copy and paste, print, and bookmark content

On demand and accessible via a web browser

Free access for Packt account holders

If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view nine entirely free books. Simply use your login credentials for immediate access.

Preface

Hadoop is a great open source tool for shifting tons of unstructured data into something manageable so that your business can gain better insight into your customers' needs. It's cheap (mostly free), scales horizontally as long as you have space and power in your datacenter, and can handle problems that would crush your traditional data warehouse. That said, a little-known secret is that your Hadoop cluster requires you to feed it data. Otherwise, you just have a very expensive heat generator! You will quickly realize (once you get past the playing around phase with Hadoop) that you will need a tool to automatically feed data into your cluster. In the past, you had to come up with a solution for this problem, but no more! Flume was started as a project out of Cloudera, when its integration engineers had to keep writing tools over and over again for their customers to automatically import data. Today, the project lives with the Apache Foundation, is under active development, and boasts of users who have been using it in their production environments for years.

In this book, I hope to get you up and running quickly with an architectural overview of Flume and a quick-start guide. After that, we'll dive deep into the details of many of the more useful Flume components, including the very important file channel for the persistence of in-flight data records and the HDFS Sink for buffering and writing data into HDFS (the Hadoop File System). Since Flume comes with a wide variety of modules, chances are that the only tool you'll need to get started is a text editor for the configuration file.

By the time you reach the end of this book, you should know enough to build a highly available, fault-tolerant, streaming data pipeline that feeds your Hadoop cluster.

What this book covers

Chapter 1, Overview and Architecture, introduces Flume and the problem space that it's trying to address (specifically with regards to Hadoop). An architectural overview of the various components to be covered in later chapters is given.

Chapter 2, A Quick Start Guide to Flume, serves to get you up and running quickly. It includes downloading Flume, creating a Hello, World! configuration, and running it.

Chapter 3, Channels, covers the two major channels most people will use and the configuration options available for each of them.

Chapter 4, Sinks and Sink Processors, goes into great detail on using the HDFS Flume output, including compression options and options for formatting the data. Failover options are also covered so that you can create a more robust data pipeline.

Chapter 5, Sources and Channel Selectors, introduces several of the Flume input mechanisms and their configuration options. Also covered is switching between different channels based on data content, which allows the creation of complex data flows.

Chapter 6, Interceptors, ETL, and Routing, explains how to transform data in-flight as well as extract information from the payload to use with Channel Selectors to make routing decisions. Then this chapter covers tiering Flume agents using Avro serialization, as well as using the Flume command line as a standalone Avro client for testing and importing data manually.

Chapter 7, Putting It All Together, walks you through the details of an end-to-end use case from the web server logs to a searchable UI, backed by Elasticsearch as well as archival storage in HDFS.

Chapter 8, Monitoring Flume, discusses various options available for monitoring Flume both internally and externally, including Monit, Nagios, Ganglia, and custom hooks.

Chapter 9, There Is No Spoon – the Realities of Real-time Distributed Data Collection, is a collection of miscellaneous things to consider that are outside the scope of just configuring and using Flume.

What you need for this book

You'll need a computer with a Java Virtual Machine installed, since Flume is written in Java. If you don't have Java on your computer, you can download it from http://java.com/.

You will also need an Internet connection so that you can download Flume to run the Quick Start example.

This book covers Apache Flume 1.5.2.

Who this book is for

This book is for people responsible for implementing the automatic movement of data from various systems to a Hadoop cluster. If it is your job to load data into Hadoop on a regular basis, this book should help you to code yourself out of manual monkey work or from writing a custom tool you'll be supporting for as long as you work at your company.

Only basic knowledge of Hadoop and HDFS is required. Some custom implementations are covered, should your needs necessitate them. For this level of implementation, you will need to know how to program in Java.

Finally, you'll need your favorite text editor, since most of this book covers how to configure various Flume components via an agent's text configuration file.

Conventions

In this book, you will find a number of styles of text that distinguish between different kinds of information. Here are some examples of these styles, and explanations of their

Enjoying the preview?

Page 1 of 1

Apache Flume: Distributed Log Collection for Hadoop - Second Edition

About this ebook

Steve Hoffman

Read more from Steve Hoffman

Related authors

Related to Apache Flume

Related ebooks

Databases For You

Related podcast episodes

Related articles

Related categories

Reviews for Apache Flume

What did you think?

Book preview

Apache Flume - Steve Hoffman

Table of Contents

Apache Flume: Distributed Log Collection for Hadoop Second Edition

Apache Flume: Distributed Log Collection for Hadoop Second Edition

Credits

About the Author

About the Reviewers

Support files, eBooks, discount offers, and more

Why subscribe?

Preface

What this book covers

What you need for this book

Who this book is for

Conventions