Ebook690 pages3 hours

Apache Hive Cookbook

Name: Apache Hive Cookbook
Author: Shrey Mehrotra
ISBN: 9781782161097

By Shrey Mehrotra, Hanish Bansal and Saurabh Chauhan

Rating: 0 out of 5 stars

()

Read preview

About this ebook

About This Book

Grasp a complete reference of different Hive topics.
Get to know the latest recipes in development in Hive including CRUD operations
Understand Hive internals and integration of Hive with different frameworks used in today’s world.

Who This Book Is For

This book has covered almost all concepts of Hive. So, if you are a beginner in the big data Hadoop domain, you can start with installing Hive, understanding Hive services and clients, and using Hive data modeling concepts to design your data model. If you have basic knowledge of Hive, you can deep dive into some of the advance concepts covered in the book such as partitions, bucketing, file formats, security, and windowing and analytics.

In a nutshell, this book is helpful for both a Hadoop developer and a Hadoop analyst who want to explore Hive.

Skip carousel

LanguageEnglish

PublisherPackt Publishing

Release dateApr 29, 2016

ISBN9781782161097

Author

Shrey Mehrotra

Shrey Mehrotra has more than 5 years of IT experience, and in the past 4 years, he has gained experience in designing and architecting solutions for cloud and big data domains. Working with big data R&D Labs, he has gained insights into Hadoop, focusing on HDFS, MapReduce, and YARN. His technical strengths also include Hive, PIG, ElasticSearch, Kafka, Sqoop, Flume, and Java. During his free time, he listens to music, watches movies, and enjoys going out with friends.

Related authors

Skip carousel

Related to Apache Hive Cookbook

Related ebooks

Skip carousel

Hadoop Real-World Solutions Cookbook - Second Edition
Ebook
Hadoop Real-World Solutions Cookbook - Second Edition
byDeshpande Tanmay
Rating: 0 out of 5 stars
0 ratings
Apache Spark for Data Science Cookbook
Ebook
Apache Spark for Data Science Cookbook
byPadma Priya Chitturi
Rating: 0 out of 5 stars
0 ratings
Hadoop 2.x Administration Cookbook
Ebook
Hadoop 2.x Administration Cookbook
byGurmukh Singh
Rating: 0 out of 5 stars
0 ratings
PostgreSQL 9 High Availability Cookbook
Ebook
PostgreSQL 9 High Availability Cookbook
byShaun M. Thomas
Rating: 5 out of 5 stars
5/5
Scala Data Analysis Cookbook
Ebook
Scala Data Analysis Cookbook
byManivannan Arun
Rating: 0 out of 5 stars
0 ratings
PostgreSQL 9 Administration Cookbook - Second Edition
Ebook
PostgreSQL 9 Administration Cookbook - Second Edition
bySimon Riggs
Rating: 0 out of 5 stars
0 ratings
MongoDB Cookbook - Second Edition
Ebook
MongoDB Cookbook - Second Edition
byDasadia Cyrus
Rating: 0 out of 5 stars
0 ratings
Neo4j Cookbook
Ebook
Neo4j Cookbook
byAnkur Goel
Rating: 0 out of 5 stars
0 ratings
Hadoop MapReduce v2 Cookbook - Second Edition
Ebook
Hadoop MapReduce v2 Cookbook - Second Edition
byThilina Gunarathne
Rating: 0 out of 5 stars
0 ratings
Talend Open Studio Cookbook
Ebook
Talend Open Studio Cookbook
byRick Barton
Rating: 2 out of 5 stars
2/5
DynamoDB Cookbook
Ebook
DynamoDB Cookbook
byDeshpande Tanmay
Rating: 0 out of 5 stars
0 ratings
Hadoop in Practice
Ebook
Hadoop in Practice
byAlex Holmes
Rating: 0 out of 5 stars
0 ratings
Learning Apache Spark 2
Ebook
Learning Apache Spark 2
byMuhammad Asif Abbasi
Rating: 0 out of 5 stars
0 ratings
Fast Data Processing with Spark 2 - Third Edition
Ebook
Fast Data Processing with Spark 2 - Third Edition
byKrishna Sankar
Rating: 0 out of 5 stars
0 ratings
Apache Hive Essentials
Ebook
Apache Hive Essentials
byDayong Du
Rating: 0 out of 5 stars
0 ratings
Machine Learning with Spark - Second Edition
Ebook
Machine Learning with Spark - Second Edition
byNick Pentreath
Rating: 0 out of 5 stars
0 ratings
Hadoop: Data Processing and Modelling
Ebook
Hadoop: Data Processing and Modelling
byGarry Turkington
Rating: 0 out of 5 stars
0 ratings
Spark in Action: Covers Apache Spark 3 with Examples in Java, Python, and Scala
Ebook
Spark in Action: Covers Apache Spark 3 with Examples in Java, Python, and Scala
byJean-Georges Perrin
Rating: 0 out of 5 stars
0 ratings
Apache Cassandra Essentials
Ebook
Apache Cassandra Essentials
byPadalia Nitin
Rating: 4 out of 5 stars
4/5
Frank Kane's Taming Big Data with Apache Spark and Python
Ebook
Frank Kane's Taming Big Data with Apache Spark and Python
byFrank Kane
Rating: 0 out of 5 stars
0 ratings
Learning Hadoop 2
Ebook
Learning Hadoop 2
byGarry Turkington
Rating: 4 out of 5 stars
4/5
Data Pipelines with Apache Airflow
Ebook
Data Pipelines with Apache Airflow
byJulian de Ruiter
Rating: 0 out of 5 stars
0 ratings
Mastering Scala Machine Learning
Ebook
Mastering Scala Machine Learning
byAlex Kozlov
Rating: 0 out of 5 stars
0 ratings
Mastering Hadoop
Ebook
Mastering Hadoop
bySandeep Karanth
Rating: 0 out of 5 stars
0 ratings
Kafka Streams - Real-time Streams Processing
Ebook
Kafka Streams - Real-time Streams Processing
byPrashant Kumar Pandey
Rating: 5 out of 5 stars
5/5
Data Lake for Enterprises
Ebook
Data Lake for Enterprises
byPankaj Misra
Rating: 0 out of 5 stars
0 ratings
HDInsight Essentials - Second Edition
Ebook
HDInsight Essentials - Second Edition
byRajesh Nadipalli
Rating: 0 out of 5 stars
0 ratings
Azure Databricks A Complete Guide - 2020 Edition
Ebook
Azure Databricks A Complete Guide - 2020 Edition
byGerardus Blokdyk
Rating: 0 out of 5 stars
0 ratings
Mastering Large Datasets with Python: Parallelize and Distribute Your Python Code
Ebook
Mastering Large Datasets with Python: Parallelize and Distribute Your Python Code
byJohn Wolohan
Rating: 0 out of 5 stars
0 ratings
Learning Apache Mahout Classification
Ebook
Learning Apache Mahout Classification
byGupta Ashish
Rating: 0 out of 5 stars
0 ratings

Databases For You

Skip carousel

Python Projects for Everyone
Ebook
Python Projects for Everyone
byMohamad Charara
Rating: 0 out of 5 stars
0 ratings
Artificial Intelligence for Fashion: How AI is Revolutionizing the Fashion Industry
Ebook
Artificial Intelligence for Fashion: How AI is Revolutionizing the Fashion Industry
byLeanne Luce
Rating: 0 out of 5 stars
0 ratings
FileMaker Pro Design and Scripting For Dummies
Ebook
FileMaker Pro Design and Scripting For Dummies
byTimothy Trimble
Rating: 0 out of 5 stars
0 ratings
Excel 2021
Ebook
Excel 2021
byJIAYI SIMONDS
Rating: 4 out of 5 stars
4/5
SQL Clearly Explained
Ebook
SQL Clearly Explained
byJan L. Harrington
Rating: 5 out of 5 stars
5/5
Learn SQL in 24 Hours
Ebook
Learn SQL in 24 Hours
byAlex Nordeen
Rating: 5 out of 5 stars
5/5
Practical Data Analysis
Ebook
Practical Data Analysis
byHector Cuesta
Rating: 4 out of 5 stars
4/5
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
Ebook
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL
byWalter Shields
Rating: 4 out of 5 stars
4/5
The AI Bible, Making Money with Artificial Intelligence: Real Case Studies and How-To's for Implementation
Ebook
The AI Bible, Making Money with Artificial Intelligence: Real Case Studies and How-To's for Implementation
byJhon Dujardin
Rating: 4 out of 5 stars
4/5
Access 2010 All-in-One For Dummies
Ebook
Access 2010 All-in-One For Dummies
byAlison Barrows
Rating: 4 out of 5 stars
4/5
Schaum’s Outline of Fundamentals of SQL Programming
Ebook
Schaum’s Outline of Fundamentals of SQL Programming
byRamon Mata-Toledo
Rating: 3 out of 5 stars
3/5
COMPUTER SCIENCE FOR ROOKIES
Ebook
COMPUTER SCIENCE FOR ROOKIES
byAngel Bahabwa
Rating: 0 out of 5 stars
0 ratings
Query Store for SQL Server 2019: Identify and Fix Poorly Performing Queries
Ebook
Query Store for SQL Server 2019: Identify and Fix Poorly Performing Queries
byTracy Boggiano
Rating: 0 out of 5 stars
0 ratings
Serverless Architectures on AWS, Second Edition
Ebook
Serverless Architectures on AWS, Second Edition
byPeter Sbarski
Rating: 5 out of 5 stars
5/5
Access 2019 For Dummies
Ebook
Access 2019 For Dummies
byLaurie A. Ulrich
Rating: 0 out of 5 stars
0 ratings
Beginning Microsoft Power BI: A Practical Guide to Self-Service Data Analytics
Ebook
Beginning Microsoft Power BI: A Practical Guide to Self-Service Data Analytics
byDan Clark
Rating: 0 out of 5 stars
0 ratings
Beginning Microsoft SQL Server 2012 Programming
Ebook
Beginning Microsoft SQL Server 2012 Programming
byPaul Atkinson
Rating: 1 out of 5 stars
1/5
Grokking Algorithms: An illustrated guide for programmers and other curious people
Ebook
Grokking Algorithms: An illustrated guide for programmers and other curious people
byAditya Bhargava
Rating: 4 out of 5 stars
4/5
Learn SQL Server Administration in a Month of Lunches
Ebook
Learn SQL Server Administration in a Month of Lunches
byDon Jones
Rating: 0 out of 5 stars
0 ratings
SQL: Practical Guide for Developers
Ebook
SQL: Practical Guide for Developers
byMichael J. Donahoo
Rating: 2 out of 5 stars
2/5
SQL Programming & Database Management For Absolute Beginners SQL Server, Structured Query Language Fundamentals: "Learn - By Doing" Approach And Master SQL
Ebook
SQL Programming & Database Management For Absolute Beginners SQL Server, Structured Query Language Fundamentals: "Learn - By Doing" Approach And Master SQL
byWilliam Sullivan
Rating: 5 out of 5 stars
5/5
SQL Server: Tips and Tricks - 2
Ebook
SQL Server: Tips and Tricks - 2
byPriyanka Agarwal
Rating: 4 out of 5 stars
4/5
Codeless Data Structures and Algorithms: Learn DSA Without Writing a Single Line of Code
Ebook
Codeless Data Structures and Algorithms: Learn DSA Without Writing a Single Line of Code
byArmstrong Subero
Rating: 0 out of 5 stars
0 ratings
Building a Scalable Data Warehouse with Data Vault 2.0
Ebook
Building a Scalable Data Warehouse with Data Vault 2.0
byDaniel Linstedt
Rating: 4 out of 5 stars
4/5
Relational Database Design and Implementation
Ebook
Relational Database Design and Implementation
byJan L. Harrington
Rating: 5 out of 5 stars
5/5
Data Stewardship: An Actionable Guide to Effective Data Management and Data Governance
Ebook
Data Stewardship: An Actionable Guide to Effective Data Management and Data Governance
byDavid Plotkin
Rating: 4 out of 5 stars
4/5
Access for Beginners: Access Essentials, #1
Ebook
Access for Beginners: Access Essentials, #1
byM.L. Humphrey
Rating: 0 out of 5 stars
0 ratings
Practical SQL
Ebook
Practical SQL
byDavid Perry
Rating: 4 out of 5 stars
4/5
100+ SQL Queries T-SQL for Microsoft SQL Server
Ebook
100+ SQL Queries T-SQL for Microsoft SQL Server
byIFS Harrison
Rating: 4 out of 5 stars
4/5
Learning PostgreSQL
Ebook
Learning PostgreSQL
byJuba Salahaldin
Rating: 1 out of 5 stars
1/5

Related podcast episodes

Skip carousel

Level Up Your Data Platform With Active Metadata: A conversation with Atlan co-founder Prukalpa Sankar about the idea of active metadata and how it can reduce the toil involved in managing a data platform
Podcast episode
Level Up Your Data Platform With Active Metadata: A conversation with Atlan co-founder Prukalpa Sankar about the idea of active metadata and how it can reduce the toil involved in managing a data platform
byData Engineering Podcast
0 ratings
0% found this document useful
Cloud Dataflow with Eric Anderson: Batch and stream processing systems have been evolving for the past decade. From MapReduce to Apache Storm to Dataflow, the best practices for large volume data processing have become more sophisticated as the industry and open source communities have ...
Podcast episode
Cloud Dataflow with Eric Anderson: Batch and stream processing systems have been evolving for the past decade. From MapReduce to Apache Storm to Dataflow, the best practices for large volume data processing have become more sophisticated as the industry and open source communities have ...
byCloud Engineering Archives - Software Engineering Daily
0 ratings
0% found this document useful
Reflections On Designing A Data Platform From Scratch: A monologue by Tobias Macey, the host of the show, about the design considerations involved in building a data platform and how the lessons learned from running the Data Engineering Podcast are influencing the choices made.
Podcast episode
Reflections On Designing A Data Platform From Scratch: A monologue by Tobias Macey, the host of the show, about the design considerations involved in building a data platform and how the lessons learned from running the Data Engineering Podcast are influencing the choices made.
byData Engineering Podcast
100%
100% found this document useful
433: Falling for FastAPI: Mike's falling in love with FastAPI and gives us a hint at the next project he's building.
Podcast episode
433: Falling for FastAPI: Mike's falling in love with FastAPI and gives us a hint at the next project he's building.
byCoder Radio
0 ratings
0% found this document useful
EP 22: What is OAuth 2?
Podcast episode
EP 22: What is OAuth 2?
byPro Coder Show
0 ratings
0% found this document useful
Introduction to Data Mesh
Podcast episode
Introduction to Data Mesh
byThe Cloudcast
0 ratings
0% found this document useful
Kafka Streams with Jay Kreps: Kafka Streams is a library for building streaming applications that transform input Kafka topics into output Kafka topics. In a time when there are numerous streaming frameworks already out there, why do we need yet another?
Podcast episode
Kafka Streams with Jay Kreps: Kafka Streams is a library for building streaming applications that transform input Kafka topics into output Kafka topics. In a time when there are numerous streaming frameworks already out there, why do we need yet another?
byCloud Engineering Archives - Software Engineering Daily
0 ratings
0% found this document useful
Ali Ghodsi – The Past, Present, and Future of Big Data – [Founder’s Field Guide, EP.18]: My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at...
Podcast episode
Ali Ghodsi – The Past, Present, and Future of Big Data – [Founder’s Field Guide, EP.18]: My Guest today is Ali Ghodsi, founder and CEO of Databricks, a data analytics platform for data scientists and developers. He's also the founder of Apache Spark, the open-source project that Databricks is built on, and is an accomplished researcher at...
byInvest Like the Best with Patrick O'Shaughnessy
0 ratings
0% found this document useful
Stateful, Distributed Stream Processing on Flink with Fabian Hueske - Episode 57: Scalable and Stateful Streaming Data With Apache Flink (Interview)
Podcast episode
Stateful, Distributed Stream Processing on Flink with Fabian Hueske - Episode 57: Scalable and Stateful Streaming Data With Apache Flink (Interview)
byData Engineering Podcast
0 ratings
0% found this document useful
Taking A Tour Of PostgreSQL with Jonathan Katz - Episode 42: A Whirlwind Tour Of The PostgreSQL Database (Interview)
Podcast episode
Taking A Tour Of PostgreSQL with Jonathan Katz - Episode 42: A Whirlwind Tour Of The PostgreSQL Database (Interview)
byData Engineering Podcast
100%
100% found this document useful
Building Real Time Applications On Streaming Data With Eventador - Episode 129: An interview with Eventador CEO Kenny Gorman about the challenges of building a managed service for streaming data to simplify building real time applications
Podcast episode
Building Real Time Applications On Streaming Data With Eventador - Episode 129: An interview with Eventador CEO Kenny Gorman about the challenges of building a managed service for streaming data to simplify building real time applications
byData Engineering Podcast
0 ratings
0% found this document useful
All Things Azure with Dwayne Monroe: Dwayne Monroe is a senior cloud architect at Cloudreach, an organization that helps enterprises maximize their cloud investments, who’s focused on Azure. Prior to joining Cloudreach, Dwayne worked as a senior Microsoft and cloud architect at High Availabi
Podcast episode
All Things Azure with Dwayne Monroe: Dwayne Monroe is a senior cloud architect at Cloudreach, an organization that helps enterprises maximize their cloud investments, who’s focused on Azure. Prior to joining Cloudreach, Dwayne worked as a senior Microsoft and cloud architect at High Availabi
byScreaming in the Cloud
0 ratings
0% found this document useful
Simplifying Data Integration Through Eventual Connectivity - Episode 91: An interview about a new pattern for data integration that reduces the amount of effort required to find connections in numerous data sets
Podcast episode
Simplifying Data Integration Through Eventual Connectivity - Episode 91: An interview about a new pattern for data integration that reduces the amount of effort required to find connections in numerous data sets
byData Engineering Podcast
0 ratings
0% found this document useful
This Week In Machine Learning & AI - 5/20/16: AI at Google I/O, Amazon's Deep Learning DSSTNE: This Week In Machine Learning & AI - May 20, 2016…
Podcast episode
This Week In Machine Learning & AI - 5/20/16: AI at Google I/O, Amazon's Deep Learning DSSTNE: This Week In Machine Learning & AI - May 20, 2016…
byThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
0 ratings
0% found this document useful
A Multipurpose Database For Transactions And Analytics To Simplify Your Data Architecture With Singlestore: An interview with Shireesh Thota about how the Singlestore database engine allows you to reduce architectural sprawl in your data systems by combining performant and scalable transactional and analytical capabilities into a single platform
Podcast episode
A Multipurpose Database For Transactions And Analytics To Simplify Your Data Architecture With Singlestore: An interview with Shireesh Thota about how the Singlestore database engine allows you to reduce architectural sprawl in your data systems by combining performant and scalable transactional and analytical capabilities into a single platform
byData Engineering Podcast
0 ratings
0% found this document useful
55: Go on The Web: Summary Andrew Gerrand (@enneff), Developer Advocate at Google & Go core contributor, talks about GoLang and how it is being used in Web Development today as well as the plans for the future of the Go as a platform for the web. Resources Go...
Podcast episode
55: Go on The Web: Summary Andrew Gerrand (@enneff), Developer Advocate at Google & Go core contributor, talks about GoLang and how it is being used in Web Development today as well as the plans for the future of the Go as a platform for the web. Resources Go...
byThe Web Platform Podcast
100%
100% found this document useful
Engineering interview tips & tricks: with Emma Draper & Jonas
Podcast episode
Engineering interview tips & tricks: with Emma Draper & Jonas
byGo Time: Golang, Software Engineering
0 ratings
0% found this document useful
Putting Airflow Into Production With James Meickle - Episode 43: Lessons Learned While Building A Data Science Platform With Airflow (Interview)
Podcast episode
Putting Airflow Into Production With James Meickle - Episode 43: Lessons Learned While Building A Data Science Platform With Airflow (Interview)
byData Engineering Podcast
0 ratings
0% found this document useful
HashiCorp Vault for Kubernetes: Bret is joined by Rosemary Wang from HashiCorp to show off Vault for Kubernetes, an open source secrets provider.
Podcast episode
HashiCorp Vault for Kubernetes: Bret is joined by Rosemary Wang from HashiCorp to show off Vault for Kubernetes, an open source secrets provider.
byDevOps and Docker Talk: Cloud Native Interviews and Tooling
0 ratings
0% found this document useful
Revisit The Fundamental Principles Of Working With Data To Avoid Getting Caught In The Hype Cycle: The data ecosystem has seen a constant flurry of activity for the past several years, and it shows no signs of slowing down. With all of the products, techniques, and buzzwords being discussed it can be easy to be overcome by the hype. In this episode Juan Sequeda and Tim Gasper from data.world share their views on the core principles that you can use to ground your work and avoid getting caught in the hype cycles.
Podcast episode
Revisit The Fundamental Principles Of Working With Data To Avoid Getting Caught In The Hype Cycle: The data ecosystem has seen a constant flurry of activity for the past several years, and it shows no signs of slowing down. With all of the products, techniques, and buzzwords being discussed it can be easy to be overcome by the hype. In this episode Juan Sequeda and Tim Gasper from data.world share their views on the core principles that you can use to ground your work and avoid getting caught in the hype cycles.
byData Engineering Podcast
0 ratings
0% found this document useful
Running Databases on Kubernetes
Podcast episode
Running Databases on Kubernetes
byThe Cloudcast
0 ratings
0% found this document useful
Software Architecture with Simon Brown: Software architecture address the challenge of communicating and navigating large, complex systems to stakeholders, both technical and non-technical. Over the years software architecture has gone in and out of fashion.
Podcast episode
Software Architecture with Simon Brown: Software architecture address the challenge of communicating and navigating large, complex systems to stakeholders, both technical and non-technical. Over the years software architecture has gone in and out of fashion.
byCloud Engineering Archives - Software Engineering Daily
0 ratings
0% found this document useful
#464: Diving deep into Amazon MWAA: Amazon Managed Workflows for Apache Airflow (MWAA) is a managed orchestration service for Apache Air
Podcast episode
#464: Diving deep into Amazon MWAA: Amazon Managed Workflows for Apache Airflow (MWAA) is a managed orchestration service for Apache Air
byAWS Podcast
0 ratings
0% found this document useful
82: How to Get Started with Advanced Analytics R-Python w/ Ryan Wade: Ryan Wade joins us on AOF today to talk about how to use advanced analytics in your organization! Ryan has been in the analytics game for the last 20 years and is now a Senior Solution Consultant at Blue Granite, based in Indianapolis, Indiana. He...
Podcast episode
82: How to Get Started with Advanced Analytics R-Python w/ Ryan Wade: Ryan Wade joins us on AOF today to talk about how to use advanced analytics in your organization! Ryan has been in the analytics game for the last 20 years and is now a Senior Solution Consultant at Blue Granite, based in Indianapolis, Indiana. He...
byAnalytics on Fire
0 ratings
0% found this document useful
#54 Women in Data Science
Podcast episode
#54 Women in Data Science
byDataFramed
0 ratings
0% found this document useful
Architecting for Scale with Lee Atchison: Lee Atchison spent seven years at Amazon working in retail, software distribution, and Amazon Web Services. He then moved to New Relic, where he has spent four years scaling the company’s internal architecture.
Podcast episode
Architecting for Scale with Lee Atchison: Lee Atchison spent seven years at Amazon working in retail, software distribution, and Amazon Web Services. He then moved to New Relic, where he has spent four years scaling the company’s internal architecture.
byCloud Engineering Archives - Software Engineering Daily
0 ratings
0% found this document useful
Building Data Flows In Apache NiFi With Kevin Doran and Andy LoPresto - Episode 39: Self Service Data Flows With Apache NiFi (Interview)
Podcast episode
Building Data Flows In Apache NiFi With Kevin Doran and Andy LoPresto - Episode 39: Self Service Data Flows With Apache NiFi (Interview)
byData Engineering Podcast
0 ratings
0% found this document useful
Using FoundationDB As The Bedrock For Your Distributed Systems - Episode 80: An interview about the FoundationDB project and how it simplifies the work of building custom distributed systems applications
Podcast episode
Using FoundationDB As The Bedrock For Your Distributed Systems - Episode 80: An interview about the FoundationDB project and how it simplifies the work of building custom distributed systems applications
byData Engineering Podcast
0 ratings
0% found this document useful
SnowflakeDB: The Data Warehouse Built For The Cloud - Episode 110: An interview about how SnowflakeDB was built to provide a performant and flexible data platform for the cloud era
Podcast episode
SnowflakeDB: The Data Warehouse Built For The Cloud - Episode 110: An interview about how SnowflakeDB was built to provide a performant and flexible data platform for the cloud era
byData Engineering Podcast
0 ratings
0% found this document useful
Monitoring Architecture with Theo Schlossnagle: Building a monitoring system is a complex distributed systems problem. Events are produced from different points in an application and must be aggregated in order to form metrics. These events are often ingested by a time series database,
Podcast episode
Monitoring Architecture with Theo Schlossnagle: Building a monitoring system is a complex distributed systems problem. Events are produced from different points in an application and must be aggregated in order to form metrics. These events are often ingested by a time series database,
byCloud Engineering Archives - Software Engineering Daily
0 ratings
0% found this document useful

Skip carousel

Build A Search And Analytic Engine
Linux Format
Article
Build A Search And Analytic Engine
Mar 10, 2020
7 min read
Usability
Linux Format
Article
Usability
Oct 19, 2021
3 min read
KAFKA Build Utilities With The Kafka Server
Linux Format
Article
KAFKA Build Utilities With The Kafka Server
Jul 2, 2019
Nowadays, quite a few data architectures involve both a database and Apache Kafka, which is a distributed streaming platform and the subject of this tutorial. You can also find Kafka described as a publish-subscribe message system, which is a fancy w
7 min read
What is ELT?
Techfastly
Article
What is ELT?
Apr 1, 2021
It stands for extract, load, and transform- the processes a data pipeline uses for replicating the data from a source system into a target system such as a cloud data warehouse. 1. Extraction is the first step in which data is copied from the source
6 min read
Elasticsearch And Kibana Basics
Linux Format
Article
Elasticsearch And Kibana Basics
Dec 15, 2020
1 min read
Grafana Terminology
Linux Format
Article
Grafana Terminology
Jan 14, 2020
A Grafana data source is a database, file or service that provides data to Grafana – it cannot operate without data. A Grafana panel is the basic building block of Grafana. Panels are made of visualisations or queries. A Grafana query is used for req
1 min read
Types Of Databases
Linux Format
Article
Types Of Databases
Aug 27, 2019
NoSQL databases provide the performance, scalability and stability that’s required by the modern data-driven apps we interact with these days. But that is where the similarity between NoSQL systems end. In fact, it wouldn’t be wrong to say that the o
1 min read
Create A RESTful Server In Go
Linux Format
Article
Create A RESTful Server In Go
Oct 19, 2021
8 min read
Rokoko Studio 2.0
3D World
Article
Rokoko Studio 2.0
Feb 23, 2021
1 min read
Grafana, Telegraf And Influxdb
Linux Format
Article
Grafana, Telegraf And Influxdb
Jun 30, 2020
If you don’t like Netdata or if you want to try something else, you can give Grafana (https://grafana.com), Telegraf (www.influxdata.com/time-series-platform/telegraf) and InfluxDB (www.influxdata.com/products/influxdb-overview) a try. Grafana can’t
1 min read
Basic Concepts
Linux Format
Article
Basic Concepts
Jul 2, 2019
A messaging system such as Kafka enables you to send messages between processes, applications and servers. Applications connect to Kafka to send or get data. Strictly speaking, a Kafka ‘topic’ is a unit of storage in Kafka: data in Kafka is stored in
1 min read
An Introduction To Rabbitmq
Linux Format
Article
An Introduction To Rabbitmq
Jun 29, 2021
RabbitMQ is a Message Broker, which means that it can safely hold messages generated by applications and make them available to other applications. The main advantages are reliability, support for clustering and high-availability queues, tracing capa
1 min read
Data Fabric
PC Pro Magazine
Article
Data Fabric
Aug 13, 2020
3 min read
Build Your Own URL Shortening Service
Linux Format
Article
Build Your Own URL Shortening Service
May 4, 2021
7 min read
Build A Dynamic App Security Pipeline
Linux Format
Article
Build A Dynamic App Security Pipeline
Sep 21, 2021
8 min read
Docker vs Podman
APC
Article
Docker vs Podman
Apr 19, 2021
When Cockpit was first developed, it had plug-in support for administering your Docker containers remotely via its user-friendly web interface. But then Red Hat OS became a major backer of Cockpit, and when Red Hat developed its own alternative to Do
1 min read
Metrics & Visuals In Go
Linux Format
Article
Metrics & Visuals In Go
Nov 17, 2020
Mihalis Tsoukalos is a DataOps engineer and a technical writer. He’s the author of Go Systems Programming and Mastering Go, 2nd edition. The subject of this tutorial is two-fold. First, it’s about creating a Go application that exports metrics to P
7 min read
Understanding ELT & ETL
Techfastly
Article
Understanding ELT & ETL
Apr 1, 2021
8 min read
AWS Vs Azure What’s The Difference?
PC Pro Magazine
Article
AWS Vs Azure What’s The Difference?
Sep 11, 2022
7 min read
Ice Cold With Kali
Linux Format
Article
Ice Cold With Kali
May 2, 2023
3 min read
About Kibana
Linux Format
Article
About Kibana
Mar 10, 2020
Kibana offers analytics and a search dashboard for Elasticsearch, as well as visualisation capabilities for data stored in Elasticsearch. Kibana is so handy that it would be a shame to use Elasticsearch without combining it with Kibana. Generally spe
1 min read
Enterprise-grade Monitoring Made Easy
Linux Format
Article
Enterprise-grade Monitoring Made Easy
Mar 10, 2020
9 min read
Set Up A Production- Ready Web Server
APC
Article
Set Up A Production- Ready Web Server
Nov 4, 2019
8 min read
Set Up A Production-ready Web Server
Linux Format
Article
Set Up A Production-ready Web Server
Sep 24, 2019
8 min read
How To Build The Linux Format Server
Linux Format
Article
How To Build The Linux Format Server
Oct 19, 2021
10 min read
Scan And Scrape Websites Using Python
Linux Format
Article
Scan And Scrape Websites Using Python
Nov 14, 2023
David Bolton once accidentally boosted the traffic for his firm’s website by 25% in one day by running a web scraper on it. Luckily, they never found out! Ever since the web made an appearance back in the mid-’90s, programmers have been writing softw
6 min read
Twenty Years Of WordPress Websites!
Linux Format
Article
Twenty Years Of WordPress Websites!
Oct 17, 2023
11 min read
Installing Apache for Linux… on Windows
TechLife
Article
Installing Apache for Linux… on Windows
Jul 27, 2020
5 min read
Observability Of The Kernel And Containers
Linux Format
Article
Observability Of The Kernel And Containers
Apr 4, 2023
Mihalis Tsoukalos is currently working on Time Series. You can reach him at: @mactsouk. For our final delve into eBPF, we’re tackling applications, the kernel and Docker containers. At the end of the day, all Linux machines execute code for applicat
10 min read
FLASK Web Frameworks
Linux Format
Article
FLASK Web Frameworks
Jun 4, 2019
The main focus of Python has always been to get you cracking on with your coding – the language was never made for web programming. However, this has just made it more interesting to extend the language for the web, or to create an interface to web-b
9 min read

Related categories

Skip carousel

Reviews for Apache Hive Cookbook

Rating: 0 out of 5 stars

0 ratings

0 ratings0 reviews

Book preview

Apache Hive Cookbook - Shrey Mehrotra

Apache Hive Cookbook

Credits

About the Authors

About the Reviewer

www.PacktPub.com

eBooks, discount offers, and more

Why Subscribe?

Preface

What this book covers

What you need for this book

Who this book is for

Sections

Getting ready

How to do it…

How it works…

There's more…

See also

Conventions

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

1. Developing Hive

Introduction

Deploying Hive on a Hadoop cluster

Getting ready

How to do it...

How it works…

Deploying Hive Metastore

Getting ready

How to do it…

Installing Hive

Getting ready

How to do it…

Hive with an embedded metastore

Hive with a local metastore

Hive with a remote metastore

Configuring HCatalog

Getting ready

How to do it...

Understanding different components of Hive

HiveServer

Hive metastore

How to do it...

HiveServer2

How to do it...

Hive clients

Hive CLI

Getting ready

How to do it...

Beeline

Getting ready

How to do it...

Compiling Hive from source

Getting ready

How to do it...

Hive packages

Getting ready

How to do it...

Debugging Hive

Getting ready

How to do it...

Running Hive

Getting ready

How to do it...

Changing configurations at runtime

How to do it...

2. Services in Hive

Introducing HiveServer2

How to do it…

How it works…

See also

Understanding HiveServer2 properties

How to do it…

How it works…

See also

Configuring HiveServer2 high availability

Getting ready

How to do it…

How it works…

See also

Using HiveServer2 clients

Getting ready

How to do it…

Beeline

Beeline command options

JDBC

JDBC client sample code using Eclipse

Running the JDBC sample code from the command-line

JDBC datatypes

Other clients

Introducing the Hive metastore service

How to do it…

How it works…

Configuring high availability of metastore service

How to do it…

Introducing Hue

Getting ready

How to do it…

Prepare dependencies

Downloading and installing Hue

Configuring Hive with Hue

Starting Hue

Accessing Hive with Hue

3. Understanding the Hive Data Model

Introduction

Introducing data types

Primitive data types

Complex data types

Using numeric data types

How to do it…

Using string data types

How to do it…

How it works…

Using Date/Time data types

How to do it…

Using miscellaneous data types

How to do it…

Using complex data types

How to do it…

Using operators

Using relational operators

How to do it…

Using arithmetic operators

How to do it…

Using logical operators

How to do it…

Using complex operators

How to do it…

Partitioning

Getting ready

How to do it…

Partitioning a managed table

How to do it…

Adding new partitions

Renaming partitions

Exchanging partitions

Dropping the partitions

Loading data in a managed partitioned table

Partitioning an external table

How to do it…

Bucketing

Getting ready

How to do it…

How it works…

4. Hive Data Definition Language

Introduction

Creating a database schema

Getting ready

How to do it…

Dropping a database schema

Getting ready

How to do it…

Altering a database schema

Getting ready

How to do it…

Using a database schema

Getting ready

How to do it…

Showing database schemas

Getting ready

How to do it…

Describing a database schema

Getting ready

How to do it…

Creating tables

How to do it…

Create table LIKE

How it works

Dropping tables

Getting ready

How to do it…

Truncating tables

Getting ready

How to do it…

Renaming tables

Getting ready

How to do it…

Altering table properties

Getting ready

How to do it…

Creating views

Getting ready

How to do it…

Dropping views

Getting ready

How to do it…

Altering the view properties

Getting ready

How to do it…

Altering the view as select

Getting ready

How to do it…

Showing tables

Getting ready

How to do it…

Showing partitions

Getting ready

How to do it…

Show the table properties

Getting ready

How to do it…

Showing create table

Getting ready

How to do it…

HCatalog

Getting ready

How to do it…

HCatalog DMLs

WebHCat

Getting ready

How to do it…

See also…

5. Hive Data Manipulation Language

Introduction

Loading files into tables

Getting ready

How to do it…

How it works…

Inserting data into Hive tables from queries

Getting ready

How to do it…

How it works…

Inserting data into dynamic partitions

Getting ready

How to do it...

How it works…

There's more…

Writing data into files from queries

Getting ready

How to do it…

Enabling transactions in Hive

Getting ready

How to do it…

Inserting values into tables from SQL

Getting ready

How to do it…

How it works…

There's more…

Updating data

Getting ready

How to do it...

How it works…

There's more…

Deleting data

Getting ready

How to do it...

How it works…

6. Hive Extensibility Features

Introduction

Serialization and deserialization formats and data types

How to do it…

LazySimpleSerDe

RegexSerDe

JSONSerDe

CSVSerDe

There's more…

See also

Exploring views

How to do it…

How it works…

Exploring indexes

How to do it…

Hive partitioning

How to do it…

Static partitioning

Dynamic partitioning

Creating buckets in Hive

How to do it…

Metastore view of bucketing

Analytics functions in Hive

How to do it…

See also

Windowing in Hive

How to do it…

LEAD

LAG

FIRST_VALUE

LAST_VALUE

See also

File formats

How to do it…

7. Joins and Join Optimization

Understanding the joins concept

Getting ready

How to do it…

How it works…

Using a left/right/full outer join

How to do it…

How it works…

Using a left semi join

How to do it…

How it works…

Using a cross join

How to do it…

How it works…

Using a map-side join

How to do it…

How it works…

Using a bucket map join

Getting ready

How to do it…

How it works…

Using a bucket sort merge map join

Getting ready

How to do it…

How it works…

Using a skew join

How to do it…

How it works…

8. Statistics in Hive

Bringing statistics in to Hive

How to do it…

Table and partition statistics in Hive

Getting ready

How to do it…

Statistics for a partitioned table

Column statistics in Hive

How to do it…

How it works…

Top K statistics in Hive

How to do it…

9. Functions in Hive

Using built-in functions

How to do it…

Mathematical functions

Collection functions

Type conversion functions

Date functions

String functions

How it works…

Mathematical functions

Collection functions

Type conversion functions

Date functions

String functions

There's more

Conditional functions

Miscellaneous functions

See also

Using the built-in User-defined Aggregation Function (UDAF)

How to do it…

How it works…

See more

Using the built-in User Defined Table Function (UDTF)

How to do it…

How it works…

See also

Creating custom User-Defined Functions (UDF)

How to do it…

How it works…

10. Hive Tuning

Enabling predicate pushdown optimizations in Hive

Getting ready

How to do it…

How it works…

Optimizations to reduce the number of map

Getting ready

How to do it…

Sampling

Getting ready

Sampling bucketed table

Block sampling

Length literal

Row count

How to do it…

How it works…

11. Hive Security

Securing Hadoop

How to do it…

How it works…

Giving read and write access to user mike

Revoking the access of the user mike

See also

Authorizing Hive

How to do it…

Default authorization–legacy mode

Storage-based authorization

SQL standards-based authorization

There's more

Configuring the SQL standards-based authorization

Getting Started

How to do it…

To list out all existing roles

creating a role

Deleting a role

Showing list of current roles

Setting a role

Granting a role

Revoking a role

Checking roles of a user/role

Checking principles of a role

Granting privileges

Revoking privileges

Checking privileges of a user or role

See also

Authenticating Hive

How to do it…

Anonymous with SASL (default no authentication)

Anonymous without SASL

Kerberos

Configuring the JDBC client for Kerberos authentication

Access Hive using the Beeline client

Access Hive using the Hive JDBC client in Java

LDAP

Pluggable Authentication Modules

Custom

12. Hive Integration with Other Frameworks

Working with Apache Spark

Getting ready

How to do it…

How it works…

Working with Accumulo

Getting ready

How to do it…

How it works…

Working with HBase

Getting ready

How to do it…

How it works…

Working with Google Drill

Getting ready

How to do it…

How it works…

Index

Apache Hive Cookbook

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: April 2016

Production reference: 1260416

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78216-108-0

www.packtpub.com

Credits

Authors

Hanish Bansal

Saurabh Chauhan

Shrey Mehrotra

Reviewer

Aristides Villarreal Bravo

Commissioning Editor

Wilson D'souza

Acquisition Editor

Tushar Gupta

Content Development Editor

Anish Dhurat

Technical Editor

Vishal K. Mewada

Copy Editor

Dipti Mankame

Project Coordinator

Bijal Patel

Proofreader

Safis Editing

Indexer

Priya Sane

Graphics

Kirk D'Penha

Production Coordinator

Shantanu N. Zagade

Cover Work

Shantanu N. Zagade

About the Authors

Hanish Bansal is a software engineer with over 4 years of experience in developing big data applications. He loves to study emerging solutions and applications mainly related to big data processing, NoSQL, natural language processing, and neural networks. He has worked on various technologies such as Spring Framework, Hibernate, Hadoop, Hive, Flume, Kafka, Storm, and NoSQL databases, which include HBase, Cassandra, MongoDB, and search engines such as Elasticsearch.

In 2012, he completed his graduation in Information Technology stream from Jaipur Engineering College and Research Center, Jaipur, India. He was also the technical reviewer of the book Apache Zookeeper Essentials. In his spare time, he loves to travel and listen to music.

You can read his blog at http://hanishblogger.blogspot.in/ and follow him on Twitter at https://twitter.com/hanishbansal786.

I would like to thank my parents for their love, support, encouragement and the amazing chances they've given me over the years.

Saurabh Chauhan is a module lead with close to 8 years of experience in data warehousing and big data applications. He has worked on multiple Extract, Transform and Load tools, such as Oracle Data Integrator and Informatica as well as on big data technologies such as Hadoop, Hive, Pig, Sqoop, and Flume.

He completed his bachelor of technology in 2007 from Vishveshwarya Institute of Engineering and Technology. In his spare time, he loves to travel and discover new places. He also has a keen interest in sports.

I would like to thank everyone who has supported me throughout my life.

Shrey Mehrotra has 6 years of IT experience and, since the past 4 years, in designing and architecting cloud and big data solutions for the governance and financial domains.

Having worked with big data R&D Labs and Global Data and Analytical Capabilities, he has gained insights into Hadoop, focusing on HDFS, MapReduce, and YARN. His technical strengths also include Hive, Pig, Spark, Elasticsearch, Sqoop, Flume, Kafka, and Java.

He likes spending time performing R&D on different big data technologies. He is the co-author of the book Learning YARN, a certified Hadoop developer, and has also written various technical papers. In his free time, he listens to music, watches movies, and spending time with friends.

I would like to thank my mom and dad for giving me support to accomplish anything I wanted. Also, I would like to thank my friends, who bear with me while I am busy writing.

About the Reviewer

Aristides Villarreal Bravo is a Java developers, a member of the NetBeans Dream Team, and a Java User Groups leader.

He has organized and participated in various conferences and seminars related to Java, JavaEE, NetBeans, NetBeans Platform, free software, and mobile devices, nationally and internationally.

He has written tutorials and blogs about Java, NetBeans, and web development. He has participated in several interviews on sites such as NetBeans, NetBeans Dzone, and JavaHispano. He has developed plugins for NetBeans. He has been a technical reviewer for the book PrimeFaces Blueprints.

Aristides is the CEO of Javscaz Software Developers. He lives in Panamá

To my mother, father, and all family and friends.

www.PacktPub.com

eBooks, discount offers, and more

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www2.packtpub.com/books/subscription/packtlib

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.

Why Subscribe?

Fully searchable across every book published by Packt

Copy and paste, print, and bookmark content

On demand and accessible via a web browser

Preface

Hive is an open source big data framework in the Hadoop ecosystem. It provides an SQL-like interface to query data stored in HDFS. Underlying it runs MapReduce programs corresponding to the SQL query. Hive was initially developed by Facebook and later added to the Hadoop ecosystem.

Hive is currently the most preferred framework to query data in Hadoop. Because most of the historical data is stored in RDBMS data stores, including Oracle and Teradata. It is convenient for the developers to run similar SQL statements in Hive to query data.

Along with simple SQL statements, Hive supports wide variety of windowing and analytical functions, including rank, row num, dense rank, lead, and lag.

Hive is considered as de facto big data warehouse solution. It provides a number of techniques to optimize storage and processing of terabytes or petabytes of data in a cost-effective way.

Hive could be easily integrated with a majority of other frameworks, including Spark and HBase. Hive allows developers or analysts to execute SQL on it. Hive also supports querying data stored in different formats such as JSON.

What this book covers

Chapter 1, Developing Hive, helps you out in configuring Hive on a Hadoop platform. This chapter explains a different mode of Hive installations. It also provides pointers for debugging Hive and brief information about compiling Hive source code and different modules in the Hive source code.

Chapter 2, Services in Hive, gives a detailed description about the configurations and usage of different services provided by Hive such as HiveServer2. This chapter also explains about different clients of Hive, including Hive CLI and Beeline.

Chapter 3, Understanding the Hive Data Model, takes you through the details of different data types provided by Hive in order to be helpful in data modeling.

Chapter 4, Hive Data Definition Language, helps you understand the syntax and semantics of creating, altering, and dropping different objects in Hive, including databases, tables, functions, views, indexes, and roles.

Chapter 5, Hive Data Manipulation Language, gives you complete understanding of Hive interfaces for data manipulation. This chapter also includes some of the latest features in Hive related to CRUD operations in Hive. It explains insert, update, and delete at the row level in Hive available in Hive 0.14 and later versions.

Chapter 6, Hive Extensibility Features, covers a majority of advance concepts in Hive. This chapter explain some concepts such as SerDes, Partitions, Bucketing, Windowing and Analytics, and File Formats in Hive with the detailed examples.

Chapter 7, Joins and Join Optimization, gives you a detailed explanation of types of Join supported by Hive. It also provides detailed information about different types of Join optimizations available in Hive.

Chapter 8, Statistics in Hive, allows you to capture and analyze tables, partitions, and column-level statistics. This chapter covers the configurations and commands use to capture these statistics.

Chapter 9, Functions in Hive, gives you the detailed overview of the extensive set of inbuilt functions supported by Hive, which can be used directly in queries. This chapter also covers how to create a custom User-Defined Function

Enjoying the preview?

Page 1 of 1

Apache Hive Cookbook

About this ebook

Shrey Mehrotra

Related authors

Related to Apache Hive Cookbook

Related ebooks

Databases For You

Related podcast episodes

Related articles

Related categories

Reviews for Apache Hive Cookbook

What did you think?

Book preview

Apache Hive Cookbook - Shrey Mehrotra

Table of Contents

Apache Hive Cookbook

Apache Hive Cookbook

Credits

About the Authors

About the Reviewer

eBooks, discount offers, and more

Why Subscribe?

Preface

What this book covers