Sei sulla pagina 1di 13

HBase

Extending with coprocessors


Paval Ambrozie 244
Short introduction - HBase

Distributed database build on top of the Hadoop file system.


Part of the Hadoop ecosystem with the role of providing real-time
random read/write access to the data from the Hadoop file system
Open-source project written in Java; easy API for clients
Modeled after Googles BigTable
Provides automatic failure support
HBase data storage

A table is a collection of rows


A row is a collection of column families
A column family is a collection of columns
A column is a collection of key-value pair
HBase Coprocessor
The idea to introduce coprocessors to HBase was inspired by the
coprocessors from the Googles Bigtable
The coprocessors are loaded globally on all hosted tables and regions
HBase provides two types of coprocessors
Observers : like triggers in conventional databases
Endpoints (only one to return results) : resembles stored procedures
Coprocessor types overview
Observers Endpoints

Like triggers in conventional databases Dynamic RPC endpoints that resemble


The idea behind observers is that we stored procedures
can insert user code by overriding upcall One can call an endpoint at any time
methods provided by the coprocessor from the client. The implementation will
framework. The callback functions are be executed remotely at the target
executed from core HBase code when region/regions and the results will be
certain events occur return to the requesting client
Observer coprocessor types
RegionObserver
Provides hooks for data manipulation events: Get, Put, Delete, etc.
There is an instance to every table region (can be more than one in a RegionServer)
WALObserver
Provides hooks for WAL (write ahead log) operations
Runs in the context of a RegionServer (one per region server)
MasterObserver
Provides hooks for DDL-type operations: create, delete, modify (ex: postDeleteTable())
It runs on the Master node
Endpoint coprocessor

Resembles stored procedures from a conventional database


It comes as an extension to the HBase RPC protocol
Invoked by the user at any time from the client
The execution is passed to the HTableInterface that runs the code on the
cluster, and the result is collected
The execution result is returned to the client
Observer implementation
Source: blogs.apache.org
Observer coprocessor
An Observer sits between the client and the HBase behave like triggers in
conventional databases so code will be ran when events will appear
(ex: add data to a GET call)
Can have more than one observer coprocessor at once
Can be set to run with certains priorities
There are 3 types of observer coprocessors (interfaces provided):
RegionObserver
WALObserver
MasterObserver
Endpoint implementation
Steps to building a endpoint:

Create new protocol interface that extends CoprocessorProtocol


Implement the new Endpoint interface (this will be loaded and executed
from the region context)
Invoke Endpoint from the client side; here we have 2 APIs:
HTableInterface.coprocessorProxy(Class<T> protocol, byte[] row)
This is executed on a single region
HTableInterface.coprocessorExec(Class<T>, byte[] startKey, byte[] endKey, Batch.Call<T, R> callable)
This is executed over a range of regions
Aggregate Endpoint example
Thank you!

Potrebbero piacerti anche