Distributed database build on top of the Hadoop file system.
Part of the Hadoop ecosystem with the role of providing real-time random read/write access to the data from the Hadoop file system Open-source project written in Java; easy API for clients Modeled after Googles BigTable Provides automatic failure support HBase data storage
A table is a collection of rows
A row is a collection of column families A column family is a collection of columns A column is a collection of key-value pair HBase Coprocessor The idea to introduce coprocessors to HBase was inspired by the coprocessors from the Googles Bigtable The coprocessors are loaded globally on all hosted tables and regions HBase provides two types of coprocessors Observers : like triggers in conventional databases Endpoints (only one to return results) : resembles stored procedures Coprocessor types overview Observers Endpoints
Like triggers in conventional databases Dynamic RPC endpoints that resemble
The idea behind observers is that we stored procedures can insert user code by overriding upcall One can call an endpoint at any time methods provided by the coprocessor from the client. The implementation will framework. The callback functions are be executed remotely at the target executed from core HBase code when region/regions and the results will be certain events occur return to the requesting client Observer coprocessor types RegionObserver Provides hooks for data manipulation events: Get, Put, Delete, etc. There is an instance to every table region (can be more than one in a RegionServer) WALObserver Provides hooks for WAL (write ahead log) operations Runs in the context of a RegionServer (one per region server) MasterObserver Provides hooks for DDL-type operations: create, delete, modify (ex: postDeleteTable()) It runs on the Master node Endpoint coprocessor
Resembles stored procedures from a conventional database
It comes as an extension to the HBase RPC protocol Invoked by the user at any time from the client The execution is passed to the HTableInterface that runs the code on the cluster, and the result is collected The execution result is returned to the client Observer implementation Source: blogs.apache.org Observer coprocessor An Observer sits between the client and the HBase behave like triggers in conventional databases so code will be ran when events will appear (ex: add data to a GET call) Can have more than one observer coprocessor at once Can be set to run with certains priorities There are 3 types of observer coprocessors (interfaces provided): RegionObserver WALObserver MasterObserver Endpoint implementation Steps to building a endpoint:
Create new protocol interface that extends CoprocessorProtocol
Implement the new Endpoint interface (this will be loaded and executed from the region context) Invoke Endpoint from the client side; here we have 2 APIs: HTableInterface.coprocessorProxy(Class<T> protocol, byte[] row) This is executed on a single region HTableInterface.coprocessorExec(Class<T>, byte[] startKey, byte[] endKey, Batch.Call<T, R> callable) This is executed over a range of regions Aggregate Endpoint example Thank you!