Data Lab: HBase Table MapReduce Basics

Wednesday, 16 October 2013

HBase Table MapReduce Basics

In this post I am going to explain how we can use HBase tables as source and target for MapReduce program.

For writing MapReduce on HBase tables we should follow below guidelines:

1. Mapper Class

Mapper class should extend TableMapper.
Input key to mapper is ImmutableBytesWritable object which has rowkey of HBase table.
Input value is Result object (org.apache.hadoop.hbase.client.Result) which contains the requested column families (define the required columns/column families in Scan) from HBase table.

2. Reducer Class

Reducer class should extend TableReducer.
Output key is NULL.
Output value is Put (org.apache.hadoop.hbase.client.Put) object.

3. MapReduce Driver

Configure a Scan (org.apache.hadoop.hbase.client.Scan) object. For this scan object we can define many parameters like:
- Start row.
- Stop row.
- Row filter.
- Column Familiy(s) to retrieve.
- Column(s) to retrieve.
Define input table using TableMapReduceUtil.initTableMapperJob. In this method we can define input table, Mapper, MapOutputKey, MapOutputValue, etc.
Define output table using TableMapReduceUtil.initTableReducerJob. In this method we can define output table, Reducer and Partitioner.

In my next post I shall give an example MapReduce program using HBase tables as input and output.

Data Lab

Wednesday, 16 October 2013

HBase Table MapReduce Basics

HBase Table MapReduce Basics

1. Mapper Class

2. Reducer Class

3. MapReduce Driver

No comments:

Post a Comment