Wednesday 16 October 2013

HBase Table MapReduce Basics

HBase Table MapReduce Basics

HBase Table MapReduce Basics

In this post I am going to explain how we can use HBase tables as source and target for MapReduce program.

For writing MapReduce on HBase tables we should follow below guidelines:

1. Mapper Class

  • Mapper class should extend TableMapper.
  • Input key to mapper is ImmutableBytesWritable object which has rowkey of HBase table.
  • Input value is Result object (org.apache.hadoop.hbase.client.Result) which contains the requested column families (define the required columns/column families in Scan) from HBase table.

2. Reducer Class

  • Reducer class should extend TableReducer.
  • Output key is NULL.
  • Output value is Put (org.apache.hadoop.hbase.client.Put) object.

3. MapReduce Driver

  • Configure a Scan (org.apache.hadoop.hbase.client.Scan) object. For this scan object we can define many parameters like:
    • Start row.
    • Stop row.
    • Row filter.
    • Column Familiy(s) to retrieve.
    • Column(s) to retrieve.
  • Define input table using TableMapReduceUtil.initTableMapperJob. In this method we can define input table, Mapper, MapOutputKey, MapOutputValue, etc.
  • Define output table using TableMapReduceUtil.initTableReducerJob. In this method we can define output table, Reducer and Partitioner.
In my next post I shall give an example MapReduce program using HBase tables as input and output.

No comments:

Post a Comment