HBase Table MapReduce Basics
HBase Table MapReduce Basics
In this post I am going to explain how we can use HBase tables as source and target for MapReduce program.
For writing MapReduce on HBase tables we should follow below guidelines:
1. Mapper Class
- Mapper class should extend TableMapper.
- Input key to mapper is ImmutableBytesWritable object which has rowkey of HBase table.
- Input value is Result object (org.apache.hadoop.hbase.client.Result) which contains the requested column families (define the required columns/column families in Scan) from HBase table.
2. Reducer Class
- Reducer class should extend TableReducer.
- Output key is NULL.
- Output value is Put (org.apache.hadoop.hbase.client.Put) object.
3. MapReduce Driver
- Configure a Scan (org.apache.hadoop.hbase.client.Scan) object. For this scan object we can define many parameters like:
- Start row.
- Stop row.
- Row filter.
- Column Familiy(s) to retrieve.
- Column(s) to retrieve.
- Define input table using TableMapReduceUtil.initTableMapperJob. In this method we can define input table, Mapper, MapOutputKey, MapOutputValue, etc.
- Define output table using TableMapReduceUtil.initTableReducerJob. In this method we can define output table, Reducer and Partitioner.
In my next post I shall give an example MapReduce program using HBase tables as input and output.
No comments:
Post a Comment