Tuesday 30 September 2014

Hive UDF to get Latitude and Longitude

In my previous post I explained about Hive GenericUDF.
In this post I will give an example of Hive GenericUDF to get Latitude and Longitude of a given location using Google Geocoding API. Lets call this Hive function as GeoEncodeUDF. GeoEncodeUDF function takes a String location and returns an array of Float containing latitude and longitude.


For obtaining latitude and longitude information I am using Google geocode API which is available here http://maps.googleapis.com/maps/api/geocode/json?address=<address>, this returns a JSON objects containg matching places and their latitude and longitude. This might return multiple address but for sake of simplicity I am taking the first address's latitude and longitude. I have created a helper method getLatLng in class GeoLatLng which takes location string and returns latitude and longitude in an array of float. This class is given below -
The GenericUDF is GeoEncodeUDF
I have overwritten initialize(), evaluate() and getDisplayString() methods which I have already described in my previous post.

Now to use this UDF in Hive we need to create a jar file of this UDF and add it to Hive. The commands to add this UDF to Hive are -
ADD JAR /path/to/HiveUDF.jar;
CREATE TEMPORARY FUNCTION geo_points AS 'com.rishav.hadoop.hive.ql.udf.generic.GeoEncodeUDF';
Now we can use geo_points function on any table having address string like this -
hive> select geo_points("india") from test.x limit 1;
[20.593683,78.96288]
This HQL will return an array containing lat-lng, to get them as separate columns use -
hive> select latlng[0], latlng[1] FROM (select geo_points("india") as latlng from test.x) tmp limit 1;
20.593683    78.96288

No comments:

Post a Comment