In this post I will go through the process of creating custom UDFs.
Difference between UDF and GenericUDF
Hive UDFs are written in Java. In order to create a Hive UDF you need to derive from one of two classes UDF or GenericUDF. GenericUDFis bit complex to develop compared to UDF but it offers better performance and it supports all non-primitive parameters as input parameters and return types.
For writing custom UDFs by extending GenericUDF we need to overwrite 3 methods: initialize(), evaluate() and getDisplayString().
initialize()
This method only gets called once per JVM at the beginning to initilize the UDF. initilialize() is used to assert and validate the number and type of parameters that a UDF takes and the type of argument it returns. It also returns an ObjectInspector corresponding to the return type of the UDF.
evaluate()
This method is called once for every row of data being processed. Here the actual logic for transformation/processing of each row is written. It will return an object containing the result of processing logic.
getDisplayString()
A simple method for returning the display string for the UDF when explain is used.
Apart from these we can have these Annotations also -
- @UDFType(deterministic = true)
- @Description(name="my_udf", value="This will be the result returned by explain statement.", extended="This will be result returned by the explain extended statement.")
In my next post I will give an example of GenericUDF to latitude and longitude of a location.
No comments:
Post a Comment