We will use these 3 methods in which we can use Avro for serialization/deserialization:
- Using Avro command line tools.
- Using Avro Java API without code generation.
- Using Avro Java API with code generation.
Sample Data
We will use below sample data (StudentActivity.json):Note that the JSON records are nested ones.
Defining a schema
Avro schemas are defined using JSON. The avro schema for our sample data is defined as below (StudentActivity.avsc):1. Serialization/Deserialization using Avro command line tools
Avro provides a jar file by name avro-tools-<version>.jar which provides many command line tools as listed below:For converting json sample data to Avro binary format use "fromjson" option and for getting json data back from Avro files use "tojson" option.
Command for serializing json
Without any compression
java -jar avro-tools-1.7.5.jar fromjson --schema-file StudentActivity.avsc StudentActivity.json > StudentActivity.avro
With snappy compression
java -jar avro-tools-1.7.5.jar fromjson --schema-file StudentActivity.avsc StudentActivity.json > StudentActivity.snappy.avro
Command for deserializing json
The same command is used for deserializing both compressed and uncompressed data
java -jar avro-tools-1.7.5.jar tojson StudentActivity.avro
java -jar avro-tools-1.7.5.jar tojson StudentActivity.snappy.avro
As Avro data file contains the schema also, we can retrieve it using this commmand:
java -jar avro-tools-1.7.5.jar getschema StudentActivity.avroIn our next post we will use Avro Java API for serialization/deserialization.
java -jar avro-tools-1.7.5.jar getschema StudentActivity.snappy.avro
No comments:
Post a Comment