To convert csv data to Avro data using Hive we need to follow below steps:
- Create a Hive table stored as textfile and specify your csv delimiter also.
- Load csv file to above table using "load data" command.
- Create another Hive table using AvroSerDe.
- Insert data from former table to new Avro Hive table using "insert overwrite" command.
To demonstrate this I will use use below data (student.csv):
0,38,91Now execute below queries in Hive:
0,65,28
0,78,16
1,34,96
1,78,14
1,11,43
Now you can get data in Avro format from Hive warehouse folder. To dump this file to local file system use below command:
hadoop fs -cat /path/to/warehouse/test.db/avro_table/* > student.avro
If you want to get json data from this avro file you can use avro tools command:
java -jar avro-tools-1.7.5.jar tojson student.avro > student.json
So we can easily convert csv to avro and csv to json also by just writing 4 HQLs.
Nice post Rohit, with avro can you change the schema after the table is created? Let's say I got a new column to the existing data source, can I update the avro schema??
ReplyDeletenot very sure if we can alter avro_table. you can try this out
Delete