In my last
post I described how we can define our first Apache Falcon process, in this
post I will describe how we can schedule (execute) that process.
As a first
step towards scheduling this process we need to submit our cluster to Falcon using
below command –
$ falcon entity -type cluster -submit -file test-primary-cluster.xml
falcon/default/Submit successful (cluster) test-primary-cluster
We can
verify all the clusters registered with Falcon using below command -
$ falcon entity -type cluster –list
(CLUSTER) test-primary-cluster
After the
cluster is submitted we need to submit our feed and process respectively -
$ falcon entity -type feed -submit -file feed-01-trigger.xml
falcon/default/Submit successful (feed) feed-01-trigger
$ falcon entity -type process -submit -file process-01.xml
falcon/default/Submit successful (process) process-01
Now we have
to upload our Oozie workflow which is referred by the Falcon process to HDFS –
$ hadoop fs -mkdir -p /tmp/oozie_workflow
$ hadoop fs -put workflow.xml /tmp/oozie_workflow/
After
submitting all the Falcon entities we need to schedule our Falcon process and
feed –
$ falcon entity -type feed -name feed-01-trigger –schedule
$ falcon entity -type process -name process-01 –schedule
Once the
Falcon process is scheduled, the status of different instances of this process will
be in waiting state as the feed file is not present on HDFS as shown in below
screenshot -
Let us
create one instance of this feed file
$ hadoop fs -mkdir -p /tmp/feed-01/2015-09-07
On
refreshing Falcon UI for this process we can see that the first instance of
this process has triggered and it is in running state and after some time the
status is successful as shown in below screenshot -
We can also check /tmp/demo.out file to confirm if our script has executed successfully.
If you
want to delete this Falcon feed and process entities execute below commands –
$ falcon entity -type process -name process-01 -delete$ falcon entity -type feed -name feed-01-trigger -delete
No comments:
Post a Comment