Logstash, ElasticSearch and Kibana Integration for Clickstream Weblog Ingestion
Logstash, ElasticSearch and Kibana Integration for Clickstream Weblog Ingestion
In this blog I am going to show case how we can develop a quick and easy demo application for clickstream weblog ingestion, search and visualization. We will achieve this using
Logstash for log ingestion, store it in
ElasticSearch and make a pretty dashboard using
Kibana.
For clickstream weblog I am using logs data from
ECML/PKDD 2005 Discovery Challenge . You can download complete weblogs after registering there.
These weblog are delimited by semi-colon (;) and have below mentioned fields in order:
- shop_id
- unixtime
- client ip
- session
- visted page
- referrer
Here are some sample log lines:
15;1075658406;212.96.166.162;052ecba084545d8348806f087b6e09bb;/ls/?&id=77&view=2,6,31&pozice=20;http://www.shop5.cz/ls/?id=77
12;1075658406;195.146.109.248;05aa4f4db0162e5723331042eb9ce8a7;/ct/?c=153;http://www.shop3.cz/
12;1075658407;212.65.194.144;86140090a2e102f1644f29e5ddadad9b;/ls/?id=34;http://www.shop3.cz/ct/?c=155
14;1075658407;80.188.85.210;f07f39ec63abf67f965684f3fa5729c4;/findp/?&id=63&view=1,2,3,14,20,15&p_14=nerez;http://www.shop4.cz/ls/?&p_14=nerez&id=63&view=1%2C2%2C3%2C14%2C20%2C15&&aktul=0
17;1075658408;194.108.232.234;be0970125c4eb3ee4fc380be05b3c58f;/ls/?id=155&sort=45;http://www.shop7.cz/ls/?id=155&sort=45
12;1075658409;62.24.70.41;851f20e644eb8bf82bfdbe4379050e2e;/txt/?c=734;http://www.shop3.cz/onakupu/
For creating this demo we need to create a logstash configuration file (lets name this file clickstream.conf) which consists of specifying inputs, filters and outputs.
The clickstream.conf file looks like:
input {
file {
# path for clickstream log
path => "/home/rishav.rohit/Desktop/clickstream/_2004_02_01_19_click_stream.log"
# define a type for all events handeled by this input
type => "weblog"
start_position => "beginning"
# the clickstream log is in character set ISO-8859-1
codec => plain {charset => "ISO-8859-1"}
}
}
filter {
csv {
# define columns present in weblog
columns => [shop_id, unixtime, client_ip, session, page, referrer]
separator => ";"
}
grok {
# get visited page and page parameters
match => ["page", "%{URIPATH:page_visited}(?:%{URIPARAM:page_params})?"]
remove_field => ["page"]
}
date {
# as we are getting unixtime field in epoch seconds we will convert it to normal timestamp
match => [ "unixtime", "UNIX" ]
}
geoip {
# this will convert ip to longitude-latitude using GeoLiteCity database from Maxmind
source => "client_ip"
fields => ["latitude","longitude"]
target => "geoip"
add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]
add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}" ]
}
mutate {
# this will convert geoip.coordinates to float values
convert => [ "[geoip][coordinates]", "float" ]
}
}
output {
# store output in local elasticsearch cluster
elasticsearch {
host => "127.0.0.1"
}
}
To start logstash agent we run below command:
java -jar logstash-1.2.2-flatjar.jar agent -f clickstream.conf
A sample record in ElasticSearch looks like this:
{
_index: logstash-2004.02.01
_type: logs
_id: I1N0MboUR0O1O3RZ-qXqnw
_version: 1
_score: 1
_source: {
message: [
14;1075658407;80.188.85.210;f07f39ec63abf67f965684f3fa5729c4;/findp/?&id=63&view=1,2,3,14,20,15&p_14=nerez;http://www.shop4.cz/ls/?&p_14=nerez&id=63&view=1%2C2%2C3%2C14%2C20%2C15&&aktul=0
]
@timestamp: 2004-02-01T18:00:07.000Z
@version: 1
type: weblog
host: HMECL000315.happiestminds.com
path: /home/rishav.rohit/Desktop/clickstream/_2004_02_01_19_click_stream.log
shop_id: 14
unixtime: 1075658407
client_ip: 80.188.85.210
session: f07f39ec63abf67f965684f3fa5729c4
referrer: http://www.shop4.cz/ls/?&p_14=nerez&id=63&view=1%2C2%2C3%2C14%2C20%2C15&&aktul=0
page_visited: /findp/
page_params: ?&id=63&view=1,2,3,14,20,15&p_14=nerez
geoip: {
latitude: 50.08330000000001
longitude: 14.466700000000003
coordinates: [
14.466700000000003
50.08330000000001
]
}
}
}
So we have parsed complex log message into simpler components and converted fields like unixtime to datetime, IP to latitude-longitude and got page visited by the client.
Now using Kibana we can quickly make dashboard with these panels
Nice..!!!
ReplyDelete