(Updated) Java Example Code using HBase Data Model Operations

This is the updated version to previous post on 2012: Java Example Code using HBase Data Model Operations

This time we will use latest stable Hbase (1.2.5), IntelliJ and Maven.

  • My local test environment: Virtual Box of Ubuntu 16 Desktop version.
  • Install HBase: https://hbase.apache.org/book.html#quickstart
    • Note: I use stable version 1.2.5 instead of 2.0.0
    • Open a terminal
      1. start HBase in terminal: bin/start-hbase.sh
      2. start HBase shell: bin/hbase shell
      3. create a table: create ‘myLittleHBaseTable’, ‘myLittleFamily’
  • To install JDK: sudo apt install openjdk-8-jre-headless
  • To install Maven: sudo apt install maven
  • Create a maven project (replace “autofei” to whatever you like): mvn archetype:generate -DgroupId=autofei -DartifactId=autofei -DarchetypeArtifactId=maven-archetype-quickstart
  • Download IntelliJ community version

Open the project in IntelliJ and add following dependencies to the pom.xml:


Enable auto-import function for IntelliJ to easy your life 🙂

The example code in the old post will still work but with some deprecation warning.


Logstash grok sample

Of cause, the best place is the official guide: https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html

Instead of reinventing the wheel, check out existing patterns: https://github.com/logstash-plugins/logstash-patterns-core/tree/master/patterns

You can also define your own pattern, here is the online tool for you to test it: http://grokdebug.herokuapp.com/

Here is an example:

  • log record: 09:33:45,416 (metrics-logger-reporter-1-thread-1) type=GAUGE, name=notifications.received, value=2
  • pattern: (?<logtime>%{HOUR}:%{MINUTE}:%{SECOND}) (?<logthread>[()a-zA-Z0-9-]+) type=(?<type>[A-Z]+), name=(?<name>[A-Za-z.]*), value=(?<value>[0-9]+)

More example: https://www.elastic.co/guide/en/logstash/current/config-examples.html

Have fun to play it!


Visualize Geo location of log using Elasticsearch + Logstash + Kibana

Visualize Geo location of log using Elasticsearch + Logstash + Kibana

Here is a visualization of an access log based on the sample access log data.

So it looks pretty cool and if you have ELK stack in your local. It will take only a little time for you to achieve this.

Please first refer to this article: http://gro.solr.pl/elasticsearch-logstash-kibana-to-geo-identify-our-users/

If everything works fine for you, that is great! If the visualization doesn’t load, please continue your reading.

Here is the software version, just in case you want to know:

  • elasticsearch-5.1.1
  • kibana-5.1.1-darwin-x86_64
  • logstash-5.1.1

I guess you might get error:No Compatible Fields: The “logs_*” index pattern does not contain any of the following field types: geo_point

The reason is there is no template to match this index. But logstach load a default template to elasticsearch which actually contain the geo mapping.  In Kibana “Dev Tools”, inside Console, type: “GET /_template/” and you will see “logstach” contains “geoip” section. So make sure the output index has “logstash-” as the prefix.

Also, if you want to use the latest Geo IP data, instead of the preload one. You can download “GeoLite2-City.mmdb.gz” from here: http://dev.maxmind.com/geoip/geoip2/geolite2/

So finally, here is my logstach config file:

input {
  file {
    path => "path to your log, for example: ~/Downloads/Tools/log/apache/*.log"
    type => "apache"
    start_position => "beginning"

filter {
    grok {
      match => {
        "message" => "%{COMBINEDAPACHELOG}"

   geoip {
    source => "clientip"
    database => "path to your Geo IP data file, for example: ~/Downloads/Tools/GeoLite2-City.mmdb"

output {
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "logstash-logs_%{+YYYY.MM.dd}"
    manage_template => true

Update [2017-01-13]

Dig a little more on this issue and here are some new founding. In the Kibana UI, it is looking for  Geohash -> geoip.location in the buckets.  (if you know how to change the config, please let me know, thanks!)

So you have to have that field in the index. Otherwise, the tile map can’t find any record. This explains why it will work with index “ligstash-” prefix. In logstach log, you can find this template:

[2017-01-13T09:46:35,015][INFO ][logstash.outputs.elasticsearch] Attempting to install template {:manage_template=>{"template"=>"logstash-*", "version"=>50001, "settings"=>{"index.refresh_interval"=>"5s"}, "mappings"=>{"_default_"=>{"_all"=>{"enabled"=>true, "norms"=>false}, "dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword"}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date", "include_in_all"=>false}, "@version"=>{"type"=>"keyword", "include_in_all"=>false}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}}}


some thoughts on Microservice

Be in the Horzion project for months now, learn lots of new things and it might be a good time to document some learnings.

To switch from monolithic service to micro service means a total mightset change. The frist thing is to become a DevOps. You need to write/debug/test your code, build/deploy the code locally and remotely and monitor the service. This is very chanllenging in mature company because thses functions are isolated into different teams/organizations.

Here are some components in building microservice:

  • service register/discover
  • config management
  • (distributed) cache
  • log/alert/monitor
  • (distrobued) database
  • authentication for internal and external users

something need special handling:

  • how to handle exception: you can throws an exception in one service and hope another service catch it in old fasion.
  • how to pass data between services: if several services need to process the same request at different stage. if you need to merge the reply from couple services? or some service need the same intermedian data?
  • how to resolve service dependency: the business logic can be sync or async.
  • how to handle failover: if a service not available, will it retry and how?
  • how to document your service: this will impact how user can easily use the APIs

More to add later…