Rule based, Machine learning or a hyper?

Fraud detection and prevention is always a batter field. Fraudsters will keep finding new ways to game the system. On the other hand, that is a big business opportunity. Let see how crowded the market is:

fraud

This is a hand selected list of some notable players in this markets. There are couple interesting observations:

  • Machine learning (or AI) is a must-have feature to the product.
  • Big data is still a treasure box with big potential.
  • Rule based system is still popular (it is old fashion?).
  • A hyper system combining rules and AI will be promising.
  • Visualization can be helpful and user friendly for customers.

Anyway, here are two nice summaries from seon.io and unfraud.com:

Screen Shot 2017-06-04 at 10.28.59 AM

Screen Shot 2017-06-05 at 10.30.06 AM

Fraud detection for eCommerce

If you work on e-commerce, beside making sure the online payment is smooth, another critical task is to deal with fraud! Fraud shares lots of common characteristics with (information) security. Good guys and bad guys are always fighting endlessly, just like Marvel comics: super heroes vs. super villains.

marvel
Credit: Marvel Comics

Most company either builds an in-house solution or use some market available solution. So what is the hard core of a fraud management system?

Before answering this question, maybe we can go through a simple e-commerce checkout flow (you know in reality, it will be much more complex):

checkout

In last 5 years, I have worked on three fraud management systems. In a plaintext, we gather all possible “evidences” of fraudsters and try to convict the “crime”. Translate the previous sentence to technical words: a payment transaction comes to a rule engine, it will run bunch of rules at real time,  then outputs a decision like reject, approve, review, etc. Based on the configuration and the business model (Merchant on Record or not, etc.) , the payment system will take corresponding action.

To build the rule engine, Drools is a popular choice. Of cause, we can build a similar in-house version too. The key is the rules. Here is a list of some rules:

Any single item above in details can be an individual post. But hope you get some basic ideas.

(Updated) Java Example Code using HBase Data Model Operations

This is the updated version to previous post on 2012: Java Example Code using HBase Data Model Operations

This time we will use latest stable Hbase (1.2.5), IntelliJ and Maven.

  • My local test environment: Virtual Box of Ubuntu 16 Desktop version.
  • Install HBase: https://hbase.apache.org/book.html#quickstart
    • Note: I use stable version 1.2.5 instead of 2.0.0
    • Open a terminal
      1. start HBase in terminal: bin/start-hbase.sh
      2. start HBase shell: bin/hbase shell
      3. create a table: create ‘myLittleHBaseTable’, ‘myLittleFamily’
  • To install JDK: sudo apt install openjdk-8-jre-headless
  • To install Maven: sudo apt install maven
  • Create a maven project (replace “autofei” to whatever you like): mvn archetype:generate -DgroupId=autofei -DartifactId=autofei -DarchetypeArtifactId=maven-archetype-quickstart
  • Download IntelliJ community version

Open the project in IntelliJ and add following dependencies to the pom.xml:

<dependency>
  <groupId>org.apache.hadoop</groupId>
  <artifactId>hadoop-common</artifactId>
  <version>2.7.3</version>
</dependency>
<dependency>
  <groupId>org.apache.hbase</groupId>
  <artifactId>hbase</artifactId>
  <version>1.2.5</version>
</dependency>
<dependency>
  <groupId>org.apache.hbase</groupId>
  <artifactId>hbase-client</artifactId>
  <version>1.2.5</version>
</dependency>

Enable auto-import function for IntelliJ to easy your life 🙂

The example code in the old post will still work but with some deprecation warning.

 

Logstash grok sample

Of cause, the best place is the official guide: https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html

Instead of reinventing the wheel, check out existing patterns: https://github.com/logstash-plugins/logstash-patterns-core/tree/master/patterns

You can also define your own pattern, here is the online tool for you to test it: http://grokdebug.herokuapp.com/

Here is an example:

  • log record: 09:33:45,416 (metrics-logger-reporter-1-thread-1) type=GAUGE, name=notifications.received, value=2
  • pattern: (?<logtime>%{HOUR}:%{MINUTE}:%{SECOND}) (?<logthread>[()a-zA-Z0-9-]+) type=(?<type>[A-Z]+), name=(?<name>[A-Za-z.]*), value=(?<value>[0-9]+)

More example: https://www.elastic.co/guide/en/logstash/current/config-examples.html

Have fun to play it!

Reference:

Visualize Geo location of log using Elasticsearch + Logstash + Kibana

Visualize Geo location of log using Elasticsearch + Logstash + Kibana

Here is a visualization of an access log based on the sample access log data.

So it looks pretty cool and if you have ELK stack in your local. It will take only a little time for you to achieve this.

Please first refer to this article: http://gro.solr.pl/elasticsearch-logstash-kibana-to-geo-identify-our-users/

If everything works fine for you, that is great! If the visualization doesn’t load, please continue your reading.

Here is the software version, just in case you want to know:

  • elasticsearch-5.1.1
  • kibana-5.1.1-darwin-x86_64
  • logstash-5.1.1

I guess you might get error:No Compatible Fields: The “logs_*” index pattern does not contain any of the following field types: geo_point

The reason is there is no template to match this index. But logstach load a default template to elasticsearch which actually contain the geo mapping.  In Kibana “Dev Tools”, inside Console, type: “GET /_template/” and you will see “logstach” contains “geoip” section. So make sure the output index has “logstash-” as the prefix.

Also, if you want to use the latest Geo IP data, instead of the preload one. You can download “GeoLite2-City.mmdb.gz” from here: http://dev.maxmind.com/geoip/geoip2/geolite2/

So finally, here is my logstach config file:

input {
  file {
    path => "path to your log, for example: ~/Downloads/Tools/log/apache/*.log"
    type => "apache"
    start_position => "beginning"
  }
}

filter {
    grok {
      match => {
        "message" => "%{COMBINEDAPACHELOG}"
      }
    }

   geoip {
    source => "clientip"
    database => "path to your Geo IP data file, for example: ~/Downloads/Tools/GeoLite2-City.mmdb"
  }
}

output {
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "logstash-logs_%{+YYYY.MM.dd}"
    manage_template => true
  }
}

Update [2017-01-13]

Dig a little more on this issue and here are some new founding. In the Kibana UI, it is looking for  Geohash -> geoip.location in the buckets.  (if you know how to change the config, please let me know, thanks!)

So you have to have that field in the index. Otherwise, the tile map can’t find any record. This explains why it will work with index “ligstash-” prefix. In logstach log, you can find this template:

[2017-01-13T09:46:35,015][INFO ][logstash.outputs.elasticsearch] Attempting to install template {:manage_template=>{"template"=>"logstash-*", "version"=>50001, "settings"=>{"index.refresh_interval"=>"5s"}, "mappings"=>{"_default_"=>{"_all"=>{"enabled"=>true, "norms"=>false}, "dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword"}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date", "include_in_all"=>false}, "@version"=>{"type"=>"keyword", "include_in_all"=>false}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}}}

Reference: