Install and run Mahout at single Linux box

This post show you how to install and run Mahout at a stand-alone Linux box.

Prerequisites for Building Mahout

  • Java JDK >=1.6
  • Maven
  • SVN

Steps:

  • svn co http://svn.apache.org/repos/asf/mahout/trunk
  • change directory to the checked out directory
  • mvn install
  • change to the core directory
  • mvn compile
  • mvn  install
  • change to the examples directory
  • mvn compile
  • mvn  install

Download test data from: http://www.grouplens.org/node/73. Please download “MovieLens 1M” one.

Run test example

Note: replace the test data file path to yours.

  • mvn -e exec:java -Dexec.mainClass=”org.apache.mahout.cf.taste.example.grouplens.GroupLensRecommenderEvaluatorRunner” -Dexec.args=”-i /home/hduser/trunk/examples/ml-1m/ratings.dat
    + Error stacktraces are turned on.
    [INFO] Scanning for projects...
    [INFO] Searching repository for plugin with prefix: 'exec'.
    [INFO] ------------------------------------------------------------------------
    [INFO] Building Mahout Examples
    [INFO]    task-segment: [exec:java]
    [INFO] ------------------------------------------------------------------------
    [INFO] Preparing exec:java
    [INFO] No goals needed for project - skipping
    [INFO] [exec:java {execution: default-cli}]
    12/03/28 14:08:33 INFO file.FileDataModel: Creating FileDataModel for file /tmp/ratings.txt
    12/03/28 14:08:33 INFO file.FileDataModel: Reading file info...
    12/03/28 14:08:34 INFO file.FileDataModel: Processed 1000000 lines
    12/03/28 14:08:34 INFO file.FileDataModel: Read lines: 1000209
    12/03/28 14:08:35 INFO model.GenericDataModel: Processed 6040 users
    12/03/28 14:08:35 INFO eval.AbstractDifferenceRecommenderEvaluator: Beginning evaluation using 0.9 of GroupLensDataModel
    12/03/28 14:08:35 INFO model.GenericDataModel: Processed 1753 users
    12/03/28 14:08:36 INFO slopeone.MemoryDiffStorage: Building average diffs...
    12/03/28 14:09:36 INFO eval.AbstractDifferenceRecommenderEvaluator: Beginning evaluation of 1719 users
    12/03/28 14:09:36 INFO eval.AbstractDifferenceRecommenderEvaluator: Starting timing of 1719 tasks in 1 threads
    12/03/28 14:09:36 INFO eval.StatsCallable: Average time per recommendation: 343ms
    12/03/28 14:09:36 INFO eval.StatsCallable: Approximate memory used: 448MB / 798MB
    12/03/28 14:09:36 INFO eval.StatsCallable: Unable to recommend in 0 cases
    12/03/28 14:09:43 INFO eval.StatsCallable: Average time per recommendation: 7ms
    12/03/28 14:09:43 INFO eval.StatsCallable: Approximate memory used: 510MB / 798MB
    12/03/28 14:09:43 INFO eval.StatsCallable: Unable to recommend in 13 cases
    12/03/28 14:09:52 INFO eval.AbstractDifferenceRecommenderEvaluator: Evaluation result: 0.7149488038906546
    12/03/28 14:09:52 INFO grouplens.GroupLensRecommenderEvaluatorRunner: 0.7149488038906546
    [INFO] ------------------------------------------------------------------------
    [INFO] BUILD SUCCESSFUL
    [INFO] ------------------------------------------------------------------------
    [INFO] Total time: 1 minute 26 seconds
    [INFO] Finished at: Wed Mar 28 14:09:53 PDT 2012
    [INFO] Final Memory: 53M/761M
    [INFO] ------------------------------------------------------------------------
    

Creating a simple recommender

Create a Maven project

mvn archetype:create -DarchetypeGroupId=org.apache.maven.archetypes -DgroupId=com.autofei -DartifactId=mahoutrec

This creates an empty project called mahoutrec with the package namespace com.autofei. Now change to the mahoutrec directory. You can try out the new project by running:

mvn compile
mvn exec:java -Dexec.mainClass="com.autofei.App"

Set the project dependencies
edit pom.xml, remember to change your Mahout version, in my case, it is 0.7-SNAPSHOT. an example file:

<?xml version="1.0"?>
<project xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd" xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<modelVersion>4.0.0</modelVersion>
<parent>
<artifactId>mahout</artifactId>
<groupId>org.apache.mahout</groupId>
<version>0.7-SNAPSHOT</version>
<relativePath>../pom.xml</relativePath>
</parent>
<groupId>com.autofei</groupId>
<artifactId>mahoutrec</artifactId>
<version>1.0-SNAPSHOT</version>
<name>mahoutrec</name>
<url>http://maven.apache.org</url>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.mahout</groupId>
<artifactId>mahout-core</artifactId>
<version>0.7-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>org.apache.mahout</groupId>
<artifactId>mahout-math</artifactId>
<version>0.7-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>org.apache.mahout</groupId>
<artifactId>mahout-math</artifactId>
<version>0.7-SNAPSHOT</version>
<type>test-jar</type>
<scope>test</scope>
</dependency>
</dependencies>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
</project>

Test data
Put these data into a file dummy-bool.csv under datasets directory

#userId,itemId
1,3
1,4
2,44
2,46
3,3
3,5
3,6
4,3
4,5
4,11
4,44
5,1
5,2
5,4

Create a java file under src/main/java/com/autofei/, named UnresystBoolRecommend.java:

 package com.autofei;

import java.io.File;
import java.io.FileNotFoundException;
import java.util.List;
import java.io.IOException;

import org.apache.commons.cli2.OptionException;
import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.cf.taste.impl.recommender.CachingRecommender;
import org.apache.mahout.cf.taste.impl.recommender.slopeone.SlopeOneRecommender;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.recommender.RecommendedItem;
import org.apache.mahout.cf.taste.impl.common.LongPrimitiveIterator;

public class UnresystBoolRecommend {

public static void main(String... args) throws FileNotFoundException, TasteException, IOException, OptionException {

// create data source (model) - from the csv file
File ratingsFile = new File("datasets/dummy-bool.csv");
DataModel model = new FileDataModel(ratingsFile);

// create a simple recommender on our data
CachingRecommender cachingRecommender = new CachingRecommender(new SlopeOneRecommender(model));

// for all users
for (LongPrimitiveIterator it = model.getUserIDs(); it.hasNext();){
long userId = it.nextLong();

// get the recommendations for the user
List<RecommendedItem> recommendations = cachingRecommender.recommend(userId, 10);

// if empty write something
if (recommendations.size() == 0){
System.out.print("User ");
System.out.print(userId);
System.out.println(": no recommendations");
}

// print the list of recommendations for each
for (RecommendedItem recommendedItem : recommendations) {
System.out.print("User ");
System.out.print(userId);
System.out.print(": ");
System.out.println(recommendedItem);
}
}
}
}

Run the code

  • mvn compile
  • mvn exec:java -Dexec.mainClass="com.autofei.UnresystBoolRecommend"
    
    [INFO] Scanning for projects...
    [INFO] Searching repository for plugin with prefix: 'exec'.
    [INFO] ------------------------------------------------------------------------
    [INFO] Building mahoutrec
    [INFO]    task-segment: [exec:java]
    [INFO] ------------------------------------------------------------------------
    [INFO] Preparing exec:java
    [INFO] No goals needed for project - skipping
    [INFO] [exec:java {execution: default-cli}]
    SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
    SLF4J: Defaulting to no-operation (NOP) logger implementation
    SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
    User 1: RecommendedItem[item:5, value:1.0]
    User 2: RecommendedItem[item:5, value:1.0]
    User 2: RecommendedItem[item:3, value:1.0]
    User 3: no recommendations
    User 4: no recommendations
    User 5: RecommendedItem[item:5, value:1.0]
    User 5: RecommendedItem[item:3, value:1.0]
    [INFO] ------------------------------------------------------------------------
    [INFO] BUILD SUCCESSFUL
    [INFO] ------------------------------------------------------------------------
    [INFO] Total time: 3 seconds
    [INFO] Finished at: Wed Mar 28 16:18:31 PDT 2012
    [INFO] Final Memory: 14M/35M
    [INFO] ------------------------------------------------------------------------

From now, you can test other algorithm inside Mahout.

Reference:

Advertisements

One thought on “Install and run Mahout at single Linux box

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s