HTablePool example in java

As the name suggests, the pool makes different Htable instances share resources like Zookeeper connection.
Suppose you have following table:

hbase(main):002:0> scan 'blogposts'
ROW                                       COLUMN+CELL
post1                                    column=image:bodyimage, timestamp=1333409506149, value=image2.jpg
post1                                    column=image:header, timestamp=1333409504678, value=image1.jpg
post1                                    column=post:author, timestamp=1333409504583, value=The Author
post1                                    column=post:body, timestamp=1333409504642, value=This is a blog post
post1                                    column=post:title, timestamp=1333409504496, value=Hello World
1 row(s) in 7.1920 seconds

Java Example Code:

import java.io.IOException;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.FutureTask;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.HTableInterface;
import org.apache.hadoop.hbase.client.HTablePool;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.util.Bytes;

import org.junit.*;
import static org.junit.Assert.*;

public class HTablePoolTest {

protected static String TEST_TABLE_NAME = "blogposts";
protected static String ROW1_STR = "post1";
protected static String COLFAM1_STR = "image";
protected static String QUAL1_STR = "bodyimage";

private final static byte[] ROW1 = Bytes.toBytes(ROW1_STR);
private final static byte[] COLFAM1 = Bytes.toBytes(COLFAM1_STR);
private final static byte[] QUAL1 = Bytes.toBytes(QUAL1_STR);

private final static int MAX = 10;
private static HTablePool pool;

@Before
public void runBeforeClass() throws IOException {
Configuration conf = HBaseConfiguration.create();
pool = new HTablePool(conf, MAX);

HTableInterface[] tables = new HTableInterface[10];
for (int n = 0; n < MAX; n++) {
tables[n] = pool.getTable(TEST_TABLE_NAME);
}
for (HTableInterface table : tables) {
table.close();
}
}

@Test
public void testHTablePool() throws IOException, InterruptedException,
ExecutionException {

Callable<Result> callable = new Callable<Result>() {
public Result call() throws Exception {
return get();
}
};

FutureTask<Result> task1 = new FutureTask<Result>(callable);

FutureTask<Result> task2 = new FutureTask<Result>(callable);

Thread thread1 = new Thread(task1, "THREAD-1");
thread1.start();
Thread thread2 = new Thread(task2, "THREAD-2");
thread2.start();

Result result1 = task1.get();
System.out.println("Thread1: "
+ Bytes.toString(result1.getValue(COLFAM1, QUAL1)));
assertEquals(Bytes.toString(result1.getValue(COLFAM1, QUAL1)),
"image2.jpg");

Result result2 = task2.get();
System.out.println("Thread2: "
+ Bytes.toString(result2.getValue(COLFAM1, QUAL1)));
assertEquals(Bytes.toString(result2.getValue(COLFAM1, QUAL1)),
"image2.jpg");
}

private Result get() {
HTableInterface table = pool.getTable(TEST_TABLE_NAME);
Get get = new Get(ROW1);
try {
Result result = table.get(get);
return result;
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
return null;
} finally {
try {
table.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}

}

Reference:

Diagram for Java Memory Structure and Garbage Collection

All Memory for JVM:

  • Heap Memory, which is the storage for Java objects
  • Non-Heap Memory, which is used by Java to store loaded classes and other meta-data, String Pool, etc.
  • JVM code itself, JVM internal structures, loaded profiler agent code and data, etc.

When JVM starts, it reserves memory for the Heap. Sometime, not all the memory is used at the beginning. But it will not be used by others for sure. You can use -Xmx parameter to set the max JVM memory size and -Xms for initial memory size.

By default:

  • If empty heap size < 40%, JVM will increase heap size to -Xmx size.
  • If empty heap size > 70%, JVM will decrease heap size to -Xms size.

You also can use -XX:MaxPermSize to set Non-Heap memory size.

Location for Object at different age

Big version: http://bit.ly/Hy1Fku

If -Xms is smaller than -Xmx, the difference is the Virtual part. With programs running, Yound, Tenured and Perm will gradually use this part.

How Age Algorithm works?

In JVM memory model, there are two parts in Heap. One is New Generation and the other is Pld Generation. In New Generation, there is Eden part for new Object. there are two Survivor Spaces(from and to), which are alwayse the same size. They are used to store survival Objects after GC(Garbarge Collection). In Old Generation, long live time Objects stay there.

In New Generation, GC normally use Copying Algorithm, which is fast. Everytime, survival Objects are copied to one of the Survivor Space. If Survivor Space is full, rest live Objects are directly copoed to Old Generation. So, after GC, Eden memory will be cleaned up. In Old Generation, GC nornally use Mark-Compact Algorithm, which is slow but requires less memory.

There are several level of GC, 0 level is Full, which clean garbage in OLD Generation, 1 or above levels are partial GC, which clean New Generation garbage. Out of Memory will happen where is no space for new Object even GC done for OLD or Perm part.

How to claim memory?

  1. JVM tries to find space in Eden
  2. if there is enough space, memory claim is done. Otherwise, go to next step
  3. JVM tries to release not active objects (1 or higher level GC). If there is still not enough space, put active objects into Survivor.
  4. Survivor part is used as exchange space for Eden and Old. If Old space is big enough, objects in Survivor will be moved into Old space.
  5. if Old part does not have enough space, JVm will perform full GC (0 level)
  6. after full GC, if Survivor and Old parts still can not hold the objects from Eden, there is the “Out of Memory Error”.

Object move during GC: eden -> survivor -> tenured

There is a Chinese version: http://blog.csdn.net/autofei/article/details/7456213

Java Example Code using HBase Data Model Operations

Please refer to the updated version: https://autofei.wordpress.com/2017/05/23/updated-java-example-code-using-hbase-data-model-operations/

The code is based on HBase version 0.92.1

The four primary data model operations are Get, Put, Scan, and Delete. Operations are applied via HTable instances.

First you need to install HBase. For testing, you can install it at a single machine by following this post.

Create a Java project inside Eclips and following libraries into ‘lib’ subdirectory are necessary:

hadoop@ubuntu:~/workspace/HBase$ tree lib
lib
├── commons-configuration-1.8.jar
├── commons-lang-2.6.jar
├── commons-logging-1.1.1.jar
├── hadoop-core-1.0.0.jar
├── hbase-0.92.1.jar
├── log4j-1.2.16.jar
├── slf4j-api-1.5.8.jar
├── slf4j-log4j12-1.5.8.jar
└── zookeeper-3.4.3.jar

Libraries locations

  • copy hbase-0.92.1.jar from HBase installation directory
  • copy rest jar files from “lib” subdirectory of HBase installation directory

Then you need to copy your HBase configuration hbase-site.xmlfile from “conf” subdirectory of HBase installation directory into the Java project directory.

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hbase.rootdir</name>
<value>file:///home/hduser/hbase</value>
</property>
</configuration>

The whole directory looks like:

hadoop@ubuntu:~/workspace/HBase$ tree
.
├── bin
│   └── HBaseConnector.class
├── hbase-site.xml
├── lib
│   ├── commons-configuration-1.8.jar
│   ├── commons-lang-2.6.jar
│   ├── commons-logging-1.1.1.jar
│   ├── hadoop-core-1.0.0.jar
│   ├── hbase-0.92.1.jar
│   ├── log4j-1.2.16.jar
│   ├── slf4j-api-1.5.8.jar
│   ├── slf4j-log4j12-1.5.8.jar
│   └── zookeeper-3.4.3.jar
└── src
└── HBaseConnector.java

Open a terminal

  • start HBase in terminal: bin/start-hbase.sh
  • start HBase shell: bin/hbase shell
  • create a table: create ‘myLittleHBaseTable’, ‘myLittleFamily’

Now you can run the code:

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.util.Bytes;

public class HBaseConnector {
public static void main(String[] args) throws IOException {
// You need a configuration object to tell the client where to connect.
// When you create a HBaseConfiguration, it reads in whatever you've set
// into your hbase-site.xml and in hbase-default.xml, as long as these
// can be found on the CLASSPATH
Configuration config = HBaseConfiguration.create();

// This instantiates an HTable object that connects you to the
// "myLittleHBaseTable" table.
HTable table = new HTable(config, "myLittleHBaseTable");

// To add to a row, use Put. A Put constructor takes the name of the row
// you want to insert into as a byte array. In HBase, the Bytes class
// has utility for converting all kinds of java types to byte arrays. In
// the below, we are converting the String "myLittleRow" into a byte
// array to use as a row key for our update. Once you have a Put
// instance, you can adorn it by setting the names of columns you want
// to update on the row, the timestamp to use in your update, etc.
// If no timestamp, the server applies current time to the edits.
Put p = new Put(Bytes.toBytes("myLittleRow"));

// To set the value you'd like to update in the row 'myLittleRow',
// specify the column family, column qualifier, and value of the table
// cell you'd like to update. The column family must already exist
// in your table schema. The qualifier can be anything.
// All must be specified as byte arrays as hbase is all about byte
// arrays. Lets pretend the table 'myLittleHBaseTable' was created
// with a family 'myLittleFamily'.
p.add(Bytes.toBytes("myLittleFamily"), Bytes.toBytes("someQualifier"),
Bytes.toBytes("Some Value"));

// Once you've adorned your Put instance with all the updates you want
// to make, to commit it do the following
// (The HTable#put method takes the Put instance you've been building
// and pushes the changes you made into hbase)
table.put(p);

// Now, to retrieve the data we just wrote. The values that come back
// are Result instances. Generally, a Result is an object that will
// package up the hbase return into the form you find most palatable.
Get g = new Get(Bytes.toBytes("myLittleRow"));
Result r = table.get(g);
byte[] value = r.getValue(Bytes.toBytes("myLittleFamily"), Bytes
.toBytes("someQualifier"));
// If we convert the value bytes, we should get back 'Some Value', the
// value we inserted at this location.
String valueStr = Bytes.toString(value);
System.out.println("GET: " + valueStr);

// Sometimes, you won't know the row you're looking for. In this case,
// you use a Scanner. This will give you cursor-like interface to the
// contents of the table. To set up a Scanner, do like you did above
// making a Put and a Get, create a Scan. Adorn it with column names,
// etc.
Scan s = new Scan();
s.addColumn(Bytes.toBytes("myLittleFamily"), Bytes
.toBytes("someQualifier"));
ResultScanner scanner = table.getScanner(s);
try {
// Scanners return Result instances.
// Now, for the actual iteration. One way is to use a while loop
// like so:
for (Result rr = scanner.next(); rr != null; rr = scanner.next()) {
// print out the row we found and the columns we were looking
// for
System.out.println("Found row: " + rr);
}

// The other approach is to use a foreach loop. Scanners are
// iterable!
// for (Result rr : scanner) {
// System.out.println("Found row: " + rr);
// }
} finally {
// Make sure you close your scanners when you are done!
// Thats why we have it inside a try/finally clause
scanner.close();
}
}
}

Another great Java example from [4]:

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.HColumnDescriptor;
import org.apache.hadoop.hbase.HTableDescriptor;
import org.apache.hadoop.hbase.KeyValue;
import org.apache.hadoop.hbase.MasterNotRunningException;
import org.apache.hadoop.hbase.ZooKeeperConnectionException;
import org.apache.hadoop.hbase.client.Delete;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.HBaseAdmin;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.ResultScanner;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.util.Bytes;

public class HBaseTest {

	private static Configuration conf = null;
	/**
	 * Initialization
	 */
	static {
		conf = HBaseConfiguration.create();
	}

	/**
	 * Create a table
	 */
	public static void creatTable(String tableName, String[] familys)
			throws Exception {
		HBaseAdmin admin = new HBaseAdmin(conf);
		if (admin.tableExists(tableName)) {
			System.out.println("table already exists!");
		} else {
			HTableDescriptor tableDesc = new HTableDescriptor(tableName);
			for (int i = 0; i < familys.length; i++) {
				tableDesc.addFamily(new HColumnDescriptor(familys[i]));
			}
			admin.createTable(tableDesc);
			System.out.println("create table " + tableName + " ok.");
		}
	}

	/**
	 * Delete a table
	 */
	public static void deleteTable(String tableName) throws Exception {
		try {
			HBaseAdmin admin = new HBaseAdmin(conf);
			admin.disableTable(tableName);
			admin.deleteTable(tableName);
			System.out.println("delete table " + tableName + " ok.");
		} catch (MasterNotRunningException e) {
			e.printStackTrace();
		} catch (ZooKeeperConnectionException e) {
			e.printStackTrace();
		}
	}

	/**
	 * Put (or insert) a row
	 */
	public static void addRecord(String tableName, String rowKey,
			String family, String qualifier, String value) throws Exception {
		try {
			HTable table = new HTable(conf, tableName);
			Put put = new Put(Bytes.toBytes(rowKey));
			put.add(Bytes.toBytes(family), Bytes.toBytes(qualifier), Bytes
					.toBytes(value));
			table.put(put);
			System.out.println("insert recored " + rowKey + " to table "
					+ tableName + " ok.");
		} catch (IOException e) {
			e.printStackTrace();
		}
	}

	/**
	 * Delete a row
	 */
	public static void delRecord(String tableName, String rowKey)
			throws IOException {
		HTable table = new HTable(conf, tableName);
		List<Delete> list = new ArrayList<Delete>();
		Delete del = new Delete(rowKey.getBytes());
		list.add(del);
		table.delete(list);
		System.out.println("del recored " + rowKey + " ok.");
	}

	/**
	 * Get a row
	 */
	public static void getOneRecord (String tableName, String rowKey) throws IOException{
        HTable table = new HTable(conf, tableName);
        Get get = new Get(rowKey.getBytes());
        Result rs = table.get(get);
        for(KeyValue kv : rs.raw()){
            System.out.print(new String(kv.getRow()) + " " );
            System.out.print(new String(kv.getFamily()) + ":" );
            System.out.print(new String(kv.getQualifier()) + " " );
            System.out.print(kv.getTimestamp() + " " );
            System.out.println(new String(kv.getValue()));
        }
    }
	/**
	 * Scan (or list) a table
	 */
	public static void getAllRecord (String tableName) {
        try{
             HTable table = new HTable(conf, tableName);
             Scan s = new Scan();
             ResultScanner ss = table.getScanner(s);
             for(Result r:ss){
                 for(KeyValue kv : r.raw()){
                    System.out.print(new String(kv.getRow()) + " ");
                    System.out.print(new String(kv.getFamily()) + ":");
                    System.out.print(new String(kv.getQualifier()) + " ");
                    System.out.print(kv.getTimestamp() + " ");
                    System.out.println(new String(kv.getValue()));
                 }
             }
        } catch (IOException e){
            e.printStackTrace();
        }
    }

	public static void main(String[] agrs) {
		try {
			String tablename = "scores";
			String[] familys = { "grade", "course" };
			HBaseTest.creatTable(tablename, familys);

			// add record zkb
			HBaseTest.addRecord(tablename, "zkb", "grade", "", "5");
			HBaseTest.addRecord(tablename, "zkb", "course", "", "90");
			HBaseTest.addRecord(tablename, "zkb", "course", "math", "97");
			HBaseTest.addRecord(tablename, "zkb", "course", "art", "87");
			// add record baoniu
			HBaseTest.addRecord(tablename, "baoniu", "grade", "", "4");
			HBaseTest.addRecord(tablename, "baoniu", "course", "math", "89");

			System.out.println("===========get one record========");
			HBaseTest.getOneRecord(tablename, "zkb");

			System.out.println("===========show all record========");
			HBaseTest.getAllRecord(tablename);

			System.out.println("===========del one record========");
			HBaseTest.delRecord(tablename, "baoniu");
			HBaseTest.getAllRecord(tablename);

			System.out.println("===========show all record========");
			HBaseTest.getAllRecord(tablename);
		} catch (Exception e) {
			e.printStackTrace();
		}
	}
}

Reference:

  1. http://hbase.apache.org/docs/current/api/index.html
  2. http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/package-summary.html
  3. http://hbase.apache.org/book/data_model_operations.html
  4. http://lirenjuan.iteye.com/blog/1470645