Xu’s Collection of Apache Projects

Understanding information content with Apache TikaThis collection includes projects I am interested in. For Hadoop project, I have several separated posts.

A detail list of all projects at: http://projects.apache.org/indexes/quick.html

Apache Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries.

Tutorials:

Thrift is a software framework for scalable cross-language services development. It combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, Smalltalk, and OCaml.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s