Thursday 26 March 2009

Tika and Solr

This is just a quick note to document another experience with Solr.

Background: To index Word, Excel, PDF and other "unstructured" documents, Solr uses Tika, another Apache project. Tika comes bundled in Solr and is ready to run in Solr. However, if you want to run Tika individually (e.g. you don't trust your installation, or you're just curious) you have to copy a few .jar files around (Java experts who can manage class paths will probably tell me there's a better way to do this).

I did
cd [Your path]/apache-solr-nightly/lib
cp commons-io-1.4.jar commons-codec-1.3.jar [Your path]/apache-solr-nightly/example/solr/lib
cp ~/.m2/repository/org/jempbox/jempbox/0.2.0/jempbox-0.2.0.jar [Your path]/apache-solr-nightly/example/solr/lib
(I have no idea where ~/.m2 came from. It may have been when I ran the Tika build.) Then I could run
java -jar tika-0.2.jar
in that directory.

1 comment:

Anonymous said...

.m2/repository is the Maven 2 cache directory, where Maven stores jar files downloaded during your build (and created by your build)