Background: To index Word, Excel, PDF and other "unstructured" documents, Solr uses Tika, another Apache project. Tika comes bundled in Solr and is ready to run in Solr. However, if you want to run Tika individually (e.g. you don't trust your installation, or you're just curious) you have to copy a few .jar files around (Java experts who can manage class paths will probably tell me there's a better way to do this).
I did
cd [Your path]/apache-solr-nightly/lib(I have no idea where ~/.m2 came from. It may have been when I ran the Tika build.) Then I could run
cp commons-io-1.4.jar commons-codec-1.3.jar [Your path]/apache-solr-nightly/example/solr/lib
cp ~/.m2/repository/org/jempbox/jempbox/0.2.0/jempbox-0.2.0.jar [Your path]/apache-solr-nightly/example/solr/lib
java -jar tika-0.2.jarin that directory.
.m2/repository is the Maven 2 cache directory, where Maven stores jar files downloaded during your build (and created by your build)
ReplyDelete