Sunday 22 March 2009

Solr and Rails

Well, after some long diversions I have Solr working in some simple test cases with Rails. The long diversion was partly caused by not understanding what was offered by the Rails Solr plug-in, so I'm going to give an overview here, and a link to detailed instructions for Solr in Rails at the end of this post.

The Rails plug-in for Solr from git://github.com/mattmatt/acts_as_solr.git includes a complete installation of Solr. You don't need to install Solr separately. (My "long diversion" is that I rushed off and installed Solr separately, and spent a fair bit of time getting it running due to my ignorance of how it worked.)

If you want to index Word, Excel, PDF, and other types of documents, there is a bit of additional configuration to do. To index those files types you have to get a nightly build of Solr from here, and copy some files and directories as described in the link at the end of this post. You have to add the following lines to example/solr/conf/solrconf.xml:
  <requestHandler name="/update/extract" class="org.apache.solr.handler.extraction.ExtractingRequestHandler">
<lst name="defaults">
<str name="ext.map.Last-Modified">last_modified</str>
<bool name="ext.ignore.und.fl">true</bool>
</lst>
</requestHandler>
The plug-in also includes rake tasks to start and stop instances of the Solr server for development, test and production -- very handy. Just type
rake solr:start RAILS_ENV=test 
to start the test Solr server (default environment is development). It also gives you a yaml file in your environment directory to configure the ports that each instance of Solr will use (as installed: production on 8983, test on 8981 and development on 8982).

One thing I learned on my diversion is that Solr comes with an administration user interface that shows how many documents are in the Solr database, and lets you try ad-hoc queries. It's a good way to test if Solr is actually running. For example, after running the rake task to start Solr for development, you can browse to localhost:8982/solr/admin and you should get the Solr administration page.

So that's the overview. The detailed write up is here. It's good. I just wish I had this overview first so I knew what I was getting and where I was going.

1 comment:

John Clark said...

TY, this helped me understand what he was doing; our RoR box & db are split on two servers, so the local w/jetty didn't fit. I finally found a script to solr-tomcat which had to be edited to work (I used tomcat6). Works on 8080 & solr cells no longer causes crashing (couldn't log w/5.5). Funny, I saw his post to the full article; small world running this stuff. For my development, I'll stick to Drupal -- any problem I have has already been fixed 1000 times already.