Sunday, October 22, 2006

acts_as_ferret - Sort by something other than rank

The Old

While rebuilding audiri.com I decided that I need to revamp the old search system. The site currently lets you search for artists by name (useless, since you probably don't know the name of the band that you are looking for) and also by artists who sound similar to an artist you already know. So if you entered in rage against the machine it would return all of the bands who had put rage against the machine into their "similar artists" field. Previously the search was as simple as that. I used a mysql full-text index and matched the query to the fields.

The New
I decided acts_as_ferret was an easy way to get a powerful full text index. To make a model searchable, simple specify:
acts_as_ferret
in the model.

You can also specify which fields as follows:
acts_as_ferret :fields => {
'artist_name' => {:boost => 5},
'similar_artists' => {:boost => 2},
'influences' => {},
'description' => {},
'short_description' => {},
'style' => {:boost => 0.5},
'history' => {:boost => 0.5},
'genre_tags' => {},
:comments => {:boost => 3},
'total_plays' => {:index => :yes, :store => :no},
'total_plays_today' => {:index => :yes, :store => :no},
'plays_this_day' => {:index => :yes, :store => :no},
'plays_this_week' => {:index => :yes, :store => :no},
'plays_this_month' => {:index => :yes, :store => :no},
'plays_ever' => {:index => :yes, :store => :no},
'hidden' => {:index => :yes, :store => :no}
}

One thing that I thought would be useful would be searching all of the comments people left on the artists page. It turns out this wasn't to hard, to do a has_many relationship with acts_as_ferret, you bascially just include a function that returns a dump of the text for each artist.

def comments
artist_comments.map {|ac| ac.body}.join(' ')
end
Then in the fields area say:

:comments => {},
The fact that it's a symbol tells acts_as_ferret that it's a function.

This all worked out great. I could quickly query acts_as_ferret using:

Artist.find_by_contents("query")
and get back some pretty good results.

So the next question I had was how do I sort the results by something other than just relevence. Normally on audiri we give people the option to sort by things like total_plays and the like. To do this really required pulling away at the acts_as_ferret model and delving under it to ferret it's self. Fortunately for us, any model made with acts_as_ferret gives us the .ferret_index structure. This lets us make method calls on the actual ferret index. Another issue I was trying to tackle was pagination natively in ferret. A lot of solutions suggest pulling out all of the ferret results, then sorting and paging with sql or by hand. This was not my idea solution.

Borrowing code from various sources , I was able to make a function in the model which let me query the index how I wanted.
    1   def self.full_text_search(q, options = {})
2 return nil if q.nil? or q==""
3 default_options = {:limit => 10, :page => 1, :sort => ''}
4 options = default_options.merge options
5 options[:offset] = options[:limit] * (options[:page].to_i-1)
6 results = []
7
8
9
10
11
12 if q == ''
13 search_map = self.ferret_index.search('*:('+q+') ', {:limit => options[:limit], :offset => options[:offset]})
14 else
15 search_map = self.ferret_index.search('*:('+q+') ', {:limit => options[:limit], :offset => options[:offset], :sort => options[:sort]})
16 end
17
18 # loop through each
19 search_map.hits.each { |t|
20 results << self.find(self.ferret_index[t.doc]["id"])
21 }
22
23 num = search_map.total_hits
24
25 return [num, results]
26 end

The getOrder is a helper funciton that returns things like:

total_plays DESC

Note that in lucene's query language, ASC is not used, only DESC.

Hope this helps people get started

No comments: