Wednesday, December 13, 2006

So cool

While the JVM is a beast when it comes to load time and applets are on the verge of death, this is cool.

Friday, November 17, 2006

Height of the Page, no questions asked

When creating a popover for a web app I was writing, I needed to get the internal height of the page, so I could set the popover to the pages height. There are other techniques to achieve this such as setting


html {
overflow: hidden;
}

body {
overflow: auto;
}


then absolutely positioning the div, but this can result in rendering bugs in IE anytime you try to relatively position an element. (usually the element doesn't move when you scroll the page)
So the simple solution is to use javascript to get the pages height when the popover is supposed to be show. For the popover I was working on, I did not want to change the underlying page's doctype (or lack there of) The problem was that document.documentElement.clientHeight only works with a doctype. To make a long story short, I came up with this line of code that should return the pages complete height no matter what.


Math.max((window.innerHeight || 0), document.body.clientHeight, document.documentElement.clientHeight, document.body.scrollHeight)


Simple, parts might be repetitive, but it works

Sunday, November 12, 2006

WWW::Mechanize get current page

So I have always done a lot of screen scraping, and typically whatever language I was working with I would build a framework to get the job done. I built one for java, which was a nightmare. Next I created one in php. It was a lot simpler, but just took to much time to really do right. When I moved to ruby I was supprised to find the WWW::Mechanize library. It did everything I had been building into these other frameworks. The nice thing about mechanize is that it takes care of following redirects, and parsing the html into an easy to follow structure. Something I would always build into my frameworks was the ability to psuedo-submit forms on the page. Typically in the form of (php example):

$cForm = $page=>forms[2]
$cForm=>login = 'bob';
$cForm=>password = 'testpass';
$cForm.submit();

You can do very similiar things in mechanize, but the thing that stumped me for to long was how you got the current url of the page. Turns out it isn't that hard, but it is poorly documented.

(ruby example)

agent = WWW::Mechanize.new
agent.user_agent = 'Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 5.1)'
form = browse.forms[1]
form.fields.find {|f| f.name == 'location'}.value = 'MT'
page1 = @agent.submit(form, form.buttons.first)
agent.page.uri.to_s

Note the last line. This should always return the page that the agent is "at" in the browser paradigm, even after multiple redirects

Sunday, October 22, 2006

acts_as_ferret - Sort by something other than rank

The Old

While rebuilding audiri.com I decided that I need to revamp the old search system. The site currently lets you search for artists by name (useless, since you probably don't know the name of the band that you are looking for) and also by artists who sound similar to an artist you already know. So if you entered in rage against the machine it would return all of the bands who had put rage against the machine into their "similar artists" field. Previously the search was as simple as that. I used a mysql full-text index and matched the query to the fields.

The New
I decided acts_as_ferret was an easy way to get a powerful full text index. To make a model searchable, simple specify:
acts_as_ferret
in the model.

You can also specify which fields as follows:
acts_as_ferret :fields => {
'artist_name' => {:boost => 5},
'similar_artists' => {:boost => 2},
'influences' => {},
'description' => {},
'short_description' => {},
'style' => {:boost => 0.5},
'history' => {:boost => 0.5},
'genre_tags' => {},
:comments => {:boost => 3},
'total_plays' => {:index => :yes, :store => :no},
'total_plays_today' => {:index => :yes, :store => :no},
'plays_this_day' => {:index => :yes, :store => :no},
'plays_this_week' => {:index => :yes, :store => :no},
'plays_this_month' => {:index => :yes, :store => :no},
'plays_ever' => {:index => :yes, :store => :no},
'hidden' => {:index => :yes, :store => :no}
}

One thing that I thought would be useful would be searching all of the comments people left on the artists page. It turns out this wasn't to hard, to do a has_many relationship with acts_as_ferret, you bascially just include a function that returns a dump of the text for each artist.

def comments
artist_comments.map {|ac| ac.body}.join(' ')
end
Then in the fields area say:

:comments => {},
The fact that it's a symbol tells acts_as_ferret that it's a function.

This all worked out great. I could quickly query acts_as_ferret using:

Artist.find_by_contents("query")
and get back some pretty good results.

So the next question I had was how do I sort the results by something other than just relevence. Normally on audiri we give people the option to sort by things like total_plays and the like. To do this really required pulling away at the acts_as_ferret model and delving under it to ferret it's self. Fortunately for us, any model made with acts_as_ferret gives us the .ferret_index structure. This lets us make method calls on the actual ferret index. Another issue I was trying to tackle was pagination natively in ferret. A lot of solutions suggest pulling out all of the ferret results, then sorting and paging with sql or by hand. This was not my idea solution.

Borrowing code from various sources , I was able to make a function in the model which let me query the index how I wanted.
    1   def self.full_text_search(q, options = {})
2 return nil if q.nil? or q==""
3 default_options = {:limit => 10, :page => 1, :sort => ''}
4 options = default_options.merge options
5 options[:offset] = options[:limit] * (options[:page].to_i-1)
6 results = []
7
8
9
10
11
12 if q == ''
13 search_map = self.ferret_index.search('*:('+q+') ', {:limit => options[:limit], :offset => options[:offset]})
14 else
15 search_map = self.ferret_index.search('*:('+q+') ', {:limit => options[:limit], :offset => options[:offset], :sort => options[:sort]})
16 end
17
18 # loop through each
19 search_map.hits.each { |t|
20 results << self.find(self.ferret_index[t.doc]["id"])
21 }
22
23 num = search_map.total_hits
24
25 return [num, results]
26 end

The getOrder is a helper funciton that returns things like:

total_plays DESC

Note that in lucene's query language, ASC is not used, only DESC.

Hope this helps people get started

Friday, September 08, 2006

How can you be a developer and not have a blog

The question of the day I guess. Truth be told, I don't have a lot of things that a developer should have. I just recently got a cell phone. So why if I hold out on all of these other things should I start a blog. Let me clear the air, I have no illusions of grandure. I am well aware that the average blog has .5 readers. (Including the authors) I however feel that this servers as a good place to give people an idea of what I do an maybe save people a little bit of code or searching in cases. So without further ado, here's my first blog post.