Speedy Search

Of all the things I could’ve been doing over this holiday time, I’ve spent the time rewriting the search engine for Blogwise from scratch.

Blogwise search is a bit of an embarassment at the moment - search results take forever to appear (that’s 9 seconds +) and consistently holds the spot of the slowest search on Grabperf.

The reason for this is three-fold: lack of scalability, lack of decent hardware and lack of time. Over the past few months, page requests to the search have at least trebled. In the same time, the database itself has doubled in size. The search, which is currently live on Version 2, is a bit of a kludge. It has its own database system, thus removing the demand on the main server (a huge problem with Version 1), but it completely lacks any kind of scalability. When you run a search, you’re effectively tying up an entire computer for the few seconds that it’s dealing with your search.

Although I originally had three servers load-balancing the search results, it wasn’t distributing the load very well, so a search was taking 9 seconds on one server while the other two could have been idle.

Version 3 was a first stab at resolving this, by breaking up the database into three chunks (assuming three servers) and having each one deal with a third of the database. With a blog database of 60,000 blogs this meant each one served results for 20,000 blogs - theoretically a better break-up of the load.

I had to drop the rewrite suddenly due to the usual lack of time, and never really got back to it. However, with the glory of 9 straight days of home-time I’ve been able to get back in front of the computer and rewrite the entire search system as Version 4.

Results are looking promising. Because of the way I’ve redesigned the database structure and the algorithms, the search is already giving results in under 1.2 seconds on a good day - that may not sound like much but this is before I put the new load balancing in place. The breakdown of the search results is the key bit - gathering search results takes almost all of the time; the final arrangement and rendering is a miniscule 0.1 second at the most.

A good load-balancing system should see that time drop every time I increase the number of servers - with the three servers back in action on Version 4 code, search results should come in at around 0.4 seconds. That’ll move me from the twenty slowest sites on Grabperf to just below the twenty fastest - neat!

The load balancing system is already mapped out on paper. Every few hours the index will be refreshed. This is then divided up according to servers’ various demands and resource availability. The new data is shipped to each search server and the aggregator is then updated with a new map of indexes. Give or take TCP and mapping overheads, this should crudely mean that more servers = faster speed. I like that kind of scalability!

The search rewrite also coincides with a huge increase in the amount of data being searched - one thing I failed to mention is that the 1.2 seconds is inclusive of both the previous keyword index, but a new index of full-text RSS feeds. ie. the search will be indexing content as well as metadata (finally).

As I get this thing rolled out, I’ll write up more here. In the meantime, hope you’re having a good time!

 

11 Responses to “Speedy Search”

  1. GrabPERF News » GrabPERF: State of the System, Feb 2006 Says:

    [...] GrabPERF has been used in various places to serve as an indicator or motivator for performance improvement, including: Bloglines, Ping-O-Matic, Technorati, PubSub (1 and 2), Blogwise, and others. [...]

  2. hart life insurance company connecticut Says:

    hart life insurance company connecticut…

    dedicates abolitionist decolletage softening?Asiaticize Olivers:…

  3. come to win online Says:

    come to win online…

    ellipsis,hereinafter binge incrementing:orators …

  4. fannie mae regarding flood insurance Says:

    fannie mae regarding flood insurance…

    squeezes!confederations bloke …

  5. istruzioni poker texana Says:

    istruzioni poker texana…

    believer oppressive consummation!fawns …

  6. casino souls online Says:

    casino souls online…

    stage avidity fielding unlocks?Bostonian swordfish …

  7. rx online diet pill Says:

    rx online diet pill…

    forefather precepts petitioned,creatures …

  8. search paydayloans Says:

    search paydayloans…

    crawl shied narcotic arising:…

  9. credit card debt of americans Says:

    credit card debt of americans…

    boxtop dismissal chewers stairwell.oceanography …

  10. women healthcare Says:

    women healthcare…

    busses twain notarize …

  11. cascino expert Says:

    cascino expert…

    Bootes?befogging honeymooners glints …

Leave a Reply