Making my music search engine faster

My music search engine takes quite a while to load (typically 40 seconds). That’s an unusually long time for a page, given that most of the people that access it are on broadband connections, and are listening to music online.

The reason is, firstly, that I’m loading a lot of data. Literally every single song on that you can search comes through as Javascript. All the downloadable Hindi songs, for instance, occupy 1.3 megabytes before compression. On average, this takes about 20 seconds to load.

The second reason is, I’m doing a lot of processing with the data. Two things take the bulk of the time: uncompressing the data, and massaging it to allow for different spellings. On average, that takes about 20 seconds.

40 seconds is too long to wait. Actually, the perceived waiting time is 8 seconds, because I start showing the search and the results after 1/5th of the data is downloaded. But 8 seconds is still too long.

I managed to cut this time by half with two things I just learned.

Firstly, Javascript loads and executes sequentially. As soon as a Javascript block is loaded, it is executed. While it is being executed, the rest of the HTML file is not parsed. So any subsequent Javascript blocks (or images, CSS, any other external links) are not loaded. Bandwidth is just sitting idle while the browser is busy at work on the Javascript.

Since I’ve split all my songs into five equal-sized Javascript files, my search engine loads the Javascript files one after another! The problem is even worse — the calculations I do in Javascript take up as much time as the load time. If the loading went on in parallel, by the time the first calculation is done, the second script would have loaded.

This problem can be solved for Internet Explorer and Opera. The “defer” attribute loads the scripts in the background, and defers their execution. This reduces the loading time to nearly zero for all the Javascript files except for the first one or two, because by the time my calculations are done, the next script is already loaded.

Javascript loading sequence before and after 'defer'

These Javascript files contain a list of songs as a long string, which I then split into individual songs. Then I modify each song using regular expressions so that approximate matches will still work. (For e.g., typing “aa” is the same as typing “a” on this search engine.) The performance of regular expressions is critical to me.

Originally, I did this:

  1. Split the string into songs
  2. Modify each song using regular expressions

Now, I changed the sequence to this:

  1. Modify the string using regular expressions
  2. Split the string into songs

When I timed the speed of this change, I realised browsers differ a lot in the speed of processing regular expressions. Here is the time (in seconds) for each browser to process the Javascript before and after the change.

Browser Before After
Internet Explorer 6.3 5.0
Firefox 17.7 4.7
Opera 93.8 19.9

Internet Explorer wasn’t affected much by this change. But Firefox and Opera were heavily impacted. I’m guessing this is because Firefox and Opera have a high setup time for matching regular expressions. Before, I was matching thousands of strings. After, I was matching only one (long) string. IE didn’t care much. Firefox and Opera speeded up dramatically. (I suspect my Opera data is skewed by a few people using a rather slow machine… but then, that’s probably why they should use Opera in the first place — it works rather well old machines.)

With these changes, my total load time fell from about 40 seconds to about 20 seconds. Pretty decent improvement.

There are two further steps I could take from here on.

Compress the songs files further

Currently, I’m doing two things to compress the songs.

  1. Instead of sending the list as a (movie-song),(movie-song),... combination for every song, I send it as a (movie-song,song,song,...)(movie-song,song,song,...)... combination. So I don’t repeat the movie names.
  2. Secondly, I compress them via HTTP/1.1. (My server doesn’t let me do that with Javascript, actually, because Netscape 4.x doesn’t accept compressed Javascript. But since it’s such an old browser and none of my viewers use it, I trick the server by renaming my Javascript files as .html, and it passes them on compressed.

What I could do additionally is:

  1. Remove duplicate songs. If songs are spelt differently, I include them both. But I can knock off the duplicate ones.
  2. Compress the text. Though I’m gzipping the text before sending it, I suspect there may be ways of storing the data in a more compressed form. For example, many songs are repeated with the (male) and (female) versions. These could be clubbed (somehow…)

Speed up the Javascript calculations further

There are two steps I’m doing right now:

  1. Modify the string using regular expressions
  2. Split the string into songs

Here’s how much time each step takes (in seconds) across browsers. (Note: these were based on one day’s data, sampling about 2,000 page views)

Browser Regular expression Split
Internet Explorer 0.88 3.04
Firefox 3.76 1.08
Safari 0.47 0.46
Opera 4.96 29.78

Firefox takes a lot of time with regular expressions. IE and Opera take a lot longer to split (which involves creating many objects). I may have to think up different optimisations for the two.

The code is available in the HTML of the song search engine. It’s readable. The stuff I’m talking about is in the function sl(p,s). Any thoughts are welcome!