Advanced Google Reader

I’ve stopped visiting websites. No, really. There’s only one website I visit these days. Google Reader. Google Reader is a feed reader. If you want to just catch up on the new stuff on a site, you can add the site to Google Reader. Anything new that is published on the site appears in Google Reader. Right now, I’ve subscribed to over 50 feeds. There’s no way I can remember to visit 50 sites – so I’m actually able to read more and miss less. ...

Default camera ISO setting

In those early days, when all I had was an analog SLR, I had to make choices up-front. Do I buy an ISO 100 film for daytime shooting? (It’s cheaper, besides.) Do I go in for the expensive ISO 1600 film for my fancy night shots? Do I lug around the tripod? Do I use the flash? Do I even bother taking indoor shots? etc. With my new digital camera, at least the ISO choice vanishes. The ISO range varies from 64 to 1600. And so, I don’t need flash or a tripod most of the time. ...

Making my music search engine faster

My music search engine takes quite a while to load (typically 40 seconds). That's an unusually long time for a page, given that most of the people that access it are on broadband connections, and are listening to music online. The reason is, firstly, that I'm loading a lot of data. Literally every single song on that you can search comes through as Javascript. All the downloadable Hindi songs, for instance, occupy 1.3 megabytes before compression. On average, this takes about 20 seconds to load. ...

Reducing the server load

I’m been using a shared hosting service with 100 WebSpace over the last 7 years. It’s an ad-free account that offers 100MB of space and 3GB of bandwidth per month. Things were fine until two months ago, which was when my song search engines started attracting an audience. I had anticipated that I might run out of bandwidth, so I used a different server (that has 5GB of bandwidth per month quota) for loading the songs. But what I didn’t anticipate whas that my server load would run over the allotted CPU limit. ...

Tamil spelling corrector

The Internet has a lot of tamil song lyrics in English. Finding them is not easy, though. Two problems. The lyrics are fragmented: there’s no one site to search them. And Google doesn’t help. It doesn’t know that alaipaayudhe, alaipaayuthe and alaipayuthey are the same word. This is similar to the problem I faced with tamil audio. The solution, as before, is to make an index, and provide a search interface that is tolerant of English spellings of Tamil words. But I want to go a step further. Is it possible to display these lyrics in Tamil? ...

Splitting a sentence into words

I often need to extract words out of sentences. It’s one of the things I used to build the Statistically Improbable Phrases for Calvin and Hobbes. But splitting a sentence into words isn’t as easy as you think. Think about it. What is a word? Something that has spaces around it? OK, let’s start with the simplest way to get words: split by spaces. Consider this piece: "I'd look at McDonald's," he said. "They sell over 3,000,000 burgers a day -- at $1.50 each." High-fat foods were the rage. For e.g., margins in fries were over 50%... and (except for R&M & Dyana [sic]) everyone was at ~30% net margin; growing at 25% too! Splitting this by spaces (consider new lines, tabs, etc as spaces too.), we get the following: ...

HTTP download speeds

In some of the Web projects I’m working on, I have a choice of many small files vs few big files to download. There are conflicting arguments. I’ve read that many small files are better, because you can choose to use only the required files, and they’ll be cached across the site. (These are typically CSS or Javascript files.) On the other hand, a single large file takes less time to download than the sum on many small files, because there’s less latency. (Latency is more important than bandwidth these days.) ...

Making a Media PC

Two weeks ago, I pulled together a Media PC. This has been a long-term ambition. I’ve always wanted to have a PC as the centre of all my media. To use it as a radio, TV, stereo system, CD player, DVD player, etc. I finally did it, for just under 1000 pounds. At the centre of the setup is my 42" Plasma TV (LG 42PC1D). I was debating between a plasma and LCD TV. The differences, as I understand them, are: ...

Hindi songs online

Click here to search for Hindi songs. This is an article on how I wrote the search engine. I find it a nuisance to have to go to Raaga, search for a song, not find it, then go to MusicIndiaOnline, not find it, then go to Musicplug.in, and so on until Google. So I got the list of songs from some of these sites, put it together in one place, and implemented a find-as-you-type. ...

Statistically improbable phrases 2

My earlier list of statistically improbable phrases in Calvin and Hobbes is technically just a list of “Statistically Improbable Words”. I re-did the same analysis using phrases. Here are the top 20 statistically improbable phrases (2 - 4 words only): baby sitter chocolate frosted sugar bombs comic books doing homework fearless spaceman spiff() good night hamster huey ice cream miss wormwood new year peanut butter really think slimy girls spaceman spiff stuffed tiger stupendous man sugar bombs susie derkins watch tv water balloon That is, these are the 2-4 word phrases whose frequency in Calvin and Hobbes is substantially (at least 5 times) higher than in the other books I have. ...

Most bookmarked pages

These are the most bookmarked pages on my site: My home page Excel tips Calvin & Hobbes quotes (I typed them all) Indian torrents (I have a search engine for Indian torrents) Tamil Transliterator (Lets you type Tamil in English) Tamil songs quiz Movie quote quiz My best links Top 10 lists But this post is not about these links. It’s about how I found this out. Think about it… how could I know what pages have been bookmarked? The browser doesn’t send any information about bookmarks. ...

Google custom search engine

I didn’t realise the power of Google Coop’s custom search engines (CSE) until I watched Scoble interviewing Google’s Shashi Seth. In a nutshell, CSE lets you create a search engine that’s focuses on specific sites, like UK blogs or Photoshop sites Anyone can create these. You can edit other people’s search engines too. There are a huge number of custom search engines you can volunteer to edit. I’ve created a bunch of search engines myself: ...

Wishlist for movies

I watch a lot of movies. Over the last year, I’ve watched over 250 movies (and read 50 books, but that’s another story). Other than making time to watch movies, my biggest problem is figuring out what to watch next. The IMDb top 250 is a good guideline, and I’m running my way down the list. Twofifty.org has been useful to track what I’ve seen as well. But I have interests outside of the IMDb Top 250, and I need a way of tracking these. ...

My Fuji Finepix S5600

My digital camera conked off. The cover that holds the battery fell off, and I can’t use it any more. I went back to my buying principles, and prepared an Excel sheet to choose my next camera. Here’s what I was looking for: Low-light photography. Flashes are lousy. This effectively means I need ISO control. Shutter speed control. I sometimes take really long exposure (3-10s) snaps, and sometimes can’t afford the blur (1/250s). Long battery life. My current camera consumed batteries like crazy. Fast start-up. By the time I got my earlier camera out and it started, it was too late. RAW mode. Gives me more control in Photoshop. I didn’t care about: ...

Google search in Tamil

When I wrote my Tamil song lyrics quizzes, I had two problems: I can't write in Tamil (not on paper, nor on a computer) I can't spell right in Tamil (ந vs ன, ர vs ற) I overcame the first using a Tamil transliterator. I write in English, and you see it in Tamil. The problem of ந vs ன was simple. ந occurs as the first letter of a word, and just before த. Nowhere else. (Is this always true?) ...

Automated resume filtering

I had to screen resumes from a leading MBA school. I’m lazy, and there were hundreds of CVs. So after procrastinating until this morning, I decided on 2 principles: I will not spend more than 45 minutes on this. (That’s the duration of my train ride to office.) I will not read a single CV. (I would write a program.) The CVs were in a single PDF file. I saved it as text (it shrunk from 66MB to 1.6MB without the photos). Then I wrote a Perl program to filter CVs by keywords. We were looking for people with an interest and/or experience in IT consulting, so I picked “technology”, “consulting”, “SAP”, “IBM”, “Accenture”, “Deloitte”, etc. ...

Playing sounds backwards

You can play a video backwards and still recognise the scenes quite well. Can you do that with sound? I tried it on this Bryan Adams clip of Summer of ‘69 (mp3). When played backwards (mp3), it almost sounds like Arabic! Instruments sound weird backwards too, like the guitar played backwards and drums played backwards. It’s seems obvious once you see the wave file. The picture below shows the guitar. The sounds are clearly not symmetric left to right. ...

Google searches that lead to my site

I stopped using Google Analytics when I redesigned my site. I track my own statistics. This gives me access to raw data, and I can do my own analyses. I wanted to know the keywords on Google that led to my site. (Google Analytics only gives you phrases.) I also wanted independent words. If you search for “Calvin and Hobbes”, I want to count only “Calvin”, knowing that it’s in the context of “Hobbes”. ...

Experiments in sound

Wikipedia says the human voice frequency for speech is between 85 to 155 Hz for men, and 165 to 255 Hz for women. That set me thinking. What is the limit to our hearing? How do sounds differ? How can we synthesise speech? What are the limits to our hearing? Kids can hear frequencies from 20 Hz to 20 kHz, while adults hear only up to 12-14 kHz (Frequency Range of Human Hearing). ...

Link to a Google search rather than a site

When you make a link, there’s no guarantee that the link will work 5 years later. Sites change their URL structure. I’m finding that many of my blog entries from 2000 are invalid. Sometimes you want to link to a concept rather than a site. In such cases, it’s better to link to a Google query. For example, rather than link to a site that defines SVG, I could link to the Google search define:SVG. ...