Year: 2006

Visualisation of data

I have managed to fill hard disks of all capacities within a few months. My first PC had 10MB of disk space, while I work on 140GB today (remember: that’s 14 thousand times more capacity in 14 years). Both were filled within 2 months. (An aside: the number of files / folders hasn’t growth by 14,000. The files themselves have grown in size. I have roughly the same number of files/folders today on my machine as I had 14 years ago.)

To regain space, I used to go through every file and delete the unnecessary ones. My favourite tool was the UNIX utility du (Disk Usage). It lists the disk space used by every subdirectory. I would sort the result and find big, useless stuff. Here are the first few lines of a sorted du output:

1342507 ./Books
1188020 ./Non-Fiction
1047607 ./Comics
842832 ./Non-Fiction.Magazines
594939 ./Audio
298737 ./Books/kokona – Business
172166 ./Books/Terry Pratchett
164246 ./Books/Terry Pratchett/Discworld
162287 ./Calvin
142274 ./Books/S
77407 ./Scripts
74858 ./Science

It would take 5 minutes to create the list, and 15 minutes to read.

Nowadays I use WinDirStat, which shows every file and folder in an intuitive, graphical manner.

Treemap from WinDirStat

This view is called a Treemap. Each small block is a file. Bigger blocks are folders. Colours indicate the type of file (MP3s are blue, AVIs are red, WMVs are yellow, JPGs are green, etc.). This view has many advantages:

  • I can see the relative sizes of files and folders.
  • I can get an idea of the % of free space (grey block).
  • I can see what type of files occupy the most space.
  • etc. etc.

But the most important thing is, I see the useful stuff at a single glance.

That’s the key in visualisation: conveying a complex topic so people get it in a second.

(Incidentally, Google has a TechTalk on visualisation, including treemaps.)

Google searches that lead to my site

I stopped using Google Analytics when I redesigned my site. I track my own statistics. This gives me access to raw data, and I can do my own analyses.

I wanted to know the keywords on Google that led to my site. (Google Analytics only gives you phrases.) I also wanted independent words. If you search for “Calvin and Hobbes”, I want to count only “Calvin”, knowing that it’s in the context of “Hobbes”.

So I did this analysis. Here are the keywords that lead to my site. (This is based on 3 weeks of data).

  1. excel in the context of cell, formula, function, leading to my Excel tips. People mostly want to know how to remove errors like #N/A.
  2. calvin in the context of hobbes, fight, club. (There was a great article on how Fight Club is really Calvin and Hobbes.) Most of these queries are searches for specific quotes, and I’ve typed out all the Calvin and Hobbes quotes.
  3. indian in the context of torrents, tv. One of my most popular posts is Indian Torrents. I simply linked to a couple of Google searches, so it’s popularity is unjustified.
  4. tamil in the context of songs, lyrics, movie. This is mostly thanks to the recent tamil quizzes I’ve put up.
  5. mumbai in the context of local, schedule, train. A shockingly large number of people search for Mumbai bus and train schedule, landing on my link to the IIT-B Mumbai Navigator.
  6. anand in the context of s anand, bcg, infosys. This is people searching for me.
  7. irr in the calculating, excel, formula. Calculating IRR turned out to be another unexpectedly popular post.
  8. interview in the context of lehman brothers, bcg, landing at some of my interview experiences.
  9. mckinsey in the context of ppt, presentation. Most of these people are looking for presentations, while I have a link to the McKinsey pre-placement talk at LBS. Interesting that BCG is not on the top 10.
  10. google in the context of engedu, types, authors@google. Though I have several posts about Google, the ones about Google video like Meet the author and on Google TechTalks are the most popular.

Having read the actual queries, I’ve concluded that only the keywords excel, mumbai, anand, irr and interview definitely lead to relevant hits. The rest are debatable. Maybe I should reduce the importance of the less relevant posts on my sitemaps file.

An honest in-flight announcement

What would an honest in-flight announcement sound like? Among other things, it would say…

Please switch off all mobile phones, since they can interfere with the aircraft’s navigation systems. At least, that’s what you’ve always been told. The real reason to switch them off is because they interfere with mobile networks on the ground, but somehow that doesn’t sound quite so good.

English movie romances

I’ve mentioned the lead pair of famous romantic English movies. How many titles can you guess?

Score: 0 / 25
Cary Grant and Deborah Kerr
Hugh Grant and Julia Roberts
Richard Gere and Julia Roberts
Cary Grant and Audrey Hepburn
Cary Elwes and Robin Wright Penn
Humphrey Bogart and Ingrid Bergman
Clark Gable and Claudette Colbert
Yul Brynner and Deborah Kerr
Leonardo Di Caprio and Kate Winslet
Jack Lemmon and Marilyn Monroe
Cary Grant and Grace Kelly
Tom Hanks and Meg Ryan (1st film)
Peter O’Toole and Audrey Hepburn
Billy Crystal and Meg Ryan
Christopher Plummer and Julie Andews
James Stewart and Donna Reed
Rex Harrison and Audrey Hepburn
Ewan McGregor and Nicole Kidman
Bill Murray and Andie MacDowell
Steve Martin and Daryl Hannah
Patrick Swayze and Demi Moore
Gregory Peck and Audrey Hepburn
Jack Lemmon and Shirley MacLaine
Cary Grant and Katherine Hepburn
Harrison Ford and Julia Ormond