Google search via e-mail

I’ve updated Mixamail to access Google search results via e-mail. For those new here, Mixamail is an e-mail client for Twitter. It lets you read and update Twitter just using your e-mail (you’ll have to register once via Twitter, though). Now, you can send an e-mail to [email protected] with a subject of “Google” and a body containing your query. You’ll get a reply within a few seconds (~20 seconds on my BlackBerry) with the top 8 search results along with the snippets. ...

Visualising student performance

I’ve been helping with visualising student scores for ReportBee, and here’s what we’ve currently come up with. Each row is a student’s performance across subjects. Let’s walk through each element here. The first column shows their relative performance across different subjects. Each dot is their rank in a subject. The dots are colour coded based on the subject (and you can see the colours on the image at the top: English is black, Mathematics is dark blue, etc.) ...

What does India search for?

Over the last couple of years, I’ve been tracking the top 5 hot searches in India on Google Trends (http://www.google.co.in/trends). Here are the results: If you're interested in making visualisations out of it, please feel free. But there's one particular thing I'm trying out, which is to categorise these searches and see if there's a trend around that. I've added a "Tag" column. Could you please help me tag the spreadsheet: https://spreadsheets.google.com/ccc?key=0Av599tR_jVYgdE5zTU5QWjcxVWVCaTBuY3d0NkUtc1E&hl=en_GB It’s publicly editable, no special access required. If you could stick to the tags I already have (Business, Education, Entertainment, News, Politics, Sports, Technology), that would be great. If not, that’s fine as well. And if you’ve made any visualisations or done any analysis using this data, please do drop a comment. ...

Visualising the Wilson score for ratings

Reddit’s new comment sorting system (charmingly explained by Randall Munroe) uses what’s called a Wilson score confidence interval. I’ll wait here while you read those articles. If you ever want to implement user-ratings, you need to read them. The summary is: don’t use average rating. Use something else, which in this case, is the Wilson score, which says that if you got 3 negative ratings and no positive ratings, your average rating shouldn’t be zero. Rather, you can be 95% sure that it’ll end up at 0.47 or above, given a chance, so rate it as 0.47. ...

Yahoo Clues API

Yahoo Clues is like Google Insights for Search. It has one interesting thing that the latter doesn’t though: search flows. It doesn’t have an official API, so I thought I’d document the unofficial one. The API endpoint is http://clues.yahoo.com/clue The query parameters are: q1 – the first query string q2 – the second query string ts – the time span. 0 = today, 1 = past 7 days, 2 = past 30 days tz – time zone? Not sure how it works. It’s just set to “0” for me s – the format? No value other than “j” seems to work So a search for “gmat” for the last 30 days looks like this: ...

Automated image enhancement

There are some standard enhancements that I apply to my photos consistently: auto-levels, increase saturation, increase sharpness, etc. I’d also read that Flickr sharpens uploads (at least, the resized ones) so that they look better. So last week, I took 100 of my photos and created 4 versions of each image: The base image itself (example) A sharpened version (example). I used a sharpening factor of 200% A saturated version (example). I used a saturation factor of 125% An auto-levelled version (example) I created a test asking people to compare these. The differences between these are not always noticeable when placed side-by-side, so the test flashed two images at the same place. ...

Surviving in prison

As promised, here are some tips from the trenches on surviving in prison. (For those who don’t follow my blog, prison is where your Internet access is restricted.) There are two things you need to know better: software and people. I’ll try and cover the software in this post, and the more important topic in the next. Portable apps You’re often not in control of your laptops / PCs. You don’t have administrator access. You can’t install software. The solution is to install Portable Apps. Most popular applications have been converted into Portable Apps that you can install on to a USB stick. Just plug them into any machine and use them. I use Firefox and Skype quite extensively this way, but increasingly, I have a preference for Portable Apps for just about everything. It makes my bloated Start Menu a lot more manageable. Some of the other portable apps I have are: Audacity, Camstudio, GIMP, Inkscape and Notepad++. ...

Shortening sentences

When writing Mixamail, I wanted tweets automatically shortened to 140 characters – but in the most readable manner. Some steps are obvious. Removing redundant spaces, for example. And URL shortening. I use bit.ly because it has an API. I’ll switch to Goo.gl, once theirs is out. I tried a few more strategies: Replace words with short forms. “u” for “you”, “&” for and, etc. Remove articles – a, an, the Remove optional punctuation – comma, semicolon, colon and quotes, in particular Replace “one” with “1”, “to” or “too” with 2, etc. “Before” becomes “Be4”, for example Remove spaces after punctuations. So “a, b” becomes “a,b” – the space after the comma is removed Remove vowels in the middle. nglsh s lgbl wtht vwls. How did they pan out? I tested out these on the English sentences on the Tanaka Corpus, which has about 150,000 sentences. (No, they’re not typical tweets, but hey…). By just doing these, independently, here is the percentage reduction in the size of text: ...

HTML5: Up and Running

HTML5: Up and Running is the book version of Mark Pilgrim’s comprehensive introduction to HTML5 at DiveIntoHTML5.org. Whether you buy the book or read it online, it’s the best introduction to the topic you’ll find. Mark begins with the history of HTML5 (using email archaeology, as he calls it). You’d never guess that many of the problems we have with XHTML, MIME types, etc. have roots in discussions over 20 years ago. From then on, he moves into feature detection (which uses the Modernizr library), new tags, canvas, video, geo-location, storage, offline web apps, new form features and microdata. Each chapter can be read independently – so if you’re planning to use this as a reference, you may be better of reading the links kept up-to-date at DiveIntoHTML5.org. If you’re interesting in learning about the features, it’s a very readable book, terse, simple, and above all, delightfully intelligent. ...

Twitter via e-mail

Since I don’t have Internet access on my BlackBerry (because I’m in prison), I’ve had a pretty low incentive to use Twitter. Twitter’s really handy when you’re on the move, and over the last year, there were dozens of occasions where I really wanted to tweet something, but didn’t have anything except my BlackBerry on hand. Since T-Mobile doesn’t support Twitter via SMS, e-mail is my only option, and I haven’t been able to find a decent service that does what I want it to do. ...

Bayes’ Theorem

I’ve tried understanding Bayes’ Theorem several times. I’ve always managed to get confused. Specifically, I’ve always wondered why it’s better than simply using the average estimate from the past. So here’s a little attempt to jog my memory the next time I forget. Q: A coin shows 5 heads when tossed 10 times. What’s the probability of a heads? A: It’s not 0.5. That’s the most likely estimate. The probability distribution is actually: ...

R scatterplots

I was browsing through Beautiful Data, and stumbled upon this gem of a visualisation. This is the default plot R provides when supplied with a table of data. A beautiful use of small multiples. Each box is a scatterplot of a pair of variables. The diagonal is used to label the rows. It shows for every pair of variables their correlation and spread – at a glance. Whenever I get any new piece of data, this is going to be the very first thing I do: ...

Modular CSS frameworks

A fair number of the CSS frameworks I’ve seen – Blueprint, Tripoli, YUI, SenCSS – are monolithic. What I’d like is to be able to mix and match specific components of these. For example, 960.gs has a simple grid system that I’d love to combine with the vertical rhythm that SenCSS offers. (Vertical rhythm ensures that sentences align vertically.) I’d love to have a CSS framework that just sets the fonts, for example, and touches nothing else. Or something that defines the colour schemes, and lets you change the theme like Microsoft Office does. ...

Install Mercurial

If you’re jointly writing code with others, use Mercurial or Git. (Not SVN. Linus explains, but the quick version is: you can’t commit offline.) Sites like bitbucket, github and Google Code let you maintain your code online with others editing it. My preference is for Mercurial via TortoiseHg, which integrates well with Windows Explorer. (I use the command prompt, but people I collaborate with prefer this.) Here’s a 2-minute video explaining how to install TortoiseHg and commit your code onto bitbucket. ...

Install Mediawiki

Once you’ve installed XAMPP, download MediaWiki and unzip it into your xampp/htdocs folder. You may need 7-Zip to extract tar.gz files. Rename the mediawiki folder to wiki. You’ll first need to create a database, which you can do by visiting /phpmyadmin/ on your localhost, typing in the database name and pressing ‘Create’. Now go to /wiki/ and fill out the form. Make sure you select “Use superuser account” since you haven’t really created a user for your database. ...

Install Wordpress

Once you’ve installed XAMPP, download Wordpress and unzip it into your xampp/htdocs folder. You’ll first need to create a database, which you can do by visiting the /phpmyadmin/ on your localhost, typing in the database name and pressing ‘Create’. Now go to /wordpress/, click the buttons and fill out the form. Type in 'root' for the database username and leave the password blank. Select any password you want for the administrator account. You can now log in with this administrator password and log into the Wordpress dashboard. ...

Install XAMPP

I’ve been going around setting up open source software a fair bit recently. To minimise the pain of explaining it, I’m putting together a short videos that explain the process. Here’s the first, on XAMPP, which is a starting point for most open source applications. It bundles Apache (web server), MySQL (database), Perl and PHP. To install it, search and download “XAMPP for Windows”, and press enter for every question. Then install your application under C:\xampp\htdocs. That’s it. ...

You are in prison

(I had intended to write this post sarcastically, a bit like my web freedom survey. But sarcasm’s confusing to read. So I’ll just be straight and mild.) If you’re a well-paid professional in an Indian IT services firm, your freedom is limited. (This holds if you’re a student, too.) You clock-in and clock-out. You’re searched on your way in and out. You need your boss’s permission to leave. You work on what you’ve been told to work on. You cannot be trusted to freely access the Internet. The last bit worries me the most. Perhaps because in all the other cases, there are humans I can put to shame or fight, face-to-face. Or because I am a Net addict. Don’t know why. ...

The Calvin and Hobbes search Takedown

Eight years ago, I started typing out each of the Calvin and Hobbes strips by hand. Four years ago, I set up a site that let people search for strips. Early this month, I was asked to take it down. This is the story. I can’t quite remember when I started reading Calvin & Hobbes. The earliest reference I can find in my blogs is in July 1999. I remember it didn’t take me long to become a fan. I’d read every strip on the newspaper; hunt them out at bookshops; and spend a fair bit of time searching for archives online. ...

A sense of proportion

A quote from David Heinemeier Hansson: So the problem is, a lot of business managers and especially business owners, they have no sense of probability. They can’t fathom that concept. So They treat the probability of 1 to 10 trillion as the same as a 1 to a 100. And like, “We’ve got to deal with this 1 to a trillion probability, because, what if it happens?” No! Doesn’t matter! I mean, don’t care. ...