Bayes’ Theorem

I’ve tried understanding Bayes’ Theorem several times. I’ve always managed to get confused. Specifically, I’ve always wondered why it’s better than simply using the average estimate from the past. So here’s a little attempt to jog my memory the next time I forget.

Q: A coin shows 5 heads when tossed 10 times. What’s the probability of a heads?
A: It’s not 0.5. That’s the most likely estimate. The probability distribution is actually:

dbeta(x,5,5)

That’s because you don’t really know the probability with which the coin will throw a heads. It could be any number p. So lets say we have a probability distribution for it, f(p).

Initially, you don’t know what this probability distribution is. So assume they’re all the same – a flat function: f(p) = 1dbeta(x,1,1)

Now, given this, let’s say a heads falls on the next toss. What’s the revised probability distribution? It’s:

f(p) ← f(p) * probability(heads | x) / probability(heads) = 1 * (x^1 * (1-x)^0) / 1 = x

dbeta(x,2,1)

Let’s say the next is again a heads. Now it’s

f(p) ← f(p) * probability(heads | x) / probability(heads) = x * (x^1 * (1-x)^0) / 1 = x^2

dbeta(x,3,1)

Now if it’s a tails, it becomes:

f(p) ← f(p) * prob(tails | x) / prob(tails) = x^2 * (x^0 * (1-x)^1) / 1 = x^2 * (1-x)

dbeta(x,3,2)

… and so on. (This happens to be a called a Beta distribution.)

Now, instead of this being the probability of heads, it could be the probability of a person having blood pressure, or a document being spam. As you get more data, the probability distribution of the probability keeps getting revised.

R scatterplots

I was browsing through Beautiful Data, and stumbled upon this gem of a visualisation.

r-scatterplots

This is the default plot R provides when supplied with a table of data. A beautiful use of small multiples. Each box is a scatterplot of a pair of variables. The diagonal is used to label the rows. It shows for every pair of variables their correlation and spread – at a glance.

Whenever I get any new piece of data, this is going to be the very first thing I do:

plot(data)

Modular CSS frameworks

A fair number of the CSS frameworks I’ve seen – Blueprint, Tripoli, YUI, SenCSS – are monolithic. What I’d like is to be able to mix and match specific components of these.

For example, 960.gs has a simple grid system that I’d love to combine with the vertical rhythm that SenCSS offers. (Vertical rhythm ensures that sentences align vertically.) I’d love to have a CSS framework that just sets the fonts, for example, and touches nothing else. Or something that defines the colour schemes, and lets you change the theme like Microsoft Office does.

LessCSS

Less CSS has been invaluable in helping with this. It extends the CSS language without deviating significantly from it. Compared to SASS and CleverCSS, I’d say it has a better chance of getting incorporated as into, say, CSS4.

LessCSS offers variables. I can define a variable:

@foreground: #112233

and use it like this:

h1 { color: @foreground; }
a:hover { background-color: @foreground; }

When I change @foreground, it’s replaced everywhere.

LessCSS offers multiple inheritance.

.highlight { color: red; }
.button { border-radius: 10px; }
.action {
  .highlight;
  .button;
}

This assigns the properties of the highlight and the button classes to the action class. Any changes made to the parents automatically get inherited.

LessCSS has a Javascript pre-processor. So I can include it directly in the HTML, and add the pre-processor, which converts it into CSS.

<link rel="stylesheet/less" href="style.less">
<script src="less.js"></script>

I now use LessCSS as the basis of all new projects.

CSS libraries

My first attempt to consolidate modular CSS libraries is at bitbucket.org/sanand0/csslibs. As far as possible, I’ve tried to avoid creating new libraries, or even tweaking existing ones. Over time, I hope to completely eliminate any new code.

There are two types 2 types of libraries. Some just have variable definitions. Others actually define styles. For example, I’ve got three libraries that just define variables:

color_themes.less

Defines a standard set of color themes (based on the Office 2007 color themes)

font_stacks.less

Defines Web-safe font stacks (based on Sitepoint’s article)

backgrounds.less

Transparent background patterns (randomly useful images)

Including the above libraries will have no effect. You need to explicitly use them. For example:

@import "font_stacks.less";         // Does nothing
h1 { font-family: .font[@serif]; }  // Makes H1 a serif font

The following libraries define styles. Including them will define new classes or change the style of tags / classes.

reset.less

Resets default styles consistently across browsers. I chose YUI3 CSS Reset arbitrarily. I think HTML5boilerplate’s CSS reset may be a better choice, though.

grids.less

Defines classes for fixed and fluid grids. I choose YUI3 CSS Grids over 960.gs (which I’ve used for some years) because of its ability to offer fixed as well as fluid layouts, and the sheer brilliance of its minimality.

lineheight.less

Sets font sizes, ensuring that lines have a vertical rhythm. This is a stripped-down version of SenCSS, but over time, I’ll phase this out and use some standard framework someone comes up with.

Between these, I think the base infrastructure for most applications is in place. What’s required next are widgets. Specifically, I’d like:

  • Buttons. A really good, cross-browser, non-image-based button that offers rounded corners, gradients and borders.
  • Forms. Consistent form styling, without forcing me to use a specific form layout.
  • Icons. A standard icon library with replaceable CSS sprite-sets.

I’ll try keep the code updated as I find these. Do pass me any suggestions you may have.

Install Mercurial

If you’re jointly writing code with others, use Mercurial or Git. (Not SVN. Linus explains, but the quick version is: you can’t commit offline.)

Sites like bitbucket, github and Google Code let you maintain your code online with others editing it.

My preference is for Mercurial via TortoiseHg, which integrates well with Windows Explorer. (I use the command prompt, but people I collaborate with prefer this.)

Here’s a 2-minute video explaining how to install TortoiseHg and commit your code onto bitbucket.

Install Mediawiki

Once you’ve installed XAMPP, download MediaWiki and unzip it into your xampp/htdocs folder. You may need 7-Zip to extract tar.gz files. Rename the mediawiki folder to wiki.

You’ll first need to create a database, which you can do by visiting /phpmyadmin/ on your localhost, typing in the database name and pressing ‘Create’.

Now go to /wiki/ and fill out the form. Make sure you select “Use superuser account” since you haven’t really created a user for your database.

Click on the “Install Mediawiki” button, and you should have a wiki.

Install WordPress

Once you’ve installed XAMPP, download WordPress and unzip it into your xampp/htdocs folder.

You’ll first need to create a database, which you can do by visiting the /phpmyadmin/ on your localhost, typing in the database name and pressing ‘Create’.

Now go to /wordpress/, click the buttons and fill out the form. Type in ‘root’ for the database username and leave the password blank. Select any password you want for the administrator account. You can now log in with this administrator password and log into the WordPress dashboard.

Install XAMPP

I’ve been going around setting up open source software a fair bit recently. To minimise the pain of explaining it, I’m putting together a short videos that explain the process.

Here’s the first, on XAMPP, which is a starting point for most open source applications. It bundles Apache (web server), MySQL (database), Perl and PHP.

To install it, search and download “XAMPP for Windows”, and press enter for every question. Then install your application under C:\xampp\htdocs. That’s it.

You are in prison

(I had intended to write this post sarcastically, a bit like my web freedom survey. But sarcasm’s confusing to read. So I’ll just be straight and mild.)

If you’re a well-paid professional in an Indian IT services firm, your freedom is limited.
(This holds if you’re a student, too.)

The last bit worries me the most. Perhaps because in all the other cases, there are humans I can put to shame or fight, face-to-face. Or because I am a Net addict. Don’t know why.

Anyway, here’s the result of my survey (after de-duplicating and eliminating results where the company or geography was not clear).

web-freedom-survey

Some day, I will follow-this up with a post on “Surviving in Prison”, detailing out my experiences with the system, and beating it.

The Calvin and Hobbes search Takedown

Eight years ago, I started typing out each of the Calvin and Hobbes strips by hand. Four years ago, I set up a site that let people search for strips. Early this month, I was asked to take it down.

This is the story.


I can’t quite remember when I started reading Calvin & Hobbes. The earliest reference I can find in my blogs is in July 1999. I remember it didn’t take me long to become a fan. I’d read every strip on the newspaper; hunt them out at bookshops; and spend a fair bit of time searching for archives online.

At some point, I discovered a few archives of the complete Calvin & Hobbes images. These aren’t hard to find, and they’re still around in plenty. So that gave me a few more months of delight.

The trouble, though, was that I never could quite find a strip when I wanted to. A friend would refuse to accept something, and I’d want to pull out that strip where Calvin declares to reside in the state of “Denial”. Or if they said something fancy, I’d want to pull out the one where Hobbes says “I notice your oeuvre is monochromatic”. Or those strips where Calvin’s Dad explains how things work (“They build bigger and bigger trucks over the bridge until it breaks.”)

There were a few Calvin and Hobbes search engines around. None quite did what I wanted them to – which was to search the text, and show me the strip, with a nice scrollable interface.

So I set out to build one. I can’t remember when, exactly, but it was before Sep 11, 2002.

It took me many years. I’d spend several train rides and evenings typing this stuff out. My friends, employers and family were a bit puzzled, but just added it to my list of eccentricities and carried on. I was halfway there in 2005, pushed further in 2006, and with some help, I managed to finally complete it.

I was able to do a lot of cool stuff with this, like statistically improbable phrases and some amusing posts as well.

It also increased traffic to my site, which was a bit disconcerting. I didn’t want to attract attention. In 2007, I removed the page from Google’s indexes, which cut the number of hits a fair bit. Since then, the site was only visited by a few people that knew of it, and the occasional stumblers.

A month ago, I got reddit-ed and MetaFiltered.

It didn’t take me long to figure that a takedown notice would be on its way. It turned out to be quite a friendly mail, actually – scary only in parts. (A bit of a carrot-and-stick approach, perhaps.) Anyway, it took me all of 2 minutes to remove all of the pages and links.

Of course, the reason I went to all of this effort was because the original Calvin & Hobbes site does not have the search feature. I’ve reached out to United Media, offering my transcripts and code. Let’s see what happens.

A sense of proportion

A quote from David Heinemeier Hansson:

So the problem is, a lot of business managers and especially business owners, they have no sense of probability. They can’t fathom that concept. So They treat the probability of 1 to 10 trillion as the same as a 1 to a 100. And like, “We’ve got to deal with this 1 to a trillion probability, because, what if it happens?”

No! Doesn’t matter! I mean, don’t care.

So as soon as that sense of probability spreads, that people can treat that reasonably, I think all this nonsense just goes away.

This lack of proportion, sadly, is at the heart of my every day problems. (Just watch the video!)