Markdress

This year, I’ve converted the bulk of my content into Markdown – a simple way of formatting text files in a way that can be rendered into HTML.

Not out of choice, really. It was the only solution if I wanted to:

  • Edit files on my iPad / iPhone (I’ve started doing that a lot more recently)
  • Allow the contents to be viewable as HTML as well as text, and
  • Allow non techies to edit the file

As a bonus, it’s already the format Github and Bitbucket use for markup.

If you toss Dropbox into the mix, there’s a powerful solution there. You can share files via Dropbox as Markdown, and publish them as web pages. There are already a number of solutions that let you do this. DropPages.com and Pancake.io let you share Dropbox files as web pages. Calepin.co lets you blog using Dropbox.

My needs were a bit simpler, however. I sometimes publish Markdown files on Dropbox that I want to see in a formatted way – without having to create an account. Just to test things, or share temporarily.

Enter Markdress.org. My project for this morning.

Just add any URL after markdress.org to render it as Markdown. For example, to render the file at http://goo.gl/zTG1q, visit http://markdress.org/goo.gl/zTG1q.

To test it out, create any text file in your Dropbox public folder, get the public link:

… and append it to http://markdress.org/ without the http:// prefix.

Google search via e-mail

I’ve updated Mixamail to access Google search results via e-mail.

For those new here, Mixamail is an e-mail client for Twitter. It lets you read and update Twitter just using your e-mail (you’ll have to register once via Twitter, though).

Now, you can send an e-mail to twitter@mixamail.com with a subject of “Google” and a body containing your query. You’ll get a reply within a few seconds (~20 seconds on my BlackBerry) with the top 8 search results along with the snippets.

It’s the snippets that contain the useful information, as far as I’m concerned. Just yesterday, I managed to find the show timings for Manmadan Ambu at the Ilford Cine World via a search on Mixamail. (Mixamail win, but the movie was a let down, given expectations.)

You don’t need to be registered to use this. So if you’re ever stuck with just e-mail access, just mail twitter@mixamail.com with a subject “Google”.

PS: The code is on Github.

Twitter via e-mail

Since I don’t have Internet access on my BlackBerry (because I’m in prison), I’ve had a pretty low incentive to use Twitter. Twitter’s really handy when you’re on the move, and over the last year, there were dozens of occasions where I really wanted to tweet something, but didn’t have anything except my BlackBerry on hand. Since T-Mobile doesn’t support Twitter via SMS, e-mail is my only option, and I haven’t been able to find a decent service that does what I want it to do.

So, obviously, I wrote one this weekend: Mixamail.com.

I’ve kept it as simple as I could. If I send an email to twitter@mixamail.com, it replies with the latest tweets on my Twitter home page. If I mail it again, it replies with new tweets since the last email.

I can update my status by sending a mail with “Update” as the subject. The first line of the body is the status.

I can reply to tweets. The tweets contain a “reply” link that opens a new mail and replies to it.

I can subscribe to tweets. Sending an email with “subscribe” as the subject sends the day’s tweets back to me every day at the same hour that I subscribed. (I’m keeping it down to daily for the moment, but if I use it enough, may expand it to a higher frequency.)

Soon enough, I’ll add re-tweeting and (update: added retweets on 27 Oct) a few other things. I intend keeping this free. Will release the source as well once I document it. The source code is at Github.

Give it a spin: Mixamail.com. Let me know how it goes!


For the technically minded, here are a few more details. I spent last night scouting for a small, nice, domain name using nxdom. I bought it using Google Apps for $10. The application itself is written in Python and hosted on AppEngine. I use the Twitter API via OAuth and store the user state via Lilcookies. The HTML is based on html5boilerplate, and has no images.

Automating PowerPoint with Python

Writing a program to draw or change slides is sometimes easier than doing it manually. To change all fonts on a presentation to Arial, for example, you’d write this Visual Basic macro:

Sub Arial()
    For Each Slide In ActivePresentation.Slides
        For Each Shape In Slide.Shapes
            Shape.TextFrame.TextRange.Font.Name = "Arial"
        Next
    Next
End Sub

If you didn’t like Visual Basic, though, you could write the same thing in Python:

import win32com.client, sys
Application = win32com.client.Dispatch("PowerPoint.Application")
Application.Visible = True
Presentation = Application.Presentations.Open(sys.argv[1])
for Slide in Presentation.Slides:
     for Shape in Slide.Shapes:
             Shape.TextFrame.TextRange.Font.Name = "Arial"
Presentation.Save()
Application.Quit()

Save this as arial.py and type “arial.py some.ppt” to convert some.ppt into Arial.

Screenshot of Python controlling PowerPoint

Let’s break that down a bit. import win32com.client lets you interact with Windows using COM. You need ActivePython to do this. Now you can launch PowerPoint with

Application = win32com.client.Dispatch("PowerPoint.Application")

The Application object you get here is the same Application object you’d use in Visual Basic. That’s pretty powerful. What that means is, to a good extent, you can copy and paste Visual Basic code into Python and expect it to work with minor tweaks for language syntax.

So let’s try to do something with this. First, let’s open PowerPoint and add a blank slide.

# Open PowerPoint
Application = win32com.client.Dispatch("PowerPoint.Application")
# Create new presentation
Presentation = Application.Presentations.Add()
# Add a blank slide
Slide = Presentation.Slides.Add(1, 12)

That 12 is the code for a blank slide. In Visual Basic, you’d instead say:

Slide = Presentation.Slides.Add(1, ppLayoutBlank)

To do this in Python, run Python/Lib/site-packages/win32com/client/makepy.py and pick “Microsoft Office 12.0 Object Library” and “Microsoft PowerPoint 12.0 Object Library”. (If you have a version of Office other than 12.0, pick your version.)

This creates two Python files. I rename these files as MSO.py and MSPPT.py and do this:

import MSO, MSPPT
g = globals()
for c in dir(MSO.constants):    g[c] = getattr(MSO.constants, c)
for c in dir(MSPPT.constants):  g[c] = getattr(MSPPT.constants, c)

This makes constants like ppLayoutBlank, msoShapeRectangle, etc. available. So now I can create a blank slide and add a rectangle Python just like in Visual Basic:

Slide = Presentation.Slides.Add(1, ppLayoutBlank)
Slide.Shapes.AddShape(msoShapeRectangle, 100, 100, 200, 200)

Incidentally, the dimensions are in points (1/72″). Since the default presentation is 10″ x 7.5″ the size of each page is 720 x 540.

Let’s do something that you’d have trouble doing manually in PowerPoint: a Treemap. The Guardian’s data store kindly makes available the top 50 banks by assets that we’ll use for this example. Our target output is a simple Treemap visualisation.

Treemap

We’ll start by creating a blank slide. The code is as before.

import win32com.client, MSO, MSPPT
g = globals()
for c in dir(MSO.constants):    g[c] = getattr(MSO.constants, c)
for c in dir(MSPPT.constants):  g[c] = getattr(MSPPT.constants, c)
 
Application = win32com.client.Dispatch("PowerPoint.Application")
Application.Visible = True
Presentation = Application.Presentations.Add()
Slide = Presentation.Slides.Add(1, ppLayoutBlank)

Now let’s import data from The Guardian. The spreadsheet is available at http://spreadsheets.google.com/pub?key=phNtm3LmDZEOoyu8eDzdSXw and we can get just the banks and assets as a CSV file by adding &output=csv&range=B2:C51 (via OUseful.Info).

import urllib2, csv
url = 'http://spreadsheets.google.com/pub?key=phNtm3LmDZEOoyu8eDzdSXw&output=csv&range=B2:C51'
# Open the URL using a CSV reader
reader = csv.reader(urllib2.urlopen(url))
# Convert the CSV into a list of (asset-size, bank-name) tuples
data = list((int(s.replace(',','')), b.decode('utf8')) for b, s in reader)

I created a simple Treemap class based on the squarified algorithm — you can play with the source code. This Treemap class can be fed the the data in the format we have, and a draw function. The draw function takes (x, y, width, height, data_item) as parameters, where data_item is a row in the data list that we pass to it.

def draw(x, y, w, h, n):
    # Draw the box
    shape = Slide.Shapes.AddShape(msoShapeRectangle, x, y, w, h)
    # Add text: bank name (asset size in millions)
    shape.TextFrame.TextRange.Text = n[1] + ' (' + str(int(n[0]/1000 + 500)) + 'M)'
    # Reduce left and right margins
    shape.TextFrame.MarginLeft = shape.TextFrame.MarginRight = 0
    # Use 12pt font
    shape.TextFrame.TextRange.Font.Size = 12
 
from Treemap import Treemap
# 720pt x 540pt is the size of the slide.
Treemap(720, 540, data, draw)

Try running the source code. You should have a single slide in PowerPoint like this.

Plain Treemap

The beauty of using PowerPoint as the output format is that converting this into a cushioned Treemap with gradients like below (or changing colours, for that matter), is a simple interactive process.

Treemap in PowerPoint

Random quotes generator

The Random Quotes Generator is a simple tool that creates quotes by mixing up words on a web page. The results are often funny, but sometimes surprisingly insightful.

Monkey Typing Shakespeare

Yes, this is the equivalent of a million monkeys typing Shakespeare, except that they’re using the works of Shakespeare as a starting point. And  it doesn’t have to be Shakespeare. It could be you or your friends.

To try it out, visit this page, select the link and “Add to Favorites” or drag it into your browser’s bookmark toolbar.. Then go to any web page that has a lot of text, and click the link to generate random quotes.

Here’s an example of random text from TechCrunch.

The net will find monetization models of theater and sporting events before them. Indeed, there has to be some way to create websites that do other than Advertising. The expected drop in internet advertising will rapidly lose its value and its impact, for reasons that can easily be understood.

For the technically minded, Programming Pearls has a section on Generating Text that explains the concept. The bookmarklet uses an Order-2 word-level Markov chain. Translated into English, what that means is: I look at every pair of words in and find out what word is likely to follow that.

For example, in the Generating Text page, the pair of words “we can” are followed by the words “extend”, “also”, “get” and “write” with equal probability. We pick one randomly (say “also”) and write “we can also”. Then we look at the word pair “can also”, see what word follows that, pick one at random, and so on.

This is Order-2 because we pick pairs of words. And it’s word-level rather than letter-level because we use words instead of letters as the basic building blocks.

When you’re trying it out, make sure that the page is large enough. If not, you may find that the page’s content is reproduced verbatim.

The bookmarklet is built on top of the excellent Readability bookmarklet by Arc90, which helps identify the main content to be randomized.

Motion charts in Excel

Creating motion charts in Excel is a simple four-step process.

  1. Get the data in a tabular format with the columns [date, item, x, y, size]
  2. Make a “today” cell, and create a lookup table for “today”
  3. Make a bubble chart with that lookup table
  4. Add a scroll bar and a play button linked to the “today” cell

For the impatient, here’s a motion chart spreadsheet that you can tailor to your needs.
For the patient and the puzzled, here’s a quick introduction to bubble and motion charts.

What is a bubble chart?

A bubble chart is a way of capturing 3 dimensions. For example, the chart below could be the birth, literacy rate and population of countries (X-axis, Y-axis and size). Or the growth, margin and market cap of companies.

Example of a bubble chart

It lets you compare three dimensions at a glance. The size dimension is a different from the X and Y axes, though. It’s not easy to compare differences in size. And the eye tends to focus on the big objects. So usually, size is used highlight important things, and the X and Y axes used to measure the performance of these things.

If I were to summarise bubble charts in a sentence, it would be: bubble charts show the performance of important things (in two dimensions). (In contrast, Variwide charts show the same on one dimension.)

Say you’re a services firm. You want to track the productivity of your most expensive groups (“the important things”). Productivity is measured by 2 parameters: utilisation and margin. The bubble chart would then have the expense of each group as the size, and its utilisation and contribution as the X and Y axes.

What is a motion chart?

Motion charts are animated bubble charts. They track the performance of important things over time (in two dimensions). This is chart with 4 dimensions. But not all data with 4 dimensions can be plotted as a motion chart. One dimension has to be time, and another has to be linked to the importance of the item.

 

Motion charts were pioneered by Hans Rosling and his TED Talk shows you the true power of motion charts.

How do I create these charts?

Use the Motion Chart Gadget to display any of your data on a web page. Or use Google Spreadsheets if you need to see the chart on a spreadsheet: motion charts are built in.

If you or your viewer don’t have access to these, and you want to use Excel, here’s how.

1. Get the data in a tabular format

Get the data in the format below. You need the X, Y and size for each thing, for each date.

Date Thing X Y Size
08/02/2009 A 64% 11% 1
08/02/2009 B 14% 33% 2
08/02/2009 C 78% 55% 3
08/02/2009 D 57% 73% 4
08/02/2009 E 39% 32% 5
08/02/2009 F 40% 81% 6
09/02/2009 A 64% 12% 1
09/02/2009 B 14% 33% 2
09/02/2009 C 78% 56% 3
09/02/2009 D 57% 73% 4
09/02/2009 E 39% 32% 5
09/02/2009 F 40% 81% 6
..

To make life (and lookups) easier, add a column called “Key” which concatenates the date and the things. Typing “=A2&B2” will concatenate cells A2 and B2. (Red cells use formulas.)

Date Thing Key X Y Size
08/02/2009 A 39852A 64% 11% 1
08/02/2009 B 39852B 14% 33% 2
08/02/2009 C 39852C 78% 55% 3
08/02/2009 D 39852D 57% 73% 4

2. Make a “today” cell, and create a lookup table for “today”

Create a cell called “Offset” and type in 0 as its value. Add another cell called Today whose value is the start date (08/02/2009 in this case) plus the offset (0 in this case)

Offset 0 (Just type 0)
Today 08/02/2009 Use a formula: =STARTDATE + OFFSET

Now, if you change the offset from 0 to 1, “Today” changes to 09/02/2009. By changing just this one cell, we can create a table that holds the bubble chart details for that day, like below.

Thing X Y Size Formula
A 44% 19% 1

X =VLOOKUP(TODAY & THING, DATA, 2, 0)

Y =VLOOKUP(TODAY & THING, DATA, 3, 0)

Size =VLOOKUP(TODAY & THING, DATA, 4, 0)

B 6% 13% 2
C 90% 71% 3
D 41% 61% 4
E 59% 40% 5
F 16% 77% 6

Check out my motion chart spreadsheet to see how these are constructed.

3. Make a bubble chart with that lookup table

This is a simple Insert – Chart. Go through the chart types and select bubble. Play around with the data selection until you get the X, Y and Size columns right.

Example of a bubble chart

4. Add a scroll bar and a play button linked to the “today” cell

Now for the magic. Add a scroll bar below the chart.
Excel 2007 users: Go to Developer – Insert and add a scroll bar.
Excel 2003 users: Go to View – Toolbars – Control Toolbox and add a scroll bar

Right click on the scroll bar, go to Format Control… and link the scroll bar to the “Offset” cell. Now, as you move the scroll bar, the value in the offset cell will change to reflect it. So the “today” cell will change too. So will the lookup table. And so will the chart.

Next, create a button called “Play” and edit its code.
Excel 2007 users: Right click the button, go to Developer – View Code.
Excel 2003 users: Right click the button and select View Code.

Type in the following code for the button’s click event:

Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
 
Sub Button1_Click()
    Dim i As Integer
    For i = 0 To 40:            ' Replace 40 with your range
        Range("J1").Value = i   ' Replace J1 with your offset cell
        Application.Calculate
        Sleep (100)
    Next
End Sub

Now clicking on the Play button will give you this glorious motion chart in Excel:

twofifty.org

It’s been a good movie month for me, and I’ve managed to nudge closer to my target of watching the IMDb Top 250.

But one tool I had in the past, that I sorely miss, is twofifty.org. It’s a now-defunct site that kept track of the IMDb Top 250, and let you strike off the movies that you had watched. You could see which movies you hadn’t seen, keep score, and discuss the movies.

Since it’s demise, my movie watching slowed down as well.

Earlier this month, I set up a similar site at 250.s-anand.net. It has the same basic function. You can log in, strike out movies that you’ve seen, and keep track of what’s left to see. For the more technically minded, the source-code is at two-fifty.googlecode.com.

Visit 250.s-anand.net

Happy movie tracking, and looking forward to your suggestions.

Dilbert search engine

Wouldn’t it be cool to be able to search through the Dilbert archives using text?

This used to be possible at Dilbert.com some years ago, as a paid service. In late 2003, I needed to find some Dilbert strips for a client, so I’d subscribed for a year. I could then search for the quotes (I happened to be looking for "outsourcing", so you can guess the context).

But I can’t seem to find the feature any more, even as a paid service. The site looks a lot better, of course. But I can’t find strips.

Well, why not type them out? After all, I’d done that with Calvin and Hobbes.

This would be a much larger exercise, though. And I’m hoping to take your help. I’ve set up a site at dilbert-search.appspot.com. You can type in a comic randomly, starting from 2000. These will be made searchable on my Dilbert page. You can export the data and use it yourself, of course.

When typing in Calvin and Hobbes, I did have a few volunteers willing to pitch in, but collaboration tools weren’t easy to set up, and I ended up typing the whole thing myself. This time, I’d be delighted if even 10 people typed in just a strip each.


So, here’s my request, to all you Dilbert fans.

  1. Please go to dilbert-search.appspot.com
  2. Log in using your Google account and type in as many strips as you like
  3. Bookmark it for the future, whenever you’re bored

As I said, the data is readily exportable from the page, so if you’re looking to do cool mash-ups with it, great! And if you want the data exported in other formats, please let me know.

Incidentally, I created the site using Google AppEngine. The source code is at dilbert-search.googlecode.com.

Animated charts in Excel

Watch Hans Rosling‘s TED Talks on debunking third world myths and new insights on poverty and ask yourself: could I do this with my own data?

Yes. Google has a gadget called MotionChart that lets you do this.

Now, you could put this up on your web page, but that’s not quite useful when presenting to a client. (It is shocking, but there are many practical problems getting an Internet connection at a client site. The room doesn’t have a connection. The cable isn’t long enough. You can’t access the LAN. Their proxy requires authentication. The connection is too slow. Whatever.)

So you need this in Excel. Let me explain a variant of the technique I described earlier.

Let’s start by creating a simple bubble chart.

For each item in a bubble chart, you need 3 pieces of data: the X-axis, Y-axis and size. This graph shows three items A, B and C in one year: 2001. To animate this, you need data for more years, so let’s create that.

The first 3 rows contain the same data as before, except that I’ve added a "Year" column and a "Key" column (which is just a concatenation of the Year and the Item). The data now goes on for many more years.

Now we need to create a scroll bar that can be used to change the year. So add a scroll bar below the bubble chart…

… and right click the scroll bar and go to Format Control. Now, select the cell link to some cell ($H$1 in this case). Now, if you move the scroll bar, the cell value will change.

All you need to do is to now change the source data for the chart based on the year. From the table on the left, VLOOKUP the year + item, and put this into the table on the right. When the year in the cell H1 changes, the data updates itself. So now, as you move the scroll bar, cell H1 changes, then so does the data and hence the graph.

This is what the animation looks like.

And here’s the Excel file.

Statistically improbable phrases on Google AppEngine

I read about Google AppEngine early this morning, and applied for an invite. Google’s issuing beta invites to the first 10,000 users. I was pretty convinced I wasn’t among those, but turns out I was lucky.

AppEngine lets you write web apps that Google hosts. People have been highlighting that it give you access to the Google File System and BigTable for the first time. But to me, that isn’t a big deal. (I’m not too worried about reliability, and MySQL / flat files work perfectly well for me as a data store.)

What’s more interesting unlike Amazon’s EC2 and S3, this is free up to a certain quota. And you get a fair bit of processing power and bandwidth for free. One of the reasons I’ve held back on creating some apps was simply because it would take away too much bandwidth / CPU cycles from my site. (I’ve had this problem before.) Google quota is 10 GB of bandwidth per day (which is about 30 times what my site uses). And this is on Google’s incredibly fast servers It also offers 200 million megacycles a day. That’s like a dedicated 2.3 GHz processor (200 million megacycles = 200,000 GHz x 1 second ~ 2.3 GHz x 86,400 seconds/day) — better, because this is the average capacity, not peak capacity. The only restriction that really worries me is that only 3 apps are allowed per developer.

So I decided to give a shot at publishing some code I’d kept in reserve for a long time. You may remember my statistical analysis of Calvin & Hobbes. For this, I’d created a script in Perl that could generate Statistically Improbable Phrases (SIPs) for any text. This is based on (a somewhat limited) 23MB corpus of ebooks that I had. I’d wanted to put that up on my website, but …

AppEngine only uses Python. So the first task was to get Python, and then to learn Python. The only saving grace was that I was just cutting-and-pasting most of the time. Google wasn’t helping:

Google AppEngine Over Quota Error

Anyway, the site is up. You can view it at sip.s-anand.net for now. Just type a URL, and it’ll tell you the improbable words in that site.

Visit sip.s-anand.net

Technical notes

I realise that these are statistically improbable words, not phrases. I’ll get to the phrases in a while.

The logic is simple:

  • Get the frequency of words in a corpus. I pre-generated this file. It has over 100,000 words.
  • Get the URL as text. Rather than muck around with Python, I decided to use the W3 html2txt service.
  • Convert the text to words. Splitting text into words is tricky. For now, I’m simply assuming that any group of letters is a word, and anything that’s not a letter is a word delimiter.
  • Find the relative frequency (improbability) of words. This is the frequency in the URL divided by the frequency in the corpus, normalised (i.e. scale it so that the maximum value is 1.0).
  • Create a tag cloud. I use the word frequency as the size and the improbability as the colour. You need a bit of mathematical jugglery to get the pattern right. Right now, I’m taking the 6th root of the improbability and the logarithm of the frequency to get a reasonably smooth tag cloud.

The source code is at statistically-improbable-phrases.googlecode.com.

Update: 12-Apr-2008. I’ve added some interactivity. You can play with the contrast and font size, the filter out common or infrequent words.

Update: 22-Apr-2008. Added concordance. You can click on a word and see the context in which it appears.