Markdress

This year, I’ve converted the bulk of my content into Markdown – a simple way of formatting text files in a way that can be rendered into HTML.

Not out of choice, really. It was the only solution if I wanted to:

  • Edit files on my iPad / iPhone (I’ve started doing that a lot more recently)
  • Allow the contents to be viewable as HTML as well as text, and
  • Allow non techies to edit the file

As a bonus, it’s already the format Github and Bitbucket use for markup.

If you toss Dropbox into the mix, there’s a powerful solution there. You can share files via Dropbox as Markdown, and publish them as web pages. There are already a number of solutions that let you do this. DropPages.com and Pancake.io let you share Dropbox files as web pages. Calepin.co lets you blog using Dropbox.

My needs were a bit simpler, however. I sometimes publish Markdown files on Dropbox that I want to see in a formatted way – without having to create an account. Just to test things, or share temporarily.

Enter Markdress.org. My project for this morning.

Just add any URL after markdress.org to render it as Markdown. For example, to render the file at http://goo.gl/zTG1q, visit http://markdress.org/goo.gl/zTG1q.

To test it out, create any text file in your Dropbox public folder, get the public link:

… and append it to http://markdress.org/ without the http:// prefix.

GarageBand in Phir Se Ud Chala

A month ago, I was at the theatre watching Ra.One. The movie was terrible, yet enjoyable. But I’m going to talk about something else – a song I heard that caught my imagination.

The song is Phir Se Ud Chala from Rockstar. Around 14 seconds into the video, you’ll hear a guitar start off at the background. That’s what caught my ear first – because I’d heard it before. Listen to this piece below:

Mystic light

I’d created this a couple of months ago with GarageBand on my iPad2. It just plays two Apple Loops one after another.

photo

The first one that you hear – Cheerful Mandolin 07 – is exactly the same background music that you hear in Phir Se Ud Chala. Guess A R Rahman uses GarageBand too!

(The strange thing is, I found no mention of this anywhere on the internet, as of 2 Dec 2011. Thought I’d have a go and be the first… just in case someone searches for Apple Loops or GarageBand in Phir Se Ud Chala from Rockstar.)

Protect static files on Apache with OpenID

I moved from static HTML pages to web applications and back to static HTML files. There’s a lot to be said for the simplicity and portability of a bunch of files. Static site generators like Jekyll are increasingly popular; I’ve built a simple publisher that I use extensively.

Web apps give you something else, though, that are still useful on a static site. Access control. I’ve been resorting to htpasswd to protect static files, and it’s far from optimal. I don’t want to know or manage users’ passwords. I don’t want them to remember a new ID. I just want to allow specific people to log in via their Google Accounts. (OpenID is too confusing, and most people use Google anyway.)

The easiest option would be to use Google AppEngine. But their new pricing worries me. Hosting on EC2 is expensive in the long run. All my hosting is now out of a shared Hostgator server that offers Apache and PHP.

So, obviously, I wrote a library protects static files on Apache/PHP using OpenID.

Download the code

 

Say you want to protect /home/www which is accessible at http://example.com/.

  1. Copy .htaccess and _auth/ under /home/www.
  2. In .htaccess, change RewriteBase to /
  3. In _auth/, copy config.sample.php into config.php, and
    1. change $AUTH_PATH to http://example.com/
    2. add permitted email IDs to function allow()

Now, when you visit http://example.com, you’ll be taken to Google’s login page. Once you log in, if your email ID is allowed , you’ll be able to see the file.

Feel free to try, or fork the code.

Codecasting

The best way to explain code to a group of people is by walking through it. If they’re far away in space or time, then a video is the next best thing.

The trouble with videos, though, is that they’re big. I can’t host them on my server – I’d need YouTube. Editing them is tough. You can’t copy & paste code from videos. And so on.

One interesting alternative is to use presentations with audio. Slideshare, for instance, lets you share slides and sync it with audio. That almost works. But it’s still not good enough. I’d like code to be stored as code.

What I really need is codecasting: a YouTube or Slideshare for code. The closest I’ve seen until day-before was etherpad or ttyrec – but neither support audio.

Enter Popcorn. It’s a Javascript library from Mozilla that, among other things, can fire events when an audio/video element reaches a particular point.

Watch a demo of how I used it for codecasting

 

A look at the code will show you that I’m using two libraries: SyntaxHighlighter to highlight the code, and Popcorn. The meat of the code I’ve written is in this subtitle function.

function subtitle(media_node, pre_node, events) {
  var pop = Popcorn(media_node);
  for (var i=0, l=events.length; i<l; i++) {
    for (var j=0, line_selector=[], line_no; line_no=events[i][1][j]; j++) {
      line_selector.push(pre_node + ' .number' + line_no)
    }
    var start = events[i][0]
      , end = i<l-1 ? events[i+1][0] : events[i][0]+999;
    (function(start, end, selector) {
      pop.code({start: start, end:end,
        onStart: function(o) { $(selector).addClass('highlighted'); },
        onEnd: function(o) { $(selector).removeClass('highlighted'); }
      })
    })(start, end, line_selector.join(','));
  }
}

When called like this:

subtitle('#audio', 'pre', [
  [ 1, [1,2,3]],
  [ 5, [4,5,6]],
  [ 9, [7,8]],
])

… it takes the #audio element, when it plays to 1 second, highlights lines 1,2,3; at 5 seconds, highlights lines 4,5,6; and so on.

Another thing that helped was that my iPad has a much better mic than my laptop, and ClearRecord is a really simple way to create recordings with minimal noise. [Note to self: sampling at 16KHz and saving as a VBR MP3 (45-85kbps) seems the best trade-off.]

With these tools, my time to prepare a tutorial went down from 4 hours to half an hour!

Javascript arrays vs objects

Summary: Arrays are a lot smaller than objects, but only slightly faster on newer browsers.

I’m writing an in-memory Javascript app that handles several thousand rows. Each row could be stored either as an array [1,2,3] or an object {"x":1,"y":2,"z":3}. Having read up on the performance of arrays vs objects, I thought I’d do a few tests on storing numbers from 0 to 1 million. The results for Chrome are below. (Firefox 7 was similar.)

  Time Size (MB)
Array: x[i] = i 2.44s 8
Object: x[i] = i 3.02s 57
Object: x["a_long_dummy_testing_string"+i]=i 4.21s 238

The key lessons for me were:

  • Browsers used to process arrays MUCH faster than objects. This gap has now shrunk.
  • However, arrays are still better: not for their speed, but for their space efficiency.
  • If you’re processing a million rows or less, don’t worry about memory. If you’re storing stuff as arrays, you can store 128 columns in 1GB of RAM (1024/8=128).

Software for my new laptop 2

Time for a new laptop, and to replace software. Here’s my new list.

A lot has changed in the last 5 years. Mainly, I use the browser, cygwin and Portable Apps a lot more. (The last is to escape jailers, not registry bloat.)

Media

  • Chrome [new]: For browsing and development. Fast, light, and stays out of the way.
  • Firefox: I keep it just for printing. Chrome sucks at printing.
  • Media Player Classic: Nothing against it, but I decided to stick to just one app, which is…
  • VLC: Continues to be the best media player, IMHO.
  • WinAmp: I just manage my playlists as M3U files, using Python programs.
  • Audacity: Still the easiest way to record audio.
  • Camstudio: The simplest free portable screen capture software I know.
  • PicPick [new]: Lightweight, powerful screenshot grabber
  • VirtualDub: Not the simplest, but still good for what I need: cropping and joining video.
  • MediaCoder [new]: Good for video/audio conversions. Maybe I’ll install this later.
  • Foxit Reader: The simples free portable PDF reader I know, better than…
  • NitroPDF Reader [new]: … which is good for Printing PDFs – better than…
  • Primo PDF: … which has trouble on rare occasions.
  • Microsoft Reader: I have a lot of ebooks in .LIT.
  • Kindle for PC [new]: I don’t own a Kindle, but I’ve bought a few ebooks.
  • Paint.NET: Good enough for cropping and adjusting colours on images.
  • Windows Live Writer [new]: The best way to write this blog WYSIWYG
  • Inkscape [new]: I occasionally edit vector graphics.
  • Google Earth. Google Maps is good enough.
  • ImgBurn: I no longer use CDs/DVDs. Just flash drives and external hard disks.
  • Picasa: I’ve stopped browsing pictures. No time.

Sharing

  • Dropbox [new]: Simplest way of sharing files.
  • Skype: I use it more than my phone.
  • Google Talk: For those friends who have chat enabled on Gmail.
  • TeamViewer [new]: Pretty efficient screen sharing. Works better than Skype, I think.
  • Google Calendar Sync: To keep Outlook in sync with Google Calendar.

Utilities

  • 7-Zip [new]: Covers all compressed formats, and has the best compression ratio.
  • WinRAR: 7-Zip has it covered.
  • AutoHotKey [new]: Shockingly powerful macro functionality. Shockingly underused.
  • Clip [new]: Command line clipboard. dir | clip copies the directory to the clipboard.
  • ClipX [new]: Stores multiple clipboard entries and history. Invaluable.
  • DiskTT [new]: I’m paranoid about disk speed. I keep measuring it.
  • WinDirStat [new]: Best way to find what’s taking up space on disk.
  • ProcessExplorer [new]: Just in case Task Manager doesn’t show you everything.
  • Google Desktop: Well, it’s dead.
  • mDesktop [new]: A Virtual Desktop Manager (multiple screens) for Windows 7.
  • PowerToys: doesn’t work on Windows 7, but I got X-Mouse working.
  • Teracopy: I don’t worry too much about copying files any more. Maybe later.
  • Junction Link Magic [new]: To map folders. But I now use Cygwin, and symlinks rock.
  • uTorrent [new]: For bittorrent.
  • ntlmaps [new]: proxies requiring a password to a proxy not requiring a password
  • Putty [new]: SSH for Windows, but can also act as an SSH tunnel
  • TrueCrypt [new]: To securely back up my bank details on the cloud.

Development

Data Visualisation

  • R [new]. The God of all statistical packages. Install reshape and ggplot2.
  • Gephi [new]: Does network visualisations quite well. 
  • GraphViz [new]: Does network visualisations not quite as well.
  • Google Refine [new]: Helps clean up messy data.
  • qhull [new]: For voronoi treemaps. Don’t ask.
  • wkhtml2pdf [new]: To print web pages as PDF.

What am I missing that you really like?

Faster data crunching

I’ve been playing with big data lately.

The good part is, it’s easy to get interesting results. The data is so unwieldy that even average value calculations provoke a “Amazing! I didn’t know that,” response (No exaggeration. I heard this from two separate ~ $1bn businesses this month.)

The bad part is that calculating even that simple average is slow.

For example, take this 40MB file (380MB unzipped) and extract the first column.

The simplest Python script to get the first column looks like this:

for row in csv.reader(fileinput.input(), delimiter='\t'):
    if len(row) &gt; 0: print row[0]

That took a good 3 minutes to execute on my laptop.

Since I’m used to UNIX data processing, I tried cut -f1. Weirdly, that’s worse. 5 minutes. Paradoxically, awk ‘{print $1}’ only takes 17 seconds. That’s about 12 times faster. Clearly the tool makes a big difference. And we always knew UNIX was fast.

But I also ran these on an Amazon EC2 server, and a Hostgator server. Here’re the results.

  python cut awk
My Dell E5400 3:04 (1x) 5:42 (0.5x) 0:17 (11x)
EC2 standard 0:33 (6x) 0:5.6 (33x) 0:16 (11x)
Hostgator 0:19 (10x) 0:2.5 (74x) 0:0.7 (265x)

What took 3 minutes with Python my Dell E5400 took less than a second on Hostgator’s server with awk. Over 250 times faster. (Not 250%. 250 times).

And it’s not just hardware. A good tool (awk) made things 11x faster on my machine. Good hardware (hostgator) made the same program 10x faster. But choosing the right combination can make things go faster than 11 x 10 = 110 times. Much faster.

There are a few of things I’m taking away from this.

  1. Good hardware can speed you up much as (or more than) choosing the right tool.
  2. Good hardware can be rented. From many places. Cheaply.
  3. Always test what’s fast. awk’s fastest on my machine and Hostgator, but not on EC2.

India district map

I put together a district map of India in SVG this weekend.

So what?

You can now plot data available at a district level on a map, like the temperature in India over the last century (via IndiaWaterPortal). The rows are years (1901, 1911, … 2001) and the columns are months (Jan, Feb, … Dec). Red is hot, green is cold.

temperature

(Yeah, the west coast is a great place to live in, but I probably need to look into the rainfall.)

districts.svg has has 640 districts (I’ve no idea what the 641st looks like) and is tagged with the State and District names as titles:

<g title="Madhya Pradesh">
  <path title="Alirajpur" d="..." />
  <path title="Jhabua" d="..." />
  ...
</g>

How?

I made it from the 2011 census map (0.4MB PDF). I opened it in Inkscape, removed the labels, added a layer for the districts, and used the paint bucket to fill each district’s area. I then saved the districts layer, cleaning it up a big. Then I labelled each district with a title. (Seemed like the easiest way to get this done.)

Thanks to @planemad, @gkjohn, @arjunram for inputs. Play around. Feedback welcome.

Formatting tables

Formatting tables in Excel is a fairly common task, but there are a number of ways to improve on the way it’s done most of the time.

Here are a few tips. Fairly basic stuff, but hopefully useful.

Eating more for less

A couple of years ago, I managed to lose a fair bit of weight. At the start of 2010, I started putting it back on, and the trajectory continues. I’m at the stage where I seriously need to lose weight.

I subscribe to The Hacker’s Diet principle – that you lose weight by eating less, not exercising.

An hour of jogging is worth about one Cheese Whopper. Now, are you going to really spend an hour on the road every day just to burn off that extra burger?

You don't exercise to lose weight (although it certainly helps). You exercise because you'll live longer and you'll feel better.

I’m afraid I’ll live too long anyway, so I won't bother exercising just yet. It's down to eating less.

Sadly, I like food. So to make my “diet” work, I need foods that add less calories per gram. Usually, when browsing stores, I check these manually. But being a geek, I figured there’s an easier way.

Below is a graph of some foods (the kind I particularly need to avoid, but still end up eating). The ones on the top add a lot of calories (per 100g), and better to avoid. The ones at the right cost a lot more. Now, I’m no longer at the point where I need to worry about food expenses, but still, I can’t quite kick the habit.

Hover over the foods to see what they are, and click on them to visit the product. (If you’re using an RSS reader and this doesn’t work, read on my site.)

(The data was picked from Tesco.)

It’s interesting that cereals are in the middle of the calorie range. I always thought they’d be low calories per gram. Turns out that if I want to to have such foods, I’m better off with desserts or ice creams (profiterole, lemon meringue or tiramisu). In fact, even jams have less calories than cereals.

But there are some desserts to avoid. Nuts are a disaster. So are chocolates. Gums, dates and honey are in the middle – about as good as cereals. Salsa dip seems surprisingly low. Custards seem to hit the sweet spot – cheap, and very low in calories. Same for jellies.

So: custards and jelly. My daughter’s going to be happy.