Auto reloading pages

After watching Bret Victor’s Inventing on Principle, I just had to figure out a way of getting live reloading to work. I know about LiveReload, of course, and everything I’ve heard about it is good. But their Windows version is in alpha, and I’m not about to experiment just yet. This little script does it for me instead: (function(interval, location) { var lastdate = ""; function updateIfChanged() { var req = new XMLHttpRequest(); req.open("HEAD", location.href, false); req.send(null); var date = req.getResponseHeader("Last-Modified"); if (!lastdate) { lastdate = date; } else if (lastdate != date) { location.reload(); } } setInterval(updateIfChanged, interval); })(300, window.location); It checks the current page every 300 milliseconds and reloads it if the Last-Modified header is changed. I usually include it as a minified script: ...

Windows XP virtual machine

Here’s the easiest way to set up a Windows XP virtual machine that I could find. (This is useful if you want to try out programs without installing it on your main machine; test your code on a new machine; or test your website on IE6 / IE7 / IE8.) Go to the Virtual PC download site. (I tried VirtualBox and VMWare Player. Virtual PC is better if you’re running Windows on Windows.) If you have Windows 7 Starter or Home, select “Don’t need XP Mode and want VPC only? Download Windows Virtual PC without Windows XP Mode.” If you have Windows Vista or Windows 7, select “Looking for Virtual PC 2007?” Download it. (You may have to jump through a few hoops like activation.) Download Windows XP and run it to extract the files. (It’s a 400MB download.) Open the “Windows XP.vmc” file – just double-clicking ought to work. At this point, you have a working Windows XP version. (The Administrator password is “Password1”.) Under Tools – Settings – Networking – Adapter 1, select “Shared Networking (NAT)” That’s pretty much it. You’ve got a Windows XP machine running inside your other Windows machine. ...

Inspecting code in Python

Lisp users would laugh, since they have macros, but Python supports some basic code inspection and modification. Consider the following pieces of code: margin = lambda v: 1 - v['cost'] / v['sales'] What if you wanted another function that lists all the dictionary indices used in the function? That is, you wanted to extract cost and sales? This is a real-life problem I encountered this morning. I have 100 functions, each defining a metric. For example, ...

Restartable and Parallel

When processing data at a large scale, there are two characteristics that make a huge difference to my life. Restartability. When something goes wrong, being able to continue from where it stopped. In my opinion, this is more important than parallelism. There’s nothing as depressing as having to start from scratch every time. Think of it as the ability to save a game as opposed to starting from Level 1 in every life. ...

Colour spaces

In reality, a colour is a combination of light waves with frequencies between 400-700THz, just like sound is a combination of sound waves with frequencies from 20-20000Hz. Just like mixing various pure notes produces a new sound, mixing various pure colours (like from a rainbow) produces new colours (like white, which isn’t on the rainbow.) Our eyes aren’t like our ears, though. They have 3 sensors that are triggered differently by different frequencies. The sensors roughly peak around red, green and blue. Roughly. ...

Is Protocol buffers worth it?

Google’s Protocol Buffers is a “language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler” XML is slow and large. There’s no doubting that. JSON’s my default alternative, though it’s a bit large. CSV’s ideal for tabular data, but ragged hierarchies are a bit difficult. I was trying to see if Protocol Buffers would be smaller and faster, at least when using Python. I took JSON as the base, and checked the write speed, read speed and file sizes. Here’s the comparison: ...

Audio data URI

Turns out that you can use data URIs in the <audio> tag. Just upload an MP3 file to http://dataurl.net/#dataurlmaker and you’ll get a long string starting with data:audio/mp3;base64... Insert this into your HTML: <audio controls src=”data:audio/mp3;base64...”> That’s it – the entire MP3 file is embedded into your HTML page without requiring additional downloads. This takes a bit more bandwidth than the MP3, and won’t work on Internet Explorer. But for modern browsers, and small audio files, it reduces the overall load time – sort of like CSS sprites. ...

Markdress

This year, I’ve converted the bulk of my content into Markdown – a simple way of formatting text files in a way that can be rendered into HTML. Not out of choice, really. It was the only solution if I wanted to: Edit files on my iPad / iPhone (I’ve started doing that a lot more recently) Allow the contents to be viewable as HTML as well as text, and Allow non techies to edit the file As a bonus, it’s already the format Github and Bitbucket use for markup. ...

Protect static files on Apache with OpenID

I moved from static HTML pages to web applications and back to static HTML files. There’s a lot to be said for the simplicity and portability of a bunch of files. Static site generators like Jekyll are increasingly popular; I’ve built a simple publisher that I use extensively. Web apps give you something else, though, that are still useful on a static site. Access control. I’ve been resorting to htpasswd to protect static files, and it’s far from optimal. I don’t want to know or manage users’ passwords. I don’t want them to remember a new ID. I just want to allow specific people to log in via their Google Accounts. (OpenID is too confusing, and most people use Google anyway.) ...

Codecasting

The best way to explain code to a group of people is by walking through it. If they’re far away in space or time, then a video is the next best thing. You can recommend them to try out the best coding apps as well. The trouble with videos, though, is that they’re big. I can’t host them on my server – I’d need YouTube. Editing them is tough. You can’t copy & paste code from videos. And so on. One interesting alternative is to use presentations with audio. Slideshare, for instance, lets you share slides and sync it with audio. That almost works. But it’s still not good enough. I’d like code to be stored as code. What I really need is codecasting: a YouTube or Slideshare for code. The closest I’ve seen until day-before was etherpad or ttyrec – but neither support audio. Enter Popcorn. It’s a Javascript library from Mozilla that, among other things, can fire events when an audio/video element reaches a particular point. ...

Javascript arrays vs objects

Summary: Arrays are a lot smaller than objects, but only slightly faster on newer browsers. I’m writing an in-memory Javascript app that handles several thousand rows. Each row could be stored either as an array [1,2,3] or an object {"x":1,"y":2,"z":3}. Having read up on the performance of arrays vs objects, I thought I’d do a few tests on storing numbers from 0 to 1 million. The results for Chrome are below. (Firefox 7 was similar.) ...

Server speed benchmarks

Yesterday, I wrote about node.js being fast. Here are some numbers. I ran Apache Benchmark on the simplest Hello World program possible, testing 10,000 requests with 100 concurrent connections (ab -n 10000 -c 100). These are on my Dell E5400, with lots of application running, so take them with a pinch of salt. PHP5 on Apache 2.2.6 <?php echo “Hello world” ?> 1,550/sec Base case. But this isn’t too bad Tornado/Python See Tornadoweb example 1,900/sec Over 20% faster Static HTML on Apache 2.2.6 Hello world 2,250/sec Another 20% faster Static HTML on nginx 0.9.0 Hello world 2,400/sec 6% faster node.js 0.4.1 See nodejs.org example 2,500/sec Faster than a static file on nginx! I was definitely NOT expecting this result… but it looks like serving a static file with node.js could be faster than nginx. This might explain why Markup.io is exposing node.js directly, without an nginx or varnish proxy. ...

Why node.js

I’ve moved from Python to Javascript on the server side – specifically, Tornado to Node.js. Three years ago, I moved from Perl to Python because I got free hosting at AppEngine. Python’s a cleaner language, but that was not enough to make me move. Free hosting was. Initially, my apps were on AppEngine, but that wouldn’t work for corporate apps, so I tried Django. IMHO, Django’s too bulky, has too much “magic”, and templates are restrictive. Then I tried Tornado: small; independent modules; easy to learn. I used it for almost 2 years. ...

HTML 4 & 5: The complete Reference

HTML 4 & 5: The Complete Reference is an iPhone / iPad app that does exactly what it says: a reference for HTML 4 and 5. It has a list of all tags, clearly demarcated as HTML4, HTML5 or both. The application is fairly easy to scroll through to find the tag or attribute you want. Clicking on a tag, you get: a brief description of what it’s for what attributes are valid – the good part is you can see clearly which attributes are specific to the element, and which ones are common (like class, id, etc.). You can also see the possible values for the attribute, which helps. and an example of how the tag is used. The examples are quite simplistic, and there’s only one per tag, but it does have a rendered version of the code, which helps. You can also scroll through the list of attributes and see which tags they’re valid for. ...

Yahoo Clues API

Yahoo Clues is like Google Insights for Search. It has one interesting thing that the latter doesn’t though: search flows. It doesn’t have an official API, so I thought I’d document the unofficial one. The API endpoint is http://clues.yahoo.com/clue The query parameters are: q1 – the first query string q2 – the second query string ts – the time span. 0 = today, 1 = past 7 days, 2 = past 30 days tz – time zone? Not sure how it works. It’s just set to “0” for me s – the format? No value other than “j” seems to work So a search for “gmat” for the last 30 days looks like this: ...

Automated image enhancement

There are some standard enhancements that I apply to my photos consistently: auto-levels, increase saturation, increase sharpness, etc. I’d also read that Flickr sharpens uploads (at least, the resized ones) so that they look better. So last week, I took 100 of my photos and created 4 versions of each image: The base image itself (example) A sharpened version (example). I used a sharpening factor of 200% A saturated version (example). I used a saturation factor of 125% An auto-levelled version (example) I created a test asking people to compare these. The differences between these are not always noticeable when placed side-by-side, so the test flashed two images at the same place. ...

Shortening sentences

When writing Mixamail, I wanted tweets automatically shortened to 140 characters – but in the most readable manner. Some steps are obvious. Removing redundant spaces, for example. And URL shortening. I use bit.ly because it has an API. I’ll switch to Goo.gl, once theirs is out. I tried a few more strategies: Replace words with short forms. “u” for “you”, “&” for and, etc. Remove articles – a, an, the Remove optional punctuation – comma, semicolon, colon and quotes, in particular Replace “one” with “1”, “to” or “too” with 2, etc. “Before” becomes “Be4”, for example Remove spaces after punctuations. So “a, b” becomes “a,b” – the space after the comma is removed Remove vowels in the middle. nglsh s lgbl wtht vwls. How did they pan out? I tested out these on the English sentences on the Tanaka Corpus, which has about 150,000 sentences. (No, they’re not typical tweets, but hey…). By just doing these, independently, here is the percentage reduction in the size of text: ...

HTML5: Up and Running

HTML5: Up and Running is the book version of Mark Pilgrim’s comprehensive introduction to HTML5 at DiveIntoHTML5.org. Whether you buy the book or read it online, it’s the best introduction to the topic you’ll find. Mark begins with the history of HTML5 (using email archaeology, as he calls it). You’d never guess that many of the problems we have with XHTML, MIME types, etc. have roots in discussions over 20 years ago. From then on, he moves into feature detection (which uses the Modernizr library), new tags, canvas, video, geo-location, storage, offline web apps, new form features and microdata. Each chapter can be read independently – so if you’re planning to use this as a reference, you may be better of reading the links kept up-to-date at DiveIntoHTML5.org. If you’re interesting in learning about the features, it’s a very readable book, terse, simple, and above all, delightfully intelligent. ...

Modular CSS frameworks

A fair number of the CSS frameworks I’ve seen – Blueprint, Tripoli, YUI, SenCSS – are monolithic. What I’d like is to be able to mix and match specific components of these. For example, 960.gs has a simple grid system that I’d love to combine with the vertical rhythm that SenCSS offers. (Vertical rhythm ensures that sentences align vertically.) I’d love to have a CSS framework that just sets the fonts, for example, and touches nothing else. Or something that defines the colour schemes, and lets you change the theme like Microsoft Office does. ...

Make backgrounds transparent

This is the simplest way that I’ve found to make the background colour of an image transparent. Download GIMP Open your image. I’ll pick this one: Optional: Select Image – Mode – RGB if it’s not RGB. Select Colors – Colors to Alpha… Click on the white button next to “From” and select the eye-dropper. Pick the green colour on the image, and click OK The anti-aliasing is preserved as well. ...