Shopping with Cooliris

I just put together this little demo that scrapes John Lewis’ site and creates a MediaRSS file out of it. CoolIris has got to be the best way to shop. Apart from being really pretty, it’s quite useful when you know what something looks like, but don’t quite know how to search for it. For example, I was trying to look for a headphone-microphone (you know, the ones that connect into an iPhone or a Blackberry). I didn’t have a clue what it’s called. (TRRS, if you’re interested. I found out later.) The only way I could get it was to browse the wall… ...

ImportHtml doesn’t auto-refresh

A cool thing about Google Spreadsheets is that you can scrape websites using external data functions like importHtml. It’s really easy to use. The formula: =importHtml("http://www.imdb.com/chart/top", "table", 1) imports the Internet Movie Database top 250 table on to Google Spreadsheets. Since you can publish these as RSS feeds, it ought to, in theory, be a great way of generating RSS feeds out of arbitrary content. There’s just one problem: it doesn’t auto update. There are claims that it does every hour. Maybe it does when the sheet is open. I don’t know. But it definitely does not when the sheet is closed. I wrote a simple script that logs the time at which the script was accessed, and prints the log every time it is accessed. ...

Command line alarm

When I’m in front of my laptop, I usually forget the world around. Sadly, the world around has important things that need to get done on time. Like eating medicines, turning off the washing machine or the hob, etc. The one thing I’ve been lacking on my machine was a simple alarm system. I’d like to set an alarm to remind me to do something in 5 minutes, for example. And it should be dead simple to set up. ...

SSH Tunneling through web filters

You can defeat most web filters by spending around 8 cents/hr 0 cents/hr on Amazon EC2. (It’s usually worth the money. It’s a fraction of the cost a phone call or a sandwich. And I usually end up wasting that money anyway on calling someone or eating my way out of the misery of corporate proxies.) Most web filters and proxies block all ports except the HTTP port (80) and the HTTPS port (443). But it’s used to carry encrypted traffic, and, as Mark explains: ...

Open source in corporates

Last month, my first application went live. I’ve been writing code for 20 years. Not one line of my code has been officially deployed in a corporate. (Loser…) It’s a happy feeling. Someone defined happiness as the intersection of pleasure and meaning. Writing code is pleasurable. Others using it is meaningful. But this post isn’t quite about that. It’s about the hoops I’ve had to jump through to make this happen. I’ve been living in a nightmare since March 2009. That was when I decided that I’d try and get corporates to use open source. March 2009 It began with a pitch to a VC firm. They were looking to build a content management system (CMS). Normally we’d pull together slides that say we’ll deliver the moon. This time, we put together demo based on WordPress’ CMS plugins. The meeting went fabulously well. We said, “Here’s a demo we’ve built for you. Do you like it?” The business lead (Stuart) was drooling and declared that that’s exactly what they wanted. The IT lead (another Stuart) was happy too, but warned the business users: “Just remember: this isn’t how we do development, so don’t get your hopes up that we can deliver stuff like this :-)” Time to make my point. I asked, “What’s your policy on open source software?” The business lead went quiet. “I don’t know,” he finally said. Fair enough. I turned to the IT lead. “Well, we don’t use it as a matter of policy… there are security concerns…” he said. “Which web server do you use?” ”Oh, OK. I see what you mean. We use Apache. So on a case to case basis, we have exceptions. But generally we have security concerns.“ ”Why? Do you believe open source software is more insecure than commercial software?“ He thought about it for a while. “Well… maybe. I don’t know.” We debated this a bit. Then we found the real issue: “It’s just that we don’t have control over the process. We don’t know enough about it to decide.” A couple of weeks later, I tried pitching to a newspaper company. This time, it was our sales team that raised the same question. “But… isn’t open source insecure?” I didn’t even bother pitching any open source stuff to them. But I’d learnt my lessons: ...

Inline form validation

A List Apart’s article on Inline Validation is one of the most informative I’ve read in a while — and it’s backed by solid data. Some useful lessons: Inline validation can reduce form completion time by 40% Use inline validations where the user doesn’t know if they’ll get it wrong (e.g. is a username available?). Don’t use them if user knows the answer (e.g. their name) Validate on blur, not on keypress (it’s distracting, and users can’t multitask) Comments jesse 25 Sep 2009 4:15 pm: maybe u should add some inline validation on your comments form, instead of the wordpress error page?

Round buttons with Python Image Library

After much hunting, I finally settled on Hedger Wang’s simple round CSS links as the most acceptable cross-browser round button implementation. The minified CSS is about 2.5KB, and the syntax is very simple. To make an input button into a round button, just wrap it within a <span class="button">: <span class="button"><input type="submit"></span> … and it’s just as easy to convert a link into a rounded button: <a class="button" href=”/”><span>Home</span></a> It works by using a transparent PNG / GIF that looks like this: ...

Error logging with Google Analytics

A quick note: I blogged earlier about Javascript error logging, saying that you can wrap every function in your code (automatically) in a try{} catch{} block, and log the error message in the catch{} block. I used to write the error message to a Perl script. But now I use Google’s event tracking. var s = []; for (var i in err) s.push(i + "=" + err[i]); s = s.join(" ").substr(0, 500); pageTracker._trackEvent("Error", function_name, s); The good part is that it makes error monitoring a whole lot easier. Within a day of implementing this, I managed to get a couple of errors fixed that had been pending for months. ...

Short URLs

With all the discussion around URL shorteners, Diggbar, blocking it, and the rev=canonical proposal, I decided to implement a URL shortening service on this blog with the least effort possible. This probably won’t impact you just yet, but when tools become more popular and sophisticated, it would hopefully eliminate the need for tinyurl, bit.ly, etc. Since the blog runs on WordPress, every post has an ID. The short URL for any post will simply be http://www.s-anand.net/the_ID. For example, http://s-anand.net/17 is a link to post on Ubuntu on a Dell Latitude D420. At 21 characters, it’s roughly the same size as most URL shorteners could make it. ...

Automating PowerPoint with Python

Writing a program to draw or change slides is sometimes easier than doing it manually. To change all fonts on a presentation to Arial, for example, you’d write this Visual Basic macro: Sub Arial() For Each Slide In ActivePresentation.Slides For Each Shape In Slide.Shapes Shape.TextFrame.TextRange.Font.Name = "Arial" Next Next End Sub If you didn’t like Visual Basic, though, you could write the same thing in Python: import win32com.client, sys Application = win32com.client.Dispatch("PowerPoint.Application") Application.Visible = True Presentation = Application.Presentations.Open(sys.argv[1]) for Slide in Presentation.Slides: for Shape in Slide.Shapes: Shape.TextFrame.TextRange.Font.Name = "Arial" Presentation.Save() Application.Quit() Save this as arial.py and type “arial.py some.ppt” to convert some.ppt into Arial. ...

WordPress themes on Live Writer

One of the reasons I moved to WordPress was the ability to write posts offline, for which I use Windows Live Writer most of the time. The beauty of this is that I can preview the post exactly as it will appear on my site. Nothing else that I know is as WYSIWYG, and it’s very useful to be able to type knowing exactly where each word will be. The only hitch is: if you write your own WordPress theme, Live Writer probably won’t be able to detect your theme — unless you’re an expert theme writer. ...

Client side scraping for contacts

By curious coincidence, just a day after my post on client side scraping, I had a chance to demo this to a client. They were making a contacts database. Now, there are two big problems with managing contacts. Getting complete information Keeping it up to date Now, people happy to fill out information about themselves in great detail. If you look at the public profiles on LinkedIn, you’ll find enough and more details about most people. ...

Client side scraping

“Scraping” is extracting content from a website. It’s often used to build something on top of the existing content. For example, I’ve built a site that tracks movies on the IMDb 250 by scraping content. There are libraries that simplify scraping in most languages: Perl: WWW::Mechanize Python: BeautifulSoup Ruby: HPricot PHP: XPath (built-in) Javascript: jQuery on env.js on Rhino But all of these are on the server side. That is, the program scrapes from your machine. Can you write a web page where the viewer’s machine does the scraping? ...

Infyblogs dashboard

I just finished Stephen Few’s book on Information Dashboard Design. It talks about what’s wrong with the dashboards most Business Intelligence vendors (Business Objects, Oracle, Informatica, Cognos, Hyperion, etc.), and brings Tuftian principles of chart design to dashboards. So I took a shot at designing a dashboard based on those principles, and made this dashboard for InfyBLOGS. You can try for yourself. Go to http://www.s-anand.net/reco/ Note: This only works within the Infosys intranet. Right click on the “Infyblog Dashboard” link and click “Add to Favourites…” (Non-IE users – drag and drop it to your links bar) If you get a security alert, say “Yes” to continue Return to InfyBLOGS, make sure you’re logged in (that’s important) and click on the “Infyblog Dashboard” bookmark You’ll see a dashboard for your account, with comments and statistics The rest of this article discusses design principles and the technology behind the implementation. (It’s long. Skim by reading just the bold headlines.) ...

To Python from Perl

I’ve recently switched to Python, after having programmed in Perl for many years. I’m sacrificing all my knowledge of the libraries and language quirks of Perl. The reason I moved despite that is for a somewhat trivial reason, actually. It’s because Python doesn’t require a closing brace. Consider this Javascript (or very nearly C or Java) code: var s=0; for (var i=0; i<10; i++) { for (var j=0; j<10; j++) { s = s + i * j } } That’s 6 lines, with two lines just containing the closing brace. Or consider Perl. ...

Bound methods in Javascript

The popular way to create a class in Javascript is to define a function and add methods to its prototype. For example, let’s create a class Node that has a method hide(). var Node = function(id) { this.element = document.getElementById(id); }; Node.prototype.hide = function() { this.style.display = "none"; }; If you had a header, say Heading, then this piece of code will hide the element. var node = new Node("header"); node.hide(); If I wanted to hide the element a second later, I am tempted to use: var node = new Node("header"); setTimeout(node.hide, 1000); … except that it won’t work. setTimeout has no idea that the function node.hide has anything to do with the object node. It just runs the function. When node.hide() is called by setTimeout, the this object isn’t set to node, it’s set to window. node.hide() ends up trying to hide window, not node. ...

Downloading online songs

You know those songs on Raaga, MusicIndiaOnline, etc? The ones you can listen to but can’t download? Well, you can download them. It’s always been possible to download these files. After all, that’s how you get to listen to them in the first place. What stopped you is security by obscurity. You didn’t know the location where the song was stored, but if you did, you could download them. So how do you figure out the URL to download the file from? ...

Keyword searches as a Web command line

Andre’s mentions dumping Google Chrome because of lack of extension support, especially Ubiquity, and lists 15 useful Ubiquity commands. If you haven’t seen Ubiquity, you should. It’s a great extension that transforms your browser into an Internet command prompt. It is modelled on the Enso Launcher, which is a great piece of work by itself. I wasn’t quite prepared to let go of Chrome that easily. On Task Manager, seeing 10 Chrome processes, the largest of which takes up 60MB, is a lot more comforting, psychologically, than 1 Firefox process taking up 300MB. (I rarely hit my 1GB RAM limit, so it shouldn’t matter either way. Yet, the spendthrift in me keeps watching.) ...

Caching pages on Apache

I don’t use any blogging software for my site. I just hand-wired it some years ago. When doing this, one of the biggest problems was caching. Consider each blog entry page. Each page has the same template, but different content. Both the template and content could be changed. So ideally, blog pages should be served dynamically. That is, every time someone requests the page, I should look up the content, look up the template, and put them together. ...

In search of a good editor

It's amazing how hard it is to get a good programming editor. I've played around with more editors/IDEs than I care to remember: e Notepad++ NoteTab SciTE Crimson Editor Komodo Eclipse Aptana ... There are four features that are critical to me. Syntax highlighting. Over time, I've found this to increase readability dramatically. Look at this piece of code with and without syntax highlighting: Doesn't the structure of the document just jump out with syntax highlighting? Anyway, I've gotten used to that. Column editing. I want to be able to do this: Being able to type across rows is incredibly useful. I use it both for programming as well as to complement data-processing on Excel. Unicode support. I often work with non-ASCII files, particularly in Tamil. Unicode support comes in handy when debugging pages for my songs site. Auto-completion. This is 10 times more productive than having to look up the manual for each function. (Oh, and it's got to be free too. Except for e Text Editor, all the others qualify.) ...