Motorbike science lab

My cousin’s working on an interesting project at the Agastya Foundation. A group of scientifically inclined volunteers go around on a bike to schools, taking with them a science lab kit, and show children in rural schools a variety of experiments.

Google will award this and 3 other projects (out of 10) Rs 3 crores based on public votes. You can vote for and read more at|vote


We are often subject to body searches, baggage inspections, and identity verifications. At malls. At airports. At offices.

These are to ensure that no one carries ammunition inside, or goods or secrets outside. In other words, to deter terrorists and thieves.

It’s nothing personal, of course. When someone does not know me, I can choose to accept that (or not; the choice is mine).

When I’m invited somewhere, however, I assume that I am not deemed a security threat. Therefore, I expect that:

  • My and my belongings will not be searched or scanned
  • I need not leave behind my personal belongings
  • I need not carry an identity card

Please afford me this courtesy if you are inviting me.

For some months now, I’ve visited many corporate offices. The reception is comprised of security guards, a metal detector and a register. I’m given a tag and an escort.

I’m not fussy. I’m not worried about being greeted, for example. I’m quite happy to plug into a power socket and work on my laptop until logistics are sorted out. But when that happens at the security outpost with no sitting space, or outside the gate in the rain, it inconveniences me.

A few weeks ago, I was at Singapore, and visited a client’s office in slippers. One of them complemented my choice of footwear, and remarked that he had not yet risen high enough in the corporate ladder to afford this luxury. (There’s a series of stories behind my footwear that I’ll get to later.)

That told me something. After a long time, I now can afford this luxury. Especially if someone knows me well enough to invite me to their office.

I hope to point them to this blog post and request that security be arranged so that I can be afforded this small courtesy; be treated with trust rather than as a terrorist or a thief.

(If their organisation’s practice does not permit this, I’m happy to meet outside. Besides, our office is happy to extend warm hospitality.)

Open source in corporates

[This is a post that I’d published internally in InfyBlogs in Dec 2009. Time to share it.]

Last month, my first application went live.

I’ve been writing code for 20 years. Not one line of my code has been officially deployed in a corporate. (Loser…)

It’s a happy feeling. Someone defined happiness as the intersection of pleasure and meaning. Writing code is pleasurable. Others using it is meaningful.

But this post isn’t quite about that. It’s about the hoops I’ve had to jump through to make this happen.

I’ve been living in a nightmare since March 2009. That was when I decided that I’d try and get corporates to use open source.

March 2009

It began with a pitch to a VC firm. They were looking to build a content management system (CMS). Normally we’d pull together slides that say we’ll deliver the moon. This time, we put together demo based on WordPress’ CMS plugins.

The meeting went fabulously well. We said, “Here’s a demo we’ve built for you. Do you like it?” The business lead (Stuart) was drooling and declared that that’s exactly what they wanted. The IT lead (another Stuart) was happy too, but warned the business users: “Just remember: this isn’t how we do development, so don’t get your hopes up that we can deliver stuff like this :-)”

Time to make my point. I asked, “What’s your policy on open source software?”

The business lead went quiet. “I don’t know,” he finally said. Fair enough.

I turned to the IT lead. “Well, we don’t use it as a matter of policy… there are security concerns…” he said.

“Which web server do you use?”

”Oh, OK. I see what you mean. We use Apache. So on a case to case basis, we have exceptions. But generally we have security concerns.“

”Why? Do you believe open source software is more insecure than commercial software?“

He thought about it for a while. “Well… maybe. I don’t know.” We debated this a bit. Then we found the real issue: “It’s just that we don’t have control over the process. We don’t know enough about it to decide.”

A couple of weeks later, I tried pitching to a newspaper. This time, it was our sales team that raised the same question. “But… isn’t open source insecure?”

I didn’t even bother pitching any open source stuff to them. But I’d learnt my lessons:

  1. Demo the application. Don’t talk about it.
  2. Show it to the business first, and then tackle IT.

Aside: June 2009

In June, I got another chance at a client where we were building their new website. The very first thing I did was ask to see the Javascript. Total mess, and filled with browser-incompatible DOM requests. So I went over to their web development team.

“Look, why don’t you guys use a Javascript library? It’ll get you cross browser compatibility and compact maintainable code at the same time.”

And, to their credit, they said, “Sure. Which library?”

I showed them this and we agreed on jQuery. So, if nothing else, I’ve managed to get one open source library into a corporate.

July 2009

I was also looking at payments on the website, and our client was looking to replace their chargeback application. Since I had a week off, I built a working PCI compliant prototype on Django. (I must clarify what I mean by PCI compliant. You see, any application that stores credit card information must pass through a stringent security clearance process. I bypassed the problem by not storing the card information. I’ve realised that I’ve been building PCI compliant applications all my life – and it’s a huge benefit to let people know that.)

This time, I applied the lessons I’d learned, and demo-ed it to the business, who were thrilled. Time to tackle IT.

I started with the architecture team. Matt on the architecture team was the most approachable. So I went over, demo-ed it, and said, “Matt, this took a week to put together. It’s based on some new technologies. Are you game to try these out?”

He was. And quite enthused about it too. So we put together a proposal for the architecture review board, proposing a new technology stack: Django / Python and MySQL. As before, I showed the demo before I talked technology. I had prepared answers to all security related questions upfront (and practically memorised section 3 of the PCI guidelines.) The clincher, though, was the business case. To build it on Java, it would cost ~1,000 person days. On Django, I’d mostly done it in 5. There was no way of justifying 1,000 person days for an application that could save, at best £100,000 a year.

So they said “Go ahead, we’re fine if operations and infrastructure are fine.”

It was time to find a Django developer in Infy. I hunted for a couple of weeks but none was available. (Only 2 people that I knew knew Django in the first place.) So that effort got canned, and we were back to the 1,000 person day solution. (Which got canned too, later.)
But in the process, I’d learned my third lesson.

  1. If you’re trying new technologies, plan on delivering it yourself.

October 2009

Another application popped up that looked like a prime candidate for introducing open source. They were using an Excel application to fraud screen orders, and wanted to make a web app out of it.

I followed the same route as before. Demo it. Show it to business first, then IT. Built it myself. I skipped Architecture, since they’d already approved the technology stack, and took it straight to Infrastructure.

“This application uses Apache as the web server, MySQL as the database, and uses PHP and Javascript for the application logic. Could we get a Linux server to host it?”

Our entire conversation lasted 30 seconds. He said, “No. We use Windows servers” (I was fine)

“… and you’ll need to chance Apache to IIS” (fine again)

“… and we don’t support PHP, so it’ll have to be Java or .NET” (I don’t know .NET or Java… but fine)

“… and we don’t support MySQL, it’ll have to be SQL Server” (fine, I guess)

“… and we don’t have DBAs available until January, so you’ll have to wait.” (definitely not good.)

So back to the drawing board on the technology stack. I needed something in Java (I know very little Java, but nothing at all in .NET) and to avoid the DBA headache, it would have to bundle in a database. I first explored key-value stores like CouchDB, Redis, etc. None of them worked on Java. The only one I found that did was Persevere, and it was a JSON data store, which fit perfectly with my plans.

By this time, I’d also learn my my fourth and most important lesson.

  1. Don’t try to promote open source. Just deliver the application

I said, “This is a custom-built application that runs on Java. Could we get a Windows server to host it?”

The answer was “Yes”, and we had it live the next day.

PS: December 2009

The application’s deployed and running. It has about 10,000 orders fraud screened by now.
And the lessons are well learnt. So when some came over asking if there was any image resizing solution I knew off, I said: “Sure, who’s your business sponsor?” Then I went over and said, “Let me show you this open source application called ImageMagick. It handles aspect ratios correctly, and can crop too. Doesn’t this look professional?” Then I went over to IT and said, “It’s open source, so you can change it. It has Java bindings, so you can integrate it into your environment. It can handle 8 3000×2400 images a second on my puny laptop. It’s used by your competitors. And I can build it for you if you like.”

I might just have my second open source entry into a corporate this year.

The scary Internet

I’m not that difficult to scare, and this log message certainly didn’t help: [] failed - POSSIBLE BREAK-IN ATTEMPT!

That’s the message I saw – one thousand five hundred and seventy times yesterday in /var/log/auth.log on one of my Amazon EC2 instances.

Someone, presumably from China, has been patiently trying out a variety of SSH keys to log into this system.

These were grouped as batches. There were exactly 314 attempts at 8am yesterday, then 314 at 12noon, then 314 at 4pm, then 314 at 8pm, then 232 at 3am today. (All times are in UTC – that is, UK time without daylight saving). Every burst took 9 minutes to run through all 314 attempts.

The worst part was, when I tried using SSH this morning, I wasn’t able to log in. (It turned out that I had made a configuration error, but this is the sort of thing that gets me quite worried.)

Perhaps I shouldn’t be complaining. I’ve written enough scrapers to make most webmasters cringe at their logs. I remember a few years ago, when I was working on a project at Tesco, and was scraping bestsellers lists from most sites. (Here’s a blog post about it.) We were putting together a prototype to see how real-time competitive pricing could help.

The scraper was a pretty mild one. It would visit a hundred links, roughly at the pace of one a second. No images were loaded, of course, just the HTML.

One fine day, a few weeks after this had started, I got a call from Andy.

“Hi Anand, are you running any scrapers on our books website?”

“Yes, why?”

“Oh! The site’s very slow. Could you shut it down immediately?”

Turns out that not a single page on the site loaded, and it had almost crawled to a halt. Now, obviously, my little 100-page script could hardly cause damage, but it’s easy to understand their reactions. No unauthorised scraping! After a few days of trying to figure out what the problem was, they increased the memory and things went back to normal. Not a bad solution, actually – throw hardware at the problem, and if it vanishes, it’s probably the cheapest solution.

But anyway, I’m sure it’s some nice chap who’s just curious to know what I’ve got on my servers. I’d be happy to share some of it. And even if it’s not so nice a chap, there’s little that I can do, is there?

Update (1pm India, 3rd June): Actually, I now realise that this has been happening ever four hours since May 29th, as regular as a clockwork. Wish I knew enough UNIX programming to pull a prank…

Hosting options

I’ve been trying out a number of options for hosting recently, and have settled on Amazon spot instances.

Here were my options:

  • Application hosting, like Google AppEngine. I used this a lot until 2 years ago. Then they changed their pricing, and I realised what “lock-in” means. I can’t just take that code and move it to another server. Besides, I’m a bit wary of Google pulling the plug. Heroku? Same problem. I just want to take the code elsewhere and run it.
  • Shared hosting, like Hostgator. This blog is run on Hostgator and I’m extremely happy with them. But the trouble is, with shared hosting, I don’t get to run long-running processes on any ports I like.
  • Run you own servers. The problem here is quite simple: power cuts in India.
  • Dedicated hosting, like Amazon EC2, Azure, GCE, etc. This remains as pretty much the main hosting option

I’m a price optimisation freak. So I ran the numbers for a year’s worth of usage. I was looking at the CPU cost of a large machine with 7-8GB RAM. Bandwidth and storage are negligible. The cost per hour worked out to:

  • Amazon: $0.32 / hr in Singapore, $0.24 in Virginia
  • Google: $0.29 / hr in Europe
  • Microsoft: $0.32 / hr in US

The price is not all that different, but I need low latency, so Singapore it what it’ll have to be.

EC2 location Latency (ms)
Singapore 139
Oregon, US 334
Japan 517
Ireland 618
Australia 620
California, US 677
Virginia, US 710

Now comes the choice of the right model. At $0.32 per hour, that’s $230 a month.

Amazon offers some ways of getting this down. Instead of on-demand instances, I could go for reserved instances. For a year of usage, that’d get the price down to about $131 a month, nearly halving it. ($739 upfront for a heavy utilisation large reserved instance, with $0.095 * 24 * 365.25 for the year.)

In this case, I know I’ll need the servers for a year. Probably more, but then, I might want to switch later. So this isn’t a bad move. But we can do better. Amazon also offers spot instances. Spot instances might get shut down any time – but in reality, so can on-demand instances. I need to plan for it anyway. I’m not going to host anything that’s so sensitive that if it’s down for a few hours, I’ll have a problem.

But what’s attractive is the pricing. Typically, it’s $0.04 per hour, making it about $29 per month. Even if it shoots up to twice that, at $58, it’s less than a fourth of the on-demand price and less than half the reserved instance price.

I’ve managed to script the entire setup up sequence as shell scripts, and it takes less than an hour to get a new server up and running the software I need. I need to work out a decent backup mechanism. Plus, I could use more reliable storage like like Amazon’s EBS to preserve the data. But on the whole, the pricing is far too attractive and makes the risks worthwhile.

Visualising networks

Some slides from my talks on visualising networks. (These are part of a series of talks I’m giving at a number of forums; the one at The Fifth Elephant is open to public.)


Geocoding in Excel

It’s easy to convert addresses into latitudes and longitudes into addresses in Excel. Here’s the Github project with a downloadable Excel file.

This is via Visual Basic code for a GoogleGeocode function that geocodes addresses.

Function GoogleGeocode(address As String) As String
    Dim xDoc As New MSXML2.DOMDocument
    xDoc.async = False
    xDoc.Load ("" + _
        "xml?address=" + address + "&sensor=false")
    If xDoc.parseError.ErrorCode <> 0 Then
        GoogleGeocode = xDoc.parseError.reason
        xDoc.setProperty "SelectionLanguage", "XPath"
        lat = xDoc.SelectSingleNode("//lat").Text
        lng = xDoc.SelectSingleNode("//lng").Text
        GoogleGeocode = lat & "," & lng
    End If
End Function

Goodbye Google

Google Reader was where I spent most of my browsing time, but now, it’s shutting down.

Time for alternatives, but not just for Reader: for all Google products. I’m not sure when one of these might go down, become paid, or become unusable.

I just uninstalled Google Drive and Google Talk. but I don’t use it much (I use Skype), so no loss. I’ll leave Chrome for the while, but I’m hearing reports that Firefox is improving faster than Chrome is. Or there’s Chromium.

I’m not worried much about search services (including image, video, scholar and books). When needed, I can switch. Scholar might be a bit sad to lose, but I don’t use it much. Google Translate, too, isn’t essential.

Likewise for content. YouTube’s not a problem. There’re enough other video services. Trends are useful, but not critical. Maps might be, so I’ll try and switch to OpenStreetMap. I don’t use News or Picasa much.

I don’t care much for social media anyway, so Blogger, Orkut and Plus can die any time.

Google’s apps are the worrying ones. Mail and Calendar, in particular. I’ll probably migrate away from them last, but the attempt is on. I’ll be documenting the alternatives I find at (safely cloned locally).

Looks like there’s no safe long-term alternative to being able to host your own apps. Pity.