Storytelling: Part 1

In a number of sessions I’ve been to, people ask analysts to make their results more interesting – to tell stories with them. I’m co-teaching a course, part of which involves telling stories with data. So this got me thinking: what is a story? How does one teach storytelling to, let’s say, an alien?

Consider this mini-paper.

ABSTRACT: Meter readings exhibit spikes at slab boundaries. We also
find significant evidence of improbably events at round numbers.

Electricity shortage is a serious problem in most Indian states. Part
of this problem is due to the inaccuracy of reporting procedures used
in monitoring meter readings. Our focus here is not to document or
experimentally determine the degree of inaccuracy. We have adopted a
data driven approach to this problem and attempt to model the extent
of inaccuracy using basic statistical analysis techniques such as
histograms and the comparison of means.

Our dataset comprises of the frequency analysis 12-month dataset
containing monthly meter readings of 1.8 million customers in the
State of Andhra Pradesh.

We find that a histogram of these readings shows unexpectedly high
values at the slab boundaries: 50 (+45.342%, t > 13.431), 100
(+55.134%, t > 16.384), 200 (+33.341%, t > 15.232), and 300
(+42.138%, t > 19.958).

We also detected spikes at round numbers: 10 (+15.341%, t > 5.315),
20 (+18.576%, t > 6.152), 30 (+11.341%, t > 4.319).

The statistical significance of every deviation listed above is over
99.9%. Further, every deviation has a positive mantissa. This leads us
to confidently declare the existence of a systematic bias in the meter
readings analysed.

You’re probably thinking: “I know why he’s put this example here. It must be a bad one. So, what a rotten paper it must be!”

Well, not quite. It’s a good piece of analysis. I did it myself and there’s a fair bit of effort and care behind these short paragraphs.

The trouble is, if I read it out to my daughter, she’d say “What?” and not understand a word. My wife’d say “So what?” and not care a bit. I might as well not have written it.

It’s like that Zen thing: If a tree falls in a forest and no on hears it, does it make a sound?

If you did a piece of analysis, and no one understands or cares about it, why did you do it in the first place?

Why do you do it?

That last question is important: why do we analyse?

Sometimes, we do it for fun. The knowledge is beautiful. Knowing Tetris is NP-Complete is rewarding, even though my colleague sarcastically remarked, “Thank God! I’m sooo relieved now that I know that Tetris is NP whatever.” If that’s the case with you, great. Write the analysis any which way you’ll enjoy.

Sometimes, we do it because we’re forced to. In class. At work. Wherever. But that’s another way of saying “I don’t know why I’m doing it.” In that case, I’d gently recommend watching 3 Idiots.

Most often, we do it to share knowledge and drive actions. In that case, if no on understands it, or does anything with it, why do it?

Keep it simple

We prerajulisation of Farhanitate flagellated with ...

Would your audience understand that? Or are you just scared that simple words indicate a simple mind?

I was once afraid. 15 years ago, when writing a paper on IBM India’s competitive advantage for the CXOs, I was worried about it being too simple. I didn’t know anything about management. So I filled it with jargon. They politely nodded when I presented it, but I wasn’t fooling anyone. If there’s no content, jargon doesn’t help.

Unfortunately, it’s become polite to accept jargon as a substitute for substance. Why were they not ripping me apart? Or at least, kindly asking me what on earth I wanted to say?

My friend Manoj did that. In his nice, humble way, he asked, “But Anand, what does this mean?” When I explained it to him, I found I didn’t have a clue. He was OK with that. He just wanted to make sure he hadn’t missed something.

(That’s the technique I use these days. Ask people to explain things clearly. It’s OK if they’re just lost in jargon. I just want to make sure I haven’t missed something.)

Don’t cloak your ignorance. No one will think less of you. In the long run, you’ll learn more, and won’t need the jargon.

Part 2 of the article will talk about focusing on people and actions; storylining and the pyramid principle; and the structure of messages.

Birthday matters

Does it matter which month you’re born in?

Based on the results of the 20 lakh students taking the Class XII exams at Tamil Nadu over the last 3 years (via Reportbee), it appears that the month you were born in can make a difference of as much as 120 marks out of 1,200 – or 10%!

Most students who took the Class XII exams in 2011 were born between March 1991 and June 1992. The average marks of each student (out of 1200) is shown in the graph below.

tn-2011

Students born in June 1991 scored the lowest – around 720/1200. This suddenly shoots up in July, then in August, and the students born in September score as much as 840/1200 on average. From there on, it’s downhill.

This result is consistent across years. In 2009 and 2010, you see a similar pattern.

tn-2009 tn-2010

Why could this be?

Malcolm Gladwell’s book Outliers offers a clue.

Outliers opens, for example, by examining why a hugely disproportionate number of professional hockey and soccer players are born in January, February and March.

The answer turns out to be completely unrelated to numerology or astrology.

It’s simply that in Canada the eligibility cutoff for age-class hockey is January 1. A boy who turns ten on January 2, then, could be playing alongside someone who doesn’t turn ten until the end of the year—and at that age, in preadolescence, a twelve-month gap in age represents an enormous difference in physical maturity.

In Tamil Nadu, students must be 5 years old before entering Class 1. Schools open mid-June. So students born in June 1994 would barely make it in June 1999 – making them the youngest students in the class. July and August students would be missed – but since many schools implement this policy leniently, they sometimes make it in as well. September borns are often consistently the eldest students in a class.

This pattern reflected in the marks. The eldest – the September 1993 borns – score the highest. The next eldest, the October 1993 borns, score a bit less. And so on. (There are older students who take the exam – the ones born before September 1993 – but many of these are failed students from the previous year, introducing a bias in the results.)

Perhaps this initial advantage that the elder students have over their classmates continues through the years? Whatever the reason, it’s clear that if your child is born in September, he or she already has a 100 mark advantage!

Surviving in prison

As promised, here are some tips from the trenches on surviving in prison. (For those who don’t follow my blog, prison is where your Internet access is restricted.)

There are two things you need to know better: software and people. I’ll try and cover the software in this post, and the more important topic in the next.

Portable apps

You’re often not in control of your laptops / PCs. You don’t have administrator access. You can’t install software. The solution is to install Portable Apps. Most popular applications have been converted into Portable Apps that you can install on to a USB stick. Just plug them into any machine and use them. I use Firefox and Skype quite extensively this way, but increasingly, I have a preference for Portable Apps for just about everything. It makes my bloated Start Menu a lot more manageable. Some of the other portable apps I have are: Audacity, Camstudio, GIMP, Inkscape and Notepad++.

Admin access

The other possibility is that you try and gain admin access. I did this once at a client site (a large bank). We didn’t have admin access. I wasn’t particularly thrilled. So I borrowed a floppy, installed an offline password recovery tool, rebooted, and got the admin password within a few minutes. This is with the full knowledge of the (somewhat worried) client. This is where the people part comes in, and I’ll talk about that later.

Proxies

But before you do any of these, you need to be able to download the files, most of which are executables. Those are probably blocked. Heck, the sites from which you can download these files are probably blocked in the first place.

Sometimes, internal proxies help. Proxies for different geographies may have different degrees of freedom. When I was at IBM, the Internet was accessible from most US proxies, just not from the Indian proxy. So it may just be a matter of finding the right internal proxy.

Or you can search for external public proxies. Sadly, many of these are blocked. Another option is for you to set up your own proxy. You can install mirrorrr on AppEngine for free, for example.

The most effective option, of course, is to use SSH tunnels. I’ve covered this is some detail earlier.

Google

Google has a wide range of tools that can help access blocked sites. If the site you’re accessing provides public RSS feeds, use Google Reader to access these. Public feeds for Twitter, for example, are available as RSS feeds.

Google’s cache is another way of getting the same information. Search for the URL, click on the “Cache” link to read the text even if it’s blocked.

To find more such help, Google for it!

Peopleware

… but all of this is, honestly, just a small part of it. The key, really, is to understand the people restricting your access. I’ll talk about this next.

You are in prison

(I had intended to write this post sarcastically, a bit like my web freedom survey. But sarcasm’s confusing to read. So I’ll just be straight and mild.)

If you’re a well-paid professional in an Indian IT services firm, your freedom is limited.
(This holds if you’re a student, too.)

The last bit worries me the most. Perhaps because in all the other cases, there are humans I can put to shame or fight, face-to-face. Or because I am a Net addict. Don’t know why.

Anyway, here’s the result of my survey (after de-duplicating and eliminating results where the company or geography was not clear).

web-freedom-survey

Some day, I will follow-this up with a post on “Surviving in Prison”, detailing out my experiences with the system, and beating it.

The Calvin and Hobbes search Takedown

Eight years ago, I started typing out each of the Calvin and Hobbes strips by hand. Four years ago, I set up a site that let people search for strips. Early this month, I was asked to take it down.

This is the story.


I can’t quite remember when I started reading Calvin & Hobbes. The earliest reference I can find in my blogs is in July 1999. I remember it didn’t take me long to become a fan. I’d read every strip on the newspaper; hunt them out at bookshops; and spend a fair bit of time searching for archives online.

At some point, I discovered a few archives of the complete Calvin & Hobbes images. These aren’t hard to find, and they’re still around in plenty. So that gave me a few more months of delight.

The trouble, though, was that I never could quite find a strip when I wanted to. A friend would refuse to accept something, and I’d want to pull out that strip where Calvin declares to reside in the state of “Denial”. Or if they said something fancy, I’d want to pull out the one where Hobbes says “I notice your oeuvre is monochromatic”. Or those strips where Calvin’s Dad explains how things work (“They build bigger and bigger trucks over the bridge until it breaks.”)

There were a few Calvin and Hobbes search engines around. None quite did what I wanted them to – which was to search the text, and show me the strip, with a nice scrollable interface.

So I set out to build one. I can’t remember when, exactly, but it was before Sep 11, 2002.

It took me many years. I’d spend several train rides and evenings typing this stuff out. My friends, employers and family were a bit puzzled, but just added it to my list of eccentricities and carried on. I was halfway there in 2005, pushed further in 2006, and with some help, I managed to finally complete it.

I was able to do a lot of cool stuff with this, like statistically improbable phrases and some amusing posts as well.

It also increased traffic to my site, which was a bit disconcerting. I didn’t want to attract attention. In 2007, I removed the page from Google’s indexes, which cut the number of hits a fair bit. Since then, the site was only visited by a few people that knew of it, and the occasional stumblers.

A month ago, I got reddit-ed and MetaFiltered.

It didn’t take me long to figure that a takedown notice would be on its way. It turned out to be quite a friendly mail, actually – scary only in parts. (A bit of a carrot-and-stick approach, perhaps.) Anyway, it took me all of 2 minutes to remove all of the pages and links.

Of course, the reason I went to all of this effort was because the original Calvin & Hobbes site does not have the search feature. I’ve reached out to United Media, offering my transcripts and code. Let’s see what happens.

A sense of proportion

A quote from David Heinemeier Hansson:

So the problem is, a lot of business managers and especially business owners, they have no sense of probability. They can’t fathom that concept. So They treat the probability of 1 to 10 trillion as the same as a 1 to a 100. And like, “We’ve got to deal with this 1 to a trillion probability, because, what if it happens?”

No! Doesn’t matter! I mean, don’t care.

So as soon as that sense of probability spreads, that people can treat that reasonably, I think all this nonsense just goes away.

This lack of proportion, sadly, is at the heart of my every day problems. (Just watch the video!)

Web freedom survey

There was a time when workers were searched when they left, to make sure they weren’t stealing. They were paid by their hour, and had to clock in/clock out. They had supervisors to ensure that they didn’t slack off. They weren’t allowed to make calls at work. After all, people were lazy and thieving in those days.

Nowadays, we’re enlightened. We respect and trust our employees. Like a family. We don’t micromanage their activities. We don’t tap their phone calls.

We don’t restrict or monitor their web usage.

Now, your company is enlightened, of course. Surely you can access these sites I believe essential for work? (If you work out of different offices, you should fill one for each office.)

So, please tell me: which sites can YOU access?

(View results)

Recruiting smart people

Recently, I have ended up giving bits of advice to people recruiting at start-ups, and a few patterns have emerged that are worth sharing.

Before I go ahead, I should warn you that I have no qualifications whatsoever. (All consulting advice should come with this caveat, perhaps!) You might be better off reading Joel Spolsky’s Smart and Get Things Done (read). I haven’t read it myself, but from what little I see of it, the thoughts seem similar.

The key is to realise that smart people are probably 10 times as productive. OK, that may be wrong. It probably originated with Fred Brooks, and has been debated to death. But it seems fairly well accepted that the best people contribute more than they are better paid. (The best guy is probably paid twice the average, but is worth more than twice the average guy.)

This isn’t because they do more work. It’s because they solve harder problems. You can get two people to do two people’s work. You can’t solve a problem twice as hard even with twenty people.

For a startup, the problem is acute. You don’t have the luxury of being able to manage a large number of people.

Since smart people typically work for a lot less than they’re probably worth, it’s a bargain to hire smart people. You pay them twice as much, and they’ll solve problems twenty others couldn’t solve.

The problem boils down to finding smart people and getting them on board.

Finding smart people

You need to go after the smart people. They won’t come to you. Many reasons. You’re not big enough. There aren’t that many of them. They’re not in the market that much (no one lets go of them anyway).

So that just demolishes the traditional recruitment model straight away. You don’t advertise for people and filter their resumes. You find the people you want and go after them.

The good thing is, smart people cluster. They tend to know other smart people, meet up with other smart people, read the same things as other smart people, etc. That gives some useful starting points.

Matt Biddulph talks about Algorithmic recruitment with Github. The premise is that smart programmers are at the centre of the social networks in their respective areas. Just go after them. I advised a friend similarly: to look for the network (or at least the smart people) that hang out on Stack Overflow for a given topic. Last year, when I was looking for a Django developer, I scoured the Infosys internal blogs for similar networks. (Found only a few, but it sure introduced me to a lot of really smart people that I didn’t know existed!)

Conferences are another place to look for them. I tend to periodically check out Upcoming and Meetup to see who’s taking part in what, go over, meet them, and see what they do. I find it a great way of figuring out who’re the experts in a field. (I once met one of the guys who wrote TiddlyWiki, and it was immediately obvious that he was in a different league from the others that day at the Javascript Meetup.)

You can go a step further. Since smart people cluster, they form networks, and control of that network is power. So why not organise those conferences? A lot of these smart people just need a place to hang out and learn from each other. I know the Javascript Meetup was struggling to find a place to meet. Pubs don’t give you the quiet atmosphere needed to learn from each other, and it’s certainly impossible to have a talk there. The folks at Hackspace have done this really well, renting a place and equipment for people to tinker with electronics.

That’s what smart people want, mostly: a nice quiet place, good company, and perhaps pizza. Skills Matter does this beautifully. They organise free workshops every now and then. The list of people that attend these is invaluable.

Getting them on board

Once you’ve spotted a smart person, what do you offer them?

Remember – they’re probably 10 times as productive. Money is quite likely to be worth offering. If that works, great. But if you’re a startup, you probably don’t have the money. You probably could offer a stake in the firm. That might work too.

But, to quote Dan Pink: “One of the most robust findings of social science is that incentives dull the mind and hamper creativity. Yet, businesses ignore it.” Some people aren’t motivated by money. You might get better results if you didn’t pay money than if you did. (Read this story on motivation by Peter Bregman.)

Suppose you said, “I have this problem… I’ve no idea how to solve it. Would you be able to help me?” Most smart people would probably help you. For free. The feel good feeling is worth more than the transaction cost of extracting payment from you.

Or you might be championing a worthy cause – anywhere from world hunger, rural poverty or cure for cancer down to organising a scout camp. The thing about this is they are intrinsically attractive. You probably just need to open up and say “This is what I’m doing, can you help?”

The flip side of it is loss of control. Jonty told me about how Hackspace London was run: “it’s as loosely organised as possible without falling apart”. You don’t manage these people like traditional organisations. You manage them like a community of volunteers. Like parents at a school day function. Like family at a wedding. You don’t pay them. You don’t order them around either.

Part of that is the flexibility of being a startup. You can afford that loss of control. Yes, you don’t have the money. No, not everyone’s working for money. (The planet as a whole is fairly well off. Smart people particularly so.) But you might offer something interesting. Just as long as you’re willing to let go of some control in your mind…

Open source in corporates

Last month, my first application went live.

I’ve been writing code for 20 years. Not one line of my code has been officially deployed in a corporate. (Loser…)

It’s a happy feeling. Someone defined happiness as the intersection of pleasure and meaning. Writing code is pleasurable. Others using it is meaningful.

But this post isn’t quite about that. It’s about the hoops I’ve had to jump through to make this happen.

I’ve been living in a nightmare since March 2009. That was when I decided that I’d try and get corporates to use open source.

March 2009
It began with a pitch to a VC firm. They were looking to build a content management system (CMS). Normally we’d pull together slides that say we’ll deliver the moon. This time, we put together demo based on WordPress’ CMS plugins.

The meeting went fabulously well. We said, “Here’s a demo we’ve built for you. Do you like it?” The business lead (Stuart) was drooling and declared that that’s exactly what they wanted. The IT lead (another Stuart) was happy too, but warned the business users: “Just remember: this isn’t how we do development, so don’t get your hopes up that we can deliver stuff like this :-)”

Time to make my point. I asked, “What’s your policy on open source software?”

The business lead went quiet. “I don’t know,” he finally said. Fair enough.

I turned to the IT lead. “Well, we don’t use it as a matter of policy… there are security concerns…” he said.

“Which web server do you use?”

”Oh, OK. I see what you mean. We use Apache. So on a case to case basis, we have exceptions. But generally we have security concerns.“

”Why? Do you believe open source software is more insecure than commercial software?“

He thought about it for a while. “Well… maybe. I don’t know.” We debated this a bit. Then we found the real issue: “It’s just that we don’t have control over the process. We don’t know enough about it to decide.”

A couple of weeks later, I tried pitching to a newspaper company. This time, it was our sales team that raised the same question. “But… isn’t open source insecure?”

I didn’t even bother pitching any open source stuff to them. But I’d learnt my lessons:

1. Demo the application. Don’t talk about it.
2. Show it to the business first, and then tackle IT.

Aside: June 2009

In June, I got another chance. I was building the website for a large retailer. The very first thing I did was ask to see the Javascript. Total mess, and filled with browser-incompatible DOM requests. So I went over to their web development team.

“Look, why don’t you guys use a Javascript library? It’ll get you cross browser compatibility and compact maintainable code at the same time.”

And, to their credit, they said, “Sure. Which library?”

I showed them this comparison of jQuery (blue), dojo, scriptaculous and mootools…

… and we agreed on jQuery. So, if nothing else, I’ve managed to get one open source library into a corporate.

July 2009

I was also looking at payments, and retailer was looking to replace their chargeback application. Since I had a week off, I built a working PCI compliant prototype on Django. This time, I applied the lessons I’d learned, and demo-ed it to the business, who were thrilled. Time to tackle IT.

I started with the architecture team. Matt on the architecture team was the most approachable. So I went over, demo-ed it, and said, “Matt, this took a week to put together. It’s based on some new technologies. Are you game to try these out?”

He was. And quite enthused about it too. So we put together a proposal for the architecture review board, proposing a new technology stack: Django / Python and MySQL. As before, I showed the demo before I talked technology. I had prepared answers to all security related questions upfront (and practically memorised section 3 of the PCI guidelines.) The clincher, though, was the business case. To build it on Java, it would cost ~1,000 person days. On Django, I’d mostly done it in 5. There was no way of justifying 1,000 person days for an application that could save, at best £100,000 a year.

So they said “Go ahead, we’re fine if operations and infrastructure are fine.”

It was time to find a Django developer in Infosys. I hunted for a couple of weeks but none was available. (Only 2 people knew Django in the first place.) So that effort got canned, and we were back to the 1,000 person day solution. (Which got canned too, later.)

But in the process, I’d learned my third lesson.

3. If you’re trying new technologies, plan on delivering it yourself.

October 2009

Another application popped up that looked like a prime candidate for introducing open source. They were using an Excel application to fraud screen orders, and wanted to make a web app out of it.

I followed the same route as before. Demo it. Show it to business first, then IT. Built it myself. I skipped Architecture, since they’d already approved the technology stack, and took it straight to Infrastructure.

“This application uses Apache as the web server, MySQL as the database, and uses PHP and Javascript for the application logic. Could we get a Linux server to host it?”

Our entire conversation lasted 30 seconds. He said, “No. We use Windows servers” (I was fine)

“… and you’ll need to chance Apache to IIS” (fine again)

“… and we don’t support PHP, so it’ll have to be Java or .NET” (I don’t know .NET or Java… but fine)

“… and we don’t support MySQL, it’ll have to be SQL Server” (fine, I guess)

“… and we don’t have DBAs available until January, so you’ll have to wait.” (definitely not good.)

So back to the drawing board on the technology stack. I needed something in Java (I know very little Java, but nothing at all in .NET) and to avoid the DBA headache, it would have to bundle in a database. I first explored key-value stores like CouchDB, Redis, etc. None of them worked on Java. The only one I found that did was Persevere, and it was a JSON data store, which fit perfectly with my plans.

By this time, I’d also learn my my fourth and most important lesson.

4. Don’t try to promote open source. Just deliver the application

I said, “This is a custom-built application that runs on Java. Could we get a Windows server to host it?”

The answer was “Yes”, and we had it the next day.

PS: December 2009

The application’s deployed and running. It has about 10,000 orders fraud screened by now.

And the lessons are well learnt. So when some came over asking if there was any image resizing solution I knew off, I said: “Sure, who’s your business sponsor?” Then I went over and said, “Let me show you this open source application called ImageMagick. It handles aspect ratios correctly, and can crop too. Doesn’t this look professional?” Then I went over to IT and said, “It’s open source, so you can change it. It has Java bindings, so you can integrate it into your environment. It can handle 8 3000×2400 images a second on my puny laptop. It’s used by your competitors. And I can build it for you if you like.”

I might just have my second open source entry into a corporate this year.

Organisational amnesia

It’s amazing how much of a dependency there is on individuals writing IT systems. Reminds me of that Dilbert strip:

19940610

A few weeks ago, I was trying to figure out in what happens when there are multiple promotions. (Our client is a retailer.) I mean, if there’s a phone that costs £100 and there are 2 promotions: 10% off on phones and £10 off on phones. Do you apply the 10% off first and pay £80 or the £10 off and pay £81?

Funnily enough, the organisational answer is, “I don’t know.” The person who determined the logic is no longer with the firm. The person who wrote the code was a contractor and moved on to another project. The vendor hadn’t gotten around to documenting the code. Sure, the code’s there, and you just had to read it to figure out what it does. But no human knew what it was supposed to do.

Last week, there was a decision to rewrite some code that was 10 years old. A colleague who wasn’t quite involved in this work said, “I’m going to have to set aside 2-3 weeks for this. I wrote this stuff when I was a developer. The docs have vanished. The business owners have vanished. I’m the only one who has any clue on what it’s supposed to do.”

This week, we were trying to figure out how their store locator system works. After fiddling around with Fiddler, and seeing that it used Microsoft Virtual Earth, I was able to figure out that it identified stores near a location using a simple JSON API. But can we get the documentation around that? Nope. Tough luck. Nobody knows how it works any longer.

Personally, I don’t think this is unusual. We forget. Companies forget. But it’s usually good if what we forget is derivable. That’s how I got through my high-school physics exams: not by remembering stuff, but by being able to derive the stuff from a few principles.

Organisations can do the same. But to be able to do that, you need to have commonly understood principles. As Fred Brooks put it in The Mythical Man Month,

I contend that conceptual integrity is the most important consideration in system design. It is better to have a system omit certain anomalous features and improvements, but to reflect one set of design ideas, than to have one that contains many good but independent and uncoordinated ideas.

One of the biggest enemies of conceptual integrity is growth. Too many people too soon, and the important decisions are taken by people who’ve never had a long chat about things. There’s another reason not to grow too fast.