Storytelling: Part 1

In a number of sessions I’ve been to, people ask analysts to make their results more interesting – to tell stories with them. I’m co-teaching a course, part of which involves telling stories with data. So this got me thinking: what is a story? How does one teach storytelling to, let’s say, an alien? Consider this mini-paper. ABSTRACT: Meter readings exhibit spikes at slab boundaries. We also find significant evidence of improbably events at round numbers. Electricity shortage is a serious problem in most Indian states. Part of this problem is due to the inaccuracy of reporting procedures used in monitoring meter readings. Our focus here is not to document or experimentally determine the degree of inaccuracy. We have adopted a data driven approach to this problem and attempt to model the extent of inaccuracy using basic statistical analysis techniques such as histograms and the comparison of means. Our dataset comprises of the frequency analysis 12-month dataset containing monthly meter readings of 1.8 million customers in the State of Andhra Pradesh. We find that a histogram of these readings shows unexpectedly high values at the slab boundaries: 50 (+45.342%, t > 13.431), 100 (+55.134%, t > 16.384), 200 (+33.341%, t > 15.232), and 300 (+42.138%, t > 19.958). We also detected spikes at round numbers: 10 (+15.341%, t > 5.315), 20 (+18.576%, t > 6.152), 30 (+11.341%, t > 4.319). The statistical significance of every deviation listed above is over 99.9%. Further, every deviation has a positive mantissa. This leads us to confidently declare the existence of a systematic bias in the meter readings analysed. You’re probably thinking: “I know why he’s put this example here. It must be a bad one. So, what a rotten paper it must be!” ...

Birthday matters

Does it matter which month you’re born in? Based on the results of the 20 lakh students taking the Class XII exams at Tamil Nadu over the last 3 years (via Reportbee), it appears that the month you were born in can make a difference of as much as 120 marks out of 1,200 – or 10%! Most students who took the Class XII exams in 2011 were born between March 1991 and June 1992. The average marks of each student (out of 1200) is shown in the graph below. ...

Surviving in prison

As promised, here are some tips from the trenches on surviving in prison. (For those who don’t follow my blog, prison is where your Internet access is restricted.) There are two things you need to know better: software and people. I’ll try and cover the software in this post, and the more important topic in the next. Portable apps You’re often not in control of your laptops / PCs. You don’t have administrator access. You can’t install software. The solution is to install Portable Apps. Most popular applications have been converted into Portable Apps that you can install on to a USB stick. Just plug them into any machine and use them. I use Firefox and Skype quite extensively this way, but increasingly, I have a preference for Portable Apps for just about everything. It makes my bloated Start Menu a lot more manageable. Some of the other portable apps I have are: Audacity, Camstudio, GIMP, Inkscape and Notepad++. ...

You are in prison

(I had intended to write this post sarcastically, a bit like my web freedom survey. But sarcasm’s confusing to read. So I’ll just be straight and mild.) If you’re a well-paid professional in an Indian IT services firm, your freedom is limited. (This holds if you’re a student, too.) You clock-in and clock-out. You’re searched on your way in and out. You need your boss’s permission to leave. You work on what you’ve been told to work on. You cannot be trusted to freely access the Internet. The last bit worries me the most. Perhaps because in all the other cases, there are humans I can put to shame or fight, face-to-face. Or because I am a Net addict. Don’t know why. ...

The Calvin and Hobbes search Takedown

Eight years ago, I started typing out each of the Calvin and Hobbes strips by hand. Four years ago, I set up a site that let people search for strips. Early this month, I was asked to take it down. This is the story. I can’t quite remember when I started reading Calvin & Hobbes. The earliest reference I can find in my blogs is in July 1999. I remember it didn’t take me long to become a fan. I’d read every strip on the newspaper; hunt them out at bookshops; and spend a fair bit of time searching for archives online. ...

A sense of proportion

A quote from David Heinemeier Hansson: So the problem is, a lot of business managers and especially business owners, they have no sense of probability. They can’t fathom that concept. So They treat the probability of 1 to 10 trillion as the same as a 1 to a 100. And like, “We’ve got to deal with this 1 to a trillion probability, because, what if it happens?” No! Doesn’t matter! I mean, don’t care. ...

Web freedom survey

There was a time when workers were searched when they left, to make sure they weren’t stealing. They were paid by their hour, and had to clock in/clock out. They had supervisors to ensure that they didn’t slack off. They weren’t allowed to make calls at work. After all, people were lazy and thieving in those days. Nowadays, we’re enlightened. We respect and trust our employees. Like a family. We don’t micromanage their activities. We don’t tap their phone calls. ...

Recruiting smart people

Recently, I have ended up giving bits of advice to people recruiting at start-ups, and a few patterns have emerged that are worth sharing. Before I go ahead, I should warn you that I have no qualifications whatsoever. (All consulting advice should come with this caveat, perhaps!) You might be better off reading Joel Spolsky’s Smart and Get Things Done (read). I haven’t read it myself, but from what little I see of it, the thoughts seem similar. ...

Open source in corporates

Last month, my first application went live. I’ve been writing code for 20 years. Not one line of my code has been officially deployed in a corporate. (Loser…) It’s a happy feeling. Someone defined happiness as the intersection of pleasure and meaning. Writing code is pleasurable. Others using it is meaningful. But this post isn’t quite about that. It’s about the hoops I’ve had to jump through to make this happen. I’ve been living in a nightmare since March 2009. That was when I decided that I’d try and get corporates to use open source. March 2009 It began with a pitch to a VC firm. They were looking to build a content management system (CMS). Normally we’d pull together slides that say we’ll deliver the moon. This time, we put together demo based on WordPress’ CMS plugins. The meeting went fabulously well. We said, “Here’s a demo we’ve built for you. Do you like it?” The business lead (Stuart) was drooling and declared that that’s exactly what they wanted. The IT lead (another Stuart) was happy too, but warned the business users: “Just remember: this isn’t how we do development, so don’t get your hopes up that we can deliver stuff like this :-)” Time to make my point. I asked, “What’s your policy on open source software?” The business lead went quiet. “I don’t know,” he finally said. Fair enough. I turned to the IT lead. “Well, we don’t use it as a matter of policy… there are security concerns…” he said. “Which web server do you use?” ”Oh, OK. I see what you mean. We use Apache. So on a case to case basis, we have exceptions. But generally we have security concerns.“ ”Why? Do you believe open source software is more insecure than commercial software?“ He thought about it for a while. “Well… maybe. I don’t know.” We debated this a bit. Then we found the real issue: “It’s just that we don’t have control over the process. We don’t know enough about it to decide.” A couple of weeks later, I tried pitching to a newspaper company. This time, it was our sales team that raised the same question. “But… isn’t open source insecure?” I didn’t even bother pitching any open source stuff to them. But I’d learnt my lessons: ...

Organisational amnesia

It’s amazing how much of a dependency there is on individuals writing IT systems. Reminds me of that Dilbert strip: A few weeks ago, I was trying to figure out in what happens when there are multiple promotions. (Our client is a retailer.) I mean, if there’s a phone that costs £100 and there are 2 promotions: 10% off on phones and £10 off on phones. Do you apply the 10% off first and pay £80 or the £10 off and pay £81? ...

The courage to be honest

Some months ago, I was working with a client who wanted to set up a website with social commerce elements. (That’s Web 2.0 in fancy words.) They only seemed to have a very rough idea of what they wanted, so asked them right at the start of the meeting: “Why do you want social commerce?” Their answer was interesting, and one that I had not expected. They said, “We want to project the image of an honest an open organisation.” ...

Resolving the Prisoners Dilemma

If you’re ever taken a course in Economics, and it discussed Game Theory, you may be familiar with The Prisoner’s Dilemma. Roughly, this is the problem. Assume you possess copious quantities of some item (money, for example), and wish to obtain some amount of another item (perhaps stamps, groceries, diamonds). You arrange a mutually agreeable trade with the only dealer of that item known to you. You are both satisfied with the amounts you will be giving and getting. For some reason, though, your trade must take place in secret. Each of you agrees to leave a bag at a designated place in the forest, and to pick up the other’s bag at the other’s designated place. Suppose it is clear to both of you that the two of you will never meet or have further dealings with each other again. ...

Less is more

The hours in consulting are pretty long. 65 hours a week used to be my norm, and that’s ignoring the travel time to and from work. So there wasn’t too much life outside of work. (I’ve come to realise, though, that what you do outside of work doesn’t change that much with more free time. What does change is that you just enjoy it more – both in and out of work.) ...

Return on effort

If you have a bunch of projects you could do, and want to decide which ones to take up, I was taught a rule: if a project has positive net present value, do it. That is, find out how much money you have to put in (& when), and how much you’ll get out (& when). Adjust for money today being worth more than money tomorrow. If it makes a profit, just do it. ...

Filtering vs weighting

I am selecting a CRM package for a bank. I asked my colleagues how they’d gone about it, and got 8 responses. Every single one of them had the same weighting approach: Take a huge list of criteria, assign weights, score each package, calculate a weighted-average score, pick the highest one. As I mentioned earlier, I think weighting is a lousy method. (See Errors in multicriteria decision making.) You can’t say “I picked this package because it has X, Y and Z features, which the others don’t.” You can only say, “Oh, overall, it has the highest score…” ...

Errors in multicriteria decision making

I talked about my approach for multicriteria decision-making, and mentioned that it was fundamentally flawed. Here’s why. The charts above compared two industries. The bigger the area, the more favourable the industry. The underlying assumptions being: The criteria are comparable. (Points at the same level are of comparable importance. Twice as large is twice as important.) All (and only) relevant criteria have been included. In this particular example, I know for a fact that both these assumptions are invalid. And in every case I used this methodology, the assumptions fail. ...

Multicriteria decision making

Decisions are usually based on multiple criteria. You have to trade off between criteria. I’ve been involved many such decisions over the last 5 years. Example 1: A conglomerate wanted to identify industries for growth. We shortlisted 19 industries, identified 12 criteria for the attractiveness of an industry, researched each one and plotted them on spidergraphs like below. The intention was that, to identify the most favourable industries, you’d just pick the ones with the largest filled area. ...

Normalising non-normal distributions is bad

I was working with the treasury of a bank. They were trying to estimate how much money could flow out of their savings account in a day, worst case. I took their total savings account balance at the end of each day and found the standard deviation. I took thrice the standard deviation, and said, “You can be 99.7% sure that your daily loss won’t be more than 1.5% of the balance.” ...

Normalising non-random samples is bad

I rate movies on a scale of 1 (bad) to 5 (good). This is an absolute scale. Initially, I assumed that I would watch as many good movies as bad ones. So I'd have about as many 1s as 5s, and 2s as 4s. But, when I looked at my ratings for movies over the last year, I had far more 4s than 2s. My movie ratings were not normal. ...

Not all distributions are normal

14 years ago, I was introduced to the process of normalising grades. Professors “fit” students’ marks into a normal distribution and assign grades based on that. (I still don’t know how they do it). Since then, I’ve encountered normalising a lot. My performance at work is normalised. I normalise my song ratings and movie ratings. I’ve normalised all kinds of things at work: lead-time of delivery of fans, movements in savings account balances, calls to a call centre, demand for a resource… you name it. ...