Year: 2008

JPath – XPath for Javascript

XPath is a neat way of navigating deep XML structures. It’s like using a directory structure. /table//td gets all the TDs somewhere below TABLE.

Usually, you don’t need this sort of a thing for data structures, particularly in JavaScript. Something like table.td would already work. But sometimes, it does help to have something like XPath even for data structures, so I built a simple XPath-like processor for Javascript called JPath.

Here are some examples of how it would work:

jpath(context, “para”) returns context.para
jpath(context, “*”) returns all values of context (for both arrays and objects)
jpath(context, “para[0]”) returns context.para[0]
jpath(context, “para[last()]”) returns context.para[context.para.length]
jpath(context, “*/para”) returns context[all children].para
jpath(context, “/doc/chapter[5]/section[2]”) returns context.doc.chapter[5].section[2]
jpath(context, “chapter//para”) returns all para elements inside context.chapter
jpath(context, “//para”) returns all para elements inside context
jpath(context, “//olist/item”) returns all olist.item elements inside context
jpath(context, “.”) returns the context
jpath(context, “.//para”) same as //para
jpath(context, “//para/..”) returns the parent of all para elements inside context

Some caveats:

  • This is an implementation of the abbreviated syntax of XPath. You can’t use axis::nodetest
  • No functions are supported other than last()
  • Only node name tests are allowed, no nodetype tests. So you can’t do text() and node()
  • Indices are zero-based, not 1-based

There are a couple of reasons why this sort of thing is useful.

  • Extracting attributes deep down. Suppose you had an array of arrays, and you wanted the first element of each array.
    Column Selection
    You could do this the long way:
    for (var list=[], i=0; i < data.length; i++) {
        list.push(data[i][0]);
    }
    

    ... or the short way:

    $.map(data, function(v) {
        return v[1];
    })

    But the best would be something like:

    jpath(data, "//1")
    
  • Ragged data structures. Take for example the results from Google's AJAX feed API.
    {"responseData": {
     "feed": {
      "title": "Digg",
      "link": "http://digg.com/",
      "author": "",
      "description": "Digg",
      "type": "rss20",
      "entries": [
       {
        "title": "The Pirate Bay Moves Servers to Egypt Due to Copyright Laws",
        "link": "http://digg.com/tech_news/The_Pirate_Bay_Moves_Servers_to_Egypt_Due_to_Copyright_Laws",
        "author": "",
        "publishedDate": "Mon, 31 Mar 2008 23:13:33 -0700",
        "contentSnippet": "Due to the new copyright legislation that are going ...",
        "content": "Due to the new copyright legislation that are going to take...",
        "categories": [
        ]
       },
       {
        "title": "Millions Dead/Dying in Recent Mass-Rick-Rolling by YouTube.",
        "link": "http://digg.com/comedy/Millions_Dead_Dying_in_Recent_Mass_Rick_Rolling_by_YouTube",
        "author": "",
        "publishedDate": "Mon, 31 Mar 2008 22:53:30 -0700",
        "contentSnippet": "Click on any \u0022Featured Videos\u0022. When will the insanity stop?",
        "content": "Click on any \u0022Featured Videos\u0022. When will the insanity stop?",
        "categories": [
        ]
       },
       ...
      ]
     }
    }
    , "responseDetails": null, "responseStatus": 200}
    

    If you wanted all the title entries, including the feed title, the choice is between:

    var titles = [ result.feed.title ];
    for (var i=0, l=result.feed.entries.length; i<l; i++) {
        titles.push(result.feed.entries[i].title;
    }
    

    ... versus...

    titles = jpath(result, '//title');
    

    If, further, you wanted the list of all categories at one shot, you could use:

    jpath(result, "//categories/*")
    

Automating Internet Explorer with jQuery

Most of my screen-scraping so far has been through Perl (typically WWW::Mechanize). The big problem is that it doesn’t support Javascript, which can often be an issue:

  • The content may be Javascript-based. For example, Amazon.com shows the bestseller book list only if you have Javascript enabled. So if you’re scraping the Amazon main page for the books bestseller list, you won’t get it from the static HTML.
  • The navigation may require Javascript. Instead of links or buttons in forms, you might have Javascript functions. Many pages use these, and not all of them degrade gracefully into HTML. (Try using Google Video without Javascript.)
  • The login page uses Javascript. It creates some crazy session ID, and you need Javascript to reproduce what it does.
  • You might be testing a Javascript-based web-page. This was my main problem: how do I automate testing my pages, given that I make a lot of mistakes?

There are many approaches to overcoming this. The easiest is to use Win32::IE::Mechanize, which uses Internet Explorer in the background to actually load the page and do the scraping. It’s a bit slower than scraping just the HTML, but it’ll get the job done.

Another is to use Rhino. John Resig has written env.js that mimics the browser environment, and on most simple pages, it handles the Javascript quite well.

I would rather have a hybrid of both approaches. I don’t like the WWW::Mechanize interface. I’ve gotten used to jQuery‘s rather powerful selectors and chainability. So I’ll tell you a way of using jQuery to screen-scrape offline using Python. (It doesn’t have to be Python. Perl, Ruby, Javascript… any scripting language that can use COM on Windows will work.)

Let’s take Google Video. Currently, it relies almost entirely on Javascript. The video marked in red below appears only if you have Javascript.

The left box showing the top video uses Javascript

I’d like an automated way of checking what video is on top on Google Video every hour, and save the details. Clearly a task for automation, and clearly not one for pure HTML-scraping.

I know the video’s details are stored in elements with the following IDs (thanks to XPath checker):

ID What’s there
hs_title_link Link to the video
hs_duration_date Duration and date
hs_ratings Ratings. The stars indicate the rating and the span.Votes element inside it has the number of people who rated it.
hs_site The site that hosts the video
hs_description Short description

So I could do the following on Win32::IE::Mechanize.

use Win32::IE::Mechanize;
my $ie = Win32::IE::Mechanize->new( visible => 1 );
$ie->get("http://video.google.com/");
my @links = $ie->links
# ... then what?

I could go through each link to extract the hs_title_link, but there’s no way to get the other stuff.

Instead, we could take advantage of a couple of facts:

  • Internet Explorer exposes a COM interface. That’s what Win32::IE::Mechanize uses. You can use it in any scripting language (Perl, Ruby, Javascript, …) on Windows to control IE.
  • You can load jQuery on to any page. Just add a <script> tag pointing to jQuery. Then, you can call jQuery from the scripting language!

Let’s take this step by step. This Python program opens IE, loads Google Video and prints the text.

# Start Internet Explorer
import win32com.client
ie = win32com.client.Dispatch("InternetExplorer.Application")
 
# Display IE, so you'll know what's happening
ie.visible = 1
 
# Go to Google Video
ie.navigate("http://video.google.com/")
 
# Wait till the page is loaded
from time import sleep
while ie.Busy: sleep(0.2)
 
# Print the contents
# Watch out for Unicode
print ie.document.body.innertext.encode("utf-8")

The next step is to add jQuery to the Google Video page.

# Add the jQuery script to the browser
def addJQuery(browser,
    url="http://jqueryjs.googlecode.com/files/jquery-1.2.4.js"
 
    document = browser.document
    window = document.parentWindow
    head = document.getElementsByTagName("head")[0]
    script = document.createElement("script")
    script.type = "text/javascript"
    script.src = url
    head.appendChild(script)
    while not window.jQuery: sleep(0.1)
    return window.jQuery
 
jQuery = addJQuery(ie)

Now the variable jQuery contains the Javascript jQuery object. From here on, you can hardly tell if you’re working in Javascript or Python. Below are the expressions (in Python!) to get the video’s details.

# Video title: "McCain's YouTube Problem ..."
jQuery("#hs_title_link").text()
 
# Title link: '/videoplay?docid=1750591377151076231'
jQuery("#hs_title_link").attr("href")
 
# Duration and date: '3 min - May 18, 2008 - '
jQuery("#hs_duration_date").text()
 
# Rating: 5.0
jQuery("#hs_ratings img").length
 
# Number of ratings '(8,288 Ratings) '
jQuery("#hs_ratings span.Votes").text()
 
# Site: 'Watch this video on youtube.com'
jQuery("#hs_site").text()
 
# Video description
jQuery("#hs_description").text()

This wouldn’t have worked out as neatly in Perl, simply because you’d need to use -> instead of . (dot). With Python (and with Ruby and Javascript on cscript), you can almost cut-and-paste jQuery code.

If you want to click on the top video link, use:

jQuery("#hs_title_link").get(0).click()

In addition, you can use the keyboard as well. If you want to type username TAB password, use this:

shell = win32com.client.Dispatch("WScript.Shell")
shell.sendkeys("username{TAB}password")

You can use any of the arrow keys, control keys, etc. Refer to the SendKeys Method on MSDN.

Statistically improbable phrases on Google AppEngine

I read about Google AppEngine early this morning, and applied for an invite. Google’s issuing beta invites to the first 10,000 users. I was pretty convinced I wasn’t among those, but turns out I was lucky.

AppEngine lets you write web apps that Google hosts. People have been highlighting that it give you access to the Google File System and BigTable for the first time. But to me, that isn’t a big deal. (I’m not too worried about reliability, and MySQL / flat files work perfectly well for me as a data store.)

What’s more interesting unlike Amazon’s EC2 and S3, this is free up to a certain quota. And you get a fair bit of processing power and bandwidth for free. One of the reasons I’ve held back on creating some apps was simply because it would take away too much bandwidth / CPU cycles from my site. (I’ve had this problem before.) Google quota is 10 GB of bandwidth per day (which is about 30 times what my site uses). And this is on Google’s incredibly fast servers It also offers 200 million megacycles a day. That’s like a dedicated 2.3 GHz processor (200 million megacycles = 200,000 GHz x 1 second ~ 2.3 GHz x 86,400 seconds/day) — better, because this is the average capacity, not peak capacity. The only restriction that really worries me is that only 3 apps are allowed per developer.

So I decided to give a shot at publishing some code I’d kept in reserve for a long time. You may remember my statistical analysis of Calvin & Hobbes. For this, I’d created a script in Perl that could generate Statistically Improbable Phrases (SIPs) for any text. This is based on (a somewhat limited) 23MB corpus of ebooks that I had. I’d wanted to put that up on my website, but …

AppEngine only uses Python. So the first task was to get Python, and then to learn Python. The only saving grace was that I was just cutting-and-pasting most of the time. Google wasn’t helping:

Google AppEngine Over Quota Error

Anyway, the site is up. You can view it at sip.s-anand.net for now. Just type a URL, and it’ll tell you the improbable words in that site.

Visit sip.s-anand.net

Technical notes

I realise that these are statistically improbable words, not phrases. I’ll get to the phrases in a while.

The logic is simple:

  • Get the frequency of words in a corpus. I pre-generated this file. It has over 100,000 words.
  • Get the URL as text. Rather than muck around with Python, I decided to use the W3 html2txt service.
  • Convert the text to words. Splitting text into words is tricky. For now, I’m simply assuming that any group of letters is a word, and anything that’s not a letter is a word delimiter.
  • Find the relative frequency (improbability) of words. This is the frequency in the URL divided by the frequency in the corpus, normalised (i.e. scale it so that the maximum value is 1.0).
  • Create a tag cloud. I use the word frequency as the size and the improbability as the colour. You need a bit of mathematical jugglery to get the pattern right. Right now, I’m taking the 6th root of the improbability and the logarithm of the frequency to get a reasonably smooth tag cloud.

The source code is at statistically-improbable-phrases.googlecode.com.

Update: 12-Apr-2008. I’ve added some interactivity. You can play with the contrast and font size, the filter out common or infrequent words.

Update: 22-Apr-2008. Added concordance. You can click on a word and see the context in which it appears.

Firefox 3 Beta 5 crashes

I just upgraded from Firefox 3 Beta 4 to Beta 5. It’s amazing how unstable Beta 5 is compared to the earlier version. Gmail crashes. Google maps crashes. Almost every other site I visit crashes. And looks like I’m not alone: doing a Google search for “Firefox 3 beta x crash” shows a consistently increasing number of results.

Number of Google search results for Firefox 3 Beta crashes, by Beta version

Update (8/Apr/08): As the comments rightly point out, this could simply be because more people use Beta 5. Here’s the number of Google hits for “Firefox 3 Beta x” — and it shows a clear increasing trend.

Number of Google search results for Firefox 3 Beta, by Beta version

So, adjusting for this, here’s the relative crash frequency:

% of Firefox 3 Beta crash mentions on Google, by Beta version

Beta 5 still stands out.

Maybe Google search results are not a good proxy. Maybe the mention of “crash” doesn’t indicate the software itself crashing. But it sure crashes a lot more for me.

Time management

Some years ago, a friend asked me to write about how I manage my time. It seemed to him I was doing a good job of it, given that I had time to pursue my interests.

It’s something I tried to do consciously. Every few years, I used to go down the route of “time management”. I’d read stuff and try it out.

But over time, I’ve come to believe that “time” is not really “manageable”. Think about it: are most of your actions planned? Me, I just react out of habit, no matter how well planned I try to be. What I do is largely driven by what I’m in the habit of doing.

Not that time management advice is useless, but you’ll end up not following most of it. You act on a fraction of what you read. A fraction of that turns into a habit. That’s still useful. But the point is, rather than pick up 10 tips on time management, it’s more useful to pick one or two pieces of advice that you like, and are likely to act on. (You won’t do things you don’t like anyway.)

So time management is about acquiring habits that save time (and is not about reading tips that are tough to habitualise).

That begs an obvious question and a subtle one. The obvious one is what habits save time? The subtle one is why save time?

Why save time?

You’ve probably heard the phrase “time is money”. For a while, I took that statement literally. I tried to act by assigning monetary value to my time, and by doing the most profitable thing.

I was making Rs 10,000 a month at that time. That’s about Rs 50 an hour. So I figured I wouldn’t do anything that earned me less than Rs 50 an hour outside of work. I mean, if I’m making Rs 50 an hour at work, why should I make any less outside?

One small hitch. I wasn’t making any money outside of work. In fact, I was spending money. So unless I took up a night job, or started freelancing, that rule of thumb was useless. (Besides, I didn’t want to spend time outside of work working. I wanted to have fun. Watch movies, for instance.)

So I needed a different way of handling this. If I spend 3 hours at a movie for Rs 60, that could be a benchmark. If something’s more expensive than Rs 20/hour, I’d rather watch a movie. If it’s less expensive, I’d do that. Take books, for instance. A typical novel would cost Rs 180 and I’d finish it in 12 hours. At Rs 15 / hour it’s a more economical way of spending time.

Except that it doesn’t quite work that way. How much fun I had, had nothing to do with how much I paid for it.

Frankly, in daily life, I don’t think you can treat the phrase “time is money” literally. Time has nothing to do with money.

Time is like money in a different way, though. By itself, it isn’t worth much. Think about it. What can you do with money? Buy stuff you like. And if you can’t, it’s useless.

Obelix: How silly! Fancy throwing out good onion soup to make room for sesterii! Asterix: But Obelix, with sesterii, you can buy onion soup! Obelix: That's the point! Why throw out the onion soup when it was in the cauldron already?

If all you need is onion soup, why throw it out for sesterii?

Time’s like that. What can you do with time? Do stuff you like. And if you can’t, it’s useless.

There are usually two reasons people want to manage time. One is where they don’t enjoy something, and would rather spend as little time at it as possible. But look, if you don’t enjoy that stuff, time management isn’t your problem. You need to get out of your job or whatever. Managing time more efficiently is simply going to let you efficiently waste your time. (Though in the short run, that’s probably the best you can do — efficiently get rid of nuisances. I’ll talk about that shortly.)

The other reason is where they have too many (enjoyable) things to do, and can’t do all of them. But hey, if you have too much enjoyable stuff, you don’t have a problem! In a way, this is like wanting to buy many things and not having enough money. With money, you can earn more or wish for less. With time, you just have to wish for less. (Living longer may not be a practical option.) Just pick anything you like to do. Don’t regret the stuff you can’t. You only have 24 hours, and you’re among the lucky few who can fill it with things you enjoy.

So, I’m effectively saying, there’s no point trying to do things more efficiently in the long run. Picking what you do is more important than doing it efficiently. (That roughly correlates to the third habit in Stephen Covey’s Seven Habits: Put First Things First. It’s the key to time management.)


So, how do you pick what to do? You’d probably want to pick something that you like, or something that’s good for you.

But it’s tricky to predict what you like.

  • We don’t know what we want. Sometimes, it’s as simple as that — we just don’t know what we’d like to do.
  • Too much of anything… I love watching movies, but I’ve never managed to watch more than 4 a day. I’ve tried breaking that record many times. Just doesn’t work. At the end of the 4th movie, I’m sick and my bum is sore. Do I prefer movies to cleaning up? Usually. But by the end of the 4th, I’d rather clean up.
  • Preferences are not consistent. I prefer a 7 megapixel camera to a 2 megapixel one. I prefer a cheaper camera to a more expensive one. So between a $100 2MP camera and a $200 7MP camera, I’m just making a wild guess.
  • Preferences are not static. If I’m tired, I’d rather watch a movie I’ve seen before. If not, I’ll experiment with an art film. There’s no telling beforehand what my mood is going to be at any point.

It’s just as tricky to figure out what’s good for us. We have no clue what will happen tomorrow. We have no clue what consequences our actions will have. (Read The Black Swan to get a flavour of that.) So we’re really guessing and groping — though sometimes with a lot of confidence.

On the whole, it’s difficult to figure out what to pick. So what do you do?

This is completely outside the realm of time management. This is about choice. I have a few (bad) habits that guide me.

  1. Follow your moods
  2. Work less
  3. Procrastinate

Those are my principles. (But like Groucho Marx, I do have others.)

Follow your moods

There are times when people do certain things better. I’ve heard some people study best early in the morning. Others study best late at night. I don’t know if there’s any physiological benefit one way or the other, but even if it’s psychological, it makes a huge difference to study when you think you’ll learn better.

Sometimes I’m in a mood to write articles. When I do, the article usually writes itself. If not, I could spend days at it without any progress.

If there’s any reality to this, then the best thing to do is to do what you feel like doing. You’ll naturally accomplish this faster. That’s typically what I do when I’m given any work. I usually wait until I just feel like it. Then it’s usually a matter of a few hours before the job is done. Sometimes the mood doesn’t quite arrive before the deadline, in which case there’s always inspiration.

Calvin & Hobbes: Do you have an idea for your story yet? No, I'm waiting for inspiration. You can't just turn on creativity like a faucet. You have to be in the right mood. What mood is that? Last-minute panic.

Seriously: do what you feel like doing the most at the moment. That’s a great way of becoming more efficient.

In fact, I would go as far as saying, mood management is more important than time management. Moods are more precious than time. If you’re in a mood to call people, pick up the phone and talk to folks you’ve been out of touch with. That mood is rarer than the time to make calls. (At least for me, the reason I am not in touch is because I’m not in a mood — not because I don’t have time.)

Optimise that mood. Do what you’re in a mood for. And when your mood changes, go with the flow. Do a lot more of what you feel like doing. You’ll do more (which is probably good), and of what you like (which is certainly good).

Work less

I’ve talked about this in Less is more. At the end of the day, 90% of the stuff you do is useless. So why do it? Just focus on the 10%.

Procrastinate

I can’t put this better than Paul Graham’s article on procrastination.

Good procrastination is avoiding errands to do real work.

You won’t know what the important 10% until much later, so you may as well wait to find out if it’s important, and then do things.


So what am I saying?

  • Time management is about habits, not tips
  • Picking what you do is more important than doing it efficiently
  • But it’s difficult to figure out what to pick
  • So avoid doing stuff until you know it’s worth doing
  • Work when you’re in the mood — it’s faster that way

Think about it.

Reading books on a laptop

I have the habit of reading books on the screen. It’s something that started from the early 90s, when I got a copy of The MIT Guide to Lockpicking. Since I didn’t have access to a printer, I’d spent hours poring over the document on the screen. And then I discovered Project Gutenburg

I’ve heard many people ask if I have a problem with this. Personally, no. I’ve been staring at screens from the age of 12, and I’m quite used to it. My job requires me to stare at a screen for most of the day anyway. (I’m not saying there’s no a strain on the eye. My eyes are red at the end of the day. I don’t know if they would be less red if I’d been staring at paper instead of a screen. But my glasses have remained roughly the same power over ~15 years, so it’s probably not ruining my eyesight much.) For those who are like me who reads all the time and spends a lot of more time facing their laptops, you might want to check this sd card, a very good quality card that can be handy in the future.

To me, the main advantage of a book is that a book is a lot easier to handle.

  • You can fit a book into your bag, sometimes into your pocket.
  • You can hold it in your hand comfortably — it’s easy to grip, and light.
  • You can open it instantly (no need to boot up).
  • You can bookmark it (or even just remember the last page number) and quickly flip to that

None of these is possible on a computer.

Or is it?

On a desktop, I agree — it’s impossible to read for long. Your back would kill you. I’ve done it for many years, and it’s not worth the pain. With a laptop, however, you can lie down on the bed or sofa and read. It’s a huge advantage. (For just this one reason alone, I’d suggest that everyone buy a laptop.)

As for carrying books, I carry my laptop to work every day, so there’s no incremental burden. But if you weren’t doing that, it’s probably not a great idea. When I travel on weekends, I’d much rather take a physical book than a laptop. This is probably the single biggest problem with a laptop — that it doesn’t travel as easy as a book.

That’s probably offset by the advantage that a laptop isn’t really a book — it’s a library. I don’t need to decide which book to read. I can bring them all along, pick what I like, and when I’m done, move on to the next. And I’m not restricted to books. I have a fairly good collection of movie scripts and comics. Depending on how long I have on the train, and my mood, I can pick between these.

One thing that makes a laptop a lot easier to use is to rotate it.

Laptop in landscape mode

Laptop in portrait mode (rotated)

If you hold the laptop this way, it’s surprisingly easy to handle. I find that I can read this way even when standing on a crowded train — which is as much as I can expect from any book. (Strangely enough, it doesn’t seem to attract too much attention on the train either.)

If you have a decent graphics card, you can rotate your screen using the graphics properties. (I’m sure there are are hotkeys to do this. My two-year old daughter somehow knows them, and manages to turn the screen upside down in a fraction of a second, while I spend then next 5 minutes struggling to restore an upside-down screen.)

If not, you can just use a PDF reader (like FoxIt, which is better than Acrobat Reader) to rotate the page by 90°.

A laptop takes care of the problems of bookmarking and load time as well. I usually leave mine on hibernate, and it takes about 10 seconds to open up to where I left off. Sometimes I just leave the laptop on in the bag — for example if I’m changing trains.

The other solution, of course, is to try an ebook reader. Given my laptop, I haven’t tried one. But other than the ease of holding it, there’s no big I see.


The other question is, how do you find ebooks?. Other than buying them, I find that the easiest option is to search on Google. A surprisingly large number of them are indexed.

Here’s a custom search engine for ebooks.

Chaining functions in Javascript

One of the coolest features of jQuery is the ability to chain functions. The output of a function is the calling object. So instead of writing:

var a = $("<div></div>");
a.appendTo($("#id"));
a.hide();

… I can instead write:

$("<div></div>").appendTo($("#id")).hide();

A reasonable number of predefined Javascript functions can be used this way. I make extensive use of it with the String.replace function.

But where this feature is not available, you an create it in a fairly unobstrusive way. Just add this code to your script:

Function.prototype.chain = function() {
var that = this;
return function() {
    // New function runs the old function
    var retVal = that.apply(this, arguments);
    // Returns "this" if old function returned nothing
    if (typeof retVal == "undefined") { return this; }
                // else returns old value
    else { return retVal; }
}
};
var chain = function(obj) {
        for (var fn in obj) {
                if (typeof obj[fn] == "function") {
                    obj[fn] = obj[fn].chain();
                }
    }
        return obj;
}

Now, chain(object) returns the same object, with all its functions replaced with chainable versions.

What’s the use? Well, take the Google AJAX search API. Normally, to search for the top 8 “Harry Potter” PDFs on esnips.com, I’d have to do:

    var searcher = new google.search.WebSearch();
    searcher.setQueryAddition("filetype:PDF");
    searcher.setResultSetSize(google.search.Search.LARGE_RESULTSET);
    searcher.setSiteRestriction("esnips.com");
    searcher.setSearchCompleteCallback(onSearch);
    searcher.execute("Harry Potter");

Instead, I can now do this:

chain(new google.search.WebSearch())
.setQueryAddition("filetype:PDF")
.setResultSetSize(google.search.Search.LARGE_RESULTSET)
.setSiteRestriction("esnips.com")
.setSearchCompleteCallback(onSearch)
.execute("Harry Potter");

(On the whole, it’s probably not worth the effort. Somehow, I just like code that looks like this.)

Less is more

The hours in consulting are pretty long. 65 hours a week used to be my norm, and that’s ignoring the travel time to and from work. So there wasn’t too much life outside of work. (I’ve come to realise, though, that what you do outside of work doesn’t change that much with more free time. What does change is that you just enjoy it more — both in and out of work.)

We have a day, once every month or two, where you take time off from whatever project and head back to the office. One such featured a session with the managers telling the consultants how to succeed. Pretty good advice, actually… but that’s not what I’m going to talk about. It’s something about the nature of that advice.

The advice had a lot of TO-DOs and suggestions. Do this. Do that. Focus more on this. Focus a lot on that. Great. Now we know what to do more of.

My question, towards the middle of the session, was: OK, so what do we do less of, then?

You can’t do more of something unless you do less of something else. In most places, it’s easy to answer this with: “Oh, you need to be more efficient.” or “Cut the idle gossip”. For us, none of these were applicable.

The question pretty much remained unanswered. And with good reason. It’s a tough question.

Later, I got involved with a proposal. I wrote a few bits of it. (One page, actually.) Others wrote a few bits of it. And then some standard appendices were added to it. Finally, it ended up as a 180-page document.

The interesting thing is, I can bet no human ever read those 180 pages end-to-end.

I know no one at our end did, because we turned it around in 1 week, and I was the last to assemble the document before sending it out.

I’m guessing no one at the client end did, because they’d have gotten 5 such documents, and had a week to shortlist down to 3.

So if we didn’t read it and they didn’t read it, why did we put it in?

I think I know why. In my IBM days, I had to make a presentation to the management on productivity. I knew nothing of management or productivity. So I put in a report that had a lot of high-sounding words (you know… value-add, leverage, etc.) that looked reasonably impressive and had no basis in fact.

I did that mostly because I was scared. Of seeming to know less. Of being wrong. You know.

(Funnily enough, the presentation was pretty well received. I don’t know if it was because they were polite or had become numb to bullshit.)

This fear is pretty common. I know how that 180-page document ended up as a 180-page document, and I’m sure you’ve seen this happening before. First, here’s a sample conversation at the client end, when they’re writing up a request for information.

Martin: So, what do I put in the RFI?

Clive: Here’s a template we used. You can use some of that. Ask Nick for the one he used last month, and Natalie for hers. Maybe you should get something from our procurement team and information security group to be on the safe side.

Martin: And how do I make the RFI out of this? (BTW, this is a “bold” question that’s rarely asked.)

Clive: Well, make sure you cover everything from all of these documents.

So the RFI asks asks:

  • if any of your 80,000 employees are a member of any one of the following 340 organisations that are considered disruptive,
  • how many employees you have in each geography, function and vertical — where the break-down provided is as per their definitions (we cook up numbers which, if you add up, totals to over 200,000)
  • how much you spent on paper-clips last fortnight, and other such intimate corporate P&L secrets

And we answer these. The answers to the above 3 questions were “No”, a table of numbers, and “We are not at liberty to divulge this information…”

Now, looking at the answers above, it still doesn’t add up to 180 pages. It’s hardly half a page. But you’ve got to take the following conversation at our end into account.

Steve: You know, we’ve got to put in some details about our methodologies in this section.

Me: I have.

Steve: Yeah, but maybe we should add more, you know, like supply chain methodologies and change management.

Me: But they’re irrelevant!

Steve: Well, can’t say that. Change management is always relevant. SCM… well, no harm putting it in. They can skip it if they don’t want to read about it.

That’s it, isn’t it? There’s no harm in doing more. I’ll just toss it in. If you don’t want to read it, skip it. I’ll just ask you to do more of these. If you can’t, skip the useless stuff.

An innocuous sounding statement: do more. I tremble whenever anyone suggests it. There’s no defence.

There’s a fundamental belief at work here. That more is better.

This is fueled by a lack of confidence. Put in high-sounding words. They look impressive. What’s missed is that experts use jargon because they understand what it means, and it conveys a lot in few words. Others follow a cargo cult science.

What we lose, though, is subtle.

Firstly, it wastes time. It wastes my time. It wastes your time. But hey, time is not all that important. (I’m not saying this sarcastically. I believe that wasting time is quite OK, really, and it’s not such a big deal.)

What’s more important is that it destroys focus. Some things in the document are important. Most others are not. In a 180-page document, I can’t find the important stuff! It actually does harm to put it in if it’s irrelevant.

That’s the tough tradeoff, really. A tangible incremental value against an intangible loss of focus. The value looks attractive when you’re less confident. The document seems completely unfocused anyway.

So what the heck, put it in.

Do more of this. And that too.


So what can you do? Quite a bit, surprisingly.

Firstly, you’ve got to believe that less is more. The response to “What’s the harm in adding…?” is “It dilutes the message”. There’s two things here. Believing it. And having the courage to say it. Trust me, you really believe it only when you say it.

Next, you’ve got to understand — really understand — before you write or speak. That requires not fooling yourself. And it requires a lot of practice. I’ve had nearly 20 years of training in fooling myself, so it’s an uphill task. Many people are worse off, never having tasted true understanding.

Third, you’ve got to be brave enough to shut up, or say “I don’t know”. Initially, this was tough for me, but I learnt from a friend. I always thought him not-so-smart, but honest. He’d ask, “But why?” and when I’d explain, he’d say, “I don’t understand it.” After two hours of trying to get him to understand, I’d realise that I was the one who never got it in the first place. After a while, I got into the habit of being very prepared before I explained anything to him.

Saying “I don’t know” doesn’t make people think less of you, I’ve found. I know a lot of people disagree with me. One of the most consistent feedbacks I’ve received in the first half of any project or firm I’ve been in is, “He should speak up.” Dammit, I don’t have anything to say! If I know something, I’ll say it. If not, I’ll shut up. Now, despite this feedback, no one’s quite objected to me. And in the second half, they’re always amazed at how much I’ve improved based on the feedback.

The feedback had nothing to do with it, of course. I just happen to know more in the second half of a project.

There’s a reason why your boss wants you to talk. It makes you appear knowledgable. In the short term, that’s good. You talk about “value” and “leverage” and people nod wisely.

In the long term, it makes you less able to say “I don’t know.” (What? This brilliant chap who knew all about value and leverage doesn’t understand our way of calculating ROI?)

It makes you less likely to ask questions.

It makes you learn less.

It makes you dumb.

On the other hand, I’ve learnt to plead ignorance up front. “Do you understand ROI?” “No.” Not even an excuse for it. Frankly, it saves time.

Sometimes, a meeting’s running late, I’m hungry, and I just nod at whatever’s said, and you lose the window of opportunity to ask. Except, I’ve learnt, there’s no such thing as a window of opportunity. If you don’t get it, ask. If they’ve said it thrice, and you still don’t get it, ask. More likely they’re not clear about it.


Postscript: This morning, I had to convert a document into a standard template. My document was 3 pages long. The template (just the headings) was 14 pages long.

Why? Because someone wants all documents in that format. Does it help them? Maybe not. But it has to be done. Standards.

Sometimes, it’s easier to give up. The smart thing is to minimise the effort on pointless work. I took 15 minutes. Beyond a point, I protect myself rather than the poor reader.