How I do things

Goodbye Google

Google Reader was where I spent most of my browsing time, but now, it’s shutting down.

Time for alternatives, but not just for Reader: for all Google products. I’m not sure when one of these might go down, become paid, or become unusable.

I just uninstalled Google Drive and Google Talk. but I don’t use it much (I use Skype), so no loss. I’ll leave Chrome for the while, but I’m hearing reports that Firefox is improving faster than Chrome is. Or there’s Chromium.

I’m not worried much about search services (including image, video, scholar and books). When needed, I can switch. Scholar might be a bit sad to lose, but I don’t use it much. Google Translate, too, isn’t essential.

Likewise for content. YouTube’s not a problem. There’re enough other video services. Trends are useful, but not critical. Maps might be, so I’ll try and switch to OpenStreetMap. I don’t use News or Picasa much.

I don’t care much for social media anyway, so Blogger, Orkut and Plus can die any time.

Google’s apps are the worrying ones. Mail and Calendar, in particular. I’ll probably migrate away from them last, but the attempt is on. I’ll be documenting the alternatives I find at https://gist.github.com/sanand0/5176161 (safely cloned locally).

Looks like there’s no safe long-term alternative to being able to host your own apps. Pity.

Streaming audio to iOS via VLC

You can play a song on your PC and listen to it on your iPhone / iPad – converting your PC into a radio station. As with most things VLC related, it’s tough to figure out but obvious in retrospect.

The first thing to do is set up the MIME type for the streaming. This is a bug that has been fixed, but might not have made it into your version of VLC.

Go to Tools – Preferences.

vlc-pref-1

Click on “All” to see all the settings.

vlc-pref-2

Under Stream output – Access output – HTTP, set Mime to audio/x-mpeg.

vlc-pref-3

At this point, you should restart VLC.

As I mentioned earlier, you might not need to do this if you have new enough a version of VLC that auto-detects the content’s MIME type.

Re-open VLC, and go to the Media – Stream menu.

vlc-stream-1

Click Add and choose the file you want to stream. Then click on Stream.

vlc-stream-2

Click Next.

vlc-stream-3

Select HTTP and click Add.

vlc-stream-4

Select Audio – MP3 and click on Stream.

vlc-stream-5

At this point, the audio is being streamed at port 8080 of your machine. You can change the port and path in the menu above. (To find your local IP address, open the Command Prompt and type ipconfig.)

Open Safari on your iPhone or iPad, and visit http://your-ip-address:8080/

vlc-ipad-streaming

I haven’t figured out the right codec and MIME type to do this for videos yet, but hopefully will figure it out soon.

Storytelling: Part 1

In a number of sessions I’ve been to, people ask analysts to make their results more interesting – to tell stories with them. I’m co-teaching a course, part of which involves telling stories with data. So this got me thinking: what is a story? How does one teach storytelling to, let’s say, an alien?

Consider this mini-paper.

ABSTRACT: Meter readings exhibit spikes at slab boundaries. We also
find significant evidence of improbably events at round numbers.

Electricity shortage is a serious problem in most Indian states. Part
of this problem is due to the inaccuracy of reporting procedures used
in monitoring meter readings. Our focus here is not to document or
experimentally determine the degree of inaccuracy. We have adopted a
data driven approach to this problem and attempt to model the extent
of inaccuracy using basic statistical analysis techniques such as
histograms and the comparison of means.

Our dataset comprises of the frequency analysis 12-month dataset
containing monthly meter readings of 1.8 million customers in the
State of Andhra Pradesh.

We find that a histogram of these readings shows unexpectedly high
values at the slab boundaries: 50 (+45.342%, t > 13.431), 100
(+55.134%, t > 16.384), 200 (+33.341%, t > 15.232), and 300
(+42.138%, t > 19.958).

We also detected spikes at round numbers: 10 (+15.341%, t > 5.315),
20 (+18.576%, t > 6.152), 30 (+11.341%, t > 4.319).

The statistical significance of every deviation listed above is over
99.9%. Further, every deviation has a positive mantissa. This leads us
to confidently declare the existence of a systematic bias in the meter
readings analysed.

You’re probably thinking: “I know why he’s put this example here. It must be a bad one. So, what a rotten paper it must be!”

Well, not quite. It’s a good piece of analysis. I did it myself and there’s a fair bit of effort and care behind these short paragraphs.

The trouble is, if I read it out to my daughter, she’d say “What?” and not understand a word. My wife’d say “So what?” and not care a bit. I might as well not have written it.

It’s like that Zen thing: If a tree falls in a forest and no on hears it, does it make a sound?

If you did a piece of analysis, and no one understands or cares about it, why did you do it in the first place?

Why do you do it?

That last question is important: why do we analyse?

Sometimes, we do it for fun. The knowledge is beautiful. Knowing Tetris is NP-Complete is rewarding, even though my colleague sarcastically remarked, “Thank God! I’m sooo relieved now that I know that Tetris is NP whatever.” If that’s the case with you, great. Write the analysis any which way you’ll enjoy.

Sometimes, we do it because we’re forced to. In class. At work. Wherever. But that’s another way of saying “I don’t know why I’m doing it.” In that case, I’d gently recommend watching 3 Idiots.

Most often, we do it to share knowledge and drive actions. In that case, if no on understands it, or does anything with it, why do it?

Keep it simple

We prerajulisation of Farhanitate flagellated with ...

Would your audience understand that? Or are you just scared that simple words indicate a simple mind?

I was once afraid. 15 years ago, when writing a paper on IBM India’s competitive advantage for the CXOs, I was worried about it being too simple. I didn’t know anything about management. So I filled it with jargon. They politely nodded when I presented it, but I wasn’t fooling anyone. If there’s no content, jargon doesn’t help.

Unfortunately, it’s become polite to accept jargon as a substitute for substance. Why were they not ripping me apart? Or at least, kindly asking me what on earth I wanted to say?

My friend Manoj did that. In his nice, humble way, he asked, “But Anand, what does this mean?” When I explained it to him, I found I didn’t have a clue. He was OK with that. He just wanted to make sure he hadn’t missed something.

(That’s the technique I use these days. Ask people to explain things clearly. It’s OK if they’re just lost in jargon. I just want to make sure I haven’t missed something.)

Don’t cloak your ignorance. No one will think less of you. In the long run, you’ll learn more, and won’t need the jargon.

Part 2 of the article will talk about focusing on people and actions; storylining and the pyramid principle; and the structure of messages.

Style of blogging

Until 2007, my blog was mostly just linking to stuff I found interesting on the Web. Since 2007, I’ve tried to write longer articles, mostly based on my own experiences.

At the moment, that’s unsustainable. Right now, being in a startup, I doing more stuff than I ever have in the past. (That does not mean working more hours, by the way.)

My posts, going forward, are likely to be smaller, less original, but hopefully more frequent.

Downloading songs from YouTube

Five years ago, I built a song search engine – mainly because I needed to listen to songs. Three years ago, I stopped updating it – mainly because I stopped listening to songs actively, and have been busy since. For those of you who have been using my site for music: my apologies.

These days, I don’t really find the need to download music. YouTube has most of the songs I need. Bandwidth is pretty good too even when on the move.

But when I do need to download music, this is my new workflow.

  1. Find the song on YouTube. (Misspellings are still an issue, but you’ll usually find what you need)
  2. Download the video. Keepvid is the simple option. youtube-dlis the geek’s option (for multiple downloads)
  3. Use VLC – the swiss-army knife of media – to convert the video into an MP3.

That last step requires a bit of explaining. It’s very simple once you know how, but it took me a few months to get it right. So here goes.

Select the Convert / Save option in the Media menu.

audio-conversion-1

Click on Add to open file you want to convert. You can pick a track from an disk as well if you want to rip an audio CD or a DVD.

audio-conversion-2

Choose the file.

audio-conversion-3

Click on Convert / Save.

audio-conversion-4

Type the destination filename. Make sure you type the full file name, and not just the name of the folder.

audio-conversion-5

Select the output format you want under Settings – Profile. You can tweak the bitrate with the settings button, but I usually don’t bother.

audio-conversion-6

When you click on the Start button, the file will be converted or the CD will be ripped. You’ll see the position marker move fairly fast.

audio-conversion-7

 

The only problem I have with this method is that I can’t seem to do batch conversions easily enough with the GUI. Does anyone have any other workflow they like?

Update (31 Jul 2012): Aditya Sengupta suggests the following: (should’ve guessed VLC would have something up its sleeve)

vlc -I dummy $FILENAME --no-sout-video --sout "#transcode{acodec=mp3,5Dab=AUDIO_BITRATE,channels=2}:std{access=file,mux=raw,dst=$NAME.mp3}" vlc://quit

Scraping for a laptop

I’ve returned my laptop, and it’s time to buy a new one. For the first time in my life, I’m buying a laptop for myself.

I have a fairly clear idea of what I want: a 500GB+ 7200 rpm hard disk with 4GB of RAM and an Intel Core i7. I thought that would make finding one of those powerful laptops for producing music since I record some stuff too out of hobby.

Sheer naïveté. Not a single site let me filter by hard disk rpm in India. (To be fair, I haven’t found any sites outside India that did that either.)

After spending a good two hours hunting for the details and collating it, I did what I normally would: spend 30 minutes writing a scraper. The scraper runs through all laptops on Flipkart and pulls out all of their specs. Thanks to the diligence of the good folks at Flipkart, this information is readily available on each page. The HTML is structured quite neatly too, so it was just a 30-line program to scrape it all. Full credit to ScraperWiki as well — I could use it on a netbook without any developer tools installed.

The scraper took 2 hours to run. Feel free to filter through the output (CSV) for your favourite laptop, or fork the code and pull any other data you like.

The next chapter of my life

I’m writing this post on a one-way flight from London back to India. I’ve moved on from Infosys Consulting, and am starting up on my own.

I’ve wanted to do this for a long time. There’s always more freedom in your own company than someone else’s. There’s often more money in it too, if you’re lucky enough. But my upbringing is a bit too conservative to make that bold step. However, given that my father runs his own firm, I figured it was just a question of time for me to do the same.

Two years ago, in Jan 2010, I picked up Rashmi Bansal’s Stay Hungry Stay Foolish at an airport. That book killed the last bit of resistance I had. If the people in that book could succeed, I felt I could too. And if what they did (building small companies, not huge ones) could be called a success, I could be successful too.

After the flight, it was clear in my mind. I would be an entrepreneur. I would create a small company that would probably fold. Then I’d do it again. And again, 10 times, because 1 in 10 companies survive. And finally, I’d be running a small business that’d be called successful by virtue of having survived. A modest, achievable ambition that I had the courage for.

I usually make big decisions without analysis, by just sleeping over them. I slept over it and announced it to my family the next day. I’m not sure they believed me.

Two months later, along with a friend, I built a dynamic digital image resizing product. We had our wives start a company in the UK, and tried selling it to retailers. There clearly was a demand. The problem was, we didn’t know how to sell. After a year and having spent £500 with no sales, it was clear to us that venture #1 had failed. We eventually shut it down.

In the middle of this, my ex- boss from IBM told me that he was looking to start a venture, focusing on mobile, rural BPO and energy management. This later on changed to data analytics and visualisation. They all sounded like fun, so I said I’ll help out in my spare time.

A few months later, a classmate told me he’d started a business digitising school report cards. That sounded like fun too, so I said I’d help out in my spare time.

Now, if that sounds like I had a lot of spare time on my hands — you’re right, I did. And it’s time to talk about the jobs in my life. My first 3 years at IBM were fun. I was coding, learning, and leading a bachelor’s life with friends, money, and no responsibilities. My 4 years at BCG were strenuous with 80-hour weeks, but it was interesting and challenging. I was newly married, and between work and home responsibilities, I had no time for fun.

I moved to Infosys Consulting in the UK with the specific aim of rectifying that (and for health reasons as well). In the last 7 years, the work has (except on occasion) been a bit boring, but very relaxing. On most days, I would spend 4 hours working, and 4 hours learning new stuff. The things I learnt only helped me be more efficient. So I ended up getting even more work done in less time.

Many things came out of this. Firstly, I recovered my health. We had a daughter, and I spent more time with her. I started coding in earnest again. By 2007, I was writing code as part of my projects — stuff that others whose job it was were unable to. By 2009, I had a few websites running, like an Indian music search engine, an IMDb Top 250 tracker, a few transliterators, and so on.

So when I said I’d help out with these startups, it wasn’t an empty promise. For the last 18 months, I’ve had a day job and three night jobs. I never did justice to any of them in my opinion, but I had more fun than ever in my life, I learnt more than ever in my life, and I produced more tangible output than ever in my life. Sometimes, quantity beats quality or reliability.

Both these startups are doing well today. Gramener.com offers data visualisation and IT services. I will be joining them as Chief Data Scientist. Reportbee.com offers a hosted report card solution. I will continue helping them out. And I will continue working with a few NGOs.

You’ll see me a lot more active online now. I can publicly write about my work — something I’ve been unable to do the last 11 years.

I am relocating to Bangalore. From a professional front, it’s an obvious choice. That’s where the geeks are. In my last visit to India, I was at Bangalore, Chennai and Hyderabad. In the latter two, it’s tough to meet geeks. And when you do, it’s no easier to find the next. Bangalore has many more geeks, and they’re fairly well networked.

From a personal front, too, Bangalore works well. It’s close enough to Chennai without actually being in Chennai.

It’s 10am on Thu 12th Jan. Our flight is descending into Delhi airport. It’s the start of a new chapter in my life. Scary, but exciting. Wish me luck!

Eating more for less

A couple of years ago, I managed to lose a fair bit of weight. At the start of 2010, I started putting it back on, and the trajectory continues. I’m at the stage where I seriously need to lose weight. I subscribe to The Hacker’s Diet principle – that you lose weight by eating less, not exercising.
An hour of jogging is worth about one Cheese Whopper. Now, are you going to really spend an hour on the road every day just to burn off that extra burger? You don't exercise to lose weight (although it certainly helps). You exercise because you'll live longer and you'll feel better.
I’m afraid I’ll live too long anyway, so I won't bother exercising just yet. It's down to eating less. Sadly, I like food. So to make my “diet” work, I need foods that add less calories per gram. Usually, when browsing stores, I check these manually. But being a geek, I figured there’s an easier way. Below is a graph of some foods (the kind I particularly need to avoid, but still end up eating). The ones on the top add a lot of calories (per 100g), and better to avoid. The ones at the right cost a lot more. Now, I’m no longer at the point where I need to worry about food expenses, but still, I can’t quite kick the habit, also you might want to check out this Rootine's comparison of B12 methylcobalamin and cyanocobalamin that will help you in your diet. Hover over the foods to see what they are, and click on them to visit the product. (If you’re using an RSS reader and this doesn’t work, read on my site.)
(The data was picked from Tesco.) It’s interesting that cereals are in the middle of the calorie range. I always thought they’d be low calories per gram. Turns out that if I want to to have such foods, I’m better off with desserts or ice creams (profiterole, lemon meringue or tiramisu). In fact, even jams have less calories than cereals. But there are some desserts to avoid. Nuts are a disaster. So are chocolates. Gums, dates and honey are in the middle – about as good as cereals. Salsa dip seems surprisingly low. Custards seem to hit the sweet spot – cheap, and very low in calories. Same for jellies. So: custards and jelly. My daughter’s going to be happy.

Visualising the IMDb

The IMDb Top 250, as a source of movies, dries out quickly. In my case, I’ve seen about 175/250. Not sure how much I want to see the rest.

When chatting with Col Needham (who’s working his way through every movie with over 40,000 votes), I came up with this as a useful way of finding what movies to watch next.

visualising-the-imdb-1

Each box is one or more movies. Darker boxes mean more movies. Those on the right have more votes.  Those on top have a better rating. The ones I’ve seen are green, the rest are red. (I’ve seen more movies than that – just haven’t marked them green yet 🙂

I think people like to watch the movies on the top right – that popularity compensates (at least partly) for rating, and the number of votes is an indication of popularity.

For example, my movie pattern tells me that I ought to see Cidade de Deus, Inglourious Basterds and Heat – which I knew from the IMDb Top 250, but also that I ought to cover Kick-Ass, The Hangover and Juno.

visualising-the-imdb-2

It’s easy to pick movies in a specific genre as well.

visualising-the-imdb-3

Clearly, there are many more Comedy movies in the list than any other type – though Romance and Action are doing fine too. And I seem the have a strong preference for the Fantasy genre, in stark contrast to Horror.

(Incidentally, I’ve given up trying to see The Shining after three attempts. Stephen King’s scary enough. The novel kept me awake checking under my bed for a week at night. Then there’s Stanley Kubrick’s style. A Clockwork Orange was disturbing enough, but Haley Joel Osment in the first part of A.I. was downright scary. Finally, there’s Jack Nicholson. Sorry, but I won’t risk that combination on a bright sunny day with the doors open.)

You can track your list at http://250.s-anand.net/visual.

For those who want to play with the code, it’s at http://code.google.com/p/two-fifty/source/browse/trunk/visual.html.

Google search via e-mail

I’ve updated Mixamail to access Google search results via e-mail.

For those new here, Mixamail is an e-mail client for Twitter. It lets you read and update Twitter just using your e-mail (you’ll have to register once via Twitter, though).

Now, you can send an e-mail to twitter@mixamail.com with a subject of “Google” and a body containing your query. You’ll get a reply within a few seconds (~20 seconds on my BlackBerry) with the top 8 search results along with the snippets.

It’s the snippets that contain the useful information, as far as I’m concerned. Just yesterday, I managed to find the show timings for Manmadan Ambu at the Ilford Cine World via a search on Mixamail. (Mixamail win, but the movie was a let down, given expectations.)

You don’t need to be registered to use this. So if you’re ever stuck with just e-mail access, just mail twitter@mixamail.com with a subject “Google”.

PS: The code is on Github.