Year: 2012

Style of blogging

Until 2007, my blog was mostly just linking to stuff I found interesting on the Web. Since 2007, I’ve tried to write longer articles, mostly based on my own experiences.

At the moment, that’s unsustainable. Right now, being in a startup, I doing more stuff than I ever have in the past. (That does not mean working more hours, by the way.)

My posts, going forward, are likely to be smaller, less original, but hopefully more frequent.

Is Protocol buffers worth it?

Google’s Protocol Buffers is a “language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler

XML is slow and large. There’s no doubting that. JSON’s my default alternative, though it’s a bit large. CSV’s ideal for tabular data, but ragged hierarchies are a bit difficult.

I was trying to see if Protocol Buffers would be smaller and faster, at least when using Python. I took JSON as the base, and checked the write speed, read speed and file sizes. Here’s the comparison:

image

Protocol Buffers are 17 times slower to write and almost 6 times slower to read than JSON files. File sizes are smaller, but then, all it takes is a simple gzip operation to compress the JSON files even smaller. Reading json.gz files is just 2% slower than JSON files, and writing them is only 4 times slower.

The code base is at https://bitbucket.org/sanand0/protobuftest

On the whole, it appears that GZipped JSON files are smaller, faster, and just as simple as Protocol Buffers. What am I missing?

Update: When you add GZipped CSV to the mix, it’s twice as fast as GZipped JSON to read: clearly a huge win. It’s only slightly slower to write, and but compresses a tiny bit more than JSON.

Audio data URI

Turns out that you can use data URIs in the <audio> tag.

Just upload an MP3 file to http://dataurl.net/#dataurlmaker and you’ll get a long string starting with data:audio/mp3;base64...

Insert this into your HTML:

<audio controls src=”data:audio/mp3;base64...”>

That’s it – the entire MP3 file is embedded into your HTML page without requiring additional downloads.

This takes a bit more bandwidth than the MP3, and won’t work on Internet Explorer. But for modern browsers, and small audio files, it reduces the overall load time – sort of like CSS sprites.

So, on my bus ride today, I built a little HTML5 musical keyboard that generates data URIs on the fly. Click to play.

keyboard

Donations for Sanskrit College

The following article appeared in The Times of India earlier this month.

The institute is struggling for funds. Please contribute, if you could, by calling +91 44 24985320 or via PayPal.

Sanskrit centre struggles to stay alive

The Kuppuswami Sastri Research Institute attached to the Sanskrit College in Mylapore is in doldrums because of lack of government patronage.

The Institute, one of the three involved in Sanskrit research in the country, has been surviving on private donations. With not enough resources, the management is unable to pay the faculty the benefits of the sixth pay commission.

Institute director V Kameswari said the Union government stopped its financial support in 1995, after which it has been solely dependent on donations. “The institute has a trove of rare palm leaf manuscripts and books not just about Sanskrit literature but also on architecture, fine arts, geography, history and astronomy in Sanskrit,” says Kameswari.

The two other such institutes are the R G Bandarkar Sanskrit Institute in Pune and the Ganganath Jha Sanskrit Institute in Allahabad. “We have requested a onetime grant from the Union planning commission and also annual assistance from the Rashtriya Sanskrit Sansthan, but are yet to get any support,” says K S Balasubramanian, deputy director of the institute. The plan panel had given grants to the Mumbai Asiatic Society and Kolkata-based Asiatic Society.

The institute was getting about 10 lakh till 1995 but due to a misunderstanding between the government-appointed members of the governing committee and the management, the aid was stopped. Today, there are 24 scholars at the institute, most of them women doing their PhDs. “Scholars from across the country and world visit the institute. We send out publications to many foreign universities and they in turn send their publications which are preserved here,” says Kameswari.

The institute was started as a private non-profit organisation in 1944 in memory of Kuppuswami Sastri, a renowned Sanskrit scholar. It has a library with books on astronomy, architecture, fine arts, mathematics, Vedas, Puranas, Upanishads and various branches of science.

“A private entrepreneur made a donation with which we have air-conditioned the library. The palm-leaf manuscripts in the library are 600 to 1,000 years old. Many of them are in Grantha script. We also have books on Jainism that speak about solving mathematical equations and explain geographical concepts,” says Kameswari, who is worried about keeping the ancient language alive.

Downloading songs from YouTube

Five years ago, I built a song search engine – mainly because I needed to listen to songs. Three years ago, I stopped updating it – mainly because I stopped listening to songs actively, and have been busy since. For those of you who have been using my site for music: my apologies.

These days, I don’t really find the need to download music. YouTube has most of the songs I need. Bandwidth is pretty good too even when on the move.

But when I do need to download music, this is my new workflow.

  1. Find the song on YouTube. (Misspellings are still an issue, but you’ll usually find what you need)
  2. Download the video. Keepvid is the simple option. youtube-dlis the geek’s option (for multiple downloads)
  3. Use VLC – the swiss-army knife of media – to convert the video into an MP3.

That last step requires a bit of explaining. It’s very simple once you know how, but it took me a few months to get it right. So here goes.

Select the Convert / Save option in the Media menu.

audio-conversion-1

Click on Add to open file you want to convert. You can pick a track from an disk as well if you want to rip an audio CD or a DVD.

audio-conversion-2

Choose the file.

audio-conversion-3

Click on Convert / Save.

audio-conversion-4

Type the destination filename. Make sure you type the full file name, and not just the name of the folder.

audio-conversion-5

Select the output format you want under Settings – Profile. You can tweak the bitrate with the settings button, but I usually don’t bother.

audio-conversion-6

When you click on the Start button, the file will be converted or the CD will be ripped. You’ll see the position marker move fairly fast.

audio-conversion-7

 

The only problem I have with this method is that I can’t seem to do batch conversions easily enough with the GUI. Does anyone have any other workflow they like?

Update (31 Jul 2012): Aditya Sengupta suggests the following: (should’ve guessed VLC would have something up its sleeve)

vlc -I dummy $FILENAME --no-sout-video --sout "#transcode{acodec=mp3,5Dab=AUDIO_BITRATE,channels=2}:std{access=file,mux=raw,dst=$NAME.mp3}" vlc://quit

Correlating subjects

A question from Dorai get me thinking: does being good at maths help in programming?

I don’t have a personal view. But since Reportbee has data on the Class 12 examination results for the last three years, we thought we could do a bit of analysis.

Here’s the correlation of the scores of various subjects with Computer Science.

Correlation Subject
0.79 CHEMISTRY
0.79 PHYSICS
0.75 ENGLISH
0.75 MATHEMATICS
0.72 LANGUAGE
0.67 BIOLOGY
0.66 ECONOMICS
0.66 COMMERCE
0.65 ACCOUNTANCY
0.56 HISTORY
0.52 GEOGRAPHY

It almost breaks neatly into four groups.

  1. Physics & Chemistry, both of which have a correlation of 0.79, and clearly are the most correlated with Computer Science
  2. Maths, English & Language, which have a correlation of 0.72 – 0.75
  3. Biology, Economics, Commerce and Accountancy, which hover at around 0.66
  4. History & Geography, which are 0.52 – 0.56

The results in 2010 are almost exactly the same.

Correlation Subject
0.78 PHYSICS
0.78 CHEMISTRY
0.75 ENGLISH
0.75 MATHEMATICS
0.73 LANGUAGE
0.67 ACCOUNTANCY
0.65 ECONOMICS
0.65 COMMERCE
0.64 BIOLOGY
0.60 GEOGRAPHY
0.55 HISTORY

I’m not sure what it is that leads to this kind of correlation. In fact, the full correlation between every pair of subjects (for 2011) is below:

subject-correlation

What inferences would you draw from this?

And what do you think is the reason for this?

The three Rs

Reading, wRiting and aRithmetic are the 3 ‘R’s that are taught at school. I was thinking about their relevance today.

Reading continues to be relevant. The volume of information available today is more than before. So you need to read faster AND smarter. (If there was one good thing that came out of my IIM coaching classes, it was the ability to read fast, and making it subconscious.)

But I wouldn’t say the same of writing. In the last 10 years, I have typed several hundred more pages than I’ve written. So have all my friends.

Yesterday, I was at a bank with a relationship manager as he was taking notes in paper and pen. I do the same on occassion. I looked at his notes later. I could not understand a single word. “Don’t worry, sir, I can read it. I’ll type it out and mail you,” he said. And he did.

Writing seems to have become a device for personal memory, not communication. He’s faster at writing than typing, perhaps. Or note taking is more convenient on paper. But for communication, he still prefers a typed format. So do I, and most other people.

Perhaps writing will fade. Perhaps not. I don’t know. But what I do know is that typing has become more important than writing. Yet, writing is taught more at school than typing.

(A broader aspect of writing, though, is expressing oneself. That will remain important, of course.)

The third R is aRithmetic. When I was 12, I could multiply four-digit numbers in my head reasonably well. I could recite 50 digits of Pi. I could do long division. Today, I can’t. Nor can my friends. Nor have we needed to. A good feel for the numbers has helped, but not the actual mechanics of the calculations.

We had an undergraduate course in statistics that taught us how to solve a linear regression problem. That skill went completely unused. I’ve never since used regression without a computer. We had a graduate course in statistics that taught us how to INTERPRET the results of a linear regression. That was worth it’s weight in gold.

This is not a critique of the three Rs. Rather, an attempt to re-interpret them. It’s about comprehension, expression and computation. Two decades ago, it was reading, writing and arithmetic. Today, it’s reading, typing and computing.

Computers will grow more powerful. It may be worth planning for it. Teaching the ability to use them can go a long way. A tool like Excel for general purpose computing gives incredible power in the hands of people. It’s worth training children for that.

If I oversimplified, I’d say children must learn typing and Excel.

Over the next few years, this is something I plan to work on. Making sure schools and parents do this. Any suggestions or leads you may have are welcome!

Scraping for a laptop

I’ve returned my laptop, and it’s time to buy a new one. For the first time in my life, I’m buying a laptop for myself.

I have a fairly clear idea of what I want: a 500GB+ 7200 rpm hard disk with 4GB of RAM and an Intel Core i7. I thought that would make finding one of those powerful laptops for producing music since I record some stuff too out of hobby.

Sheer naïveté. Not a single site let me filter by hard disk rpm in India. (To be fair, I haven’t found any sites outside India that did that either.)

After spending a good two hours hunting for the details and collating it, I did what I normally would: spend 30 minutes writing a scraper. The scraper runs through all laptops on Flipkart and pulls out all of their specs. Thanks to the diligence of the good folks at Flipkart, this information is readily available on each page. The HTML is structured quite neatly too, so it was just a 30-line program to scrape it all. Full credit to ScraperWiki as well — I could use it on a netbook without any developer tools installed.

The scraper took 2 hours to run. Feel free to filter through the output (CSV) for your favourite laptop, or fork the code and pull any other data you like.

The next chapter of my life

I’m writing this post on a one-way flight from London back to India. I’ve moved on from Infosys Consulting, and am starting up on my own.

I’ve wanted to do this for a long time. There’s always more freedom in your own company than someone else’s. There’s often more money in it too, if you’re lucky enough. But my upbringing is a bit too conservative to make that bold step. However, given that my father runs his own firm, I figured it was just a question of time for me to do the same.

Two years ago, in Jan 2010, I picked up Rashmi Bansal’s Stay Hungry Stay Foolish at an airport. That book killed the last bit of resistance I had. If the people in that book could succeed, I felt I could too. And if what they did (building small companies, not huge ones) could be called a success, I could be successful too.

After the flight, it was clear in my mind. I would be an entrepreneur. I would create a small company that would probably fold. Then I’d do it again. And again, 10 times, because 1 in 10 companies survive. And finally, I’d be running a small business that’d be called successful by virtue of having survived. A modest, achievable ambition that I had the courage for.

I usually make big decisions without analysis, by just sleeping over them. I slept over it and announced it to my family the next day. I’m not sure they believed me.

Two months later, along with a friend, I built a dynamic digital image resizing product. We had our wives start a company in the UK, and tried selling it to retailers. There clearly was a demand. The problem was, we didn’t know how to sell. After a year and having spent £500 with no sales, it was clear to us that venture #1 had failed. We eventually shut it down.

In the middle of this, my ex- boss from IBM told me that he was looking to start a venture, focusing on mobile, rural BPO and energy management. This later on changed to data analytics and visualisation. They all sounded like fun, so I said I’ll help out in my spare time.

A few months later, a classmate told me he’d started a business digitising school report cards. That sounded like fun too, so I said I’d help out in my spare time.

Now, if that sounds like I had a lot of spare time on my hands — you’re right, I did. And it’s time to talk about the jobs in my life. My first 3 years at IBM were fun. I was coding, learning, and leading a bachelor’s life with friends, money, and no responsibilities. My 4 years at BCG were strenuous with 80-hour weeks, but it was interesting and challenging. I was newly married, and between work and home responsibilities, I had no time for fun.

I moved to Infosys Consulting in the UK with the specific aim of rectifying that (and for health reasons as well). In the last 7 years, the work has (except on occasion) been a bit boring, but very relaxing. On most days, I would spend 4 hours working, and 4 hours learning new stuff. The things I learnt only helped me be more efficient. So I ended up getting even more work done in less time.

Many things came out of this. Firstly, I recovered my health. We had a daughter, and I spent more time with her. I started coding in earnest again. By 2007, I was writing code as part of my projects — stuff that others whose job it was were unable to. By 2009, I had a few websites running, like an Indian music search engine, an IMDb Top 250 tracker, a few transliterators, and so on.

So when I said I’d help out with these startups, it wasn’t an empty promise. For the last 18 months, I’ve had a day job and three night jobs. I never did justice to any of them in my opinion, but I had more fun than ever in my life, I learnt more than ever in my life, and I produced more tangible output than ever in my life. Sometimes, quantity beats quality or reliability.

Both these startups are doing well today. Gramener.com offers data visualisation and IT services. I will be joining them as Chief Data Scientist. Reportbee.com offers a hosted report card solution. I will continue helping them out. And I will continue working with a few NGOs.

You’ll see me a lot more active online now. I can publicly write about my work — something I’ve been unable to do the last 11 years.

I am relocating to Bangalore. From a professional front, it’s an obvious choice. That’s where the geeks are. In my last visit to India, I was at Bangalore, Chennai and Hyderabad. In the latter two, it’s tough to meet geeks. And when you do, it’s no easier to find the next. Bangalore has many more geeks, and they’re fairly well networked.

From a personal front, too, Bangalore works well. It’s close enough to Chennai without actually being in Chennai.

It’s 10am on Thu 12th Jan. Our flight is descending into Delhi airport. It’s the start of a new chapter in my life. Scary, but exciting. Wish me luck!