Inspecting code in Python

Lisp users would laugh, since they have macros, but Python supports some basic code inspection and modification. Consider the following pieces of code: margin = lambda v: 1 - v['cost'] / v['sales'] What if you wanted another function that lists all the dictionary indices used in the function? That is, you wanted to extract cost and sales? This is a real-life problem I encountered this morning. I have 100 functions, each defining a metric. For example, ...

Restartable and Parallel

When processing data at a large scale, there are two characteristics that make a huge difference to my life. Restartability. When something goes wrong, being able to continue from where it stopped. In my opinion, this is more important than parallelism. There’s nothing as depressing as having to start from scratch every time. Think of it as the ability to save a game as opposed to starting from Level 1 in every life. ...

Storytelling: Part 1

In a number of sessions I’ve been to, people ask analysts to make their results more interesting – to tell stories with them. I’m co-teaching a course, part of which involves telling stories with data. So this got me thinking: what is a story? How does one teach storytelling to, let’s say, an alien? Consider this mini-paper. ABSTRACT: Meter readings exhibit spikes at slab boundaries. We also find significant evidence of improbably events at round numbers. Electricity shortage is a serious problem in most Indian states. Part of this problem is due to the inaccuracy of reporting procedures used in monitoring meter readings. Our focus here is not to document or experimentally determine the degree of inaccuracy. We have adopted a data driven approach to this problem and attempt to model the extent of inaccuracy using basic statistical analysis techniques such as histograms and the comparison of means. Our dataset comprises of the frequency analysis 12-month dataset containing monthly meter readings of 1.8 million customers in the State of Andhra Pradesh. We find that a histogram of these readings shows unexpectedly high values at the slab boundaries: 50 (+45.342%, t > 13.431), 100 (+55.134%, t > 16.384), 200 (+33.341%, t > 15.232), and 300 (+42.138%, t > 19.958). We also detected spikes at round numbers: 10 (+15.341%, t > 5.315), 20 (+18.576%, t > 6.152), 30 (+11.341%, t > 4.319). The statistical significance of every deviation listed above is over 99.9%. Further, every deviation has a positive mantissa. This leads us to confidently declare the existence of a systematic bias in the meter readings analysed. You’re probably thinking: “I know why he’s put this example here. It must be a bad one. So, what a rotten paper it must be!” ...

Colour spaces

In reality, a colour is a combination of light waves with frequencies between 400-700THz, just like sound is a combination of sound waves with frequencies from 20-20000Hz. Just like mixing various pure notes produces a new sound, mixing various pure colours (like from a rainbow) produces new colours (like white, which isn’t on the rainbow.) Our eyes aren’t like our ears, though. They have 3 sensors that are triggered differently by different frequencies. The sensors roughly peak around red, green and blue. Roughly. ...

Style of blogging

Until 2007, my blog was mostly just linking to stuff I found interesting on the Web. Since 2007, I’ve tried to write longer articles, mostly based on my own experiences. At the moment, that’s unsustainable. Right now, being in a startup, I doing more stuff than I ever have in the past. (That does not mean working more hours, by the way.) My posts, going forward, are likely to be smaller, less original, but hopefully more frequent. ...

Is Protocol buffers worth it?

Google’s Protocol Buffers is a “language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler” XML is slow and large. There’s no doubting that. JSON’s my default alternative, though it’s a bit large. CSV’s ideal for tabular data, but ragged hierarchies are a bit difficult. I was trying to see if Protocol Buffers would be smaller and faster, at least when using Python. I took JSON as the base, and checked the write speed, read speed and file sizes. Here’s the comparison: ...

Audio data URI

Turns out that you can use data URIs in the <audio> tag. Just upload an MP3 file to http://dataurl.net/#dataurlmaker and you’ll get a long string starting with data:audio/mp3;base64... Insert this into your HTML: <audio controls src=”data:audio/mp3;base64...”> That’s it – the entire MP3 file is embedded into your HTML page without requiring additional downloads. This takes a bit more bandwidth than the MP3, and won’t work on Internet Explorer. But for modern browsers, and small audio files, it reduces the overall load time – sort of like CSS sprites. ...

Recent Tamil Songs Quiz

After a long break, here's another quiz, featuring relatively recent Tamil songs. Can you guess which movie they are from? Don't worry about the spelling. Just spell it like it sounds, and the box will turn green. Comments Rosario 23 May 2012 2:46 am: Right answers can be given along with the result. Nice pass-time for youngsters. Thank You for your services. Rosario sriram 22 May 2012 4:50 pm: after a long time… :) but what a poor memory i have…. 8/20. :(:(:(:( S Anand 21 May 2012 3:49 pm: @vrraghy: Fixed song 10. Thanks! Vinoth B 22 May 2012 3:02 pm: Interesting..! But couldn’t get many !! :-) Raghavan Rengachari (@vrraghy) 21 May 2012 2:44 pm: Just couldn’t get a few :| and song 10 isn’t working… thambidi 21 Jul 2012 9:17 pm: poor memory 07/20 priya 3 Jun 2012 1:26 pm: Nice..familiar song but could’t find all movie name Jayapal Chandran 23 May 2012 8:59 pm: It isn’t poor memory but it is the quality of the song which makes you listen to it again and again and eventually you will get well memorized in you. and i could get green only the songs which i knew very well… others i did not even try cause they never sounded good… or the song failed publicity in some way… kavilavu 16 Jun 2012 8:29 am: 11/20,…..we found it..nice job.. so interesting to play thanks kamlakrisnan 29 Oct 2012 3:51 pm: Hi i got 20/20, its very interesting to find the movie names. Niranjan 6 Dec 2012 3:00 pm: I got 20/20 at first try itself by identifying the song name immediately. But it was difficult to find movie names. I have listened to all these songs in music channels as well as in my car. But finding movie names was difficult for 3-4 songs. Vincent(Jv) 12 Oct 2012 4:29 am: i got 15/20 great work Thanks… radha 28 Sep 2012 8:17 am: hey….. i got 19/20, great work… thanks for your good work Sundar 21 Nov 2012 5:04 am: i think i’m the first to get all 20/20 songs, but very tough to find it. good job for the collections, many was out of memory as we hear new songs, afshin 23 Oct 2012 6:32 pm: wow!!!!!!!17/20,……… intersting!!!!!!!! Abi 31 Dec 2012 4:29 am: can’t find 8,10 and 20 Shiva 22 Nov 2012 2:38 pm: HEy Intersting!!!!! got 16/20. tough work :) Akila 12 Dec 2012 9:55 am: 14/20… where are the answers please? anusha 14 Mar 2013 3:06 pm: wow!!all 20 right!! :D Priya Krishnamoorthy 3 Feb 2013 8:09 am: 18/20. every thing is easy but except 2(i.e,13 & 16) sribala arun 25 Dec 2013 3:21 pm: i cant get 16 and then 14 is 7am arivu but it is showing not correct anu 24 Dec 2013 1:51 pm: i got 19/20 its interesting!!! deepika 21 Feb 2014 2:11 pm: i got 18/20 nice game Anusha 14 Dec 2013 8:35 pm: Woohoo!!! All 20 correct! ;) Anusha 14 Dec 2013 8:36 pm: Woohoo!!! All 20 are right! ;) hbqdb 27 Apr 2013 5:51 am: can’t find 10,14&17 anu 24 Dec 2013 1:55 pm: i got 19/20 nice!!! kumar 8 Apr 2014 3:40 pm: 17/20 Divya kd 14 Mar 2015 7:59 am: Wow I got 20/20 rviji 6 Dec 2014 8:23 pm: wow!!!!20/20 sowndharya 30 Apr 2018 8:10 pm: wow …..got 15 out of 20 ATR 9 Jun 2016 8:53 am: Got 20/20 but 7aam arivu is taking up as answer Sab 18 Apr 2020 7:05 pm: bad memory sankar 30 Dec 2015 8:52 am: could you forward the link to download this and answers for our own party game. hema 9 Jul 2016 8:31 am: semma i got 20/20 Akshaya 28 Nov 2018 12:15 pm: it was interesting revo 29 Sep 2016 10:20 am: nice quiz……..but where are the answers? ganga 17 Jun 2016 7:16 pm: i got ans for 17 and it’s very interesting Siva 8 May 2018 7:24 pm: Good adithiya 23 Apr 2017 1:16 pm: wow…sema but i cant find songs Meena 10 Apr 2020 4:59 pm: I found 17.. I know another 2 song.. but I can’t remember the lyrics.. the last one is very tough Priya 23 Sep 2019 7:06 pm: Wow I got it 18/20

Donations for Sanskrit College

The following article appeared in The Times of India earlier this month. The institute is struggling for funds. Please contribute, if you could, by calling +91 44 24985320 or via PayPal. Sanskrit centre struggles to stay alive The Kuppuswami Sastri Research Institute attached to the Sanskrit College in Mylapore is in doldrums because of lack of government patronage. The Institute, one of the three involved in Sanskrit research in the country, has been surviving on private donations. With not enough resources, the management is unable to pay the faculty the benefits of the sixth pay commission. Institute director V Kameswari said the Union government stopped its financial support in 1995, after which it has been solely dependent on donations. "The institute has a trove of rare palm leaf manuscripts and books not just about Sanskrit literature but also on architecture, fine arts, geography, history and astronomy in Sanskrit," says Kameswari. The two other such institutes are the R G Bandarkar Sanskrit Institute in Pune and the Ganganath Jha Sanskrit Institute in Allahabad. "We have requested a onetime grant from the Union planning commission and also annual assistance from the Rashtriya Sanskrit Sansthan, but are yet to get any support," says K S Balasubramanian, deputy director of the institute. The plan panel had given grants to the Mumbai Asiatic Society and Kolkata-based Asiatic Society. The institute was getting about 10 lakh till 1995 but due to a misunderstanding between the government-appointed members of the governing committee and the management, the aid was stopped. Today, there are 24 scholars at the institute, most of them women doing their PhDs. "Scholars from across the country and world visit the institute. We send out publications to many foreign universities and they in turn send their publications which are preserved here," says Kameswari. The institute was started as a private non-profit organisation in 1944 in memory of Kuppuswami Sastri, a renowned Sanskrit scholar. It has a library with books on astronomy, architecture, fine arts, mathematics, Vedas, Puranas, Upanishads and various branches of science. "A private entrepreneur made a donation with which we have air-conditioned the library. The palm-leaf manuscripts in the library are 600 to 1,000 years old. Many of them are in Grantha script. We also have books on Jainism that speak about solving mathematical equations and explain geographical concepts," says Kameswari, who is worried about keeping the ancient language alive. ...

Downloading songs from YouTube

Five years ago, I built a song search engine – mainly because I needed to listen to songs. Three years ago, I stopped updating it – mainly because I stopped listening to songs actively, and have been busy since. For those of you who have been using my site for music: my apologies. These days, I don’t really find the need to download music. YouTube has most of the songs I need. Bandwidth is pretty good too even when on the move. But when I do need to download music, this is my new workflow. ...

Correlating subjects

A question from Dorai get me thinking: does being good at maths help in programming? I don’t have a personal view. But since Reportbee has data on the Class 12 examination results for the last three years, we thought we could do a bit of analysis. Here’s the correlation of the scores of various subjects with Computer Science. Correlation Subject 0.79 CHEMISTRY 0.79 PHYSICS 0.75 ENGLISH 0.75 MATHEMATICS 0.72 LANGUAGE 0.67 BIOLOGY 0.66 ECONOMICS 0.66 COMMERCE 0.65 ACCOUNTANCY 0.56 HISTORY 0.52 GEOGRAPHY It almost breaks neatly into four groups. ...

The three Rs

Reading, wRiting and aRithmetic are the 3 ‘R’s that are taught at school. I was thinking about their relevance today. Reading continues to be relevant. The volume of information available today is more than before. So you need to read faster AND smarter. (If there was one good thing that came out of my IIM coaching classes, it was the ability to read fast, and making it subconscious.) But I wouldn’t say the same of writing. In the last 10 years, I have typed several hundred more pages than I’ve written. So have all my friends. ...

Scraping for a laptop

I’ve returned my laptop, and it’s time to buy a new one. For the first time in my life, I’m buying a laptop for myself. I have a fairly clear idea of what I want: a 500GB+ 7200 rpm hard disk with 4GB of RAM and an Intel Core i7. I thought that would make finding one of those powerful laptops for producing music since I record some stuff too out of hobby. Sheer naïveté. Not a single site let me filter by hard disk rpm in India. (To be fair, I haven’t found any sites outside India that did that either.) ...

The next chapter of my life

I’m writing this post on a one-way flight from London back to India. I’ve moved on from Infosys Consulting, and am starting up on my own. I’ve wanted to do this for a long time. There’s always more freedom in your own company than someone else’s. There’s often more money in it too, if you’re lucky enough. But my upbringing is a bit too conservative to make that bold step. However, given that my father runs his own firm, I figured it was just a question of time for me to do the same. ...

Markdress

This year, I’ve converted the bulk of my content into Markdown – a simple way of formatting text files in a way that can be rendered into HTML. Not out of choice, really. It was the only solution if I wanted to: Edit files on my iPad / iPhone (I’ve started doing that a lot more recently) Allow the contents to be viewable as HTML as well as text, and Allow non techies to edit the file As a bonus, it’s already the format Github and Bitbucket use for markup. ...

GarageBand in Phir Se Ud Chala

A month ago, I was at the theatre watching Ra.One. The movie was terrible, yet enjoyable. But I’m going to talk about something else – a song I heard that caught my imagination. The song is Phir Se Ud Chala from Rockstar. Around 14 seconds into the video, you’ll hear a guitar start off at the background. That’s what caught my ear first – because I’d heard it before. Listen to this piece below: ...

Protect static files on Apache with OpenID

I moved from static HTML pages to web applications and back to static HTML files. There’s a lot to be said for the simplicity and portability of a bunch of files. Static site generators like Jekyll are increasingly popular; I’ve built a simple publisher that I use extensively. Web apps give you something else, though, that are still useful on a static site. Access control. I’ve been resorting to htpasswd to protect static files, and it’s far from optimal. I don’t want to know or manage users’ passwords. I don’t want them to remember a new ID. I just want to allow specific people to log in via their Google Accounts. (OpenID is too confusing, and most people use Google anyway.) ...

Codecasting

The best way to explain code to a group of people is by walking through it. If they’re far away in space or time, then a video is the next best thing. You can recommend them to try out the best coding apps as well. The trouble with videos, though, is that they’re big. I can’t host them on my server – I’d need YouTube. Editing them is tough. You can’t copy & paste code from videos. And so on. One interesting alternative is to use presentations with audio. Slideshare, for instance, lets you share slides and sync it with audio. That almost works. But it’s still not good enough. I’d like code to be stored as code. What I really need is codecasting: a YouTube or Slideshare for code. The closest I’ve seen until day-before was etherpad or ttyrec – but neither support audio. Enter Popcorn. It’s a Javascript library from Mozilla that, among other things, can fire events when an audio/video element reaches a particular point. ...

Javascript arrays vs objects

Summary: Arrays are a lot smaller than objects, but only slightly faster on newer browsers. I’m writing an in-memory Javascript app that handles several thousand rows. Each row could be stored either as an array [1,2,3] or an object {"x":1,"y":2,"z":3}. Having read up on the performance of arrays vs objects, I thought I’d do a few tests on storing numbers from 0 to 1 million. The results for Chrome are below. (Firefox 7 was similar.) ...

Software for my new laptop 2

Time for a new laptop, and to replace software. Here’s my new list. A lot has changed in the last 5 years. Mainly, I use the browser, cygwin and Portable Apps a lot more. (The last is to escape jailers, not registry bloat.) Media Chrome [new]: For browsing and development. Fast, light, and stays out of the way. Firefox: I keep it just for printing. Chrome sucks at printing. Media Player Classic: Nothing against it, but I decided to stick to just one app, which is… VLC: Continues to be the best media player, IMHO. WinAmp: I just manage my playlists as M3U files, using Python programs. Audacity: Still the easiest way to record audio. Camstudio: The simplest free portable screen capture software I know. PicPick [new]: Lightweight, powerful screenshot grabber VirtualDub: Not the simplest, but still good for what I need: cropping and joining video. MediaCoder [new]: Good for video/audio conversions. Maybe I’ll install this later. Foxit Reader: The simples free portable PDF reader I know, better than… NitroPDF Reader [new]: … which is good for Printing PDFs – better than… Primo PDF: … which has trouble on rare occasions. Microsoft Reader: I have a lot of ebooks in .LIT. Kindle for PC [new]: I don’t own a Kindle, but I’ve bought a few ebooks. Paint.NET: Good enough for cropping and adjusting colours on images. Windows Live Writer [new]: The best way to write this blog WYSIWYG Inkscape [new]: I occasionally edit vector graphics. Google Earth. Google Maps is good enough. ImgBurn: I no longer use CDs/DVDs. Just flash drives and external hard disks. Picasa: I’ve stopped browsing pictures. No time. Sharing ...