Year: 2005

Excel – Avoid manual labour 3

A corollary of Rule 3: Never type the same formula twice. Design the formula so that if you cut and paste it elsewhere, it works correctly. The $ symbol and the F4 key for cell references help in 90% of the cases. For complex requirements and large data, 5 functions come in handy: INDIRECT, OFFSET, ADDRESS, ROW and COLUMN.

I once did a survey, and had data spread across 300 sheets (same format on all sheets). I needed cell D3 across all sheets in a column, to summarise the results. The image explains what I did.

Excel snapshot

INDIRECT returns the value of a cell. INDIRECT(“Sheet2!D3″) is the value of cell D3 in Sheet2. And INDIRECT(CONCATENATE(A2,”!D3″)) will give you the value of cell D3 in whatever sheet A2 specifies! I created a list of sheet names in column A, and column B had “D3” in each of those sheets. In effect, INDIRECT can transpose sheets into columns.

Getting a list of sheet names on to a column is tough, however. If your sheets are sequentially numbered, this image shows a trick that may help.

Excel snapshot

If you need multiple cells from a sheet, say D3:Z3, use the ADDRESS, ROW and COLUMN functions. ROW(D3) returns 3, and COLUMN(D3) returns 4 — the respective row and column. If you copy ROW(D3) to multiple rows, you will see ROW(D3), ROW(D4), ROW(D5), … which are 3, 4, 5 respectively. Similarly for COLUMN. It’s a useful way of linking values to position.

Excel snapshot

ADDRESS does the opposite. ADDRESS(ROW(D3),COLUMN(D3)) = ADDRESS(3,4) = “$D$3”. ADDRESS(3,4,1,1,”Sheet2″) returns “Sheet2!$D$3”. (See help for the ,1,1 in the middle, and just put it in always.) To cells D3:Z3 from all the sheets, copy the formula INDIRECT(ADDRESS(3,COLUMN(D3),1,1,$A2)) to the entire range. The INDIRECT, ADDRESS, ROW/COLUMN combination can slice contiguous data across sheets in any way you want.

Another useful function is OFFSET. OFFSET(D3,2,1) returns the value in cell E5. It shifts the reference D3 down by 2 rows and right by 1 column. OFFSET can be used instead of the INDIRECT and ADDRESS when multiple sheets are not involved. OFFSET can also return a range. OFFSET(D3,0,0,2,2) returns the range D3:E4, which is the 2×2 range starting from 0,0. So SUM(OFFSET(D3,0,0,2,2)) is the same as SUM(D3:E4). With OFFSET, you can specify a range with variable position and variable size (which you can’t with $ references).

Once, we were modelling a leasing company’s accounts. (Warning: this is a complex example.) We knew the volume of loans they would disburse over the next 3 years. The monthly interest rate is, say, 1%. What would be their interest income every month? Well, it’s not just 1% of what they’ve lent out. Customers pay back in equal monthly installments (EMIs). The EMI includes the principal and the interest. Initially, the EMI has a large interest component and very little principal, because there’s a lot left to repay. Towards the end, the balance dies down and so does the interest; it’s mainly the balance principal that’s being repaid. The interest income is not the same every month even for a single lease.

The calculation is conceptually simple. The IPMT function tells you the monthly interest each month. Let’s say all leases are for 36 months. So, to calculate the March interest income, take the January disbursals and multiply it by the third month interest component: IPMT(1%,3,36,-1). Take the Feb disbursals and multiply it by IPMT(1%,2,36,-1). Take the March disbursals and multiply it by IPMT(1%,1,36,-1). And add them up. For April, you’d add 4 terms. And so on. Mathematicians call this a convolution. It’s like a SUMPRODUCT of a series with another series in reverse.

Excel snapshot

Cell E4 on the image alongside does exactly that for month 3 (March). There are 5 columns:
A: Month
B: Amount disbursed that month
C: Months in reverse
D: Interest component for month in reverse
E: Interest income for month
E4 is the sumproduct of B2 to B4 (the Jan, Feb and March disbursals) with something else — an OFFSET. The offset says, from D1, move down C4 (34) rows and select A4 (3) cells further down. This has the interest components for the first, second and third months in reverse. So, the disbursal for Jan is multiplied with the 3rd month’s interest, Feb with the 2nd month’s interest, and Mar with the 1st month’s interest. That’s exactly what we wanted.

It may look complex. But remember: you have to type this complex formula only once, not 36 times. (And in my case, I had 18 product types.) Also, you’re less likely to make typing errors when cutting and pasting. So this saves you debugging time as well.

Imitation is tougher than we thought

Research suggests that chimps learn differently from humans.

When they showed the chimpanzees how to retrieve the food, the researchers added some unnecessary steps. Those chimps could see that the scientists were wasting their time sliding the bolt and tapping the top. None followed suit. They all went straight for the door. The children could see just as easily as the chimps that it was pointless to slide open the bolt or tap on top of the box. Yet 80 percent did so anyway.

… humans are hard-wired to learn by imitation, even when that is clearly not the best way to learn. As human ancestors began to make complicated tools, figuring out goals might not have been good enough anymore. Hominids needed a way to register automatically what other hominids did, even if they didn’t understand the intentions behind them. They needed to imitate. Not long ago, many psychologists thought that imitation was a simple, primitive action compared with figuring out the intentions of others. But that is changing.

Herbert Simon on Information

Quote by Herbert Simon on Information:

What information consumes is rather obvious: it consumes the attention of its recipients. Hence a wealth of information creates a poverty of attention, and a need to allocate that attention efficiently among the overabundance of information sources that might consume it.

It sounds quite like the Scientific American article The Tyranny of Choice which says that after a point, more choice causes unhappiness. The satisfaction of picking the best choice is less, because the second best is almost as good. And you’re more likely to not pick the best — because there are so many choices — and will regret it more often.

Matching misspelt Tamil movie names

I don’t like hunting for new songs either. Too much effort.

External recommendations like Raaga Top 10 help, but not much. I usually like only 1 of the top 10.

I don’t really know the recent music directors. But many interesting songs I’ve heard recently (like Ondra Renda in Kakka Kakka, Vaseegara in Minnale, and Kaadhalikkum in Chellame) are by Harris Jayaraj. So maybe if I can find the music directors I like, other songs by them would be good recommendations.

I have an automated way to find the music director for a movie. First, I spent a few hours renaming my MP3s to a Movie.Song.mp3 filename format (using Excel and Perl liberally). After that, I wrote a Perl program that reads movie names and the movie directors from Raaga and matches the Raaga movie names with my movie names. (Raaga has all but 5 movies whose songs I’ve heard.) Then I rate music directors based on my songs’ ratings.

Unfortunately, the matching worked only for 45% of my 273 movies. The rest were spelt differently on my list and Raaga. I checked CPAN if there was a way to match Tamil words roughly. The closest was Lingua::Phonology, but Jesse, the author, mailed me saying that was “like slicing your bread with a chainsaw”.

So I developed these rules. The -> arrow below is to be read as “is also spelt as”. By just applying them sequentially, I matched 33% more movies.

Vowel rules
AEdhiri -> Edhiri
kadhal kondEIn -> kadhal kondEn
chellamEY at end-> chellamE
sachIEn -> sachIn
marupadIUm -> marupadIYUm
OI, OY, OVI, OYI are all the same
AAthma -> Athma
azhagiya thEEye -> azhagiya thIye
abOOrva ragam -> abUrva ragam
Ignore H. It is redundant.

Consonant rules
arasakTCHi -> arasakSHi
CHippikkul muthu -> Sippikkul muthu
thenNDRal -> thenNRal
devar maHan -> devar maGan
bagaWathi -> bagaVathi
avvai shanmuGi -> avvai shanmuKi
konJi pesalam -> konCHi pesalam
anDha 7 naatkal -> anTha 7 naatkal
aBoorva sagodharargal -> aPoorva sagodharargal
agni natchaTHIRam -> agni natchaTHRam

The remaining movies either had spelling mistakes (e.g. Kilipethcu Ketkavaa) or had structural differences (Ilamai Oonjal Aadugiradhu vs Ilamai Oonjal Aadudhu). By permitting approximate matches using String::Approx, I was able to match 12% more, making my total accuracy ~90%.

Though this is good enough for identifying music directors, I’m working on improving the approximate matching rules. I hope to have 98% accuracy, and then I can match individual songs — and know who the singers are. Hopefully, this can be extended to other sites like MusicIndiaOnline, and who knows — maybe even IMDb.