Correlating subjects

A question from Dorai get me thinking: does being good at maths help in programming?

I don’t have a personal view. But since Reportbee has data on the Class 12 examination results for the last three years, we thought we could do a bit of analysis.

Here’s the correlation of the scores of various subjects with Computer Science.

Correlation Subject
0.79 CHEMISTRY
0.79 PHYSICS
0.75 ENGLISH
0.75 MATHEMATICS
0.72 LANGUAGE
0.67 BIOLOGY
0.66 ECONOMICS
0.66 COMMERCE
0.65 ACCOUNTANCY
0.56 HISTORY
0.52 GEOGRAPHY

It almost breaks neatly into four groups.

  1. Physics & Chemistry, both of which have a correlation of 0.79, and clearly are the most correlated with Computer Science
  2. Maths, English & Language, which have a correlation of 0.72 – 0.75
  3. Biology, Economics, Commerce and Accountancy, which hover at around 0.66
  4. History & Geography, which are 0.52 – 0.56

The results in 2010 are almost exactly the same.

Correlation Subject
0.78 PHYSICS
0.78 CHEMISTRY
0.75 ENGLISH
0.75 MATHEMATICS
0.73 LANGUAGE
0.67 ACCOUNTANCY
0.65 ECONOMICS
0.65 COMMERCE
0.64 BIOLOGY
0.60 GEOGRAPHY
0.55 HISTORY

I’m not sure what it is that leads to this kind of correlation. In fact, the full correlation between every pair of subjects (for 2011) is below:

subject-correlation

What inferences would you draw from this?

And what do you think is the reason for this?

8 thoughts on “Correlating subjects”

  1. My inference is purely anecdotal but might be helpful in explaining this data. I had chosen Computer Science for my +2 in 1998. It was taught in C++ and mostly involved memorising operations on Data Structures such as Lists, Stacks and Queues. The C++ standard library had to be memorized including I/O, string and file functions.

    The exams were basically a test of memory rather than attacking a new problem space mathematically. Guess which are the other subjects which involve memorising a huge set of symbolic facts? — Chemistry and to a certain extend, Physics.

    I believe the data is more revealing of our Computer Science pedagogical and evaluation methods than the subject itself.

  2. My 2 cents based on experience.
    People who take the computer science group generally have to always take one language and English subject apart from taking Physics, Chemistry, Maths. They would not be able to take classes in accountancy, economics, history, etc.

    Some of the students at the top of the class would have realized that in order to differentiate in terms of coming first in class/school, one would need to excel in language and english. It is given that one needs to get excellent scores in physics, chemistry and maths to get an overall good percentage. Also the fact, that scoring a big total would enable them to get admission in colleges in like BITS pilani which looks at overall score.

    Students who take the biology or accountancy group have no such pressure for them to excel in English and language subjects.

    Hope it makes sense.

    Also, if you can do a correlation between computer science students and the language that they take, I believe you will find that the majority of students would have chosen french rather than Tamil or English.

  3. Anand

    Correlation is a misleading statistic if the dependent and independent variables are not “really” related that way. There could be compounding effects within the independent variables leading to wrong correlation coefficients.

    There are ways to detect and cleanse compounding – design of experiments (in statistics) deals in depth on this. It is quite possible that there could be some relationship in how the computer science and math exams are scheduled and the scores in these subjects. If they are too close to each other, the student may have had lesser time to prepare for the math exam. For example, if Physics and Math were scheduled one after the other, and Computer Science test is after the Math test, the student may have prepared better for Physics, not have had enough time for Math and then recovered to do well in Computer Science. This can be a pattern in the entire class as it is normal for kids to focus more on Physics (the dreaded subject!) v/s Math.

    The statistic may not reveal ability or natural alignment of the subjects.

  4. In my experience, I’ve come across rock star programmers who have sound grasp of mathematics. But i dont have statistics to prove them. As we all know, programming (esp functional) is heavily influenced by mathematics. Coming to inferences, I would take the dataset with a pinch of salt. Is it diverse enough to be statistically significant ? I agree with most of the comments on relating “mugging up programs” to being good at Chemistry in particular.

  5. I’d rearrange subjects such that they’re more clustered together by correlation, like in a TreeMap view. That way it’s easier to see the relationships.

Leave a Comment

Your email address will not be published. Required fields are marked *