Normalising non-random samples is bad

I rate movies on a scale of 1 (bad) to 5 (good). This is an absolute scale. Initially, I assumed that I would watch as many good movies as bad ones. So I’d have about as many 1s as 5s, and 2s as 4s. But, when I looked at my ratings for movies over the last year, I had far more 4s than 2s. My movie ratings were not normal.

Rating Frequency
1 8
2 31
3 98
4 81
5 18

The reason is clear. I pick good movies rather than bad ones, based on reviews. If I rated every movie there was, the ratings may be normally distributed (or they may not). But when I pick movies, I consciously reject those I know would have a low rating (based on reviews), so my ratings would be more clustered around the top.

Even if I redefined my scale, I’d still have more than 50% above the average. This is not a contradiction. I watch a LOT of good movies with very similar ratings, and a few disastrously bad movies. The good movies will have a higher-than-average rating, and there’ll be more of them than the bad movies. This is a skewed or asymmetric distribution.

So, selective picking can wreck the normal curve.

Yet, almost everything is selectively picked. Colleges try and pick the best students. Organisations tend to pick the best employees. If they rate performance, they’re likely to find a bias towards the higher side — at least, the good colleges and organisations. Force fitting a normal distribution pushes down genuinely good people. (In bad colleges and organisations, it pushes up genuinely bad people).

2 thoughts on “Normalising non-random samples is bad”

  1. Swapnaa Jayaraman

    I wonder if your movie rating scale is really absolute. I can imagine 1 being “one of the greatest movies I’ve ever seen” and 5 being “can’t get worse than that”, but how about the ones in between? Do you not go “hmm, this one’s nice, but not as nice as Mauna Raagam (which got a 2), so I’ll give this one a 3..”?

  2. This really is an absolute scale. 1 means I stopped watching the movie midway. 2 – I won’t watch it again. 3 – I watched it twice, but won’t watch it any more (or I plan to watch it only once more). 4 – I can watch it a few times. 5 – I can watch it forever. (4 and 5 are slightly fuzzy, I admit)

Leave a Comment

Your email address will not be published. Required fields are marked *