Storytelling: Part 1
In a number of sessions I’ve been to, people ask analysts to make their results more interesting – to tell stories with them. I’m co-teaching a course, part of which involves telling stories with data. So this got me thinking: what is a story? How does one teach storytelling to, let’s say, an alien? Consider this mini-paper. ABSTRACT: Meter readings exhibit spikes at slab boundaries. We also find significant evidence of improbably events at round numbers. Electricity shortage is a serious problem in most Indian states. Part of this problem is due to the inaccuracy of reporting procedures used in monitoring meter readings. Our focus here is not to document or experimentally determine the degree of inaccuracy. We have adopted a data driven approach to this problem and attempt to model the extent of inaccuracy using basic statistical analysis techniques such as histograms and the comparison of means. Our dataset comprises of the frequency analysis 12-month dataset containing monthly meter readings of 1.8 million customers in the State of Andhra Pradesh. We find that a histogram of these readings shows unexpectedly high values at the slab boundaries: 50 (+45.342%, t > 13.431), 100 (+55.134%, t > 16.384), 200 (+33.341%, t > 15.232), and 300 (+42.138%, t > 19.958). We also detected spikes at round numbers: 10 (+15.341%, t > 5.315), 20 (+18.576%, t > 6.152), 30 (+11.341%, t > 4.319). The statistical significance of every deviation listed above is over 99.9%. Further, every deviation has a positive mantissa. This leads us to confidently declare the existence of a systematic bias in the meter readings analysed. You’re probably thinking: “I know why he’s put this example here. It must be a bad one. So, what a rotten paper it must be!” ...